CN110765759B - Intention recognition method and device - Google Patents

Intention recognition method and device Download PDF

Info

Publication number
CN110765759B
CN110765759B CN201911000072.5A CN201911000072A CN110765759B CN 110765759 B CN110765759 B CN 110765759B CN 201911000072 A CN201911000072 A CN 201911000072A CN 110765759 B CN110765759 B CN 110765759B
Authority
CN
China
Prior art keywords
key element
intention
corpus
result
target sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911000072.5A
Other languages
Chinese (zh)
Other versions
CN110765759A (en
Inventor
陈甜甜
井玉欣
周学阳
崔妲珅
宋忠森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Puxin Hengye Technology Development Beijing Co ltd
Original Assignee
Puxin Hengye Technology Development Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Puxin Hengye Technology Development Beijing Co ltd filed Critical Puxin Hengye Technology Development Beijing Co ltd
Priority to CN201911000072.5A priority Critical patent/CN110765759B/en
Publication of CN110765759A publication Critical patent/CN110765759A/en
Application granted granted Critical
Publication of CN110765759B publication Critical patent/CN110765759B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Abstract

The invention discloses an intention recognition method and device, wherein the method comprises the following steps: acquiring a target sentence to be subjected to intention recognition; processing the target sentence to obtain a key element of the target sentence; and respectively matching each key element of the target sentence with a pre-constructed key element parameter set of each intention to obtain an intention recognition result. The method can ensure that the chat robot adapts to business requirements in time.

Description

Intention recognition method and device
Technical Field
The invention relates to the field of intelligent chat robots, in particular to an intention recognition method and device.
Background
In recent years, intelligent chat robots have been rapidly developed and are widely used in various fields. In the man-machine conversation process, the chat robot needs to accurately grasp the user intention in the conversation process, namely, for a sentence input by the user, the chat robot should determine what the user expects to finish the task through the intention recognition technology, then perform subsequent tasks such as task slot value extraction and the like, and finally finish the task delivered by the user. The visual intention recognition determines whether the chat robot can intelligently and accurately interact with the user, and is an important technology in the field of intelligent chat robots.
Due to the diversity and rapid development of the service, in order to ensure that the chat robot adapts to the service requirement in time, the existing intention recognition method needs to be optimized.
Disclosure of Invention
In view of the above, the invention provides an intention recognition method and device to ensure that chat robots adapt to business needs in time.
In order to achieve the above purpose, the present invention provides the following technical solutions:
an intent recognition method, the method comprising:
acquiring a target sentence to be subjected to intention recognition;
processing the target sentence to obtain a key element of the target sentence;
and respectively matching each key element of the target sentence with a pre-constructed key element parameter set of each intention to obtain an intention recognition result.
Optionally, the processing the target sentence to obtain a key element of the target sentence includes:
performing word segmentation and part-of-speech tagging on the target sentence to obtain a word segmentation result and a part-of-speech tagging result;
performing dependency syntax analysis on the target statement to obtain a dependency syntax analysis result;
and extracting key elements from the target statement according to the part-of-speech tagging result and the dependency syntax analysis result.
Optionally, the key element parameter set of each intention is constructed in the following manner:
acquiring corpus corresponding to each intention;
performing word segmentation and part-of-speech tagging on each corpus to obtain a word segmentation result and a part-of-speech tagging result;
performing dependency syntax analysis on each corpus to obtain a dependency syntax analysis result;
extracting key elements of each corpus from each corpus according to the part-of-speech tagging result and the dependency syntax analysis result;
obtaining an original key element parameter set of each intention according to the original key elements of each corpus;
and performing similar word expansion on the original key element parameter set of each intention to obtain the key element parameter set of each intention.
Optionally, the extracting key elements of each corpus from each corpus according to the part-of-speech tagging result and the dependency syntax analysis result includes:
extracting nine key elements of a core noun, a subject modifier, an object noun, an object modifier, a non-subject verb, a core entity and a modifier from each corpus by using three parts of speech, a verb and an entity word as well as four dependency syntactic relations of a core relation, a subject relation, a guest relation and a centering relation.
Optionally, the obtaining the original key element parameter set of each intention according to the key elements of each corpus includes:
and combining key elements of each corpus corresponding to each intention into nine original key element subsets according to types, wherein the nine original key element subsets comprise a core noun subset, a subject modifier subset, an object noun subset, an object modifier subset, a non-subject verb subset, a core entity subset and a modifier entity subset, and the original key element parameter set of each intention comprises the nine original key element subsets.
Optionally, performing similar word expansion on the original key element parameter set of each intention, including:
acquiring a preset dictionary;
calculating the similarity of each original key element in the target original key element subset in the original key element parameter set of each intention and each word in the preset dictionary;
determining that words with similarity larger than a preset similarity threshold value in the preset dictionary are synonyms of the original key elements;
the synonyms are added to the target original key element subset.
Optionally, the extracting key elements from the target sentence according to the part-of-speech tagging result and the dependency syntax analysis result includes:
extracting nine key elements of a core noun, a subject modifier, an object noun, an object modifier, a non-subject verb, a core entity and a modifier from the target sentence according to three parts of speech, a verb and an entity word, and four dependency syntactic relations of a core relation, a subject relation, a guest relation and a centering relation.
Optionally, the matching each key element of the target sentence with a pre-constructed key element parameter set of each intention to obtain an intention recognition result includes:
matching each key element of the target sentence with each pre-constructed intention recognition key element parameter set to obtain an intention set hit by each key element;
determining the intersection of the intent sets hit by each key element as the intent recognition result.
An intent recognition device, the device comprising:
a target sentence acquisition unit for acquiring a target sentence to be subjected to intention recognition;
the processing unit is used for processing the target sentence to obtain a key element of the target sentence;
and the matching unit is used for respectively matching each key element of the target sentence with a pre-constructed key element parameter set of each intention to obtain an intention recognition result.
Optionally, the processing unit includes:
the word segmentation and part of speech tagging unit is used for carrying out word segmentation and part of speech tagging on the target sentence to obtain a word segmentation result and a part of speech tagging result;
the dependency syntax analysis unit is used for performing dependency syntax analysis on the target statement to obtain a dependency syntax analysis result;
and the key element extraction unit is used for extracting key elements from the target sentence according to the part-of-speech tagging result and the dependency syntax analysis result.
As can be seen from the above technical solution, compared with the prior art, the present invention discloses an intention recognition method and apparatus, the method includes: acquiring a target sentence to be subjected to intention recognition; processing the target sentence to obtain a key element of the target sentence; and respectively matching each key element of the target sentence with a pre-constructed key element parameter set of each intention to obtain an intention recognition result. The method can ensure that the chat robot adapts to business requirements in time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of an intent recognition method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method of constructing a set of key element parameters for each intent in accordance with the present disclosure;
FIG. 3 is a schematic flow chart of a method for processing the target sentence to obtain the key element of the target sentence according to the present invention;
fig. 4 is a schematic structural diagram of an intent recognition device according to an embodiment of the present invention.
Detailed Description
For reference and clarity, the description, shorthand or abbreviations of technical terms used hereinafter are summarized as follows:
and (5) intention recognition: the intent of a sentence, such as "check weather", "check postal code", etc., is identified.
Dependency syntax analysis: the analysis sentence summarizes the dependency relationship between the words, i.e. indicates the syntactical collocation relationship between the words.
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely illustrative of the manner in which embodiments of the invention have been described in connection with the description of the objects having the same attributes. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
The existing intention recognition methods mainly comprise two types:
one is an intent recognition method based on rule template construction.
According to the method, sentence components are analyzed through related technologies such as word segmentation and dependency syntactic analysis according to intent corpus collected in advance, a proper rule template is constructed for each intent manually based on a combination relation among the components, and a question successfully matched with the rule template in a prediction stage is the intent.
The method has the advantages of simple and easy rule construction and high accuracy, and has the defects of low recall rate, low universality, complex rule writing, complete dependence on manual completion and incapability of dynamically generating from corpus.
Another is an intention recognition method based on machine learning.
According to the method, a large number of intent corpus sets are manually marked in advance, sentence characteristics are extracted through natural language processing, and a statistical machine learning model or a neural network is constructed for training to obtain an intent recognition model.
The method has the advantages that lengthy rules are not required to be constructed, a large number of corpus sets are often required to train, the interpretation of misclassified samples is limited, the model is required to be retrained for adjustment and correction, and the model cannot be updated in time.
It can be seen that both rule template construction-based and machine learning-based intent recognition methods suffer from certain drawbacks.
On the one hand, although the rule template-based intent recognition method is simple, the conventional method only uses keywords or dependency syntactic analysis technology to match with a preset intent template by means of an analysis result in a model prediction stage, and the template is limited and fixed, so that the recall rate is low, and the generalization performance is poor.
On the other hand, the intention recognition method based on machine learning has requirements on the number of language materials, a large amount of intention corpus data needs to be manually marked in advance for intention recognition, the intention recognition effect is uncontrollable due to different-scale corpus and algorithms with different structures, and the intention recognition result cannot be manually interpreted and timely adjusted for misclassification samples.
Most importantly, the two methods have the problem that the intention recognition tasks cannot be dynamically added, namely, the types of the intention recognition tasks which can be supported by the system are limited, the design is carried out aiming at the pre-designated tasks, if business personnel want to add new intention recognition tasks, continuous support of algorithm personnel and updating of the system are required, after one-time delivery of the model cannot be achieved, the business personnel can optionally add and configure the new intention recognition tasks according to the requirements and can be automatically adapted and accurately processed by the model.
The method based on the rule template construction needs to manually identify sentence characteristics, the rule writing technology is complex, special technical research personnel are needed to write corresponding identification rule templates aiming at new intention identification tasks, and the process is complex; the intention recognition method based on machine learning has the characteristics of supervised learning, so that the expansion of the intention recognition task requires an algorithm engineer to collect corpus aiming at the expanded task and retrain the original model; although this process can be realized by automatic training in theory, so that the model is self-trained, the training process also requires a large amount of corpus and time, the automatic training technology is complex to implement, and only a small number of large enterprises currently adopt the method.
Due to the diversity and rapid development of the service, the problem affects the performance of the robot, and meanwhile, the delivered robot system is difficult to adapt to the service requirement in time, and the subsequent continuous updating greatly improves the operation cost.
In view of the above problems, the present invention aims to provide a lightweight and agile intention recognition method, and can dynamically add an adaptive recognition scheme corresponding to the generation intention aiming at the intention recognition task.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Referring to fig. 1, fig. 1 is a schematic flow chart of an intent recognition method according to an embodiment of the present invention, the method includes the following steps:
s101: and acquiring a target sentence to be subjected to intention recognition.
In this embodiment, the target sentence to be subjected to intention recognition may specifically be a sentence input by the user.
S102: and processing the target sentence to obtain a key element of the target sentence.
In this embodiment, the target sentence may be processed to obtain the key element of the target sentence, where, as an implementation manner, the key element of the target sentence includes nine key elements of a core noun, a subject modifier, an object noun, an object modifier, a non-subject verb, a core entity, and a modifier entity.
S103: and respectively matching each key element of the target sentence with a pre-constructed key element parameter set of each intention to obtain an intention recognition result.
In this embodiment, each key element of the target sentence may be respectively matched with a subset corresponding to a pre-constructed key element parameter set of each intention, and if a certain subset is hit, the intention corresponding to the subset may be an intention recognition result.
In this embodiment, the key element parameter set of each intention may be constructed in advance based on the means of model training, and the key element parameter set of each intention may be stored, and before the intention recognition is performed on the target sentence, the stored key element parameter set of each intention may be obtained. The key element parameter set of each intent may include a plurality of subsets, and as one possible implementation, the key element parameter set of each intent may include nine kinds of subsets of core nouns, subsets of subject modifiers, subsets of object nouns, subsets of object modifiers, subsets of non-subject verbs, subsets of core entities, and subsets of modifying entities.
The embodiment discloses an intention recognition method, which comprises the following steps: acquiring a target sentence to be subjected to intention recognition; processing the target sentence to obtain a key element of the target sentence; and respectively matching each key element of the target sentence with a pre-constructed key element parameter set of each intention to obtain an intention recognition result. According to the method, the key element parameter set of each intention is constructed in advance, when the intention is identified for the target sentence, the key elements of the target sentence are extracted, and the key elements of the target sentence are matched with the key element parameter sets of each intention, so that the intention identification result can be obtained, and therefore, the chat robot can be ensured to adapt to the service requirement in time.
On the basis of the embodiments disclosed in the invention, the invention discloses an implementation manner for constructing each intended key element parameter set, and the implementation manner is described in detail through the following embodiments.
Referring to fig. 2, fig. 2 is a flow chart of a method for constructing a parameter set of each key element of the intent, the method comprises the following steps:
s201: and acquiring the corpus corresponding to each intention.
In this embodiment, the corpus corresponding to each intention is smaller than the corpus required by the existing machine learning-based intention recognition method, for example, the corpus can be at least 5 sentences.
It should be noted that, after the corpus corresponding to each intention is obtained, the specialized term list and the entity name set specified by the user may be loaded, the specialized term list and the entity name set specified by the user may be added to the word segmentation dictionary and the corresponding part of speech may be set, in addition, data preprocessing may be performed on the corpus corresponding to each intention, specifically, according to the entity name set, a preset unicode (e.g. entity_xxx) may be adopted to replace the entity word in the corpus, and at the same time, the word segmentation dictionary may be updated by the entity word and the preset unicode.
S202: and performing word segmentation and part-of-speech tagging on each corpus to obtain a word segmentation result and a part-of-speech tagging result.
In this embodiment, a word segmentation tool may be used to segment and label the part of speech for each corpus, so as to obtain a word segmentation result and a part of speech labeling result.
S203: and carrying out dependency syntax analysis on each corpus to obtain a dependency syntax analysis result.
In this embodiment, a dependency syntax analysis tool may be used to perform dependency syntax analysis on each corpus to obtain a dependency syntax analysis result.
S204: and extracting key elements of each corpus from each corpus according to the part-of-speech tagging result and the dependency syntax analysis result.
As an implementation manner, nine key elements of a core noun, a subject modifier, an object noun, an object modifier, an unobject verb, a subject verb, a core entity and a modifier entity can be extracted from each corpus for three parts of speech, a verb and an entity word and four dependency syntactic relations of the core relation, the main-predicate relation, the animal-to-guest relation and the centering relation.
Based on the following assumption, the four main dependency syntactic relations are respectively a core relation HED, a main-predicate relation SBV, a movable guest relation VOB and a centering relation ATT, and 9 key elements in each corpus are respectively a core noun Wn1, a main noun Wn2, a main noun modifier Wn3, an object noun Wn4, an object modifier Wn5, a non-main verb wv1, a main verb wv2, a core entity we1 and a modifier entity we2. Then:
1. when the sentence HED is a noun, the following operations are performed:
a) The sentence HED is noun part of speech and only has one relation component in the sentence, if the HED is a non-entity word, the HED is marked as a core noun wn1, otherwise, the HED is marked as a core entity we1;
b) When HED is a noun and there is a part-of-speech ATT component, if ATT is an entity word and HED is a non-entity word, we2 and we1 are marked respectively, if ATT is a non-entity word and HED is an entity word, we1 and we2 are marked respectively, if both are non-entity words, we2 and wn3 are marked respectively, otherwise, we1 are marked respectively.
2. When the sentence HED is a verb, the following operations are performed:
a) If the verb in the sentence is unique and there is an SBV, the noun SBV and VOB of the HED, and the noun ATT of the SBV and the noun ATT of the VOB are sequentially searched. When the SBV does not have a part-of-speech ATT, if the SBV is a non-entity word, the SBV is marked as wn1, otherwise, we2. When the SBV has the name part-of-speech ATT, if the SBV non-entity word ATT is an entity word, respectively marking as wn1 and we2; if SBV is entity word ATT is non-entity word, then we2 and wn1 are marked respectively; if SBV and ATT are not entity words, respectively marking as wn2 and wn3; if SBV and ATT are both entity words, we2 is labeled.
When the VOB does not have a part of speech ATT, the VOB is marked as wn4 if the VOB is a non-physical word, otherwise it is marked as we2. When the VOB has the name part-of-speech ATT, if the non-entity word ATT of the VOB is an entity word, the non-entity word ATT of the VOB is marked as wn4 and we2 respectively; if the VOB is an entity word ATT and is a non-entity word, the VOB is respectively marked as we2 and wn4; if the VOB and the ATT are both non-entity words, respectively marking as wn4 and wn5; if both VOB and ATT are physical words, we2 is marked. Finally, HED is marked as wv2.
b) If the verb in the sentence is unique and there is no SBV, the noun VOB of the HED and the noun ATT of the VOB are found. When the VOB does not have a part of speech ATT, the VOB is marked as wn1 if the VOB is a non-physical word, otherwise, we1. When the VOB has the name part-of-speech ATT, if the non-entity word ATT of the VOB is an entity word, the non-entity word ATT of the VOB is marked as wn1 and we2 respectively; if the VOB is an entity word ATT and is a non-entity word, the VOB is respectively marked as we2 and wn1; if the VOB and the ATT are both non-entity words, respectively marking as wn2 and wn3; if both VOB and ATT are physical words, we1 is marked. Finally, HED is marked as wv1.
c) If multiple verbs exist in the sentence, recursively searching the verb-like VOB of the HED until the last verb VOB is found, and searching the noun-like VOB of the VOB and the name part of speech ATT thereof. When no part-of-speech VOB exists, the last verb VOB is marked as wv1. When there is a part of speech VOB and no part of speech ATT, we1 is marked if the VOB is an entity word, otherwise we1 is marked. When the name part-of-speech VOB exists and the name part-of-speech ATT exists, if the VOB is a non-entity word ATT is an entity word, the VOB is marked as wn1 and we2 respectively; if the VOB is an entity word ATT and is a non-entity word, the VOB is respectively marked as we2 and wn1; if the VOB and the ATT are both non-entity words, respectively marking as wn2 and wn3; if both VOB and ATT are physical words, we1 is marked. The end verb VOB is marked wv1.
S205: and obtaining the original key element parameter set of each intention according to the original key elements of each corpus.
As an implementation manner, the key elements of each corpus corresponding to each intention can be combined into nine original key element subsets of a core noun subset, a subject modifier subset, an object noun subset, an object modifier subset, a non-subject verb subset, a core entity subset and a modifier entity subset according to types, and the original key element parameter set of each intention comprises the nine original key element subsets.
S206: and performing similar word expansion on the original key element parameter set of each intention to obtain the key element parameter set of each intention.
As an embodiment, a preset dictionary may be acquired; calculating the similarity of each original key element in the target original key element subset in the original key element parameter set of each intention and each word in the preset dictionary; determining that words with similarity larger than a preset similarity threshold value in the preset dictionary are synonyms of the original key elements; the synonyms are added to the target original key element subset. It should be noted that, the preset dictionary may be a synonym table defined in advance or a word vector trained by an external corpus.
On the basis of the embodiment disclosed by the invention, the invention discloses an implementation mode for processing the target sentence to obtain the key element of the target sentence, and the implementation mode is described in detail by the following embodiment.
Referring to fig. 3, fig. 3 is a flow chart of a method for processing the target sentence to obtain a key element of the target sentence, which includes the following steps:
s301: and performing word segmentation and part-of-speech tagging on the target sentence to obtain a word segmentation result and a part-of-speech tagging result.
In this embodiment, a word segmentation tool may be used to segment words and label parts of speech for the target sentence, so as to obtain a word segmentation result and a part of speech labeling result.
S302: and performing dependency syntax analysis on the target statement to obtain a dependency syntax analysis result.
In this embodiment, a dependency syntax analysis tool may be used to perform dependency syntax analysis on the target statement to obtain a dependency syntax analysis result.
S303: and extracting key elements from the target statement according to the part-of-speech tagging result and the dependency syntax analysis result.
As one implementation manner, nine key elements of a core noun, a subject modifier, an object noun, an object modifier, a non-subject verb, a core entity and a modifier entity can be extracted from the target sentence for three parts of speech, a verb and a entity word, and four dependency syntactic relations of the core relation, the main-predicate relation, the guest-moving relation and the centering relation.
In the invention, each key element of the target sentence is respectively matched with a pre-constructed key element parameter set of each intention, and the obtaining of the intention recognition result specifically includes respectively matching each key element of the target sentence with a pre-constructed key element parameter set of each intention recognition to obtain an intention set hit by each key element; determining the intersection of the intent sets hit by each key element as the intent recognition result.
The key elements of the target sentence are assumed to comprise a core noun tn1, a subject noun tn2, a subject modifier noun tn3, an object noun tn4, an object modifier noun tn5, an unobject verb tv1, a subject verb tv2, a core entity te1, a modifier entity te2
And respectively matching each key element of the target sentence with each intention recognition key element parameter set to obtain an intention set hit by each key element, wherein the intention set hit by each key element comprises a core noun result set RN1, a subject result set RN2, an object result set RN3, a non-subject verb result set RV1, a subject verb result set RV2, a core entity result set RE1 and a modification entity result set RE2.
The specific operation is as follows:
traversing each intention, for the intention i:
a) If tn1 is not null and belongs to SN1 of intent i, add i to RN1.
b) If tn2 belongs to SN2 of intent i and tn3 belongs to SN3 of intent i, add i to RN2; if tn2 belongs to SN2 of intent i and tn3 is null, add i to RN2; if the concatenation of the character strings of tn2 and tn3 belongs to SN1 of the intention i, adding i to RN2; if tn2 belongs to SN1 of intent i and tn3 is null, add i to RN2.
c) If tn4 belongs to SN4 of intent i and tn5 belongs to SN5 of intent i, add i to RN3; if tn4 belongs to SN4 of intent i and tn5 is null, add i to RN3; if the concatenation of the character strings of tn4 and tn5 belongs to SN1 of the intention i, adding i to RN3; if tn4 belongs to SN1 of intent i and tn3 is null, add i to RN3.
d) If tv1 belongs to the SV1 set of intent i, intent i is added to RV1.
e) If tv2 belongs to the SV2 set of intent i, intent i is added to RV2.
f) If te1 belongs to the SE1 set of intent i, intent i is added to RE1.
g) If te2 belongs to the SE2 set of intent i, intent i is added to RE2.
2. Summarizing hit results of all the components to obtain a final prediction result:
a) If all non-empty target key elements miss any intention, namely RN1, RN2, RN3, RV1, RV2, RE1 and RE2 are empty sets, the final intention recognition result is that no intention is recognized;
b) If RV1 or RV2 is not null and RN1, RN2 and RN3 are null, the final intention recognition result is that no intention is recognized;
and returning the intersection of the intention matching result sets of the non-empty target key elements as an intention recognition result.
The method is described in detail in the embodiments disclosed in the present invention, and the method can be implemented by using various types of devices, so that the present invention also discloses a device, and specific embodiments are given below for details.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an intent recognition device according to an embodiment of the present invention, where the device includes:
a target sentence acquisition unit 41 for acquiring a target sentence to be subjected to intention recognition;
a processing unit 42, configured to process the target sentence to obtain a key element of the target sentence;
and a matching unit 43, configured to match each key element of the target sentence with a pre-constructed key element parameter set of each intention, so as to obtain an intention recognition result.
As an embodiment, the processing unit includes:
the word segmentation and part of speech tagging unit is used for carrying out word segmentation and part of speech tagging on the target sentence to obtain a word segmentation result and a part of speech tagging result;
the dependency syntax analysis unit is used for performing dependency syntax analysis on the target statement to obtain a dependency syntax analysis result;
and the key element extraction unit is used for extracting key elements from the target sentence according to the part-of-speech tagging result and the dependency syntax analysis result.
It should be noted that, the specific implementation of each unit is described in detail in the method embodiment, and this embodiment is not repeated.
To sum up:
the method does not need manual updating of algorithm personnel and has extremely low requirement on the quantity of the language materials, and business personnel only need to provide a small quantity of sample example sentences with intention.
The intention recognition method is based on a small corpus, is simple in configuration, has low calculation force requirements, is highly automated in process, does not need to edit complex rule templates, does not need to train a large number of corpus into proper machine learning models for each intention independently, does not need special technical personnel to support, can be directly used by business personnel, has low learning threshold, supports self-adaptive training for dynamic addition and deletion of intention recognition tasks, and has high agility. The algorithm is based on syntactic component analysis, and has strong interpretability. In a word, the method of the invention greatly reduces the algorithm research and development steps required by the intention recognition flow, simplifies the implementation flow of the intention recognition model, and reduces the workload required by training the model and subsequent maintenance of the model.
The invention adopts dependency syntax analysis to extract key elements under each intention as the parameter set of the intention recognition model, which can be implemented when the number of samples is not very large, solves the problem that a large number of intention labels are required in the intention recognition method based on machine learning, and has stronger interpretability compared with the machine learning model based on complex algorithm.
The invention uses word vector technique to expand the similar word of the key element set, which improves the generalization performance of the model compared with the rule matching method.
The method supports dynamic addition or modification of the intention corpus, can automatically retrain to obtain a new intention recognition model based on the new corpus, does not need to manually modify the model, and solves the problem that the existing intention recognition model cannot be automatically generated based on the dynamic thought corpus. Compared with the method based on the complex machine learning model for automatic training, the method is simpler and quicker, does not need more corpus and calculation support, and is high in speed.
The method provided by the invention is simple to maintain after delivery, and does not need professional technicians to update the model or expand rules.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
It should be further noted that the above-described apparatus embodiments are merely illustrative, and that the units described as separate units may or may not be physically separate, and that units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the invention, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the present invention may be implemented by means of software plus necessary general purpose hardware, or of course by means of special purpose hardware including application specific integrated circuits, special purpose CPUs, special purpose memories, special purpose components, etc. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions can be varied, such as analog circuits, digital circuits, or dedicated circuits. However, a software program implementation is a preferred embodiment for many more of the cases of the present invention. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random-access Memory (RAM, random Access Memory), a magnetic disk or an optical disk of a computer, etc., including several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to execute the method according to the embodiments of the present invention.
In summary, the above embodiments are only for illustrating the technical solution of the present invention, and are not limited thereto; although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the above embodiments can be modified or some technical features thereof can be replaced equivalently; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. A method of intent recognition, the method comprising:
acquiring a target sentence to be subjected to intention recognition;
processing the target sentence to obtain a key element of the target sentence;
matching each key element of the target sentence with a pre-constructed key element parameter set of each intention to obtain an intention recognition result;
the key element parameter set of each intention is constructed in the following way:
acquiring corpus corresponding to each intention;
loading a preset professional language vocabulary and entity name set, and adding the professional language vocabulary and the entity name set into a word segmentation dictionary;
performing data preprocessing on the corpus corresponding to each intention, wherein the data preprocessing comprises the following steps: replacing entity words in each corpus with preset unified characters;
performing word segmentation and part-of-speech tagging on each corpus to obtain a word segmentation result and a part-of-speech tagging result;
performing dependency syntax analysis on each corpus to obtain a dependency syntax analysis result;
extracting key elements of each corpus from each corpus according to the part-of-speech tagging result and the dependency syntax analysis result;
obtaining an original key element parameter set of each intention according to the original key elements of each corpus;
and performing similar word expansion on the original key element parameter set of each intention to obtain the key element parameter set of each intention.
2. The method of claim 1, wherein the processing the target sentence to obtain the key element of the target sentence comprises:
performing word segmentation and part-of-speech tagging on the target sentence to obtain a word segmentation result and a part-of-speech tagging result;
performing dependency syntax analysis on the target statement to obtain a dependency syntax analysis result;
and extracting key elements from the target statement according to the part-of-speech tagging result and the dependency syntax analysis result.
3. The method of claim 1, wherein the extracting key elements of each corpus from each corpus according to the part-of-speech tagging result and the dependency syntax analysis result comprises:
extracting nine key elements of a core noun, a subject modifier, an object noun, an object modifier, a non-subject verb, a core entity and a modifier from each corpus by using three parts of speech, a verb and an entity word as well as four dependency syntactic relations of a core relation, a subject relation, a guest relation and a centering relation.
4. A method according to claim 3, wherein said obtaining said original key element parameter set for each intention from said key elements of each corpus comprises:
and combining key elements of each corpus corresponding to each intention into nine original key element subsets according to types, wherein the nine original key element subsets comprise a core noun subset, a subject modifier subset, an object noun subset, an object modifier subset, a non-subject verb subset, a core entity subset and a modifier entity subset, and the original key element parameter set of each intention comprises the nine original key element subsets.
5. The method of claim 1, wherein performing a similar word expansion on the original key element parameter set of each intention comprises:
acquiring a preset dictionary;
calculating the similarity of each original key element in the target original key element subset in the original key element parameter set of each intention and each word in the preset dictionary;
determining that words with similarity larger than a preset similarity threshold value in the preset dictionary are synonyms of the original key elements;
the synonyms are added to the target original key element subset.
6. The method of claim 2, wherein the extracting key elements from the target sentence according to the part-of-speech tagging result and the dependency syntax analysis result comprises:
extracting nine key elements of a core noun, a subject modifier, an object noun, an object modifier, a non-subject verb, a core entity and a modifier from the target sentence according to three parts of speech, a verb and an entity word, and four dependency syntactic relations of a core relation, a subject relation, a guest relation and a centering relation.
7. The method according to claim 1, wherein the matching each key element of the target sentence with a pre-constructed key element parameter set of each intention to obtain an intention recognition result includes:
matching each key element of the target sentence with each pre-constructed intention recognition key element parameter set to obtain an intention set hit by each key element;
determining the intersection of the intent sets hit by each key element as the intent recognition result.
8. An intent recognition device, the device comprising:
a target sentence acquisition unit for acquiring a target sentence to be subjected to intention recognition;
the processing unit is used for processing the target sentence to obtain a key element of the target sentence;
the matching unit is used for respectively matching each key element of the target sentence with a pre-constructed key element parameter set of each intention to obtain an intention recognition result;
the key element parameter set of each intention is constructed in the following way:
acquiring corpus corresponding to each intention;
loading a preset professional language vocabulary and entity name set, and adding the professional language vocabulary and the entity name set into a word segmentation dictionary;
performing data preprocessing on the corpus corresponding to each intention, wherein the data preprocessing comprises the following steps: replacing entity words in each corpus with preset unified characters;
performing word segmentation and part-of-speech tagging on each corpus to obtain a word segmentation result and a part-of-speech tagging result;
performing dependency syntax analysis on each corpus to obtain a dependency syntax analysis result;
extracting key elements of each corpus from each corpus according to the part-of-speech tagging result and the dependency syntax analysis result;
obtaining an original key element parameter set of each intention according to the original key elements of each corpus;
and performing similar word expansion on the original key element parameter set of each intention to obtain the key element parameter set of each intention.
9. The apparatus of claim 8, wherein the processing unit comprises:
the word segmentation and part of speech tagging unit is used for carrying out word segmentation and part of speech tagging on the target sentence to obtain a word segmentation result and a part of speech tagging result;
the dependency syntax analysis unit is used for performing dependency syntax analysis on the target statement to obtain a dependency syntax analysis result;
and the key element extraction unit is used for extracting key elements from the target sentence according to the part-of-speech tagging result and the dependency syntax analysis result.
CN201911000072.5A 2019-10-21 2019-10-21 Intention recognition method and device Active CN110765759B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911000072.5A CN110765759B (en) 2019-10-21 2019-10-21 Intention recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911000072.5A CN110765759B (en) 2019-10-21 2019-10-21 Intention recognition method and device

Publications (2)

Publication Number Publication Date
CN110765759A CN110765759A (en) 2020-02-07
CN110765759B true CN110765759B (en) 2023-05-19

Family

ID=69332584

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911000072.5A Active CN110765759B (en) 2019-10-21 2019-10-21 Intention recognition method and device

Country Status (1)

Country Link
CN (1) CN110765759B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460117B (en) * 2020-03-20 2024-03-08 平安科技(深圳)有限公司 Method and device for generating intent corpus of conversation robot, medium and electronic equipment
CN111783425B (en) * 2020-06-28 2023-04-18 中国平安人寿保险股份有限公司 Intention identification method based on syntactic analysis model and related device
CN111984789B (en) * 2020-08-26 2024-01-30 普信恒业科技发展(北京)有限公司 Corpus classification method, corpus classification device and server
CN112115705A (en) * 2020-09-23 2020-12-22 普信恒业科技发展(北京)有限公司 Method and device for screening electronic resume
CN112364139B (en) * 2020-11-02 2023-12-19 南京京恒信息技术有限公司 Medical dialogue system intention recognition and classification method based on deep learning
CN112328763A (en) * 2020-11-04 2021-02-05 北京京东尚科信息技术有限公司 Intention recognition method, device, dialogue method and system
CN112784574B (en) * 2021-02-02 2023-09-15 网易(杭州)网络有限公司 Text segmentation method and device, electronic equipment and medium
CN115270786B (en) * 2022-09-27 2022-12-27 炫我信息技术(北京)有限公司 Method, device and equipment for identifying question intention and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104866511A (en) * 2014-02-26 2015-08-26 华为技术有限公司 Method and equipment for adding multi-media files
CN109657062A (en) * 2018-12-24 2019-04-19 万达信息股份有限公司 A kind of electronic health record text resolution closed-loop policy based on big data technology
CN110147445A (en) * 2019-04-09 2019-08-20 平安科技(深圳)有限公司 Intension recognizing method, device, equipment and storage medium based on text classification

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10387892B2 (en) * 2008-05-06 2019-08-20 Netseer, Inc. Discovering relevant concept and context for content node
CN101510221B (en) * 2009-02-17 2012-05-30 北京大学 Enquiry statement analytical method and system for information retrieval
CN107977387A (en) * 2016-10-25 2018-05-01 北京酷我科技有限公司 A kind of song recommendations method and system based on semantics recognition
US10594560B2 (en) * 2017-03-27 2020-03-17 Cisco Technology, Inc. Intent driven network policy platform
CN107862005A (en) * 2017-10-25 2018-03-30 阿里巴巴集团控股有限公司 User view recognition methods and device
CN108304466B (en) * 2017-12-27 2022-01-11 中国银联股份有限公司 User intention identification method and user intention identification system
CN110147544A (en) * 2018-05-24 2019-08-20 清华大学 A kind of instruction generation method, device and relevant device based on natural language
CN109241538B (en) * 2018-09-26 2022-12-20 上海德拓信息技术股份有限公司 Chinese entity relation extraction method based on dependency of keywords and verbs
CN109346078B (en) * 2018-11-09 2021-06-18 泰康保险集团股份有限公司 Voice interaction method and device, electronic equipment and computer readable medium
CN109582968A (en) * 2018-12-04 2019-04-05 北京容联易通信息技术有限公司 The extracting method and device of a kind of key message in corpus
CN109597994B (en) * 2018-12-04 2023-06-06 挖财网络技术有限公司 Short text problem semantic matching method and system
CN109992651B (en) * 2019-03-14 2024-01-02 广州智语信息科技有限公司 Automatic identification and extraction method for problem target features

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104866511A (en) * 2014-02-26 2015-08-26 华为技术有限公司 Method and equipment for adding multi-media files
CN109657062A (en) * 2018-12-24 2019-04-19 万达信息股份有限公司 A kind of electronic health record text resolution closed-loop policy based on big data technology
CN110147445A (en) * 2019-04-09 2019-08-20 平安科技(深圳)有限公司 Intension recognizing method, device, equipment and storage medium based on text classification

Also Published As

Publication number Publication date
CN110765759A (en) 2020-02-07

Similar Documents

Publication Publication Date Title
CN110765759B (en) Intention recognition method and device
CN108847241B (en) Method for recognizing conference voice as text, electronic device and storage medium
EP4027268A1 (en) Training method for classification model, sample classification method and apparatus, and device
JP5901001B1 (en) Method and device for acoustic language model training
CN110020422A (en) The determination method, apparatus and server of Feature Words
CN101458681A (en) Voice translation method and voice translation apparatus
CN110853625B (en) Speech recognition model word segmentation training method and system, mobile terminal and storage medium
CN111445898B (en) Language identification method and device, electronic equipment and storage medium
CN112992125B (en) Voice recognition method and device, electronic equipment and readable storage medium
CN111709242A (en) Chinese punctuation mark adding method based on named entity recognition
CN110991179A (en) Semantic analysis method based on electric power professional term
CN115392264A (en) RASA-based task-type intelligent multi-turn dialogue method and related equipment
CN112541070B (en) Mining method and device for slot updating corpus, electronic equipment and storage medium
CN111062211A (en) Information extraction method and device, electronic equipment and storage medium
CN113076749A (en) Text recognition method and system
CN115906835B (en) Chinese question text representation learning method based on clustering and contrast learning
CN115357720B (en) BERT-based multitasking news classification method and device
CN110633468A (en) Information processing method and device for object feature extraction
CN116978367A (en) Speech recognition method, device, electronic equipment and storage medium
CN116483314A (en) Automatic intelligent activity diagram generation method
CN115357697A (en) Data processing method, device, terminal equipment and storage medium
CN115174285A (en) Conference record generation method and device and electronic equipment
CN110362803B (en) Text template generation method based on domain feature lexical combination
CN109727591B (en) Voice search method and device
CN113299277A (en) Voice semantic recognition method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant