CN116127011A

CN116127011A - Intention recognition method, device, electronic equipment and storage medium

Info

Publication number: CN116127011A
Application number: CN202211469093.3A
Authority: CN
Inventors: 范智超; 蒋宁; 夏粉; 吴海英
Original assignee: Mashang Consumer Finance Co Ltd
Current assignee: Mashang Consumer Finance Co Ltd
Priority date: 2022-11-22
Filing date: 2022-11-22
Publication date: 2023-05-16

Abstract

The application provides an intention recognition method, an intention recognition device, electronic equipment and a storage medium, wherein the intention recognition method comprises the following steps: determining whether the current sentence in the dialogue contains keywords in a keyword list; determining a target label corresponding to the keyword when the current sentence contains the keyword, and taking the target label as a first intention label when the intention label corresponding to the current sentence is the target label; sequentially positioning local text information in the dialogue according to a preset step length and a preset window size; under the condition that the first intention label comprises a second-level label, determining the second-level label as a reference intention label, inputting the first-level label and local text information to which the reference intention label belongs into a second classification model to obtain a second classification result, wherein the second classification result comprises the second intention label; the first intention label and the second intention label are used as intention labels of the dialogue. According to the technical scheme, the identified intention labels can be output in real time in the conversation process, and communication strategy support is provided for customer service.

Description

Intention recognition method, device, electronic equipment and storage medium

Technical Field

The present application relates to the field of natural language processing technologies, and in particular, to an intent recognition method, an intent recognition device, an electronic device, and a storage medium.

Background

In the process of a customer service to conduct a conversation with a user, the user may express multiple intentions to the customer service, and in some cases, it is necessary to accurately identify and record the multiple intentions expressed by the user for a downstream staff to process or facilitate subsequent other work. In general, a plurality of intentions expressed by a user during a conversation may be recorded by means of an artificial memory, but this is prone to error and miss. In addition, the multi-intention classification model can be used for classifying the dialogue content to obtain a plurality of intentions, the mode has high requirement on the model, the training cost is high, and particularly when the dialogue length is long, the classification effect of the model is poor, and a plurality of accurate intentions are difficult to obtain.

Disclosure of Invention

In view of this, the embodiments of the present application provide an intent recognition method, apparatus, electronic device, and storage medium, which can output, in real time, recognized intent labels during a session between customer service and a user through two methods of keyword matching and sliding window enumeration, so as to obtain a multi-label combination of the session, thereby being beneficial to providing communication policy support for the customer service during the session.

In a first aspect, embodiments of the present application provide an intent recognition method, including: determining whether a current sentence in a dialogue contains keywords in a keyword list according to the keyword list, wherein the keyword list comprises a plurality of labels and at least one keyword corresponding to each label, and the label recall rate of each keyword in the keyword list is greater than or equal to a first preset threshold; when the current sentence contains a keyword, determining a target label corresponding to the keyword, acquiring context information containing the current sentence, inputting the context information into a first classification model to obtain a first classification result, wherein the first classification result is used for determining whether an intention label corresponding to the current sentence is the target label, and taking the target label as the first intention label when the intention label corresponding to the current sentence is the target label; sequentially positioning local text information in the dialogue according to a preset step length and a preset window size; under the condition that the first intention label comprises a second-level label, determining the second-level label as a reference intention label, inputting a first-level label and local text information to which the reference intention label belongs into a second classification model to obtain a second classification result, wherein the second classification result comprises the second intention label, the scene types of the first-level label corresponding to the second intention label and the scene type of the first-level label corresponding to the first intention label are the same, and the types of the intention labels classified by the first classification model and the second classification model are different; the first intention label and the second intention label are used as intention labels of the dialogue.

In a second aspect, embodiments of the present application provide an intent recognition device, comprising: the determining module is used for determining whether the current sentence in the dialogue contains keywords in the keyword list according to the keyword list, wherein the keyword list comprises a plurality of labels and at least one keyword corresponding to each label, and the label recall rate of each keyword in the keyword list is larger than or equal to a first preset threshold value; the first classification module is used for determining a target label corresponding to the keyword when the current sentence contains the keyword, acquiring context information containing the current sentence, inputting the context information into the first classification model to obtain a first classification result, wherein the first classification result is used for determining whether an intention label corresponding to the current sentence is the target label, and taking the target label as the first intention label when the intention label corresponding to the current sentence is the target label; the second classification module is used for sequentially positioning local text information in the dialogue according to a preset step length and a preset window size, determining the second-level label as a reference intention label under the condition that the first intention label comprises the second-level label, inputting the first-level label and the local text information to which the reference intention label belongs into a second classification model to obtain a second classification result, wherein the second classification result comprises the second intention label, the scene types of the first-level label corresponding to the second intention label and the scene type of the first-level label corresponding to the first intention label are the same, and the types of the intention labels classified by the first classification model and the second classification model are different; the determining module is further used for taking the first intention label and the second intention label as intention labels of the dialog.

In a third aspect, embodiments of the present application provide an electronic device, including: a processor; a memory for storing processor-executable instructions, wherein the processor is configured to perform the intent recognition method as described in the first aspect above.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program for executing the intention recognition method described in the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program product comprising instructions that, when executed by a processor of a computer device, enable the computer device to perform the method of intent recognition as described in the first aspect above.

The embodiment of the application provides an intention recognition method, an intention recognition device, electronic equipment and a storage medium, which are characterized in that whether keywords in a keyword list exist in a current sentence or not is determined based on the keyword list with high recall rate of the keywords, the context information of the current sentence is positioned when the keywords exist in the current sentence, the context information is classified by using a first classification model to determine whether an intention label corresponding to the context information is a target label corresponding to the keywords, and if so, the target label corresponding to the keywords is used as the first intention label of the current sentence, so that the accuracy rate of intention recognition can be further improved under the condition of ensuring high recall rate. Moreover, the intention labels are determined in a keyword matching mode, so that the intention recognition efficiency can be improved, and the recognized intention labels are classified through the first classification model, so that the accuracy of recognition results can be improved while the efficiency is ensured. Further, the local text information is positioned through a sliding window enumeration method, and under the condition that the first intention label comprises a second-level label, the first-level label corresponding to the first intention label and the local text information are input into a second classification model, so that a second intention label is obtained. Therefore, the defect that intention recognition is carried out in a keyword matching mode can be overcome, the intention that keyword matching cannot be recognized can be recognized, and missing intention labels can be avoided; and the conflict of the scene type of the first-level label corresponding to the first-level label and the first-level label corresponding to the second-level label can be avoided, so that the possibility that the second classification model outputs the intention label which is inconsistent with the scene type related to the conversation can be reduced. In addition, through the two methods of keyword matching and sliding window enumeration, the identified intention labels can be output in real time in the conversation process of customer service and users, so that multi-label combination of the conversation is obtained, and further communication strategy support is provided for the customer service in the conversation process.

Drawings

Fig. 1 is a schematic system architecture diagram of an intent recognition system according to an exemplary embodiment of the present application.

Fig. 2 is a flow chart of an intent recognition method according to an exemplary embodiment of the present application.

Fig. 3 is a flowchart illustrating an intention recognition method according to another exemplary embodiment of the present application.

Fig. 4 is a schematic structural view of an intention recognition device according to an exemplary embodiment of the present application.

Fig. 5 is a block diagram of an electronic device for performing an intention recognition method according to an exemplary embodiment of the present application.

Detailed Description

The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Summary of the application

When a user communicates with a customer service, the user may express multiple intentions in a one-way conversation. For example, in an electric marketing scenario, a user may present to a customer service a number of intentions that wish to "purchase", "sign up", "renew", "consult purchase conditions", etc.; in the resource acquisition scenario, a user may propose multiple intents such as "repayment in advance", "repayment in delay", "account cancellation", interest deduction "to a customer service. Customer service needs to record intention labels for the current work order for downstream staff to process according to the intention expressed by the user in the conversation process. When the dialogue length is longer and the intention expressed by the user is more, the intention label is recorded for the current work order in a customer service memory mode, so that the difficulty for the customer service is higher, such as high requirements on the memory of the customer service, the grasping degree of business knowledge and the like; moreover, recording the intention label in this way is error-prone and it is difficult to record all intents, i.e., the accuracy of the intention label is difficult to ensure, and a case where the intention label is missed easily occurs.

Besides recording the intention labels in a customer service memory mode, a multi-intention classification model can be adopted to classify the whole dialogue so as to obtain a plurality of intention labels, but the mode has higher requirement on the model. If a huge deep learning model is adopted as a classifier, the training cost is high, the reasoning performance is poor, and the bill of lading efficiency is low.

Aiming at the technical problems, the method and the device analyze the intentions contained in the local content in the user dialogue process in real time in two ways, and can mutually supplement, so that all intentions in the dialogue can be identified as far as possible while the accuracy of the identification result is ensured, and missing intentions are avoided.

Exemplary System

Fig. 1 is a schematic system architecture diagram of an intent recognition system 100 according to an exemplary embodiment of the present application, where, as shown in fig. 1, the system 100 includes: user terminal device 110, outbound device 120, and intent recognition device 130.

Taking the electric marketing scenario as an example, the user terminal device 110 may be a mobile phone, a tablet, a personal computer, a personal digital assistant, etc., and may implement communication between a user and customer service. The customer service may be a manual customer service or an intelligent customer service, and the outbound device 120 may include a server or a landline telephone, etc. For example, the customer service may place a call to the user via the outbound device 120 and establish a communication connection with the user terminal device 110 via a network so that the user and the customer service may conduct a conversation. The intention recognition device 130 can establish communication connection with the outbound device 120 through a network, and is used for monitoring the dialogue content of a user and customer service in real time, retrieving keywords in a current sentence in the dialogue, positioning context information containing the current sentence according to the keywords, and further determining an intention label of the current sentence according to the context information; in addition, the intention recognition device 130 may sequentially locate the local text information in the monitored dialog content according to the preset step size and the preset window size, and perform intention recognition on the local text information. The intention recognition device 130 may feed back the intention labels recognized in the two ways to the customer service in real time, so that the customer service can adjust the communication policy according to the current intention label, or when the reliability of the current intention label is not high, the customer service can confirm the current intention again to the user by means of a query or the like. At the end of the conversation, the intent recognition device 130 may obtain a plurality of intent tags corresponding to the full conversation. Here, the intent recognition device 130 may be a computing device separate from the outbound device 120 or a computing device integrated on the outbound device 120.

In other scenarios, the outbound device 120 may also connect a call to the user terminal device 110 to establish a communication connection with the user terminal device 110 so that the user and customer service may conduct a conversation. Optionally, the customer service and the user can perform a dialogue in a voice manner, and also can perform a dialogue in a text manner.

It is to be understood that the above application scenarios are only shown for facilitating understanding of the spirit and principles of the present application, and that the embodiments of the present application are not limited thereto. Rather, embodiments of the present application may be applied to any scenario where applicable.

Exemplary method

Fig. 2 is a flow chart of an intent recognition method according to an exemplary embodiment of the present application. The method of fig. 2 may be performed by a computing device (e.g., the intent recognition device 130 of fig. 1, or other electronic device). As shown in fig. 2, the intention recognition method includes the following.

210: and determining whether the current sentence in the dialogue contains keywords in a keyword list according to the keyword list, wherein the keyword list comprises a plurality of labels and at least one keyword corresponding to each label, and the label recall rate of each keyword in the keyword list is greater than or equal to a first preset threshold value.

Specifically, the keyword table may include a plurality of tags and keywords corresponding to each tag, and each tag may correspond to one or more keywords. The labels in the keyword table may represent the intent of the keyword correspondence. For example, a keyword may be extracted from a large number of historical dialog samples that have been labeled with a tag (e.g., a happy icon tag), i.e., the keyword corresponds to the tag, in such a way that a keyword table may be constructed.

Further, since some keywords appear in the dialogue during the actual application process, it may be that the user wants to express the intention corresponding to the keywords in the keyword table, or the user does not want to express the intention corresponding to the keywords, but only uses the keywords during the dialogue process. Thus, each keyword corresponds to a certain tag recall rate, which is used to represent how reliable the keyword is to recall the tag corresponding to the keyword. Specifically, the label recall rate of each keyword may be counted when constructing the keyword corresponding to each label in the keyword table.

For example, when constructing the keyword corresponding to the tag a in the keyword table, the history dialogue samples corresponding to the tag a may be collected first, where the total number of history dialogue samples corresponding to the tag a is N; one keyword corresponding to the keyword A is a keyword B, and the keyword B appears in M historical dialogue samples in N historical dialogue samples, so that the tag recall rate of the keyword B is N/M. When the recall rate of the label corresponding to the keyword is higher, the probability that the intention expressed by the dialogue content containing the keyword is consistent with the intention represented by the label corresponding to the keyword is higher, namely the reliability of recall of the label corresponding to the keyword is higher.

Therefore, when the keyword list is constructed, the label recall rate of the keywords under each label can be judged first, the keywords with the label recall rate larger than or equal to the first preset threshold value are brought into the keyword list, and the keywords with the label recall rate smaller than the first preset threshold value are discarded, so that the keyword list with high label recall rate can be obtained.

If the current sentence in the dialogue contains a keyword in the keyword table, it indicates that there may be an intention that the user wants to express for the current sentence, and that the intention is an intention represented by a tag corresponding to the keyword is highly probable. Therefore, the intention of the user is identified by judging whether the keywords in the keyword list exist in the current sentence, and the accuracy of intention identification can be improved.

In the conversation process of the user and the customer service, the sentence expressed by the user can be sequentially used as the current sentence, or each sentence in the conversation can be sequentially used as the current sentence. For the current sentence, it may be determined whether keywords in the keyword table exist in the current sentence. For example, word segmentation is performed on the current sentence to obtain a plurality of words, and for each word, the keywords in the keyword table may be traversed to determine whether the word matches a certain keyword in the keyword table. If there is a match between a word in the current sentence and a keyword in the keyword list, i.e., the word is a keyword in the keyword list, it is determined that the current sentence contains a keyword in the keyword list. It should be understood that there may be no keywords in the current sentence, there may be one keyword, or there may be multiple keywords.

When the dialogue is performed in a text manner, the current sentence may be text information; when the conversation is performed in a voice manner, the current sentence may be voice information or text information into which the voice information is converted.

220: when the current sentence contains the keyword, determining a target label corresponding to the keyword, acquiring context information containing the current sentence, inputting the context information into a first classification model to obtain a first classification result, wherein the first classification result is used for determining whether an intention label corresponding to the current sentence is the target label, and taking the target label as the first intention label when the intention label corresponding to the current sentence is the target label.

The length of the context information may meet a preset length requirement, for example, K sentences before the current sentence, and L sentences after the current sentence may be spliced to obtain the context information, where K and L may be equal or unequal.

In the case that keywords exist in the current sentence, the possibility that the current sentence contains the intention of the user is high; moreover, since the matching between the words in the current sentence and the keywords in the keyword list is realized by the strong matching mode of keyword matching, the intention contained in the current sentence is highly likely to be consistent with the intention represented by the target label corresponding to the matched keyword. In order to further verify whether the current sentence contains the intention represented by the target tag corresponding to the keyword, the context information containing the current sentence may be located based on the keyword, and the intention of the user may be more accurately identified based on the context information of the current sentence.

For example, the context information is classified by using the first classification model to obtain a classification result (first classification result). The classification result may indicate whether the intention label corresponding to the context information is consistent with the target label corresponding to the keyword. For example, the classification result includes two types, one type corresponds to a specific intention label, such as a target label corresponding to a keyword, and the target label can be output at this time; the other type is other, and the other type indicates that the intention label corresponding to the context information is not the target label corresponding to the keyword, i.e. the intention label is not obtained at the moment. Optionally, the classification result may include yes or no, and if yes, outputting the target label corresponding to the keyword as the intention label; if not, the intention label is not output.

Wherein the first classification model may be a BERT model or other natural language processing model.

230: and sequentially positioning the local text information in the dialogue according to the preset step length and the preset window size.

The preset window may represent a length of the partial text information, and the preset step size may represent a length of an interval between two adjacent partial text information. For example, the preset step length is 1, the preset window is 3, the number of sentences existing in the current dialogue is 6, and four pieces of local text information are sequentially obtained: sentence 1+ sentence 2+ sentence 3; sentence 2+ sentence 3+ sentence 4; sentence 3+ sentence 4+ sentence 5; sentence 4+ sentence 5+ sentence 6. The method of sequentially locating local text information in a dialog by a preset step size and a preset window size may be referred to as a sliding window enumeration method.

240: under the condition that the first intention label comprises a second-level label, determining the second-level label as a reference intention label, inputting a first-level label and local text information to which the reference intention label belongs into a second classification model to obtain a second classification result, wherein the second classification result comprises the second intention label, the scene types of the first-level label corresponding to the second intention label and the scene type of the first-level label corresponding to the first intention label are the same, and the types of the intention labels classified by the first classification model and the second classification model are different.

For each local text information, a second classification model may be used to classify it to obtain a classification result (second classification result). The types of the intention labels which can be classified by the first classification model and the second classification model are different, and the second classification model can identify the intention labels which cannot be recalled through the keywords.

Similar to the first classification model, the second classification model may also be a BERT model or other natural language processing model.

The first intention label obtained through the first classification model can comprise a first-level label, a second-level label or a first-level label corresponding to the second-level label. The first-level label indicates that the label has no last-level label, namely the parent label; the secondary label indicates that the label has a primary label, namely the secondary label is a child label, and the primary label corresponds to a parent label of the secondary label and can be also called as a primary label of the secondary label. The same secondary label may correspond to the same primary label or may correspond to a different primary label. For example, the primary label is a renewal, and the corresponding secondary label is a package option; the primary label is consultation, and the corresponding secondary label is package option. The primary label and the secondary label may be set in advance as needed. For example, two-pole labels that can be identified by the first classification model and the second classification model and have different scene types may be set in advance, for example, a two-pole label list that can be identified by the first classification model includes: the primary label is repayment in advance, and the corresponding secondary label is disapproval of interest; the list of the two-pole labels identifiable by the second classification model comprises: the primary label is delayed repayment, and the corresponding secondary label is interest disapproval. The list of the two-pole labels identifiable by the first classification model comprises: the primary label is renewal, and the corresponding secondary label is package option. The list of the two-pole labels identifiable by the second classification model comprises: the primary label is consultation, and the corresponding secondary label is package option.

Because the accuracy of the intention labels obtained by the keyword matching mode is high, when the first intention label comprises the second-level label, the second-level label can be determined to be the reference intention label, and the first-level label corresponding to the reference intention label can be used for the subsequent reference of intention recognition of the local text information. For example, the first-level label corresponding to the reference intention label and the local text information are input into the second classification model to obtain the second intention label, so that the conflict of the scene type of the first-level label corresponding to the first intention label and the first-level label corresponding to the second intention label can be avoided, namely, the second classification model is prevented from outputting the intention label which is inconsistent with the scene type related to the dialogue.

For example, the first level tag corresponding to the second intention tag and the first level tag corresponding to the first intention tag have the same scene type, i.e. the specific content of the first level tag corresponding to the second intention tag and the specific content of the first level tag corresponding to the first intention tag may be different, but the scene types of the first level tag and the second level tag are the same. The scene type corresponding to each level of label can be set in advance according to the requirement.

Similar to the first intent tag, the second intent tag may include a primary tag, a secondary tag, or a secondary tag and a primary tag corresponding to the secondary tag.

250: the first intention label and the second intention label are used as intention labels of the dialogue.

It should be understood that the first classification model may classify context information corresponding to different keywords respectively; the second classification model may classify local text information that includes different intents.

In the dialog, the local text information corresponding to the second intention label may be located after the context information. For example, if the reference intention label is not obtained for a sentence preceding a certain local text information and/or a sentence located in the local text information, the local text information may be directly input into the second classification model to obtain the classification result. Alternatively, first intention labels in the whole dialogue can be identified in a keyword matching mode, reference intention labels are determined based on the first intention labels, and then intention identification is performed on each local text information based on the reference intention labels, so that second intention labels are obtained.

The first intention tag (target tag) and the second intention tag may be the intention tags of the entire dialog.

The embodiment of the application provides an intention recognition method, which is characterized in that whether keywords in a keyword list exist in a current sentence or not is determined based on the keyword list with high recall rate of the keywords, the context information of the current sentence is positioned when the keywords exist in the current sentence, the context information is classified by using a first classification model to determine whether an intention label corresponding to the context information is a target label corresponding to the keywords, and if so, the target label corresponding to the keywords is used as a first intention label of the current sentence, so that the accuracy rate of intention recognition can be further improved under the condition of ensuring high recall rate. Moreover, the intention labels are determined in a keyword matching mode, so that the intention recognition efficiency can be improved, and the recognized intention labels are classified through the first classification model, so that the accuracy of recognition results can be improved while the efficiency is ensured. Further, the local text information is positioned through a sliding window enumeration method, and under the condition that the first intention label comprises a second-level label, the first-level label corresponding to the first intention label and the local text information are input into a second classification model, so that a second intention label is obtained. Therefore, the defect that intention recognition is carried out in a keyword matching mode can be overcome, the intention that keyword matching cannot be recognized can be recognized, and missing intention labels can be avoided; and the conflict of the scene type of the first-level label corresponding to the first-level label and the first-level label corresponding to the second-level label can be avoided, so that the possibility that the second classification model outputs the intention label which is inconsistent with the scene type related to the conversation can be reduced. In addition, through the two methods of keyword matching and sliding window enumeration, the identified intention labels can be output in real time in the conversation process of customer service and users, so that multi-label combination of the conversation is obtained, and further communication strategy support is provided for the customer service in the conversation process.

In the embodiment of the application, the types of the intention labels which can be classified by the first classification model and the second classification model are different.

For example, when constructing the keyword table, the intention labels a and B have keywords with high label recall rates, so the keyword table can be constructed based on the intention label a and the keywords with high label recall rates corresponding thereto and the intention label B and the keywords with high label recall rates corresponding thereto. Thus, when the intention recognition method is executed, the intention labels A and B can be recognized in a keyword matching manner, namely, the first classification model can be used for classifying the intention labels A and B. And the tag recall rate of the keywords corresponding to the intention tags C and D is low, so the intention tags C and D do not join the keyword table, and in order to identify the intention tags C and D in the text information when the intention recognition method is performed, the second classification model may be trained using the historical dialog samples regarding the intention tags C and D so that the second classification model may identify the intention tags C and D.

In this embodiment, the intention labels corresponding to the keywords with high label recall rate can be identified by adopting a keyword matching mode, and since the context information can be rapidly positioned by keyword matching, the efficiency and accuracy of intention identification can be improved; the intent labels corresponding to the keywords with low label recall rate can be identified by adopting a sliding window enumeration method, namely the intent labels which cannot be identified by keyword matching can be identified by adopting the sliding window enumeration method, so that missing intent labels can be avoided.

According to an embodiment of the present application, in a case where the first intention tag includes a second-level tag, determining the second-level tag as a reference intention tag includes: acquiring a first representation vector corresponding to the context information under the condition that the first intention label comprises a second-level label; acquiring a second representation vector of the local text information; calculating a similarity value between the first representation vector and the second representation vector; and determining the secondary label as the reference intention label under the condition that the similarity value is greater than or equal to a second preset threshold value.

There may be multiple sentences containing keywords in the dialogue, for each keyword, the context information of the keyword may be obtained, and the context information is input into the first classification model, so as to obtain a first classification result. Thus, there may be a plurality of first intention tags including a secondary tag when the dialog proceeds to a certain stage.

In an example, a plurality of secondary labels corresponding to the plurality of first intention labels may be determined as reference intention labels, resulting in a plurality of reference intention labels. And combining the first-level labels corresponding to the reference intention labels with the local text information and inputting the local text information into a second classification model to obtain a second intention label, wherein the first-level labels corresponding to the second intention label are not in conflict with the scene types of the first-level labels corresponding to the reference intention labels, and if the scene types are the same.

Specifically, the second classification model may obtain the representative vectors of the first-level labels corresponding to the multiple reference intention labels, obtain the representative vectors corresponding to the local text information, splice the representative vectors to obtain a spliced representative vector, and classify the spliced representative vector to obtain a second classification result.

In another example, for each first intent tag including a secondary tag, a semantic similarity value between context information and local text information corresponding to the first intent tag may be calculated. And when the semantic similarity value is greater than or equal to a second preset threshold value, determining the secondary label as a reference intention label. For example, a first representation vector corresponding to the context information and a second representation vector corresponding to the local text information may be obtained by an encoder. And calculating a similarity value between the first representation vector and the second representation vector, and determining the secondary label as a reference intention label under the condition that the similarity value is larger than or equal to a second preset threshold value. Alternatively, a maximum similarity value among a plurality of similarity values corresponding to the plurality of first intention tags (including the second-level tag) may be determined, and the second-level tag corresponding to the maximum similarity value may be determined as the reference intention tag.

Specifically, the first-level label corresponding to the reference intention label is combined with the local text information and input into a second classification model, so that a second intention label is obtained. For example, the first-level label and the local text information corresponding to the reference intention label may be input into a second classification model, and the second classification model may obtain the representative vectors of the first-level label and the local text information corresponding to the reference intention label, splice the representative vectors to obtain a spliced representative vector, and classify the spliced representative vector to obtain a second classification result. Optionally, the encoder may be used to obtain the representative vectors of the first-level labels and the local text information corresponding to the reference intention labels, and then input the representative vectors into the second classification model to splice and classify the representative vectors to obtain the second classification result.

In this embodiment, for the first intention tag including the second-level tag, by calculating a similarity value between a representation vector of the context information corresponding to the first intention tag and a representation vector corresponding to the local text information, a semantic similarity degree between the context information and the local text information may be determined. When the similarity value is larger than or equal to a second preset threshold value, the semantic similarity degree between the context information and the local text information is high, and the scene types of the first-level labels corresponding to the context information and the local text information are the same. When the similarity value is larger than or equal to a second preset threshold value, the second-level label is determined to be the reference intention label, and the first-level label and the local text information corresponding to the reference intention label are input into a second classification model to obtain the second intention label, so that the possibility that the second intention label is inconsistent with the scene type related to the conversation can be reduced, and the accuracy of the second intention label is improved.

According to an embodiment of the present application, the local text information is text information except for context information of a current sentence in complete text information of a dialogue; inputting the first-level label and the local text information to which the reference intention label belongs into a second classification model to obtain a second classification result, wherein the method comprises the following steps: acquiring a third representation vector of the first-level tag to which the reference intention tag belongs; inputting the third representative vector and the second representative vector into a second classification model; splicing the third expression vector and the second expression vector by using the second classification model to obtain a spliced expression vector; classifying the spliced representing vectors to obtain a reference classification result; and acquiring emotion recognition information of the user aiming at the reference classification result in the dialogue, and determining the reference classification result as a second classification result if the emotion recognition information is positive emotion.

The local text information is text information other than the context information of the current sentence in the complete text information of the conversation, for example, in the complete text information of the conversation, the context information may be located before or after the local text information.

The second classification model may include a coding network and a classification network.

In an example, the first class label to which the reference intention label belongs and the local text information may be input into a second classification model, a third representative vector of the first class label corresponding to the reference intention label is obtained through the encoding network, and a second representative vector corresponding to the local text information is obtained. Further, the third representation vector and the second representation vector may be spliced through the encoding network to obtain a spliced representation vector. And classifying the spliced representing vectors through a classification network to obtain a reference classification result. The reference classification result includes a corresponding intent tag.

In another example, a third representative vector of the primary label corresponding to the reference intent label and a second representative vector corresponding to the local text information may be obtained by an encoder. And inputting the third expression vector and the second expression vector into a second classification model, splicing the third expression vector and the second expression vector through a coding network of the second classification model to obtain a spliced expression vector, and classifying the spliced expression vector through a classification network of the second classification model to obtain a reference classification result. The encoder in the embodiments of the present application may be an encoding network in the first classification model, an encoding network in the second classification model, or other natural language processing model.

In an example, the reference classification result may be directly determined as the second classification result, and the intent label corresponding to the reference classification result is the second intent label.

In another example, emotion recognition information of a user for a reference classification result in a conversation may be obtained, and the emotion recognition information may be information of a question reply of the user for customer service. For example, after the second classification model outputs the reference classification result, the client may further confirm to the user whether the intention corresponding to the reference classification result is an intention actually expressed by the user. The customer service can confirm to the user through the question, and the reply information of the user aiming at the question is emotion recognition information. The emotion recognition information may be input into an emotion recognition model. The emotion recognition model can output two results, one positive emotion and one negative emotion. For example, if the user replies "yes, i mean this", the emotion recognition model outputs a positive emotion; the user's reply is "not, i am not this meaning", and the emotion recognition model outputs a negative emotion. If the emotion recognition information is positive emotion, the reference classification result may be determined as a second classification result. The emotion recognition model may be other models independent of the second classification model or may be part of the second classification model.

In this embodiment, the accuracy of the second intention label may be improved by performing intention recognition on the local text information by combining the third expression vector of the first-level label to which the reference intention label belongs and the second expression vector of the local text information. In addition, by acquiring and analyzing the emotion recognition information, and determining the reference classification result as the second classification result in the case that the emotion recognition information is positive emotion, the accuracy of the second intention label can be further improved.

According to an embodiment of the present application, obtaining context information including a current sentence and inputting the context information into a first classification model to obtain a first classification result, including: acquiring a first representation vector of the context information by using a first classification model; classifying the first expression vector to obtain a first classification result, wherein the first classification result comprises a first probability that the first expression vector belongs to the target label and a second probability that the first expression vector belongs to other labels, and when the first probability is larger than or equal to a third preset threshold value, the intention label corresponding to the current sentence is consistent with the target label.

The first classification model may include a coding network and a classification network, and after the context information is input into the first classification model, a first expression vector corresponding to the context information may be obtained through the coding network, and the first expression vector is classified by using the classification network, so as to obtain a first classification result. The first classification result includes a first probability that the first representative vector belongs to the target tag and a second probability that the first representative vector belongs to the other tags. When the first probability is greater than or equal to a third preset threshold, indicating that the intention label corresponding to the current sentence is consistent with the target label, and taking the target label as a first intention label; when the first probability is smaller than a third preset threshold, the intention label corresponding to the current sentence is inconsistent with the target label, the intention label corresponding to the current sentence can be other intention labels except the target label, or the current sentence does not contain intention. Therefore, when the first probability is smaller than the third preset threshold, the intention tag about the current sentence may not be output.

The third preset threshold may be 0.5,0.6 or other values, and the specific value may be set according to actual needs.

In this embodiment, after determining the target tag corresponding to the current sentence by means of keyword matching, the context information of the current sentence may be classified by using the first classification model, and whether the intention tag of the current sentence including the keyword is the target tag may be further confirmed, so that accuracy of intention recognition may be improved.

According to an embodiment of the application, the first classification model is obtained through training a dataset comprising positive examples and negative examples, the positive examples comprise a first sample and labeling labels of the first sample, the first sample comprises keywords, the labeling labels of the first sample are consistent with labels corresponding to the keywords, the negative examples comprise labeling labels of a second text and labeling labels of the second text, the second text comprises keywords, and the labeling labels of the second text are inconsistent with labels corresponding to the keywords. The first classification result comprises a target label corresponding to the keyword and other labels, and the other labels are used for indicating that the first classification result is inconsistent with the label corresponding to the keyword. And under the condition that the first classification result is the target label corresponding to the keyword, determining the target label corresponding to the keyword as the first intention label.

Specifically, the data set for training the first classification model includes positive examples and negative examples, where both positive examples and negative examples are samples including keywords, the samples may be text information meeting a preset length requirement, and the keywords may be keywords in a keyword table. Before training the first classification model, the sample may be subjected to intention labeling to obtain a labeling label, which may specifically be manual labeling or machine labeling.

The labeling label of the positive example sample (first sample) is consistent with the label corresponding to the keyword, that is, the text information including the keyword includes the intention consistent with the intention represented by the label corresponding to the keyword. The labeling label of the negative example (second sample) is different from the label corresponding to the keyword, that is, the text information including the keyword includes an intention that is not consistent with the intention represented by the label corresponding to the keyword. For example, historical dialog samples used in the keyword list construction process may be used as the dataset for training the first classification model.

And training the first classification model by using the positive example sample and the negative example sample, so that the first classification model can classify the context information containing the keywords to confirm whether the intention label corresponding to the context information is consistent with the target label corresponding to the keywords.

Different positive examples and negative examples can be constructed for different keywords, and the first classification model is trained by utilizing the positive examples and the negative examples corresponding to the different keywords, so that the first classification model can classify the context information containing the different keywords. In addition, aiming at different application scenes, the keyword list can be adjusted, and the corresponding relation between different labels and keywords is constructed.

In this embodiment, positive examples and negative examples are constructed by using text information including keywords, and training is performed on the first classification model by using the positive examples and the negative examples, so that the trained first classification model can further confirm intention labels including context information of the keywords in the process of intention recognition, and accuracy of intention recognition can be improved.

According to an embodiment of the present application, the intention recognition method further includes: determining at least one historical dialogue sample corresponding to each tag in a plurality of tags included in a keyword table; extracting candidate keywords from at least one historical dialog text for each tag; determining the number of samples in which candidate keywords appear in at least one historical dialog sample; determining a ratio between the number of samples and the total number of at least one historical dialog sample as a tag recall for the candidate keyword; and under the condition that the label recall rate of the candidate keywords is larger than or equal to a first preset threshold value, taking the candidate keywords as the keywords corresponding to the labels, and constructing a keyword list according to the labels and at least one keyword corresponding to each label.

Specifically, when the keyword table is constructed, one or more history dialogue samples corresponding to each tag may be collected, the history dialogue samples may be text information selected from the history dialogue data to meet a preset length requirement, for example, the length of the history dialogue samples may be consistent with the length of the context information input into the first classification model. For each tag, candidate keywords may be extracted from a plurality of historical dialog samples corresponding to the tag, and the number of candidate keywords may be one or more. For each candidate keyword, the number of samples of the plurality of historical dialog samples in which the candidate keyword appears may be counted. If the number of samples of the candidate keyword is larger, the association degree between the candidate keyword and the label is higher, in other words, if the candidate keyword is contained in the text to be identified, the intention contained in the text to be identified is highly likely to be consistent with the intention represented by the label, namely, the recall rate of the candidate keyword to the label is high. Thus, the ratio between the number of samples in the plurality of historical dialog samples in which the candidate keyword appears and the total number of the plurality of historical dialog samples may be counted, and the ratio may be determined as the tag recall rate of the candidate keyword.

In order to obtain keywords with high label recall rate to construct a keyword table, a plurality of candidate keywords can be screened. For example, a first preset threshold may be set according to a specific application scenario, and candidate keywords with a label recall rate greater than or equal to the first preset threshold may be retained as keywords corresponding to labels in the keyword table.

Taking an electric marketing scene as an example, the intention reflected by a user to customer service comprises 'bad attitudes of staff', 'log-out account number' and 'stop making a call'. For the intention label of "bad staff attitude", in the history dialogue, the keywords contained in the local context information include "attitude", "mood", and the like, and the recall rate of the keywords under dialogue data corresponding to the intention label of "bad staff attitude" is high, so that the "attitude", "mood" can be added into the keyword list as the keywords corresponding to the intention label of "bad staff attitude". According to the principle, keywords of all intention labels can be constructed, and finally a keyword vocabulary is obtained.

In this embodiment, by screening keywords with high tag recall rates from a plurality of historical dialog samples corresponding to a tag as keywords corresponding to the tag, and constructing a keyword table based on the tag and the keywords, the accuracy of subsequent intent recognition based on the keyword table can be improved, and the reliability of recognition results can be improved. In addition, the intention recognition is carried out in a keyword matching mode, so that the efficiency of the intention recognition can be improved, the intention label with high accuracy can be provided for customer service in real time in the conversation process of the customer service and the user, the customer service can know the user psychology conveniently, and the communication between the customer service and the user is promoted.

According to an embodiment of the present application, the intention recognition method further includes: updating the next sentence of the context information to the current sentence under the condition that the current sentence contains the keywords; and updating the next sentence of the current sentence into the current sentence under the condition that the current sentence does not contain keywords.

Specifically, in the case where the current sentence includes a keyword, the context information of the current sentence may be analyzed, for example, the context information may be input into the first classification model, and then a next sentence of the context information may be updated to the current sentence, and the next sentence of the context information may be a sentence expressed by the user or a sentence expressed by customer service. And repeatedly determining whether keywords in a keyword list exist in the current sentence, if so, repositioning the context information of the current sentence, and analyzing the context information. By such iteration, the intention expressed by the user can be identified in real time in the conversation process. Since the context information has been analyzed, keyword matching is performed on sentences following the context information, and thus the intention recognition efficiency can be improved.

Under the condition that the current sentence does not contain keywords, the next sentence of the current sentence can be updated to be the current sentence, and the next sentence of the current sentence can be a sentence expressed by a user or a sentence expressed by customer service. When the current sentence does not contain the keywords in the keyword list, the current sentence is possibly boring content, so the current sentence does not contain the keywords; it is also possible that the words in the current sentence belong to keywords with low tag recall and are therefore difficult to match with keywords in the keyword table, but at this time the current sentence may contain user intent. Although the intent tag corresponding to the current sentence is not recalled in a keyword matching manner, the intent tag corresponding to the current sentence can be recalled in a sliding window enumeration manner in the embodiment of the application.

Therefore, the embodiment of the application can ensure the high-efficiency and high-accuracy identification of part of the intention labels in a keyword matching mode, and can also avoid missing the intention labels which cannot be identified in keyword matching in a sliding window enumeration mode.

Optionally, when the current sentence contains the keyword, the next sentence of the current sentence can be updated to the current sentence, so that the comprehensiveness of the intention label recognition can be further improved, and missing of the intention label is avoided.

According to an embodiment of the present application, the intention recognition method further includes: under the condition that the first intention label does not comprise the second-level label, inputting the local text information into a second classification model to obtain a third classification result, wherein the third classification result comprises a third intention label and a score of the third intention label; and taking the first intention label and the third intention label as the intention labels of the conversation when the score of the third intention label is larger than or equal to a fourth preset threshold value, wherein the fourth preset threshold value is used for representing the reliability of the third intention label.

In the case that the first intention label does not include the second-level label, it is indicated that the first intention label includes only the first-level label, and the first-level label does not include the sub-label, that is, the first-level label does not relate to a case that the first-level label is different (e.g., conflicts) from other first-level labels in the scene type, in which case, the local text information may be directly input into the second classification model, to obtain the third classification result.

Specifically, the second classification model can be trained by using text information with a certain length and labeled labels, so that the second classification model can classify local text information and obtain the intention labels corresponding to the local text information. The length of the text information used to train the second classification model may be consistent with the length of the local text information.

In an example, the third classification result output by the second classification model may be a third intent tag, which may be stored as a final intent tag.

In another example, the third classification result output by the second classification model may be the third intent label and a score for the third intent label, i.e., the reliability of the third intent label needs to be determined from the score. For example, there are a plurality of intention labels that can be identified by the second classification model, and a preset threshold (fourth preset threshold) may be set for each intention label, where the preset threshold may be used to characterize the reliability, or difficulty, of identifying the intention label, and the smaller the fourth preset threshold, the greater the difficulty of identifying the intention label. Thus, a threshold value can be set for each intention label identifiable by the second classification model according to the actual situation. Alternatively, the preset threshold corresponding to each intention label may be determined according to data in the training process of the second classification model, for example, the fourth preset threshold is obtained through multi-threshold training. Specifically, on the verification set, the model is continuously trained according to the importance of the service index, such as the requirement of the service on the accuracy, so that the accuracy of each category on the verification set is optimal, and after the training is finished, the local optimal threshold of each label is obtained, namely, a fourth preset threshold is obtained.

When the second classification model outputs the intention label and the score of the intention label, the output intention label can be regarded as a pending intention label, the score of the pending intention label is compared with a preset threshold corresponding to the pending intention label, and if the score is greater than or equal to the preset threshold, the pending intention label can be regarded as a reliable intention label, and the intention label is stored; if the score is less than the preset threshold, the pending intent label may be considered an unreliable intent label, which is discarded. For example, the score of the output intention label of the second classification model is 0.7, and the preset threshold corresponding to the intention label is 0.5, and since 0.7 is greater than 0.5, the intention label is a reliable intention label, and the intention label can be used as the intention label of the dialog.

In this embodiment, by determining the threshold value of each intention label identifiable by the second classification model in advance, after the second classification model obtains the intention label and the score corresponding to the intention label, it is determined whether the score of the intention label is greater than or equal to the corresponding threshold value, if so, the intention label is determined to be the intention label of the local text information, otherwise, the intention label may be discarded, so that the reliability of the finally obtained intention label may be improved.

According to an embodiment of the present application, the intention recognition method further includes: inputting response information of a user corresponding to the question sentence in the dialogue to the first intention label, the second intention label and/or the third intention label into an emotion recognition model to obtain a recognition result; and when the identification result is affirmative, determining the first intention label, the second intention label and/or the third intention label as the target intention label.

In particular, the computing device may present the intent label derived based on the first classification model and/or the intent label derived based on the second classification model to the customer service in order for the customer service to further confirm to the user whether the intent corresponding to the derived intent label is a real intent of the user. For example, the customer service may ask a question to the user according to the obtained intent corresponding to the intent tag, and the computing device may determine whether the intent tag is a target intent tag, that is, whether the intent tag is an intent tag corresponding to the dialog according to response information of the user to the question corresponding to the intent tag. If the user's answer is affirmative, the intent tag is a target intent tag, and if the user's answer is negative, the intent tag is not a target intent tag and may be discarded.

In a practical application scenario, there may be problems that some sentences expressed by a user are not direct enough or are not clear enough, or accuracy of the classification model itself is limited, and the problems may cause that the intention labels output by the classification model are not accurate enough. In this embodiment, by further confirming the reliability of the intention label to the user, the accuracy and reliability of the target intention label can be improved.

For example, the emotion recognition model may be used to recognize the response information of the user to obtain a recognition result, and when the recognition result is positive, the first intention label, the second intention label and/or the third intention label are determined to be target intention labels. Specifically, the training samples of the emotion recognition model may be a general expression of "affirmative" (affirmative data sample) and a general expression of "negative" (negative data sample). For example, a positive data sample is "pair, i say what is meant," such a sample may be classified as a positive example; negative data samples are "not, i say not this meaning", and such samples may be classified as negative examples. The emotion recognition model can be obtained by training the model by constructing positive and negative example data sets.

In one scenario, the intent label output by the classification model (the first or second classification model) is "log-off account", and customer service feels that the intent label is low in reliability, and the user can be further confirmed whether the intent label is accurate or not. If the customer service can ask the user whether to confirm the cancellation account, then the emotion recognition model can judge emotion of the response information of the user, and then the user can output affirmative or negative. If the emotion recognition model outputs positive, the account cancellation is indicated to be a target intention label; if the emotion recognition model outputs "negative", it indicates that "cancellation account" is not the target intention label and can be discarded.

Since there are a plurality of expressions of "affirmative" and "negative", in this embodiment, whether the answer information of the user is an affirmative answer or a negative answer is recognized by the emotion recognition model, the accuracy of identifying the affirmative emotion and the negative emotion of the user can be improved, and the reliability of the obtained target intention label can be further improved.

According to an embodiment of the present application, after the first intention label and the second intention label are used as the intention labels of the dialog, the intention recognition method further includes: acquiring complete text information of a dialogue, wherein the length of the complete text information is larger than or equal to a fifth preset threshold value; inputting the complete text information into a long text multi-intention recognition model to obtain a long text intention label; the first intention label, the second intention label and the long text intention label are used as intention labels of the dialog.

After the dialogue is finished, the whole content (complete text information) of the dialogue can be input into a long text multi-intention recognition model, and the long text multi-intention recognition model can recognize intention spanning a large length in the complete text information.

Specifically, the context information located by the keywords and the local text information located by the sliding window enumeration are both local contents in the full-through dialogue, so that the two ways cannot cover the intention labels crossing long distances in the text information. That is, for an intention tag that needs text information based on a long distance, both of the above cannot be recognized. In order to avoid missing the intention labels crossing long distances in the text information, the intention recognition of the complete text information can be carried out by utilizing a long-text multi-intention recognition model after the passing is finished to obtain the long-text intention labels. The long text intent labels identified by the long text multi-intent recognition model may be zero, one, or multiple, depending on the number of intents spanning long distances that are actually contained in the complete text information to be recognized.

For example, the user may explain how long the payment date is at the beginning of the conversation, a lot of sentences are spaced in the middle, and finally, the user may propose to the customer service that the payment date needs to be changed, and for the intention label which can be recognized only over a large length, the manner of performing intention recognition based on the context information of keyword positioning (which may be regarded as task one) and the manner of performing intention recognition based on the local text information of sliding window enumeration positioning (which may be regarded as task two) cannot cover the scene, so that the complete text information of the whole conversation may be analyzed through the long text multi-intention recognition model to obtain the long text intention label "change payment date".

The long text multi-intention recognition model can be obtained by training the model by utilizing a plurality of historical complete dialogue samples, and the length of the historical complete dialogue samples can be larger than or equal to a fifth preset threshold value, so that the trained long text multi-intention recognition model can carry out intention recognition on long text information with the length larger than or equal to the fifth preset threshold value. The plurality of historical complete dialogue samples can be marked with different intention labels, so that the trained long text multi-intention recognition model can recognize intention of long text information containing different intention. For example, the long text multi-intent recognition model may be a Longformer or other natural language processing model.

In this embodiment, the intention recognition is performed on the complete text information obtained after the dialogue is ended by using the long text multi-intention recognition model, so that the intention label crossing the long distance can be obtained, the label recall rate of the long text information is improved, and the missing of the intention label is avoided.

In some embodiments, when the local text information is located before the current sentence containing the first intent tag, only the local text information may be input into the second classification model to obtain the corresponding intent tag. After the dialogue is ended, further judgment can be carried out on all obtained intention labels, and whether contradictory intention labels exist in the intention labels or not is judged. If the intention labels exist, the intention labels corresponding to the local text information can be abandoned, and the intention labels obtained through the keyword matching mode are reserved, so that the overall accuracy of all finally obtained intention labels can be improved.

For example, the intent tag in the embodiment of the present application may be a two-stage tag, where the depth of the tag is 2, and the one-stage tag may represent a scene where the user problem is located; the secondary labels may represent specific intent in the corresponding scenario. According to the first-level label, whether the two intention labels contradict or not can be directly judged. For example, the first-level label of the intention label obtained by the first task is "repayment in advance", the first-level label of the intention label obtained by the second task is "repayment in delay", and the two scenes are contradictory, so that the second intention label "repayment in delay" can be abandoned, and the "repayment in advance" is reserved. The contradictory intention labels in the present embodiment may be set in advance according to the actual scene situation.

Fig. 3 is a flowchart illustrating an intention recognition method according to another exemplary embodiment of the present application. The embodiment of fig. 3 is an example of the embodiment of fig. 2, and for avoiding repetition, reference is made to the description of the above embodiment for the same point, and no further description is given here. As shown in fig. 3, the intention recognition method includes the following.

300: dialogue information is received.

Specifically, sentences expressed by the user and customer service can be received in real time.

310: and determining whether the current sentence in the dialogue contains keywords in the keyword list according to the keyword list.

The keyword list comprises a plurality of tags and at least one keyword corresponding to each tag, the tag recall rate of each keyword in the keyword list is larger than or equal to a first preset threshold value, and the tag recall rate is used for indicating the recall reliability of each keyword to the tag corresponding to each keyword.

If the current sentence contains keywords in the keyword table, step 320 is performed, otherwise step 350 is performed.

320: and locating the context information containing the current sentence according to the keywords.

330: and classifying the context information by using the first classification model to obtain a first classification result, and outputting the target label as a first intention label in real time when the first classification result is the target label corresponding to the keyword.

The context information meets the requirement of a preset length, and the first classification result is used for indicating whether the intention label corresponding to the context information is consistent with the target label corresponding to the keyword. For example, the first classification result may be a target tag, which may be a first intent tag; or the first classification result may be other labels, where the other labels indicate that the intent label corresponding to the context information is inconsistent with the target label corresponding to the keyword, and no label may be output according to the first classification result.

Specifically, when the customer service considers that the accuracy of the first intention label is not high, the user may be asked in a back. And inputting response information of the user aiming at the back question into an emotion recognition model to confirm whether the first intention label is a reliable intention label, if so, reserving, otherwise, discarding.

340: and updating the next sentence of the context information to the current sentence.

Step 310 is repeatedly performed.

350: and updating the next sentence of the current sentence into the current sentence.

Step 310 is repeatedly performed.

360: and sequentially positioning the local text information in the dialogue according to the preset step length and the preset window size.

370: when the first intention label comprises a second-level label, determining the second-level label as a reference intention label, inputting a first-level label and local text information to which the reference intention label belongs into a second classification model, and when the first intention label does not comprise the second-level label, inputting the local text information into the second classification model, and outputting a pending intention label and a score corresponding to the pending intention label by the second classification model.

Specifically, a first intention label is identified before the local text information, and if the first intention label comprises a second-level label, the second-level label is determined to be a reference intention label, and the first-level label and the local text information to which the reference intention label belongs are input into a second classification model; in the case where the first intention tab does not include a secondary tab, the local text information is input into the second classification model. In the case that the first intention label is not recognized before the local text information, the local text information is input into the second classification model. The output of the second classification model includes a score corresponding to the pending intent label. The determination process of the reference intention label and the classification process of the second classification model can be referred to the description in the above embodiment.

380: and judging whether the score corresponding to the to-be-determined intention label is larger than or equal to a preset threshold value.

If the score corresponding to the pending intent label is greater than or equal to the preset threshold, step 390 is executed, otherwise step 360 is repeatedly executed.

390: outputting the pending intention label in real time.

Specifically, when the customer service considers that the undetermined intent tag accuracy is not high, the user may be asked in return. And inputting response information of the user aiming at the back question into an emotion recognition model to confirm whether the undetermined intention label is a reliable intention label, if so, reserving, otherwise, discarding.

391: at the end of the dialog, the complete text information is entered into a long text multi-intent recognition model to obtain a long text intent tag.

Steps 310-350 may be considered as task one and steps 360-390 may be considered as task two, both of which continue as the dialog progresses. When the current sentence is the last sentence in the dialogue, the intention recognition process of the task one can be ended; the intent recognition process of task two above may be ended when the local text message already contains the last sentence in the conversation. And summarizing the intention labels output by the task I and the task II, and transmitting the intention labels output by the long text multi-intention recognition model to downstream business. Further, in the electricity sales scenario, the plurality of intent labels output in the summary can be checked by a human customer service to determine whether to apply the labels to the final work order. For example, the manual service may adjust the aggregated intent labels according to the actual situation.

Exemplary apparatus

Fig. 4 is a schematic structural diagram of an intention recognition device 400 according to an exemplary embodiment of the present application. As shown in fig. 4, the intention recognition apparatus 400 includes: a determination module 410, a first classification module 420, and a second classification module 430.

A determining module 410, configured to determine, according to a keyword table, whether a current sentence in a dialogue includes keywords in the keyword table, where the keyword table includes a plurality of tags and at least one keyword corresponding to each tag, and a tag recall rate of each keyword in the keyword table is greater than or equal to a first preset threshold; the first classification module 420 is configured to determine, when the current sentence includes a keyword, a target tag corresponding to the keyword, obtain context information including the current sentence, input the context information to the first classification model, obtain a first classification result, where the first classification result is used to determine whether an intention tag corresponding to the current sentence is a target tag, and take the target tag as the first intention tag when the intention tag corresponding to the current sentence is the target tag; the second classification module 430 is configured to sequentially locate local text information in the dialogue according to a preset step size and a preset window size, determine that the second-level tag is a reference intention tag if the first intention tag includes the second-level tag, input the first-level tag to which the reference intention tag belongs and the local text information into the second classification model to obtain a second classification result, where the second classification result includes the second intention tag, and the first-level tag corresponding to the second intention tag is the same as the first-level tag corresponding to the first intention tag in terms of scene type, and the types of the intention tags classified by the first classification model and the second classification model are different; the determining module 410 is further configured to take the first intent tag and the second intent tag as intent tags of the dialog.

The embodiment of the application provides an intention recognition device, which determines whether keywords in a keyword list exist in a current sentence or not based on the keyword list with high recall rate of the keywords, locates the context information of the current sentence when the keywords exist in the current sentence, classifies the context information by using a first classification model to determine whether an intention label corresponding to the context information is a target label corresponding to the keywords, and if so, takes the target label corresponding to the keywords as the first intention label of the current sentence, so that the accuracy rate of intention recognition can be further improved under the condition of ensuring high recall rate. Moreover, the intention labels are determined in a keyword matching mode, so that the intention recognition efficiency can be improved, and the recognized intention labels are classified through the first classification model, so that the accuracy of recognition results can be improved while the efficiency is ensured. Further, the local text information is positioned through a sliding window enumeration method, and under the condition that the first intention label comprises a second-level label, the first-level label corresponding to the first intention label and the local text information are input into a second classification model, so that a second intention label is obtained. Therefore, the defect that intention recognition is carried out in a keyword matching mode can be overcome, the intention that keyword matching cannot be recognized can be recognized, and missing intention labels can be avoided; and the conflict of the scene type of the first-level label corresponding to the first-level label and the first-level label corresponding to the second-level label can be avoided, so that the possibility that the second classification model outputs the intention label which is inconsistent with the scene type related to the conversation can be reduced. In addition, through the two methods of keyword matching and sliding window enumeration, the identified intention labels can be output in real time in the conversation process of customer service and users, so that multi-label combination of the conversation is obtained, and further communication strategy support is provided for the customer service in the conversation process.

According to an embodiment of the present application, the second classification module 430 is configured to: acquiring a first representation vector corresponding to the context information under the condition that the first intention label comprises a second-level label; acquiring a second representation vector of the local text information; calculating a similarity value between the first representation vector and the second representation vector; and determining the secondary label as the reference intention label under the condition that the similarity value is greater than or equal to a second preset threshold value.

According to an embodiment of the present application, the local text information is text information except for context information of the current sentence in the complete text information of the dialogue, and the second classification module 430 is configured to: acquiring a third representation vector of the first-level tag to which the reference intention tag belongs; inputting the third representative vector and the second representative vector into a second classification model; splicing the third expression vector and the second expression vector by using the second classification model to obtain a spliced expression vector; classifying the spliced representing vectors to obtain a reference classification result; and acquiring emotion recognition information of the user aiming at the reference classification result in the dialogue, and determining the reference classification result as a second classification result if the emotion recognition information is positive emotion.

According to an embodiment of the present application, the first classification module 420 is configured to: acquiring a first representation vector of the context information by using a first classification model; classifying the first expression vector to obtain a first classification result, wherein the first classification result comprises a first probability that the first expression vector belongs to the target label and a second probability that the first expression vector belongs to other labels, and when the first probability is larger than or equal to a third preset threshold value, the intention label corresponding to the current sentence is consistent with the target label.

According to an embodiment of the present application, the intention recognition device 400 further comprises a construction module for: determining at least one historical dialogue sample corresponding to each tag in a plurality of tags included in a keyword table; extracting candidate keywords from at least one historical dialog text for each tag; determining the number of samples in which candidate keywords appear in at least one historical dialog sample; determining a ratio between the number of samples and the total number of at least one historical dialog sample as a tag recall for the candidate keyword; and under the condition that the label recall rate of the candidate keywords is larger than or equal to a first preset threshold value, taking the candidate keywords as the keywords corresponding to the labels, and constructing a keyword list according to the labels and at least one keyword corresponding to each label.

According to an embodiment of the present application, the second classification module 430 is further configured to: and under the condition that the first intention label does not comprise the second-level label, inputting the local text information into the second classification model to obtain a third classification result, wherein the third classification result comprises the third intention label and the score of the third intention label. The determining module 410 is further configured to: and taking the first intention label and the third intention label as the intention labels of the conversation when the score of the third intention label is larger than or equal to a fourth preset threshold value, wherein the fourth preset threshold value is used for representing the reliability of the third intention label.

According to an embodiment of the present application, the intention recognition device 400 further includes an acquisition module, after taking the first intention tag and the second intention tag as intention tags of the dialog, the acquisition module is configured to: acquiring complete text information of a dialogue, wherein the length of the complete text information is larger than or equal to a fifth preset threshold value; and inputting the complete text information into a long text multi-intention recognition model to obtain a long text intention label. The determining module 410 is configured to take the first intent label, the second intent label, and the long text intent label as intent labels for the dialog.

It should be understood that the operations and functions of the determining module 410, the first classifying module 420, the second classifying module 430, the constructing module, and the obtaining module in the above embodiments may refer to the description in the intent recognition method provided in the above embodiment of fig. 2 or fig. 3, and are not repeated herein for the sake of avoiding repetition.

Fig. 5 is a block diagram of an electronic device 500 for performing an intent recognition method according to an exemplary embodiment of the present application.

Referring to fig. 5, an electronic device 500 includes a processor 510; and memory resources represented by memory 520 for storing instructions, such as application programs, executable by processor 510. The application program stored in memory 520 may include one or more modules each corresponding to a set of instructions. Further, the processor 510 is configured to execute instructions to perform the intent recognition method described above.

The electronic device 500 may also include a power component configured to perform power management of the electronic device 500, a wired or wireless network interface configured to connect the electronic device 500 to a network, and an input output (I/O) interface 530. The electronic device 500 may be operated based on an operating system stored in the memory 520, such as Windows Server ^TM ，Mac OS X ^TM ，Unix ^TM ，Linux ^TM ，FreeBSD ^TM Or the like.

A non-transitory computer readable storage medium, which when executed by a processor of the electronic device 500, enables the electronic device 500 to perform a method of intent recognition.

All the above optional solutions may be combined arbitrarily to form an optional embodiment of the present application, which is not described here in detail.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program verification codes.

It should be noted that in the description of the present application, the terms "first," "second," "third," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more.

The foregoing description of the preferred embodiments of the present invention is not intended to limit the invention to the precise form disclosed, and any modifications, equivalents, and alternatives falling within the spirit and principles of the present invention are intended to be included within the scope of the present invention.

Claims

1. An intent recognition method, comprising:

determining whether a current sentence in a dialogue contains keywords in a keyword list according to the keyword list, wherein the keyword list comprises a plurality of labels and at least one keyword corresponding to each label, and the label recall rate of each keyword in the keyword list is greater than or equal to a first preset threshold;

determining a target label corresponding to the keyword when the current sentence contains the keyword, acquiring context information containing the current sentence, inputting the context information into a first classification model to obtain a first classification result, wherein the first classification result is used for determining whether an intention label corresponding to the current sentence is the target label, and taking the target label as a first intention label when the intention label corresponding to the current sentence is the target label;

Sequentially positioning local text information in the dialogue according to a preset step length and a preset window size;

when the first intention label comprises a second-level label, determining that the second-level label is a reference intention label, inputting a first-level label to which the reference intention label belongs and the local text information into a second classification model to obtain a second classification result, wherein the second classification result comprises a second intention label, the scene types of the first-level label corresponding to the second intention label and the scene type of the first-level label corresponding to the first intention label are the same, and the types of the intention labels classified by the first classification model and the second classification model are different;

and taking the first intention label and the second intention label as intention labels of the dialog.

2. The intent recognition method as claimed in claim 1, wherein in case the first intent tag includes a secondary tag, determining the secondary tag as a reference intent tag includes:

acquiring a first representation vector corresponding to the context information under the condition that the first intention label comprises a second-level label;

acquiring a second representation vector of the local text information;

Calculating a similarity value between the first representation vector and the second representation vector;

and determining the secondary label as the reference intention label under the condition that the similarity value is larger than or equal to a second preset threshold value.

3. The intention recognition method according to claim 2, wherein the local text information is text information other than the context information of the current sentence in the complete text information of the dialogue; inputting the first-level label to which the reference intention label belongs and the local text information into a second classification model to obtain a second classification result, wherein the method comprises the following steps:

acquiring a third representation vector of the primary label to which the reference intention label belongs;

inputting the third representation vector and the second representation vector into the second classification model;

splicing the third representation vector and the second representation vector by using the second classification model to obtain a spliced representation vector;

classifying the spliced representing vectors to obtain a reference classification result;

and acquiring emotion recognition information of the user aiming at the reference classification result in the dialogue, and determining the reference classification result as a second classification result if the emotion recognition information is positive emotion.

4. The method for recognizing intention according to claim 1, wherein the obtaining the context information including the current sentence and inputting the context information into a first classification model, obtaining a first classification result, comprises:

acquiring a first representation vector of the context information by using the first classification model;

classifying the first expression vector to obtain the first classification result, wherein the first classification result comprises a first probability that the first expression vector belongs to the target label and a second probability that the first expression vector belongs to other labels, and when the first probability is greater than or equal to a third preset threshold value, the intention label corresponding to the current sentence is consistent with the target label.

5. The intent recognition method as claimed in claim 1, wherein the method further comprises:

determining at least one historical dialogue sample corresponding to each tag in a plurality of tags included in the keyword table;

extracting candidate keywords from the at least one historical dialog text for each tag;

determining the number of samples of the at least one historical dialog sample in which the candidate keyword appears;

Determining a ratio between the number of samples and the total number of the at least one historical dialog samples as a tag recall for the candidate keyword;

and under the condition that the label recall rate of the candidate keywords is larger than or equal to the first preset threshold, taking the candidate keywords as keywords corresponding to the labels, and constructing the keyword list according to the labels and at least one keyword corresponding to each label.

6. The intent recognition method as claimed in claim 1, wherein the method further comprises:

inputting the local text information into a second classification model under the condition that the first intention label does not comprise a second-level label, and obtaining a third classification result, wherein the third classification result comprises a third intention label and a score of the third intention label;

and taking the first intention label and the third intention label as the intention labels of the dialog when the score of the third intention label is larger than or equal to a fourth preset threshold, wherein the fourth preset threshold is used for representing the reliability of the third intention label.

7. The intention recognition method of any one of claims 1 to 6, wherein after the first intention tag and the second intention tag are taken as intention tags of the conversation, the method further comprises:

Acquiring complete text information of the dialogue, wherein the length of the complete text information is greater than or equal to a fifth preset threshold value;

inputting the complete text information into a long text multi-intention recognition model to obtain a long text intention label;

the first intention label, the second intention label and the long text intention label are used as intention labels of the dialogue.

8. An intent recognition device, comprising:

a determining module, configured to determine, according to a keyword table, whether a current sentence in a dialogue includes keywords in the keyword table, where the keyword table includes a plurality of tags and at least one keyword corresponding to each tag, and a tag recall rate of each keyword in the keyword table is greater than or equal to a first preset threshold;

the first classification module is used for determining a target label corresponding to the keyword when the current sentence contains the keyword, acquiring context information containing the current sentence, inputting the context information into a first classification model to obtain a first classification result, wherein the first classification result is used for determining whether an intention label corresponding to the current sentence is the target label, and taking the target label as a first intention label when the intention label corresponding to the current sentence is the target label;

The second classification module is used for sequentially positioning the local text information in the dialogue according to a preset step length and a preset window size, determining the second-level label as a reference intention label under the condition that the first intention label comprises the second-level label, inputting a first-level label to which the reference intention label belongs and the local text information into a second classification model to obtain a second classification result, wherein the second classification result comprises a second intention label, the scene types of the first-level label corresponding to the second intention label and the scene type of the first-level label corresponding to the first intention label are the same, and the types of the intention labels classified by the first classification model and the second classification model are different;

the determining module is further configured to take the first intent tag and the second intent tag as intent tags of the dialog.

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions,

wherein the processor is configured to perform the intent recognition method as claimed in any one of the preceding claims 1 to 7.

10. A computer-readable storage medium, characterized in that the storage medium stores a computer program for executing the intention recognition method according to any one of the preceding claims 1 to 7.