CN113343677B

CN113343677B - Intention identification method and device, electronic equipment and storage medium

Info

Publication number: CN113343677B
Application number: CN202110597354.9A
Authority: CN
Inventors: 马丹; 曾增烽
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2021-05-28
Filing date: 2021-05-28
Publication date: 2023-04-07
Anticipated expiration: 2041-05-28
Also published as: CN113343677A

Abstract

The embodiment of the application relates to the field of artificial intelligence and discloses an intention identification method, an intention identification device, electronic equipment and a storage medium, wherein the intention identification method comprises the following steps: performing sentence division processing on each training sentence in the training data set to obtain a plurality of training sub-sentences of each training sentence; adding a first intention role label to each training sub-sentence, and determining a second intention role label of each training sentence according to the first intention role label of each training sub-sentence; training the sequence labeling model according to the training data set and the second intention role label of each training sentence to obtain an intention identification model; and acquiring a sentence to be recognized, inputting the sentence to be recognized into the intention recognition model, and obtaining a third intention role label of a plurality of sub sentences to be recognized of the sentence to be recognized. A plurality of intentions contained in one sentence can be effectively identified, and the intention identification efficiency is effectively improved. The present application relates to blockchain techniques, such as the above data may be written into blockchains for use in scenarios such as intent recognition.

Description

Intention identification method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to an intention recognition method and apparatus, an electronic device, and a storage medium.

Background

With the rapid development of computer technology, the professional terms in various professional fields are continuously increased, a large number of professional terms are generated in the development of related work of enterprises, the work of term labeling is also generated along with the occurrence of the large number of the professional terms, the intention types of the professional terms are identified according to the labeling, and when the terms need to be labeled, a manual operation method is generally adopted. Professional staff are needed to label the professional terms, but the operation is quite troublesome, time and labor are wasted, errors are prone to occur during labeling, and the working efficiency cannot be improved.

Disclosure of Invention

The embodiment of the application provides an intention identification method, an intention identification device, electronic equipment and a storage medium, which can effectively identify a plurality of intents contained in one sentence, intelligently decompose one sentence into sentences with different intents, and identify the intention type in each sentence, so that the intention identification efficiency is effectively improved. Each intention of the user can be completely restored, so that the user experience is improved.

In a first aspect, an embodiment of the present application discloses an intention identifying method, where the method includes:

acquiring a training data set, wherein the training data set comprises a plurality of training sentences, and performing sentence division processing on each training sentence in the training data set to obtain a plurality of training sub-sentences of each training sentence;

adding a first intention role label to each training sub-sentence in the plurality of training sub-sentences, and determining a second intention role label corresponding to each training sentence according to the first intention role label of each training sub-sentence;

training a sequence labeling model according to the training data set and a second intention role label corresponding to each training sentence in the training data set to obtain an intention identification model;

obtaining a statement to be recognized, inputting the statement to be recognized into the intention recognition model, and obtaining a third intention role label of a plurality of sub-statements to be recognized corresponding to the statement to be recognized, where the third intention role label is used for indicating an intention type, and the intention type includes one or more of an object intention type, an action intention type, a situation intention type, and a question intention type.

In a second aspect, an embodiment of the present application discloses an intention identification apparatus, including:

the sentence dividing unit is used for acquiring a training data set, wherein the training data set comprises a plurality of training sentences, and performing sentence dividing processing on each training sentence in the training data set to obtain a plurality of training sub-sentences of each training sentence;

a determining unit, configured to add a first intention role label to each of the plurality of training sub-sentences, and determine, according to the first intention role label of each training sub-sentence, a second intention role label corresponding to each training sentence;

the training unit is used for training the sequence labeling model according to the training data set and a second intention role label corresponding to each training sentence in the training data set to obtain an intention identification model;

the recognition unit is configured to acquire a statement to be recognized, input the statement to be recognized into the intention recognition model, and obtain a third intention role tag of a plurality of sub statements to be recognized corresponding to the statement to be recognized, where the third intention role tag is used to indicate an intention type, and the intention type includes one or more of an object intention type, an action intention type, a situation intention type, and a question intention type.

In a third aspect, an embodiment of the present application discloses an electronic device, including a processor and a memory, where the memory is used for storing a computer program, the computer program includes program instructions, and the processor is configured to call the program instructions to execute the method of the first aspect.

In a fourth aspect, embodiments of the present application disclose a computer-readable storage medium storing a computer program, the computer program comprising program instructions that, when executed by a processor, cause the processor to perform the method of the first aspect.

In the embodiment of the application, the electronic device may obtain a training data set including a plurality of training sentences, and perform sentence division processing on each training sentence in the training data set to obtain a plurality of training sub-sentences of each training sentence. Then, a first intention role label is added to each training sub-sentence in the multiple training sub-sentences, and a second intention role label corresponding to each training sentence is determined according to the first intention role label of each training sub-sentence. And then, training the sequence labeling model according to the training data set and a second intention role label corresponding to each training sentence in the training data set to obtain an intention identification model. Furthermore, the to-be-recognized sentence can be recognized by using the intention recognition model, so as to obtain a third intention role tag of a plurality of to-be-recognized sub-sentences corresponding to the to-be-recognized sentence, wherein the third intention role tag can be used for indicating the intention type. By implementing the method, a plurality of intentions contained in one sentence can be effectively identified, the sentence with different intentions is intelligently decomposed into the sentences with different intentions, and the intention type in each sentence is identified, so that the intention identification efficiency is effectively improved. Each intention of the user can be completely restored, so that the user experience is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flowchart of an intention identification method provided in an embodiment of the present application;

FIG. 2 is a schematic flow chart diagram of another method for identifying intentions provided by embodiments of the present application;

FIG. 3 is a schematic structural diagram of an intention identifying apparatus provided in an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, of the embodiments of the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an intention identification method according to an embodiment of the present application. The intention identifying method described in this embodiment is applied to an electronic device, and may be executed by the electronic device, where the electronic device may be a server or a terminal. As shown in fig. 1, the intention identifying method includes the steps of:

s101: and acquiring a training data set, and performing sentence splitting processing on each training sentence in the training data set to obtain a plurality of training sub-sentences of each training sentence.

In one implementation, a training data set used for training a sequence annotation model may be obtained, so as to label a sequence with the model according to the training data set, thereby obtaining a trained sequence annotation model, where the trained sequence annotation model is an intention recognition model. The training data set may include a plurality of training sentences, and one training sentence may be a question sentence. For example, in a search tool, a user may perform a related search using the search system, the user may input a voice input, or manually input a question, such as "account password is forgotten, roll-out transaction fails, and how to do", and then the search tool may search for a solution corresponding to the question. That is, a historical search record may be obtained in the search engine, and a record containing a question may be obtained from the historical search record, where the obtained question is the above-described training sentence, and then the training data set may be obtained. The training data set may be obtained by other methods, such as an intelligent question answering robot, and the dialogue system may include questions and answers, wherein the questions are the training data set to be obtained. Alternatively, the training data set may be a data set in a specific field, which may be a bank field, an insurance field, or the like, for example, a question commonly found in an intelligent question and answer robot used in the bank field may be obtained. After the training data set is obtained, sentence division processing may be performed on each training sentence in the training data set to obtain a plurality of training sub-sentences of each training sentence.

In one implementation, the following description takes an example in which any one of a plurality of training sentences obtains a plurality of training sub-sentences. For any training sentence in a plurality of training sentences in the training data set, sentence division processing can be performed on the training sentence according to a target sentence division rule to obtain a plurality of initial training sub-sentences of the training sentence. Optionally, the target sentence dividing rule may refer to that the training sentences are divided according to the punctuation marks, the initial training sub-sentences may be any combination of the respective sentences, and may be combined according to the original sequence of the respective sentences in the training sentences when combined. For example, the training sentences are "account password forgotten, roll-out transaction failed, and how, and the training sentences are divided according to commas to obtain 3 clauses (" account password forgotten "," roll-out transaction failed "," how to do "), and then the 3 clauses are arbitrarily combined to obtain a plurality of initial training sub-sentences, such as" account password forgotten "," account password forgotten, how to do "," roll-out transaction failed, and how to do ", and the like. Considering that a plurality of initial training sub-sentences can be obtained when one training sentence is subjected to sentence splitting processing, when a subsequent sequence labeling model is trained by using the training sentence under the condition that the number of the initial training sub-sentences is large, the model effect of the intention recognition model can be influenced. Then, it can be considered that a preset threshold is preset to filter a plurality of initial training sub-sentences. Specifically, a first number of initial training sub-sentences may be determined, and it is determined whether the first number of initial training sub-sentences exceeds a preset threshold. If the first number of the initial training sub-sentences exceeds the preset threshold, a second number of the initial training sub-sentences may be deleted from the plurality of initial training sub-sentences to obtain a plurality of training sub-sentences of the training sentences, the second number being a difference between the first number and the preset threshold. If the first number of initial training sub-sentences does not exceed the preset threshold, the plurality of initial training sub-sentences may be determined as a plurality of training sub-sentences of the training sentence.

For example, if the first number of initial training sub-sentences of a certain training sentence is 5 and the preset threshold is 2, it is known that if the first number of initial training sub-sentences 5 exceeds the preset threshold 2, 3 (the second number = 5-2) initial training sub-sentences may be deleted from 5 initial training sub-sentences, and 2 initial training sub-sentences obtained by deleting the initial training sub-sentences are the multiple training sub-sentences of the training sentence. Assuming that the first number of initial training sub-sentences of a certain training sentence is 2 and the preset threshold is 2, it is known that the first number of initial training sub-sentences 2 is equal to the preset threshold 2, that is, the 2 initial training sub-sentences are a plurality of training sub-sentences of the training sentence.

Optionally, when the screening is performed, an initial training sub-sentence with good quality may be retained, and the quality may be determined by whether the initial training sub-sentence is a question sentence, for example, the "forgotten account password" merely states a problem but is not a question sentence, then the initial training sub-sentence may be deleted, and for example, the "forgotten account password," how to do "is to ask how to process the forgotten account password in case of the account password, and the" failed transfer-out transaction, "how to do" is to ask how to process the transfer-out transaction in case of the failure, it can be understood that the 2 initial training sub-sentences are all a question sentence, and then the quality may be retained.

In one implementation, considering that there may be an abnormal sentence in a training sentence in the acquired training data set, the abnormal sentence may be understood as that the sentence has no question, i.e. the training sentence is not a question, e.g., "rolling out a transaction failure" is stating a problem and not a question. In this case, after the training data set is obtained, the training data set may be filtered to ensure that all training sentences in the filtered training data set are question sentences, so as to improve the reliability of training data in subsequent model training and further improve the accuracy of intent recognition using the model. Specifically, it is possible to train whether each training sentence in the data set is an abnormal sentence, i.e., a training sentence that is not a question sentence. If the detection result is that the training sentence is an abnormal sentence, filtering the training sentence; if the detection result is that the training sentence is not an abnormal sentence, a step of performing sentence splitting processing on each training sentence in the training data set to obtain a plurality of training sub-sentences of each training sentence can be executed.

S102: and adding a first intention role label to each training sub-sentence in the plurality of training sub-sentences, and determining a second intention role label corresponding to each training sentence according to the first intention role label of each training sub-sentence.

In one implementation, for any training sub-sentence in the plurality of training sub-sentences, a plurality of intent types included in the training sub-sentence may be determined, and optionally, the intent types may include the following four types: an object (Arg for short), an Action (Act for short), a status (promlem for short), and a Question (Question for short). Among them, arg, act, pro, and Que can be used as an intention type tag or a target intention type tag in the following description. After determining the plurality of intention types included in the training sub-sentence, the training sub-sentence may be divided according to the plurality of intention types to obtain a plurality of segment sentences. Wherein each segment statement may correspond to one of a plurality of intent types, and the corresponding intent type of each segment statement is different. Then, the position of each word in each segment sentence in the segment sentence may be determined, so as to determine a fourth intention role label corresponding to each word in the training sub-sentence according to the intention type corresponding to each segment sentence and the position of each word in each segment sentence in the segment sentence, where the fourth intention role label may include an intention type label and a position label, and then, according to the fourth intention role label corresponding to each word, the first intention role label of the training sub-sentence may be determined.

In one implementation, for any training sentence in the plurality of training sentences, a third number of the plurality of training sub-sentences included in the training sentence may be determined, and a sequence number label may be determined according to the third number, where the sequence number label is used to indicate different training sub-sentences, for example, the third number is 2, and the sequence number label may be 1, 2, and 12, where 1 may indicate that a word is in the first training sub-sentence, 2 may indicate that a word is in the second training sub-sentence, and 12 may indicate that a word is in the first training sub-sentence and the second training sub-sentence. After the sequence number label is determined, a second intention role label corresponding to the training sentence can be determined according to the sequence number label and the first intention role label of each training sub-sentence in the training sentence.

S103: and training the sequence labeling model according to the training data set and a second intention role label corresponding to each training sentence in the training data set to obtain an intention identification model.

In one implementation, the sequence labeling model may include a feature extraction module and an intention recognition module, and the training data set may be input into the feature extraction module, so as to perform feature extraction on the training data set through the feature extraction module, thereby obtaining feature vector data corresponding to the training data set, where the feature vector data may include feature vectors corresponding to words in each training sentence. Optionally, the feature extraction layer is a BERT layer (which may be understood as a BERT model) as an example, the input of the BERT model is a sentence using a Chinese character as a unit, the output of the BERT model is a feature vector of each character in the sentence, and the feature vector is a character vector trained from each character. Then, after the feature vector data is obtained, the feature vector data may pass through an intention identification module to obtain a predicted intention role label corresponding to each training statement of the training data set, so that model parameters of the sequence annotation model may be optimized according to the predicted intention role label corresponding to each training statement and the second intention role label corresponding to each training statement to obtain an intention identification model.

In an implementation manner, the specific implementation steps of optimizing the model parameters of the sequence labeling model according to the predicted intention role label corresponding to each training sentence and the second intention role label corresponding to each training sentence to obtain the intention recognition model may be: and calculating the gradient of the loss function according to the predicted intention role label and the second intention role label, wherein the loss function is not limited in the application. And then, carrying out parameter updating on the model parameters of the sequence labeling model according to the gradient of the loss function, detecting whether the loss function meets a preset convergence condition, and stopping the parameter updating of the model parameters when the loss function is detected to meet the preset convergence condition, so that an intention identification model can be obtained. The preset convergence condition may be that the gradient of the loss function is smaller than a preset threshold, or that the weight change between two iterations is already small and smaller than a preset threshold, or that the iteration number of the model reaches a preset maximum iteration number, and when any one of the above conditions is met, the training of the sequence labeling model may be stopped.

S104: and acquiring a sentence to be recognized, inputting the sentence to be recognized into the intention recognition model, and obtaining a third intention role label of a plurality of sub sentences to be recognized corresponding to the sentence to be recognized.

In one implementation, a sentence to be recognized may be obtained, and the sentence to be recognized is input into an intention recognition model, so as to obtain candidate intention role labels corresponding to each word in the sentence to be recognized, and after obtaining the candidate intention role labels corresponding to each word, a third intention role label of a plurality of intention sentences corresponding to the sentence to be recognized may be determined according to the candidate intention role labels, where the third intention role label may be used to indicate an intention type. Optionally, the candidate intended role tag corresponding to each word may include a target sequence number tag, a target location tag, and a target role type tag. For example, if a certain sentence to be recognized is "account password forgotten, roll-out transaction failed, and how to do", the candidate intention role tags corresponding to each word in the sentence to be recognized may be "account B-Arg1", "user I-Arg1", "secret I-Arg1", "code I-Arg1", "forget B-Pro1", "note I-Pro1", "", O "," roll-B-Arg 2"," out I-Arg2"," cross I-Arg2"," easy I-Arg2"," lose B-Pro2"," lose I-Pro2"," ", O", "how B-Que12", "how I-Que12", and "do I-Que12", as follows, after passing through the intention recognition model. For example, the above "account B-Arg1", and the candidate intention character tag corresponding to the "account" word is "B-Arg1", where "B" is a target position tag, "Arg" is a target character type tag, and "1" is a target sequence number tag.

Then, a plurality of sub-sentences to be recognized of the sentence to be recognized may be determined according to the target sequence number tag corresponding to each word. Alternatively, the words with the target sequence number label belonging to the same type may be combined into one sub-sentence to be recognized, for example, all the words with the target sequence number label of 1 may be determined as a first sub-sentence to be recognized, and all the words with the target sequence number label of 2 may be determined as a second sub-sentence to be recognized. When the target sequence number tag of a word is not a single number, for example, "what B-Que12" and "what" target sequence number tag is "12" in the above example, this means that the word can be combined with all words with target sequence number tag of 1 as a first sub-sentence to be recognized, or can be combined with all words with target sequence number tag of 2 as a second sub-sentence to be recognized. Then, according to the target serial number tag corresponding to each word, it can be obtained that the two sub-sentences to be identified of the sentence to be identified are respectively "what is done if the account password is forgotten", and "what is done if the roll-out transaction fails". After a plurality of sub-sentences to be identified of the sentence to be identified are determined, the third intention role label of the sub-sentence to be identified can be determined according to the target position label and the target role type label of each word in the sub-sentence to be identified.

In an implementation manner, for any one of the obtained sub-sentences to be identified, a specific implementation manner of determining the third intention role label of the sub-sentence to be identified is as follows. A plurality of target segmented sentences of the sub-sentences to be recognized may be determined according to the target location tag corresponding to each word in the sub-sentences to be recognized. The target position label is represented by B and I in the candidate intention character labels of each word, wherein the 'B' represents that an element is at the beginning of a certain fragment, and the 'I' represents that the element is at the middle position of the certain fragment. Then, a plurality of target segmented sentences can be obtained according to the target position label corresponding to each word in the sub-sentences to be identified. For example, the sub-sentence to be identified is "account password forgotten, how to do", wherein the candidate intention role labels corresponding to each word are "account B-Arg1", "account I-Arg1", "secret I-Arg1", "code I-Arg1", "forget B-Pro1", "remember I-Pro1", "what B-Que12", "how I-Que12", and "do I-Que12", respectively. According to the target position label included by the candidate intention role label corresponding to each word, the candidate intention role label can be divided into 3 target segmentation sentences, and the 3 target segmentation sentences are respectively 'account password', 'forgotten' and 'what do'. In the same way, the sub-sentences to be identified may be also divided into 3 target segment sentences, where the sub-sentences to be identified are "transaction out failure, and" how "and the 3 target segment sentences are" transaction out failure "and" how ".

Then, after obtaining a plurality of target segment sentences of the sub-sentences to be identified, the third intention role labels of the sub-sentences to be identified can be determined according to the target intention type label corresponding to each target segment sentence. Optionally, for any target segmented sentence in the plurality of target segmented sentences, the target intention type tag of the target segmented sentence may be determined according to the target intention type tag corresponding to each word in the target segmented sentence. The target intention type label corresponding to each word is the target intention type label of the target segmented sentence. For example, the target segmentation statement is "account password", and the candidate labels corresponding to each word are "account B-Arg1", "user I-Arg1", "secret I-Arg1", and "code I-Arg1", respectively. The target character type labels included by the candidate intention character label corresponding to each word are all "Arg", and then the target intention type label of the target segmented sentence is "Arg". After the target intention type label of each target segmented sentence in the sub-sentence to be identified is obtained, the third intention role label of the sub-sentence to be identified can be determined according to the target intention type label of each target segmented sentence. Wherein the third intent role tag can be used to indicate an intent type, and the intent type can include one or more of an object intent type, an action intent type, a situation intent type, a query intent type. And the third intention role label of the sub-sentence to be identified may comprise a target intention type label of each target segment sentence in all the sub-sentences to be identified. For example, the third intention role labels of the multiple sub-sentences to be identified corresponding to the sentences to be identified are account passwords: arg, pro, how to do Que; roll-out transaction: arg, failure Pro, how do Que.

In the embodiment of the application, a plurality of intentions contained in one sentence can be effectively identified, the sentence with different intentions is intelligently decomposed into the sentences with different intentions, and the intention type in each sentence is identified, so that the intention identification efficiency is effectively improved. Each intention of the user can be completely restored, so that the user experience is improved.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an intention identification method according to an embodiment of the present application. The intention identifying method described in this embodiment is applied to an electronic device, and may be executed by the electronic device, where the electronic device may be a server or a terminal. As shown in fig. 2, the intention identifying method includes the steps of:

s201: and acquiring a training data set, and performing sentence splitting processing on each training sentence in the training data set to obtain a plurality of training sub-sentences of each training sentence.

S202: for any of a plurality of training sub-sentences, a plurality of intent types included in the training sub-sentence is determined.

S203: and dividing the training sub-sentences according to the plurality of intention types to obtain a plurality of segment sentences.

Wherein, the intention types can include the following four intention types: object intent type, action intent type, condition intent type, query intent type.

In steps S202-S203, the intent type included in each training sub-sentence may be determined according to the above four intent types, and one training sub-sentence may include multiple intent types, for example, for the training sub-sentence "account password forgot, roll-out transaction failed, and do what" a training sub-sentence "account password forgot and do what". The "account password" in the training sub-sentence is the object in the sentence, "forgets" is the status in the sentence, "what does" is the question in the sentence, and then the plurality of intention types included in the training sub-sentence are the object, status, question, respectively. After determining the plurality of intent types included in the training sub-sentence, the training sub-sentence may be divided according to the plurality of intent types to obtain a plurality of segment sentences. Each fragment statement corresponds to one intention type in a plurality of intention types, and the corresponding intention types of each fragment statement are different. For example, in the above description, the training sub-sentence "account password forgotten, the intent type of" account password "in what" is the target, "forgotten" is the target, and "what" is the target. Then, the training sub-sentences may be divided according to the intent type to obtain 3 segment sentences, where the 3 segment sentences are "account password", "forgotten" and "what to do", respectively.

S204: the position of each word in each segment sentence in the segment sentence is determined.

In an implementation manner, it is considered that the present application may add a label to a training sentence in a BIO labeling manner to obtain a second intention role label corresponding to the training sentence in a subsequent step. Wherein, BIO labeling means that each element is labeled as "B-X", "I-X" or "O". Wherein "B-X" indicates that the fragment in which the element is located belongs to X type and that the element is at the beginning of the fragment, "I-X" indicates that the fragment in which the element is located belongs to X type and that the element is in the middle of the fragment, and "O" indicates that the element does not belong to any type. Then, the position of each word in each segment sentence in the segment sentence needs to be determined, so as to determine the position tag according to the position, wherein the position tag can be B, I as described above. For example, the position of "account" in the segment statement "account password" is the beginning of the segment statement, and the positions of "user", "password", and "code" in the "account password" are the middle of the segment statement.

S205: and determining a fourth intention role label corresponding to each word in the training sub-sentence according to the intention type corresponding to each segment sentence and the position of each word in each segment sentence in the segment sentence.

S206: and determining a first intention character label of the training sub-sentence according to a fourth intention character label corresponding to each word.

In steps S205-S206, the fourth intent role tag can include an intent type tag and a location tag. A fourth intention role label may be added to each word using BIO notation, which refers to labeling each element as "B-X", "I-X", or "O". Wherein "B-X" indicates that the fragment in which the element is located belongs to X type and the element is at the beginning of the fragment, "I-X" indicates that the fragment in which the element is located belongs to X type and the element is in the middle position of the fragment, and "O" indicates that the fragment does not belong to any type. X in the above can be the intention type tag in the application, and B and I can be position tags. Optionally, the fourth intention role label of each word in each segment statement may be determined according to the corresponding intention type of each segment statement and the position of each word in each segment statement in the segment statement. For example, taking the fragment sentence "account password" as an example, if the step S202 knows that the intention type of "account password" is the target, then the intention type tag of the fragment sentence is Arg; as can be seen from step S204, if the position of "account" in the segment statement is the beginning of the segment statement, the position label of "account" is B, and if the positions of "user", "password", and "code" in the account password are the middle of the segment statement, the position labels of "user", "password", and "code" are all I. Then, the fourth intended role label corresponding to each word in the segment statement "account password" is: "E.N.B. -Arg", "Hu I-Arg", "Mi I-Arg", "code I-Arg". Through the method, the fourth intention role label of each word in each segment sentence can be determined, and then the fourth intention role label corresponding to each word in the training sub-sentence can be determined. And the fourth intention role label corresponding to each word in the training sub-sentence is the fourth intention role label of each word in each segment sentence. For example, the fourth intention role labels corresponding to each word in the training sub-sentence "account password is forgotten and how is" account B-Arg "," user I-Arg "," secret I-Arg "," code I-Arg "," forget B-Pro "," remember I-Pro "," how B-quee "," how I-quee ", and" do I-quee ". Then, the first intended character label of the training sub-sentence is determined after determining the fourth intended character label corresponding to each word. Wherein the first intention character label of the training sub-sentence can be a fourth intention character label corresponding to each word in the training sub-sentence.

S207: and determining a second intention role label corresponding to each training sentence according to the first intention role label of each training sub-sentence.

In one implementation, for any training sentence in the multiple training sentences, a third number of multiple training sub-sentences included in the training sentence is determined, and a sequence number label is determined according to the third number, where the sequence number label is used to indicate different training sub-sentences, for example, the third number is 2, and the sequence number label may be 1, 2, and 12, where 1 may indicate that a word is in a first training sub-sentence, 2 may indicate that a word is in a second training sub-sentence, and 12 may indicate that a word is in the first training sub-sentence and the second training sub-sentence. After the sequence number label is determined, a second intention role label corresponding to the training sentence can be determined according to the sequence number label and the first intention role label of each training sub-sentence in the training sentence. Optionally, a sequence number label may be added on the basis of the first intention role label of each training sub-sentence, so as to obtain a second intention role label corresponding to the training sentence. For example, the first training sub-sentence "account password is forgotten," how to do "is the first intention role label of" account B-Arg, "" family I-Arg, "" secret I-Arg, "" code I-Arg, "" forget B-Pro, "" remember I-Pro, "" what B-quee, "" how I-quee, "and" do I-quee. The first intention role labels of the second training sub-sentence 'failure in roll-out transaction and how to do' are 'roll-to-B-Arg', 'out-I-Arg', 'exchange-I-Pro', 'how B-Que', 'how I-Que' and 'do I-Que'.

Then, the serial number label is added on the basis of the first intention role label, so that a training sentence of 'account password forgotten, and roll-out transaction fails' can be obtained, and the second intention role labels of 'account B-Arg 1', 'account I-Arg 1', 'secret I-Arg 1', 'code I-Arg 1', 'forget B-Pro 1', 'remember I-Pro 1', 'roll-B-Arg 2', 'out I-Arg 2', 'trade I-Arg 2', 'easy I-Arg 2', 'lose B-Pro 2', 'lose I-Pro 2', 'how B-Que 12', 'how I-Que 12', and 'do I-Que 12'. For example, a "1" in "B-Arg1" indicates that the word is present in the first training sub-sentence, a "2" in "B-Arg 2" indicates that the word is present in the second training sub-sentence, and a "2" in "I-Que 12" indicates that the word is present in the first training sub-sentence and the second training sub-sentence.

S208: and training the sequence labeling model according to the training data set and a second intention role label corresponding to each training sentence in the training data set to obtain an intention identification model.

S209: and acquiring a sentence to be recognized, inputting the sentence to be recognized into the intention recognition model, and obtaining a third intention role label of a plurality of sub sentences to be recognized corresponding to the sentence to be recognized.

For specific implementation of steps S201 and S208 to S209, reference may be made to the detailed description of steps S101 and S103 to S104 in the foregoing embodiment, and details are not repeated here.

In the embodiment of the application, an intention identification model can be built by utilizing a large amount of marking data, the model is simple to build, the intention identification model can be applied to a multi-intention scene, a plurality of intentions contained in a sentence can be effectively identified, the marking of intention types in the multi-intention scene does not depend on common tools in the natural language processing field such as word segmentation and the like, the sentence can be intelligently decomposed into sentences with different intentions, the intention type in each sentence can be identified, and the intention identification efficiency is effectively improved. Each intention of the user can be completely restored, so that the user experience is improved.

Referring to fig. 3, fig. 3 is a schematic structural diagram of an intention identifying apparatus according to an embodiment of the present application, where the intention identifying apparatus includes:

a sentence dividing unit 301, configured to obtain a training data set, where the training data set includes a plurality of training sentences, and perform sentence dividing processing on each training sentence in the training data set to obtain a plurality of training sub-sentences of each training sentence;

a determining unit 302, configured to add a first intention role label to each of the plurality of training sub-sentences, and determine, according to the first intention role label of each training sub-sentence, a second intention role label corresponding to each training sentence;

a training unit 303, configured to train a sequence labeling model according to the training data set and a second intention role label corresponding to each training sentence in the training data set, so as to obtain an intention identification model;

an identifying unit 304, configured to obtain a statement to be identified, and input the statement to be identified into the intention identification model, to obtain a third intention role tag of a plurality of sub statements to be identified corresponding to the statement to be identified, where the third intention role tag is used to indicate an intention type, and the intention type includes one or more of an object intention type, an action intention type, a situation intention type, and a question intention type.

In an implementation manner, the sentence dividing unit 301 is specifically configured to:

for any training sentence in the training sentences in the training data set, performing sentence division processing on the training sentences according to a target sentence division rule to obtain a plurality of initial training sub-sentences of the training sentences;

determining a first number of the initial training sub-sentences, and judging whether the first number of the initial training sub-sentences exceeds a preset threshold value;

if the first number of the initial training sub-sentences exceeds a preset threshold, deleting a second number of the initial training sub-sentences from the plurality of initial training sub-sentences to obtain a plurality of training sub-sentences of the training sentences, wherein the second number is a difference value between the first number and the preset threshold;

and if the first number of the initial training sub-sentences does not exceed the preset threshold, determining the plurality of initial training sub-sentences as a plurality of training sub-sentences of the training sentence.

In an implementation manner, the determining unit 302 is specifically configured to:

determining a plurality of intent types included in the training sub-sentence for any of the plurality of training sub-sentences;

dividing the training sub-sentences according to the plurality of intention types to obtain a plurality of segment sentences, wherein each segment sentence corresponds to one intention type in the plurality of intention types, and the intention types corresponding to the segment sentences are different;

determining the position of each word in each section statement in the section statement;

determining a fourth intention role label corresponding to each word in the training sub-sentence according to the intention type corresponding to each segment sentence and the position of each word in each segment sentence in the segment sentence, wherein the fourth intention role label comprises an intention type label and a position label;

and determining a first intention role label of the training sub-sentence according to a fourth intention role label corresponding to each word.

determining a third number of a plurality of training sub-sentences included in the training sentence aiming at any training sentence in the plurality of training sentences, and determining the sequence number labels according to the third number, wherein the sequence number labels are used for indicating different training sub-sentences;

and determining a second intention role label corresponding to the training sentence according to the sequence number label and the first intention role label of each training sub-sentence in the training sentence.

In one implementation, the sequence annotation model includes a feature extraction module and an intent recognition module; the training unit 303 is specifically configured to:

passing the training data set through the feature extraction module to obtain feature vector data corresponding to the training data set, wherein the feature vector data comprises feature vectors corresponding to words in each training statement;

enabling the feature vector data to pass through the intention identification module to obtain a predicted intention role label corresponding to each training statement of the training data set;

and optimizing the model parameters of the sequence labeling model according to the predicted intention role label corresponding to each training statement and the second intention role label corresponding to each training statement to obtain an intention identification model.

In an implementation manner, the identifying unit 304 is specifically configured to:

inputting the sentence to be recognized into the intention recognition model to obtain candidate intention role labels corresponding to each word in the sentence to be recognized, wherein the candidate intention role labels comprise a target sequence number label, a target position label and a target intention type label;

determining a plurality of sub-sentences to be identified of the sentences to be identified according to the target sequence number label corresponding to each word;

and determining a third intention role label of the sub-sentence to be identified according to the target position label and the target intention type label of each word in the sub-sentence to be identified aiming at any sub-sentence to be identified in the plurality of sub-sentences to be identified.

determining a plurality of target segmented sentences of the sub-sentences to be identified according to the target position tags corresponding to each word in the sub-sentences to be identified;

for any target segmented sentence in the target segmented sentences, determining a target intention type label of the target segmented sentence according to a target intention type label corresponding to each word in the target segmented sentence;

and determining the target intention type label of each target subsection statement in the sub-statement to be identified as a third intention role label of the sub-statement to be identified.

It can be understood that the functions of the functional units of the intent recognition apparatus described in the embodiment of the present application may be specifically implemented according to the method in the embodiment of the method described in fig. 1 or fig. 2, and the specific implementation process thereof may refer to the description related to the embodiment of the method in fig. 1 or fig. 2, which is not described herein again.

In the embodiment of the present application, a sentence dividing unit 301 obtains a training data set, where the training data set includes a plurality of training sentences, and performs sentence dividing processing on each training sentence in the training data set to obtain a plurality of training sub-sentences of each training sentence; the determining unit 302 adds a first intention role label to each of the multiple training sub-sentences, and determines a second intention role label corresponding to each training sentence according to the first intention role label of each training sub-sentence; the training unit 303 trains a sequence labeling model according to the training data set and a second intention role label corresponding to each training sentence in the training data set to obtain an intention identification model; the identification unit 304 obtains a to-be-identified statement, and inputs the to-be-identified statement into the intention identification model, to obtain a third intention role tag of a plurality of to-be-identified sub-statements corresponding to the to-be-identified statement, where the third intention role tag is used to indicate an intention type, and the intention type includes one or more of an object intention type, an action intention type, a situation intention type, and a question intention type. A plurality of intentions contained in one statement can be effectively identified, and the intention identification efficiency is effectively improved.

Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. The electronic device includes: a processor 401, a memory 402, and a network interface 403. Data may be exchanged between the processor 401, the memory 402, and the network interface 403.

The Processor 401 may be a Central Processing Unit (CPU), and may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 402 may include both read-only memory and random access memory, and provides program instructions and data to the processor 401. A portion of the memory 402 may also include random access memory. Wherein the processor 401, when calling the program instruction, is configured to perform:

and obtaining a statement to be recognized, inputting the statement to be recognized into the intention recognition model, and obtaining a third intention role tag of a plurality of sub statements to be recognized corresponding to the statement to be recognized, wherein the third intention role tag is used for indicating an intention type, and the intention type comprises one or more of an object intention type, an action intention type, a condition intention type and a question intention type.

In one implementation, the processor 401 is specifically configured to:

determining, for any of the plurality of training sub-sentences, a plurality of intent types included in the training sub-sentence;

In one implementation, the processor 401 is specifically configured to:

determining a third number of a plurality of training sub-sentences included in the training sentence aiming at any training sentence in the plurality of training sentences, and determining the sequence number label according to the third number, wherein the sequence number label is used for indicating different training sub-sentences;

In one implementation, the sequence annotation model includes a feature extraction module and an intent recognition module; the processor 401 is specifically configured to:

In one implementation, the processor 401 is specifically configured to:

inputting the sentence to be recognized into the intention recognition model to obtain a candidate intention role label corresponding to each word in the sentence to be recognized, wherein the candidate intention role label comprises a target sequence number label, a target position label and a target intention type label;

determining a plurality of sub sentences to be identified of the sentence to be identified according to the target sequence number label corresponding to each word;

In one implementation, the processor 401 is specifically configured to:

determining a plurality of target subsection sentences of the sub-sentences to be identified according to the target position tags corresponding to the words in the sub-sentences to be identified;

and determining the target intention type label of each target subsection sentence in the sub-sentence to be identified as a third intention role label of the sub-sentence to be identified.

In a specific implementation, the processor 401 and the memory 402 described in this embodiment of the present application may execute the implementation manner described in the intent recognition method provided in fig. 1 or fig. 2 in this embodiment of the present application, and may also execute the implementation manner of the intent recognition apparatus described in fig. 3 in this embodiment of the present application, which is not described herein again.

In this embodiment of the application, the processor 401 may obtain a training data set, where the training data set includes a plurality of training sentences, and perform sentence division processing on each training sentence in the training data set to obtain a plurality of training sub-sentences of each training sentence; adding a first intention role label to each training sub-sentence in the plurality of training sub-sentences, and determining a second intention role label corresponding to each training sentence according to the first intention role label of each training sub-sentence; training a sequence labeling model according to the training data set and a second intention role label corresponding to each training sentence in the training data set to obtain an intention identification model; obtaining a statement to be recognized, inputting the statement to be recognized into the intention recognition model, and obtaining a third intention role label of a plurality of sub-statements to be recognized corresponding to the statement to be recognized, where the third intention role label is used for indicating an intention type, and the intention type includes one or more of an object intention type, an action intention type, a situation intention type, and a question intention type. A plurality of intentions contained in one statement can be effectively identified, and the intention identification efficiency is effectively improved.

The embodiment of the present application further provides a computer-readable storage medium, in which program instructions are stored, and when the program is executed, some or all of the steps of the intention identification method in the corresponding embodiment of fig. 1 or fig. 2 may be included.

It should be noted that, for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the order of acts described, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a program, and the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, read-Only memories (ROMs), random Access Memories (RAMs), magnetic or optical disks, and the like.

It is emphasized that the data may also be stored in a node of a blockchain in order to further ensure the privacy and security of the data. The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The intention recognition method, the intention recognition device, the electronic device, and the storage medium provided by the embodiments of the application are described in detail above, and the principles and embodiments of the application are explained herein by applying specific examples, and the description of the embodiments is only used to help understand the method and the core ideas of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An intent recognition method, comprising:

acquiring a training data set, wherein the training data set comprises a plurality of training sentences, each training sentence in the training data set is subjected to clause processing, and each clause obtained by the clause processing is combined randomly to obtain a plurality of initial training sub-sentences of each training sentence; screening the plurality of initial training sub-sentences of each training sentence to obtain a plurality of training sub-sentences of each training sentence;

adding a first intention role label to each training sub-sentence in the plurality of training sub-sentences, and determining a second intention role label corresponding to each training sentence according to the first intention role label of each training sub-sentence; the second intention role label corresponding to any training sentence is obtained by adding a sequence number label on the basis of the first intention role label of each training sub-sentence of any training sentence, wherein the sequence number label is used for indicating different training sub-sentences in any training sentence;

obtaining a statement to be recognized, inputting the statement to be recognized into the intention recognition model, and obtaining a third intention role label of a plurality of sub-statements to be recognized corresponding to the statement to be recognized, wherein the third intention role label is used for indicating an intention type, and the intention type comprises one or more of an object intention type, an action intention type, a situation intention type and a question intention type; the step of inputting the sentence to be recognized into the intention recognition model to obtain a third intention role label of a plurality of sub sentences to be recognized corresponding to the sentence to be recognized includes: inputting the sentence to be recognized into the intention recognition model to obtain candidate intention role labels corresponding to each word in the sentence to be recognized, wherein the candidate intention role labels comprise a target sequence number label, a target position label and a target intention type label; determining a plurality of sub-sentences to be identified of the sentences to be identified according to the target sequence number label corresponding to each word; and aiming at any sub-sentence to be identified in the plurality of sub-sentences to be identified, determining a third intention role tag of the sub-sentence to be identified according to the target position tag and the target intention type tag of each word in the sub-sentence to be identified.

2. The method according to claim 1, wherein the sentence division processing is performed on each training sentence in the training data set, and the sentences obtained by the sentence division processing are arbitrarily combined to obtain a plurality of initial training sub-sentences of each training sentence; screening the plurality of initial training sub-sentences of each training sentence to obtain a plurality of training sub-sentences of each training sentence, including:

for any training sentence in the training sentences in the training data set, carrying out sentence division processing on the training sentences according to a target sentence division rule, and randomly combining the sentences obtained by the sentence division processing to obtain a plurality of initial training sub-sentences of the training sentences;

3. The method of claim 1 or 2, wherein the adding a first intent role label to each of the plurality of training sub-sentences comprises:

4. The method of claim 3, wherein the determining a second intention role label corresponding to each training sentence according to the first intention role label of each training sub-sentence comprises:

5. The method of claim 1, wherein the sequence annotation model comprises a feature extraction module and an intent recognition module; the training a sequence labeling model according to the training data set and a second intention role label corresponding to each training sentence in the training data set to obtain an intention recognition model, comprising:

6. The method according to claim 1, wherein the determining the third intention role label of the sub-sentence to be identified according to the target location label and the target intention type label of each word in the sub-sentence to be identified comprises:

and determining a third intention role label of the sub-sentence to be identified according to the target intention type label of each target subsection sentence in the sub-sentence to be identified.

7. An intention recognition device, comprising:

a sentence dividing unit, configured to obtain a training data set, where the training data set includes a plurality of training sentences, perform sentence dividing processing on each training sentence in the training data set, and arbitrarily combine each sentence obtained through the sentence dividing processing to obtain a plurality of initial training sub-sentences of each training sentence; screening the plurality of initial training sub-sentences of each training sentence to obtain a plurality of training sub-sentences of each training sentence;

a determining unit, configured to add a first intention role label to each of the plurality of training sub-sentences, and determine, according to the first intention role label of each training sub-sentence, a second intention role label corresponding to each training sentence; the second intention role label corresponding to any training statement is obtained by adding a sequence number label on the basis of the first intention role label of each training sub-statement of any training statement, wherein the sequence number label is used for indicating different training sub-statements in any training statement;

the recognition unit is used for acquiring a statement to be recognized, inputting the statement to be recognized into the intention recognition model, and obtaining a third intention role tag of a plurality of sub-statements to be recognized corresponding to the statement to be recognized, wherein the third intention role tag is used for indicating an intention type, and the intention type comprises one or more of an object intention type, an action intention type, a situation intention type and a question intention type; the step of inputting the sentence to be recognized into the intention recognition model to obtain a third intention role label of a plurality of sub sentences to be recognized corresponding to the sentence to be recognized includes: inputting the sentence to be recognized into the intention recognition model to obtain candidate intention role labels corresponding to each word in the sentence to be recognized, wherein the candidate intention role labels comprise a target sequence number label, a target position label and a target intention type label; determining a plurality of sub-sentences to be identified of the sentences to be identified according to the target sequence number label corresponding to each word; and determining a third intention role label of the sub-sentence to be identified according to the target position label and the target intention type label of each word in the sub-sentence to be identified aiming at any sub-sentence to be identified in the plurality of sub-sentences to be identified.

8. An electronic device comprising a processor, a memory, wherein the memory is configured to store a computer program comprising program instructions, and wherein the processor is configured to invoke the program instructions to perform the method of any of claims 1-6.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the method according to any one of claims 1-6.