CN115827831A - Intention recognition model training method and device - Google Patents

Intention recognition model training method and device Download PDF

Info

Publication number
CN115827831A
CN115827831A CN202210841809.1A CN202210841809A CN115827831A CN 115827831 A CN115827831 A CN 115827831A CN 202210841809 A CN202210841809 A CN 202210841809A CN 115827831 A CN115827831 A CN 115827831A
Authority
CN
China
Prior art keywords
text
intention
target
dialog
recognition model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210841809.1A
Other languages
Chinese (zh)
Inventor
阎覃
张天宇
孙子钧
赵薇
柳景明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Feixiang Xingxing Technology Co ltd
Original Assignee
Beijing Feixiang Xingxing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Feixiang Xingxing Technology Co ltd filed Critical Beijing Feixiang Xingxing Technology Co ltd
Priority to CN202210841809.1A priority Critical patent/CN115827831A/en
Publication of CN115827831A publication Critical patent/CN115827831A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The present specification provides an intention recognition model training method and apparatus, wherein the intention recognition model training method includes: acquiring a target dialog text, and segmenting the target dialog text into at least two text segments; inputting at least two text segments into an initial intention recognition model, wherein the initial intention recognition model comprises a coding unit, a pooling unit and a prediction unit; coding at least two text segments through a coding unit to obtain text characteristics, wherein the text characteristics comprise segment characteristics corresponding to each text segment; performing pooling processing on text features containing the segment features through a pooling unit, and converting the pooling processing result by using a prediction unit to obtain and output prediction intention category information; and performing parameter adjustment on the initial intention recognition model based on the standard intention category information and the prediction intention category information corresponding to the target dialog text until the intention recognition model meeting the training conditions is obtained.

Description

Intention recognition model training method and device
Technical Field
The present disclosure relates to the field of machine learning technologies, and in particular, to an intention recognition model training method and apparatus.
Background
With the development of computer technology, deep learning has deepened into various service scenes of modern life and becomes an important support technology for various interconnection services. The method has different degrees of application in daily scenes such as searching, recommending, intelligent customer service, text processing, automatic driving and the like. The intention recognition is used as a processing technology for different downstream services, the recognition accuracy determines the processing mode of the downstream services, for example, in a course recommendation scene, the requirement degree of course purchase of a user can be determined by analyzing the conversation content recommended by the course, and the course recommendation is performed on the basis of the requirement degree, so that the user with higher requirement degree can be hit, and the user with lower requirement degree can be prevented from being disturbed; it follows that the accuracy of the intent recognition determines the extent to which downstream traffic is pushed. In the prior art, most of the intention recognition is realized based on models, but the intention recognition can be realized by the influence of model architecture and samples, but the accuracy is low, and the intention recognition cannot be completed from the global information of the conversation, so an effective scheme is needed to solve the problems.
Disclosure of Invention
In view of this, embodiments of the present specification provide an intention recognition model training method. The present specification also relates to an intention recognition model training apparatus, an intention recognition method, an intention recognition apparatus, a computing device, and a computer-readable storage medium, which are used to solve the technical defects in the prior art.
According to a first aspect of embodiments of the present specification, there is provided an intention recognition model training method, including:
acquiring a target dialog text, and segmenting the target dialog text into at least two text segments;
inputting the at least two text segments to an initial intent recognition model, wherein the initial intent recognition model comprises an encoding unit, a pooling unit, and a prediction unit;
encoding the at least two text segments through the encoding unit to obtain text characteristics, wherein the text characteristics comprise segment characteristics corresponding to each text segment;
performing pooling processing on the text features containing the segment features through the pooling unit, converting the result of the pooling processing by using the prediction unit, and acquiring and outputting prediction intention category information;
and performing parameter adjustment on the initial intention recognition model based on the standard intention category information and the prediction intention category information corresponding to the target dialog text until an intention recognition model meeting training conditions is obtained.
According to a second aspect of embodiments of the present specification, there is provided an intention recognition model training apparatus including:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is configured to acquire a target dialog text and segment the target dialog text into at least two text segments;
an input module configured to input the at least two text segments to an initial intent recognition model, wherein the initial intent recognition model includes an encoding unit, a pooling unit, and a prediction unit;
the encoding module is configured to encode the at least two text segments through the encoding unit to obtain text features, wherein the text features include segment features corresponding to each text segment;
the pooling module is configured to pool the text features containing the segment features through the pooling unit, convert the pooling processing result by using the prediction unit, obtain and output prediction intention type information;
and the training module is configured to perform parameter adjustment on the initial intention recognition model based on the standard intention category information and the prediction intention category information corresponding to the target dialog text until an intention recognition model meeting the training condition is obtained.
According to a third aspect of embodiments of the present specification, there is provided an intention identification method including:
acquiring a to-be-processed dialog text of a related target user in a target service;
the dialog text to be processed is divided into at least two text segments to be processed;
inputting the at least two text segments to be processed into an intention recognition model in the method for processing to obtain intention category information corresponding to the dialog text to be processed;
and determining the participation intention of the target user in the target business according to the intention category information.
According to a fourth aspect of embodiments herein, there is provided an intention recognition apparatus including:
the text acquisition module is configured to acquire a to-be-processed dialog text of a related target user in the target service;
the text segmentation module is configured to segment the dialog text to be processed into at least two text segments to be processed;
the input model module is configured to input the at least two text segments to be processed into the intention recognition model in the method for processing, and intention category information corresponding to the dialog text to be processed is obtained;
an intention determining module configured to determine an intention of the target user to participate in the target business according to the intention category information.
According to a fifth aspect of embodiments herein, there is provided a computing device comprising:
a memory and a processor;
the memory is for storing computer-executable instructions which, when executed by the processor, implement the intent recognition model training method or steps of an intent recognition method.
According to a sixth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the intent recognition model training method or the steps of the intent recognition method.
In order to improve the accuracy of model recognition, the intention recognition model training method provided by the specification may include, after a target dialog text is obtained, dividing the target dialog text into at least two text segments, then uniformly inputting the text segments into an initial intention recognition model, and performing coding processing on the target dialog text by a coding unit in the model to obtain text features including segment features corresponding to each text segment, so as to realize representation of all text segments by one text feature; and finally, performing parameter adjustment on the initial intention recognition model based on standard intention type information and prediction intention type information corresponding to the target dialog text until an intention recognition model meeting training conditions is obtained, fusing the representation of the target dialog text in the model training process in an average pooling mode of the pooling unit, and performing model training on the basis of the fusion, so that the model can be learned in the training process and the intention prediction is performed on the basis of the global representation of the dialog text, the trained model has higher prediction accuracy and is convenient for downstream business use.
Drawings
FIG. 1 is a flowchart of an intent recognition model training method provided in an embodiment of the present specification;
FIG. 2 is a diagram illustrating an intent recognition model training method according to an embodiment of the present disclosure;
FIG. 3 is a schematic structural diagram of an intention recognition model training apparatus according to an embodiment of the present disclosure;
FIG. 4 is a flow chart of an intent recognition method provided by an embodiment of the present description;
fig. 5 is a schematic structural diagram of an intention identifying apparatus provided in an embodiment of the present specification;
FIG. 6 is a process flow diagram of an intent recognition method provided by an embodiment of the present description;
fig. 7 is a block diagram of a computing device according to an embodiment of the present disclosure.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at" \8230; "or" when 8230; \8230; "or" in response to a determination ", depending on the context.
First, the noun terms to which one or more embodiments of the present specification relate are explained.
BERT: the BERT is called Bidirectional Encoder reproduction from transformations, is a pre-training language model for natural language processing, is a self-coding language model, and can acquire vector Representation of a text through Bidirectional information of a coded text.
Text classification: text classification is a classic task of natural language processing. The task is to assign the text to the category to which it belongs according to a certain classification system or standard. In the field of machine learning, given labeled training data, the model can predict the type of the text after training.
In the present specification, an intention recognition model training method is provided, and the present specification relates to an intention recognition model training apparatus, an intention recognition method, an intention recognition apparatus, a computing device, and a computer-readable storage medium, which are described in detail one by one in the following embodiments.
In practical application, the intention prediction can be regarded as a text classification task, for a section of text, firstly, the section of text is input into a coding model to be processed, vector representation is obtained, then, the probability of each category is calculated through a neural network, and the category with the highest probability is used as the category of the text. In the modeling stage of a language model, aiming at the problem of how to improve the capability of an encoder for capturing long-distance dependency, a plurality of effective encoders exist, LSTM is used for modeling long-distance dependency, a gating mechanism and gradient clipping are utilized, and the longest average distance of encoding is about 200 words. The Transformer utilizes self-attribute mechanism to allow direct connection between words, and can better capture long-distance dependence, and the coding capability of the Transformer exceeds LSTM, but is limited to fixed-length context. The maximum input length of the transform-based BERT model is 512, but in a dialogue scene, the total number of words is mostly larger than this number, so that the model cannot be coded at one time. So that the context information cannot be fused for subsequent processing.
In order to improve the accuracy of model recognition, the intention recognition model training method provided by the specification may include, after a target dialog text is obtained, dividing the target dialog text into at least two text segments, then uniformly inputting the text segments into an initial intention recognition model, and performing coding processing on the target dialog text by a coding unit in the model to obtain text features including segment features corresponding to each text segment, so as to realize representation of all text segments by one text feature; and finally, performing parameter adjustment on the initial intention recognition model based on standard intention type information and prediction intention type information corresponding to the target dialog text until an intention recognition model meeting training conditions is obtained, fusing the representation of the target dialog text in the model training process in an average pooling mode of the pooling unit, and performing model training on the basis of the fusion, so that the model can be learned in the training process and the intention prediction is performed on the basis of the global representation of the dialog text, the trained model has higher prediction accuracy and is convenient for downstream business use.
Fig. 1 is a flowchart illustrating an intention recognition model training method provided in an embodiment of the present specification, which specifically includes the following steps:
step S102, a target dialog text is obtained, and the target dialog text is segmented into at least two text segments.
Specifically, the target dialog text is a text formed by dialog contents between a telephone operator of a business project and a user in an actual business project, and the training of the intention recognition model is performed based on the text, so that the intention of the user participating in the business project can be determined under the business project, and the downstream business can be processed according to the intention recognition result. It should be noted that, in different service project scenarios, the trained intent recognition model has different intent recognition capabilities, that is, recognized intent categories are different, recognition capabilities of the trained intent recognition model are influenced by the target dialog text, different target dialog texts can be selected in different service scenarios, and the embodiment is not limited herein.
In this embodiment, the process of the intention model training method is described by taking a service item as an example of purchasing a web class service, and the same or corresponding description contents in other scenes can be referred to in this embodiment, which is not described in detail herein.
Furthermore, the at least two text segments are specifically segments obtained by segmenting the target dialog text, and are used for processing the target dialog text into segments conforming to the input length of the model so as to achieve the purpose of model training after the target dialog text is input into the model, and meanwhile, the model can be combined with semantics among different segments of the same text to complete prediction processing operation in a prediction stage.
Furthermore, considering that in the model training stage, a large amount of texts need to be adopted to train the initial intention recognition model to obtain the intention recognition model meeting the use requirements, it is necessary to determine a dialog information set associated with the target service first, and select the target dialog text from the set for training, in this embodiment, the specific implementation manner is as follows:
acquiring a conversation information set of a related target service; carrying out data cleaning on initial dialogue information contained in the dialogue information set to obtain a target dialogue information set containing target dialogue information; determining at least two dialog texts corresponding to the target dialog information in the target dialog information set, and splicing the at least two dialog texts to obtain the target dialog text; wherein each of the at least two dialog texts contains a speaker identification.
Specifically, the target service refers to a service item which a user can participate in and can provide corresponding services for the user, and the target service also relates to interaction between a telephone operator and the user, and is used for assisting the target service to provide better service for the user and reach more users; correspondingly, the dialogue information set specifically refers to a set formed by dialogue information between the telephone operators and the users of the associated target service; correspondingly, the initial dialog information specifically refers to the unprocessed dialog information in the dialog information set, and includes non-standardized text contents, such as mood assist words, repeated content of word units, lack of word units, and the like. Correspondingly, the data cleaning specifically refers to an operation of filtering and/or standardizing the initial dialogue information, and is used for removing unclear contents in the initial dialogue information, converting spoken language contents into written language contents, removing initial dialogue information which does not meet business requirements, and the like, and ensuring that target dialogue information contained in the target dialogue information set meets the subsequent model training requirements. Correspondingly, the dialog text specifically refers to the speech content corresponding to any one speaker involved in the dialog information, that is, each dialog text corresponds to one speaker, and each dialog text contains a speaker identifier; the speaker identifier specifically refers to an identifier representing a speaker, and the identifier may be a text identifier, a character string identifier, or the like, which is not limited herein.
Based on this, when it is determined that the intention recognition model needs to be trained for the target service, a dialog information set associated with the target service may be obtained first, so that the trained intention recognition model may be applied to prediction in a target service scene. Furthermore, considering that the initial dialog information contained in the dialog information set is not standard, if training of the model is performed on the basis of the initial dialog information, the prediction accuracy of the model may be reduced, so before training of the model, data cleaning may be performed on the initial dialog information contained in the dialog information set to achieve standardization processing on the initial dialog information, and the initial dialog information not meeting the requirements of the service scenario in the dialog information set may be removed to obtain the target dialog information set containing the target dialog information according to the processing result.
And after the target dialogue information set is obtained, because each target dialogue information is formed by dialogue contents between at least two users, the intention identification is realized by combining the speaking contents of the users participating in the target business, and the speaking contents of telephone operators interacting with the users are also required to be considered, so that a sample of the training model is constructed on the basis, and the model prediction precision can be higher. Therefore, at least two dialog texts contained in each target dialog message can be spliced to obtain a plurality of target dialog texts according to the splicing result, and the target dialog texts also contain the speaker identification of each user, so that the model training can be completed by combining the speaker identification and the text content on the basis of convenience in the follow-up process.
In practical application, the data cleaning stage is considered to be completed by combining a plurality of cleaning rules set by a target service, and different cleaning rules can generate different modifications on initial dialogue information; therefore, in specific implementation, one or more cleaning rules can be selected according to business requirements to process the initial dialogue information so as to achieve the purpose of data cleaning. For example, the cleaning rule includes rule a, rule B and rule C, when the initial dialogue information is cleaned, rule a may be selected to clean the initial dialogue information, and the cleaned dialogue information is the target dialogue information; or selecting the rule A, the rule B and the rule C to simultaneously clean the initial dialogue information, wherein the initial dialogue information processed by the three rules is the target dialogue information.
In the target business scenario, the cleansing rules include, but are not limited to: removing initial dialogue information with text length not reaching a length threshold value, removing initial dialogue information with wrong speaker identification label, removing meaningless characters (such as tone words and the like) in the initial dialogue information, repairing punctuation marks aiming at the initial dialogue information and the like. In practical applications, the cleaning rule may be set according to actual requirements, and the data cleaning stage may be selected according to requirements, which is not limited herein.
For example, in a network class report service scene, after the report continuing intention of the user is determined, different ways can be adopted for different report continuing intentions to be docked with the user; if the user has strong continuing and reporting intention, the user can be directly recommended to continue and report the course, or if the user has weak continuing and reporting intention, the user can be recommended to try to listen to the network course, so that the experience of the user participating in the network course continuing and reporting service is improved. And the accurate recognition of the user's continuing intention is the most important in the process. The recognition of the continuous reporting intention of the user can be realized by adopting an intention recognition model, and before that, the intention recognition model meeting the current business scene needs to be trained.
Further, firstly, a dialogue information set of the network class continuous report service is obtained, namely a set formed by dialogues between teachers and users; secondly, considering that the initial dialogue information contained in the dialogue information set may not reach the sample use standard, the initial dialogue information contained in the dialogue information set can be subjected to data cleaning; initial dialogue information with short dialogue content in the dialogue information set can be filtered, for example, the initial dialogue information with less than 20 total dialogue sentences of two parties is filtered; the initial dialogue information with wrong identity labeling can be filtered, for example, the initial dialogue information with the words of the teacher labeled as the words of the user is filtered; initial dialogue information that any person continuously speaks more than n sentences can be filtered, for example, initial dialogue information that one person continuously speaks more than 15 sentences; meaningless characters in the initial dialogue information can be deleted, for example, a Chinese assist word 'chou' in the initial dialogue information is deleted; punctuation corrections may be made to the initial dialog message, such as adding a period after the end of a period.
Furthermore, after the data cleaning processing is performed on the initial dialogue information contained in the dialogue information set, a target dialogue information set composed of target dialogue information with clear text expression can be obtained. Then, considering that each target dialog information is a dialog between the user and the teacher, in order to be used in the model training stage and complete the intention recognition by combining the dialog contents of the user and the teacher, the dialog texts respectively corresponding to the teacher and the user in each target dialog text may be spliced, so as to obtain the target dialog text corresponding to the target dialog information according to the splicing result, so as to be used for performing the model training subsequently.
It should be noted that, at the stage of determining the speaker identifier corresponding to each dialog text, the user identity to which the initial speech text belongs in each initial dialog information may be determined in a regular expression search manner, and then, the speaker identifier corresponding to each dialog text may be determined by labeling the subsequent speech text with a user identity determining statement. For example, user a communicates with user B, the first speech content of user a is { a1}, the first speech content of user B is { B1}, the users have 10 conversations, in the identity determination stage, the first speech and the second speech may be identified first, so as to determine which user the first speech corresponds to in the 10 conversations according to the identification result, and then the speech characteristics corresponding to the first speech are used to identify the speech content in the remaining 10 conversations, so as to determine the speech content corresponding to user a, and the remaining speech content is the speech content of user B.
Or, searching the identity between the user A and the user B by adopting a regular expression, wherein the regular expression is used for searching: i am {1,10} user A | this side is {1,10} user A | is a {1,5} parent; if the search condition is met, the corresponding user is judged to be the user A, and if both people do not meet the condition, the user A (user A associated target service) with the most speaking content is selected.
In practical application, the speaker identification may be determined according to actual requirements, and may also be implemented in a tone recognition mode or a manual labeling mode, which is not limited herein.
In conclusion, before the model training, the influence of redundant information can be eliminated from the cleaned sample by cleaning the sample, and the prediction capability of the model can be improved in the model training stage. Meanwhile, at least two dialog texts contained in the target dialog information are spliced, so that the context information of the dialog contents can be combined in the sample, the model can learn the potential characteristics with identity influence in the model training stage, and the model prediction capability is further improved.
In addition, considering that the conversation content related to the business project may be obtained through a communication software chat or a telephone call, and therefore before the model training, the audio conversion text needs to be used, in this embodiment, the specific implementation manner is as follows:
acquiring a conversation audio set associated with the target service; and respectively inputting the dialogue audio contained in the dialogue audio set into a voice recognition model for processing to obtain dialogue information containing the speaker identification, and forming the dialogue information set.
Specifically, the dialog audio set specifically refers to audio content corresponding to a target service, and the audio content is composed of dialog audio content between at least two users; correspondingly, the speech recognition model is a model capable of converting the dialogue audio into a text, and it should be noted that the speech recognition model can complete the labeling of the dialogue information by combining the tone corresponding to the audio while converting the audio, so as to label the speaker identification of the dialogue information.
Based on the above, after the dialog audio set of the associated target service is acquired, the dialog audio contained in the dialog audio set can be respectively input to the speech recognition model for processing, the dialog information corresponding to each dialog audio can be obtained according to the processing result of the model, the dialog information contains the speaker identifier corresponding to each dialog sentence, and the dialog information set can be formed based on the dialog information.
It should be noted that the dialogue audio contained in the dialogue audio set is composed of the mutual dialogue contents between at least two users, and therefore the dialogue sentences contained in the dialogue information are also composed of the mutual dialogue text contents between at least two users.
In conclusion, in the conversion process by combining the voice recognition model, the speaker identifier corresponding to each dialog sentence in the dialog information is labeled by combining the recognition capability of the voice recognition model, so that the speaker identity identifier can be blended into the dialog information, and thus the constructed sample can improve the learning dimensionality of the intention recognition model and improve the intention recognition capability of the model.
Furthermore, in order to perform high-precision data cleansing on the initial session information, the data cleansing may be completed in combination with the data cleansing link, that is, the session information set is sequentially cleansed in combination with all data cleansing stages included in the data cleansing link, and in this embodiment, the specific implementation manner is as follows:
determining a data washing link comprising a plurality of data washing nodes; selecting a data cleaning rule corresponding to the ith data cleaning node in the data cleaning link, and performing data cleaning on the initial session information contained in the session information set to obtain an initial session information set; judging whether the data cleaning link contains an unexecuted data cleaning node or not; if yes, i is increased by 1, the initial dialogue information set is used as a dialogue information set, and a step of selecting a data cleaning rule corresponding to the ith data cleaning node in the data cleaning link is executed; if not, the initial dialog information set is used as a target dialog information set containing target dialog information.
Specifically, the data cleaning link is a link formed by at least two data cleaning nodes, the data cleaning nodes have a sequential execution sequence in the data cleaning link, and different data cleaning nodes correspond to different data cleaning rules. Correspondingly, the data cleansing node specifically refers to a node for performing data cleansing processing on initial session information included in the session information set. Accordingly, the data cleansing rules include, but are not limited to, removing initial dialog information whose text length does not reach a length threshold, removing initial dialog information with speaker identification labeling errors, removing meaningless characters (such as linguistic words and the like) in the initial dialog information, repairing punctuation marks for the initial dialog information, and the like. In practical applications, the cleaning rule may be set according to actual requirements, and the data cleaning stage may be selected according to requirements, which is not limited herein.
Based on this, in the data cleaning stage, a data cleaning link of the associated target service can be determined, then a data cleaning rule corresponding to the ith data cleaning node is selected from the link, all initial session information contained in the session information set is cleaned, and the initial session information set is obtained according to the cleaning result, wherein i is a positive integer and is taken from 1; and then judging whether the unexecuted data cleaning nodes exist in the data cleaning link, if so, increasing 1 by self, taking the initial session information set as the session information set, executing the data cleaning rule corresponding to the selected data cleaning node again, and cleaning the data of the session information set. And after all data cleaning nodes of the data cleaning link are executed, taking the finally obtained initial dialogue information set as a target dialogue information set for subsequent model training.
It should be noted that, when the data cleansing rule corresponding to each data cleansing node cleanses the data of the initial session information, the data cleansing process in the foregoing embodiment may be referred to, and this embodiment is not described in detail herein.
In summary, the data cleaning link is adopted to perform data cleaning processing on the initial dialogue information, so that all the initial dialogue information contained in the dialogue information set can be ensured to be cleaned, and thus the target dialogue information set is more standard, so as to train an intention recognition model with higher prediction accuracy.
Furthermore, after the target dialog text is obtained, the target dialog text needs to be segmented to obtain at least two text segments, and the at least two text segments can be processed in a subsequent input model, and in the segmentation process, in order to make the length of the text segment after segmentation conform to the predicted processing length of the model, the segmentation can be completed according to the segmentation window parameters, and in the embodiment, the specific implementation manner is as follows:
determining segmentation window parameters matched with the input parameters of the initial intention recognition model; and carrying out segmentation processing on the target dialog text according to the segmentation window parameters to obtain the at least two text segments.
Specifically, the input parameters specifically refer to parameters corresponding to input standards set by the initial intention recognition model, and are used for standardizing the text length of the input model; correspondingly, the segmentation window parameter specifically refers to the size of the segmentation window matched with the input parameter, and is used for segmenting the target dialog text into text lengths matched with the input parameter.
Based on this, after the target dialog text is obtained, in order to enable the initial intention recognition model to complete intention recognition processing in combination with context, the input parameters corresponding to the initial intention recognition model can be determined before the model is input, the segmentation window parameters corresponding to the input parameters are determined, at this time, the target dialog text is segmented according to the segmentation window parameters, and at least two text segments can be obtained and used for inputting at least two text segments belonging to the target dialog text into the initial intention recognition model in a unified manner, and training is completed on the premise that the model input requirements are met.
In specific implementation, considering that the length of the target dialog text is not fixed, in at least two text segments obtained after segmentation processing is performed according to the segmentation window parameter, there may be text segments whose text length does not meet the model input parameter. In order to avoid the influence on model training, after a text segment which does not meet the input parameters of the model is obtained, discarding the text segment or supplementing characters; for example, the model input parameter is 512 characters, and the segmented text segment corresponds to 512 characters and 12 characters, so that the text segment corresponding to 12 characters can be discarded; or, the segmented text segments correspond to 512 characters and 510 characters, respectively, and then characters can be supplemented in the text segment corresponding to the 510 characters at this time to obtain a text segment corresponding to the 512 characters, and then the subsequent model processing is performed. The supplemented characters are characters which do not influence the semantics of the text fragments.
According to the above example, after the dialog contents between the user A and the user B are spliced, the target dialog text with the length of 4096 is obtained, then the target dialog text with the length of 4096 can be segmented according to the window {512} matched with the parameters of the initial intention recognition model, and 8 text segments are obtained according to the segmentation result, namely S1-S8 respectively, and are used for training the subsequent input model.
In conclusion, the target dialog text is processed according to the segmentation window parameters matched with the input parameters to obtain text segments meeting the input requirements of the model, and model training is performed on the basis of the text segments, so that the model can be combined with context information of each text segment according to the hierarchical structure contained in the model in the training stage to improve the prediction accuracy of the model.
Step S104, inputting the at least two text segments into an initial intention recognition model, wherein the initial intention recognition model comprises a coding unit, a pooling unit and a prediction unit.
Specifically, under the condition that the at least two text segments corresponding to the target dialog text are obtained, all the text segments may be input to the initial intention recognition model for processing, so as to process the at least two text segments through the initial intention recognition model including the coding unit, the pooling unit and the prediction unit, respectively, so that after the model outputs the prediction result, the model may be parametrized according to the sample label.
The initial intention recognition model specifically refers to an intention recognition model corresponding to a target service, and comprises a coding unit, a pooling unit and a prediction unit, wherein the coding unit is used for coding the text segments, and the pooling unit is used for pooling the coding processing result, so that the text segments can be fused together and input into the prediction unit for intention recognition on the basis of the fusion. The coding unit can be realized by a BERT model, the pooling unit can be realized by an average pooling mode, and the prediction unit can be realized by a linear transformation mode.
And step S106, the at least two text segments are coded through the coding unit to obtain text characteristics. Wherein the text features comprise segment features corresponding to each text segment.
Specifically, after the at least two text segments are uniformly input to the initial intention recognition model, the at least two text segments may be encoded by the encoding unit, and in the encoding process, although the text segments are uniformly input to the initial intention recognition model, the encoding unit in the initial intention recognition model can only process one text segment at a time, so that in the encoding process, the encoding unit needs to sequentially encode each text segment, so as to obtain the text features including the segment features corresponding to each text segment according to the encoding process result.
The segment characteristics are vector expressions obtained after each text segment is coded, and the text characteristics are vector expressions integrating text characteristics corresponding to all the text segments. And the vector expression is used for representing the target text to be processed.
And step S108, performing pooling processing on the text features containing the segment features through the pooling unit, converting the pooling processing result by using the prediction unit, and obtaining and outputting prediction intention type information.
Specifically, after the encoding unit encodes at least two text segments, text features composed of segment features corresponding to the text segments are obtained. Furthermore, in order to enable the intention recognition model to complete the intention prediction of the user in a training stage by combining context information, the text features containing the segment features can be pooled by the pooling unit, so that the segment features corresponding to at least two text segments can be integrated and compressed together, and the result of the pooling processing is converted by the prediction unit to obtain the predicted intention category information corresponding to the target dialog text, thereby facilitating the subsequent model parameter adjustment.
The pooling treatment refers to that the fragment characteristics contained in the text characteristics are subjected to average pooling so as to represent all expressions of the target dialogue text through the pooling treatment result; correspondingly, the predicted intention category information specifically refers to information corresponding to an intention category obtained after the initial intention recognition model performs prediction processing on the target dialog text, and is used for subsequently adjusting parameters of the initial intention recognition model by combining with a label corresponding to the target dialog text.
Further, in the process of performing pooling processing on text features including segment features through the pooling unit, it is considered that the text features include segment features corresponding to each text segment, and therefore in the process of pooling processing, average pooling processing needs to be performed on each segment feature in the text features first, and after pooling processing corresponding to each text segment is obtained, average pooling processing is performed again to obtain a pooling processing result, which is specifically implemented as follows in this embodiment:
performing initial pooling processing on the text features containing the segment features through the pooling unit to obtain pooled text features, wherein the pooled text features contain pooled segment features corresponding to each text segment; performing target pooling on the pooled text features containing the pooled fragment features through the pooling unit to obtain target text features, and taking the target text features as the result of the pooling; and converting the pooling processing result through the prediction unit to obtain prediction intention category information corresponding to the target dialog text, and outputting the initial intention identification model.
Specifically, the pooling text feature refers to a vector expression composed of features obtained by performing average pooling on text segments contained in the text feature through a pooling unit; correspondingly, the pooling segment characteristics are vector expressions obtained after each text segment is subjected to average pooling, the dimensionality of the pooling text characteristics is smaller than that of the text characteristics, and correspondingly, the dimensionality of the pooling segment characteristics is smaller than that of the segment characteristics. Correspondingly, the target text features specifically refer to vector expressions obtained after average pooling processing is performed on pooled text features including pooled segment features, the target text features are vector expressions obtained after all text segments are fused, and the dimensionality of the target text features is smaller than that of the pooled text features.
Based on this, after the encoding unit encodes at least two text segments, the text features including the segment features are obtained, and at this time, the text features including the segment features may be subjected to average pooling processing by the pooling unit, so that pooled text features including pooled segment features corresponding to each text segment may be obtained according to the processing result. And then, performing average pooling on the pooled text features containing the pooled segment features through a pooling unit to obtain target text features, so that the context information of each text segment feature can be fused into the target text features, finally, converting the target text features through a prediction unit to obtain prediction intention category information, and outputting a model for subsequent parameter adjustment.
Wherein, the initial pooling refers to performing average pooling on each segment feature contained in the text features, and the segment feature of each text segment is represented as h i ∈R L*d Wherein i starts from 1 and is a positive integer. Obtaining the corresponding pooled fragment feature expression of each text fragment by using average pooling processing
Figure BDA0003751397760000122
Wherein avgpool is a function used for the average pooling process. After the pooled text features including the pooled segment features are obtained, average pooling can be performed on all the pooled segment features to obtain target text features corresponding to the target dialog text
Figure BDA0003751397760000121
Finally, linear transformation is carried out on the target text characteristics r corresponding to the target dialog text, and then the probability p, p = (c) of each intention type corresponding to the target dialog text can be obtained n | r) softmax (Wr), wherein c n And expressing each intention type, wherein W expresses a parameter matrix to be learned of the text classification task, and after the probability corresponding to each intention type is obtained, the prediction intention type corresponding to the target dialogue text can be selected from the parameter matrix to be learned, and the prediction intention type is used for adjusting the parameters of the model after the loss value is calculated subsequently.
In conclusion, by adopting the pooling unit to perform average pooling twice, the text segments can be fused together to obtain the target text characteristics, the intention prediction can be completed by fully combining the connection among the text segments, and the model prediction precision is improved.
Furthermore, after the pooling processing result is obtained, since the intent prediction belongs to the text classification task, it is necessary to select the most suitable intent prediction result from the plurality of intent prediction results for outputting, so as to ensure that the model can perform accurate parameter adjustment in the parameter adjustment stage, and in this embodiment, the specific implementation manner is as follows:
converting the result of the pooling processing through the prediction unit to obtain a category probability corresponding to each intention category information in at least two intention category information; and comparing the category probability of each intention category information, and selecting target intention category information from at least two intention category information according to the comparison result as the predicted intention category information.
Specifically, the intention category information specifically refers to intention description information corresponding to a plurality of categories preset in relation to the target service; correspondingly, the category probability specifically refers to the probability corresponding to each intention category information obtained after the target dialog text is processed by the initial intention recognition model, and the higher the probability is, the closer the target dialog text and the corresponding intention category is. Accordingly, the target intention category information specifically refers to intention category information obtained by comparing the category probabilities.
Based on this, after the pooling processing result is obtained, the pooling processing result can be subjected to linear transformation through the prediction unit so as to obtain the category probability corresponding to each intention category information according to the conversion result, then the category probabilities corresponding to the intention category information are compared, and the intention category information with the highest probability can be selected according to the comparison result to be used as the prediction intention category information and output a model, so that the model parameter adjustment can be completed by combining the sample label subsequently.
Referring to the schematic diagram shown in fig. 2, after the S1 to S8 text segments are uniformly input to the initial intention recognition model, the S1 to S8 text segments are sequentially encoded by the BERT model, so as to obtain text features including sequence representation of each text segment, and then the first average pooling process is performed on the sequence representation of each text segment, so as to obtain segment representation corresponding to each text segment. And then, carrying out average pooling treatment on the pooled text characteristics containing the fragment representation to obtain the text representation corresponding to the target dialog text.
Further, linear transformation is carried out on the text representation of the target dialog text, and the probability of the level corresponding to the continuation intention A1 is p1, the probability of the level corresponding to the continuation intention A2 is p2, and the probability of the level corresponding to the continuation intention A3 is p3; and comparing the probabilities p1, p2 and p3, and determining that p1 is greater than p2 and greater than p3, then selecting the continuous reporting intention A1 as the prediction intention type information corresponding to the target dialog text at the moment, and outputting an initial intention recognition model, so that the model can be conveniently called by subsequently combining the label.
In summary, by comparing the category probability of each intention category information, the intention category information with the highest probability can be output through the initial intention recognition model for determining the intention represented by the target dialog text, and then the loss value is calculated by combining the sample label, so that the parameter adjustment of the model is completed, and the model can have higher prediction accuracy.
And step S110, performing parameter adjustment on the initial intention recognition model based on the standard intention type information and the predicted intention type information corresponding to the target dialog text until an intention recognition model meeting training conditions is obtained.
Specifically, after the prediction intention category information is obtained, further, standard intention category information corresponding to the target dialog text may be obtained first, then the prediction accuracy of the model is determined by combining the prediction intention category information and the standard intention category information, the model is subjected to parameter adjustment according to the determination result, and training is performed again after the parameter adjustment is completed until the model meets the training condition, that is, the finally obtained model may be used as the intention recognition model for the actual application stage. The standard intention type information specifically refers to real intention type information corresponding to the target dialog text, and prediction accuracy of the initial intention recognition model can be determined by comparing and calculating the prediction intention type information and the standard intention type information, so that parameter adjustment can be performed on the initial intention recognition model. So that the method has the capability of high prediction precision.
Further, when the model is subjected to parameter adjustment, the training condition may select an iteration number condition, a loss value comparison condition, and the like, and when the training condition is the loss value comparison condition, the specific implementation manner is as follows:
determining the standard intention category information corresponding to the target dialog text in a training set; calculating a target loss value according to the standard intention category information and the prediction intention category information; adjusting parameters of the initial intention recognition model based on the target loss value until the intention recognition model meeting training conditions is obtained; wherein the training condition is a loss value comparison condition.
Specifically, the target loss value specifically refers to a loss value calculated according to standard intention category information and prediction intention category information in combination with a loss function, where the loss function includes, but is not limited to, a cross entropy loss function, a maximum loss function, an absolute loss function, and the like, and in a specific application, the target loss value may be selected according to an actual requirement, and the embodiment is not limited herein. Correspondingly, the loss value comparison condition specifically refers to a condition for comparing the loss value with a preset loss value threshold, and under the condition that the loss value is smaller than the preset loss value threshold, it is determined that the model of the current node meets the training condition.
Based on the above, after the predicted intention category information is obtained, standard intention category information corresponding to the target dialog text can be selected from a training set to which the target dialog text belongs, then a cross entropy loss function is combined, a target loss value is calculated according to the standard intention category information and the predicted intention category information, the target loss value is compared with a preset loss value threshold, if the target loss value is larger than the preset loss value threshold, the initial intention recognition model is subjected to parameter adjustment based on the target loss value threshold, and after the parameter adjustment is completed, samples are selected from the set to be trained. And taking the model at the current stage as an intention identification model until the loss value is smaller than a preset loss value threshold.
In conclusion, by combining the loss function to perform model parameter adjustment, the model can be more accurate in the parameter adjustment process, so that the model can learn the intention prediction capability, and the accurate prediction can be conveniently completed for the target service scene in the application stage.
In addition, considering that the models in different training stages have different fitting capabilities, in order to avoid reduction of prediction accuracy caused by over-training of the models, the models may be verified by combining a verification set, and in this embodiment, the specific implementation manner is as follows:
determining an intermediate intention recognition model according to a parameter adjusting result; extracting a verification dialog text in a verification set, and dividing the verification dialog text into at least two verification text segments; inputting the at least two verification text segments into the intermediate intention recognition model for processing to obtain verification intention category information; comparing target intention category information corresponding to the verification dialog text with the verification intention category information; and taking the intermediate intention recognition model as the intention recognition model when the comparison result meets the training condition.
Specifically, the intermediate intention recognition model is an intention recognition model obtained after preliminary training, and the precision of the intermediate intention recognition model does not meet the requirement; correspondingly, the verification set specifically refers to a set of sample pairs for verifying the prediction accuracy of the model.
Based on the method, after the intermediate intention recognition model is obtained according to the parameter adjusting result, in order to improve the model prediction precision, the model at the current stage can be verified by combining a verification set; firstly, extracting a verification dialog text in a verification set, and segmenting the verification dialog text into at least two verification text segments; secondly, inputting at least two verification text segments into the middle intention recognition model for processing to obtain verification intention category information; comparing the target intention type information corresponding to the verification dialogue text with the verification intention type information again; and determining the prediction accuracy of the model according to the comparison result. If the comparison result does not satisfy the training condition, it is sufficient to explain that the model still needs to be trained, and continue training. In the case where the comparison result satisfies the training condition, the description model is already available, and the intermediate intention recognition model is taken as the intention recognition model.
It should be noted that, for the processing process of the verification text segment by the intermediate intention recognition model, reference may be made to the corresponding or the same description in the above embodiments, and this embodiment is not described in detail herein.
In order to improve the accuracy of model recognition, the intention recognition model training method provided by the specification may include, after a target dialog text is obtained, dividing the target dialog text into at least two text segments, then uniformly inputting the text segments into an initial intention recognition model, and performing coding processing on the target dialog text by a coding unit in the model to obtain text features including segment features corresponding to each text segment, so as to realize representation of all text segments by one text feature; and finally, performing parameter adjustment on the initial intention recognition model based on standard intention type information and prediction intention type information corresponding to the target dialog text until an intention recognition model meeting training conditions is obtained, fusing the representation of the target dialog text in the model training process in an average pooling mode of the pooling unit, and performing model training on the basis of the fusion, so that the model can be learned in the training process and the intention prediction is performed on the basis of the global representation of the dialog text, the trained model has higher prediction accuracy and is convenient for downstream business use.
Corresponding to the above method embodiment, the present specification further provides an intention recognition model training device embodiment, and fig. 3 illustrates a schematic structural diagram of an intention recognition model training device provided in an embodiment of the present specification. As shown in fig. 3, the apparatus includes:
an obtaining module 302, configured to obtain a target dialog text and segment the target dialog text into at least two text segments;
an input module 304 configured to input the at least two text segments to an initial intent recognition model, wherein the initial intent recognition model comprises an encoding unit, a pooling unit, and a prediction unit;
the encoding module 306 is configured to perform encoding processing on the at least two text segments through the encoding unit to obtain text features, where the text features include segment features corresponding to each text segment;
a pooling module 308 configured to pool the text features including the segment features through the pooling unit, convert the result of the pooling processing by using the prediction unit, obtain and output prediction intention category information;
a training module 310 configured to perform parameter adjustment on the initial intention recognition model based on the standard intention category information and the predicted intention category information corresponding to the target dialog text until an intention recognition model satisfying a training condition is obtained.
In an optional embodiment, the obtaining module 302 is further configured to:
acquiring a conversation information set of a related target service; carrying out data cleaning on initial dialogue information contained in the dialogue information set to obtain a target dialogue information set containing target dialogue information; determining at least two dialog texts corresponding to the target dialog information in the target dialog information set, and splicing the at least two dialog texts to obtain the target dialog text; wherein each of the at least two dialog texts contains a speaker identification.
In an optional embodiment, the obtaining module 302 is further configured to:
determining a data washing link comprising a plurality of data washing nodes; selecting a data cleaning rule corresponding to the ith data cleaning node in the data cleaning link, and performing data cleaning on the initial session information contained in the session information set to obtain an initial session information set; judging whether the data cleaning link contains an unexecuted data cleaning node or not; if yes, i is added by 1, the initial dialogue information set is used as a dialogue information set, and a step of selecting a data cleaning rule corresponding to the ith data cleaning node in the data cleaning link is executed; and if not, taking the initial dialog information set as a target dialog information set containing target dialog information.
In an optional embodiment, the obtaining module 302 is further configured to:
determining segmentation window parameters matched with the input parameters of the initial intention recognition model; and carrying out segmentation processing on the target dialog text according to the segmentation window parameters to obtain the at least two text segments.
In an alternative embodiment, the pooling module 308 is further configured to:
performing initial pooling processing on the text features containing the segment features through the pooling unit to obtain pooled text features, wherein the pooled text features contain pooled segment features corresponding to each text segment; performing target pooling on the pooled text features containing the pooled fragment features through the pooling unit to obtain target text features, and taking the target text features as the result of the pooling; and converting the pooling processing result through the prediction unit to obtain prediction intention category information corresponding to the target dialog text, and outputting the initial intention identification model.
In an alternative embodiment, the pooling module 308 is further configured to:
converting the result of the pooling processing through the prediction unit to obtain a category probability corresponding to each intention category information in at least two intention category information; and comparing the category probability of each intention category information, and selecting target intention category information from at least two intention category information according to the comparison result as the predicted intention category information.
In an optional embodiment, the training module 310 is further configured to:
determining the standard intention category information corresponding to the target dialog text in a training set; calculating a target loss value according to the standard intention category information and the prediction intention category information; adjusting parameters of the initial intention recognition model based on the target loss value until the intention recognition model meeting training conditions is obtained; wherein the training condition is a loss value comparison condition.
In an optional embodiment, the training module 310 is further configured to:
determining an intermediate intention recognition model according to the parameter adjusting result; extracting a verification dialog text in a verification set, and dividing the verification dialog text into at least two verification text segments; inputting the at least two verification text segments into the intermediate intention recognition model for processing to obtain verification intention category information; comparing target intention category information corresponding to the verification dialog text with the verification intention category information; and taking the intermediate intention recognition model as the intention recognition model when the comparison result meets the training condition.
In an optional embodiment, the obtaining module 302 is further configured to:
acquiring a conversation audio set associated with the target service; and respectively inputting the dialogue audio contained in the dialogue audio set into a voice recognition model for processing to obtain dialogue information containing the speaker identification, and forming the dialogue information set.
In order to improve the accuracy of model recognition, the intention recognition model training device provided in this specification may divide a target dialog text into at least two text segments after obtaining the target dialog text, and then input the target dialog text into an initial intention recognition model in a unified manner, and encode the target dialog text by using an encoding unit in the model to obtain text features including segment features corresponding to each text segment, so as to realize representation of all text segments by one text feature; and finally, performing parameter adjustment on the initial intention recognition model based on standard intention type information and prediction intention type information corresponding to the target dialog text until an intention recognition model meeting training conditions is obtained, fusing the representation of the target dialog text in the model training process in an average pooling mode of the pooling unit, and performing model training on the basis of the fusion, so that the model can be learned in the training process and the intention prediction is performed on the basis of the global representation of the dialog text, the trained model has higher prediction accuracy and is convenient for downstream business use.
The above is an illustrative scheme of the intention recognition model training apparatus of the present embodiment. It should be noted that the technical solution of the intention recognition model training device and the technical solution of the intention recognition model training method described above belong to the same concept, and details that are not described in detail in the technical solution of the intention recognition model training device can be referred to the description of the technical solution of the intention recognition model training method described above.
The present specification further provides an intention recognition method, and referring to fig. 4, fig. 4 shows a flowchart of an intention recognition model training method provided according to an embodiment of the present specification, which specifically includes the following steps:
step S402, obtaining the dialog text to be processed of the associated target user in the target service.
Step S404, the dialog text to be processed is divided into at least two text segments to be processed.
Step S406, inputting the at least two text segments to be processed into an intention recognition model in an intention recognition model training method for processing, and obtaining intention category information corresponding to the dialog text to be processed.
Step S408, determining the participation intention of the target user for participating in the target service according to the intention category information.
Specifically, the target user specifically refers to a user having relevance to the target service, for example, if the target service is a web class sales service, the target user is a user related to the web class sales service, including but not limited to a user who has consulted a web class, a user who has purchased a web class, and the like, and this embodiment is not limited in any way herein. Correspondingly, the to-be-processed dialog text specifically refers to a dialog text between the target user and a telephone operator in the interface of the network class sales service. Accordingly, the participation intention specifically refers to the intention of the target user to participate in the target business, i.e., the intention to purchase the web lesson, including but not limited to a high purchase intention, a low purchase intention, or a medium purchase intention.
In particular, the user with different participation intentions can be docked in different ways. For example, a user with high purchasing intention may be assigned a specialist to interface with a target user, to introduce a course, and the like; for example, the user with low purchasing intention can add an identifier to the information of the target user, so that the subsequent disturbance to the target user is avoided; for another example, the user who purchases the intention can recommend to try to listen to the lesson to the target user for experiencing the course content.
In practical application, when different target services are docked by the user based on participation intention, the docking manner may be set according to requirements, and this embodiment is not limited in any way.
The training process of the intention recognition model involved in the intention recognition method provided in this embodiment can be referred to the above embodiments, and this embodiment is not described in detail herein.
The intention identification method provided by the specification can be used for obtaining at least two text fragments to be processed in a preprocessing mode after obtaining the to-be-processed conversation text of the associated target user in the target service, and then inputting the text fragments to be processed into the intention identification model obtained by the training method for use, so that intention category information corresponding to the to-be-processed conversation text can be obtained, the participation intention of the target user in the target service can be accurately determined on the basis of the intention, a service affiliated party can conveniently perform subsequent follow-up according to the participation intention, the follow-up mode is ensured to be in line with the intention of the user in the target service, and the user experience effect is improved.
Corresponding to the above method embodiment, the present specification further provides an intention identification device embodiment, and fig. 5 shows a schematic structural diagram of an intention identification device provided in an embodiment of the present specification. As shown in fig. 5, the apparatus includes:
an obtaining text module 502 configured to obtain a to-be-processed dialog text of an associated target user in a target service;
a text segmentation module 504 configured to segment the dialog text to be processed into at least two text segments to be processed;
an input model module 506, configured to input the at least two text segments to be processed into an intention recognition model in an intention recognition model training method for processing, so as to obtain intention category information corresponding to the dialog text to be processed;
an intent determination module 508 configured to determine an engagement intention of the target user to engage in the target business according to the intention category information.
The intention recognition device provided by the specification can obtain at least two text fragments to be processed in a preprocessing mode after obtaining the to-be-processed conversation text of the associated target user in the target service, and then input the text fragments to the intention recognition model obtained by the training method for use, so that intention category information corresponding to the to-be-processed conversation text can be obtained, the participation intention of the target user in the target service can be accurately determined on the basis of the intention, a service affiliated party can conveniently follow up according to the participation intention, the follow-up mode is ensured to be in line with the intention of the user in the target service, and the user experience effect is improved.
The above is a schematic scheme of the intention identifying apparatus of the present embodiment. It should be noted that the technical solution of the intention identifying device and the technical solution of the intention identifying method belong to the same concept, and for details that are not described in detail in the technical solution of the intention identifying device, reference may be made to the description of the technical solution of the intention identifying method.
The following will further describe the intention recognition method with reference to fig. 6 by taking an application of the intention recognition method provided in the present specification in a web lesson follow-up business scenario as an example. Fig. 6 shows a processing flow chart of an intention identifying method provided in an embodiment of the present specification, which specifically includes the following steps:
step S602, a dialogue audio set associated with the target service is obtained, dialogue audio contained in the dialogue audio set is respectively input into the speech recognition model for processing, dialogue information containing speaker identification is obtained, and a dialogue information set is formed.
Step S604, performing data cleaning on the initial session information included in the session information set to obtain a target session information set including the target session information.
Step S606, at least two dialog texts corresponding to the target dialog information in the target dialog information set are determined, and the target dialog texts are obtained by splicing the at least two dialog texts.
Wherein each of the at least two dialog texts contains a speaker identification.
In step S608, the segmentation window parameters matching the input parameters of the initial intention recognition model are determined.
Step S610, segmenting the target dialog text according to the segmentation window parameters to obtain at least two text segments.
Step S612, inputting at least two text segments into an initial intention recognition model, wherein the initial intention recognition model includes a coding unit, a pooling unit, and a prediction unit.
Step S614, at least two text segments are coded through the coding unit to obtain text characteristics, wherein the text characteristics comprise segment characteristics corresponding to each text segment.
Step S616, performing initial pooling on the text features containing the segment features through a pooling unit to obtain pooled text features, wherein the pooled text features contain pooled segment features corresponding to each text segment.
And step S618, performing target pooling processing on the pooled text features containing the pooled segment features through the pooling unit to obtain target text features, and taking the target text features as a pooling processing result.
Step S620, converting the pooling processing result by the prediction unit to obtain a category probability corresponding to each of the at least two intention category information.
In step S622, the category probabilities of each of the intention category information are compared, and target intention category information is selected as predicted intention category information from at least two intention category information according to the comparison result.
Step S624, determining standard intention category information corresponding to the target dialog text in the training set.
In step S626, a target loss value is calculated from the standard intention category information and the prediction intention category information.
Step S628, performing parameter adjustment on the initial intention recognition model based on the target loss value until an intention recognition model meeting the training condition is obtained; wherein the training condition is a loss value comparison condition.
Step S630, obtaining the to-be-processed dialog text associated with the target user in the target service, and dividing the to-be-processed dialog text into at least two to-be-processed text segments.
Step S632, inputting at least two text segments to be processed into the intention recognition model for processing, and obtaining intention category information corresponding to the dialog text to be processed;
in step S634, the participation intention of the target user in the target service is determined according to the intention category information.
In summary, in order to improve the accuracy of model identification, after a target dialog text is obtained, the target dialog text is divided into at least two text segments, and then the target dialog text is input to the initial intention identification model in a unified manner, and is coded by a coding unit in the model to obtain text features including segment features corresponding to each text segment, so that representation of representing all text segments by one text feature is realized; and finally, performing parameter adjustment on the initial intention recognition model based on standard intention type information and prediction intention type information corresponding to the target dialog text until an intention recognition model meeting training conditions is obtained, fusing the representation of the target dialog text in the model training process in an average pooling mode of the pooling unit, and performing model training on the basis of the fusion, so that the model can be learned in the training process and the intention prediction is performed on the basis of the global representation of the dialog text, the trained model has higher prediction accuracy and is convenient for downstream business use.
Fig. 7 illustrates a block diagram of a computing device 700 provided in accordance with an embodiment of the present specification. The components of the computing device 700 include, but are not limited to, memory 710 and a processor 720. Processor 720 is coupled to memory 710 via bus 730, and database 750 is used to store data.
Computing device 700 also includes access device 740, access device 740 enabling computing device 700 to communicate via one or more networks 760. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 740 may include one or more of any type of network interface, e.g., a Network Interface Card (NIC), wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the present description, the above-described components of computing device 700, as well as other components not shown in FIG. 7, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 7 is for purposes of example only and is not limiting as to the scope of the present description. Those skilled in the art may add or replace other components as desired.
Computing device 700 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 700 may also be a mobile or stationary server.
Wherein processor 720 is configured to implement the intent recognition model training method or the steps of the intent recognition method when executing the computer-executable instructions.
The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device belongs to the same concept as the above-mentioned intention recognition model training method or the above-mentioned intention recognition method, and details that are not described in detail in the technical solution of the computing device can be referred to the above-mentioned description of the intention recognition model training method or the intention recognition method.
An embodiment of the present specification also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, are used for an intention recognition model training method or an intention recognition method.
The above is an illustrative scheme of a computer-readable storage medium of the embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the above-mentioned intention recognition model training method or intention recognition method, and details that are not described in detail in the technical solution of the storage medium can be referred to the above-mentioned description of the intention recognition model training method or intention recognition method.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present disclosure is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present disclosure. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for this description.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.
The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the specification and its practical application, to thereby enable others skilled in the art to best understand the specification and its practical application. The specification is limited only by the claims and their full scope and equivalents.

Claims (14)

1. An intention recognition model training method, comprising:
acquiring a target dialog text, and segmenting the target dialog text into at least two text segments;
inputting the at least two text segments to an initial intent recognition model, wherein the initial intent recognition model comprises an encoding unit, a pooling unit, and a prediction unit;
encoding the at least two text segments through the encoding unit to obtain text characteristics, wherein the text characteristics comprise segment characteristics corresponding to each text segment;
performing pooling processing on the text features containing the segment features through the pooling unit, converting the result of the pooling processing by using the prediction unit, and acquiring and outputting prediction intention category information;
and performing parameter adjustment on the initial intention recognition model based on the standard intention category information and the prediction intention category information corresponding to the target dialog text until an intention recognition model meeting training conditions is obtained.
2. The method of claim 1, wherein obtaining the target dialog text comprises:
acquiring a conversation information set of the associated target service;
carrying out data cleaning on initial dialogue information contained in the dialogue information set to obtain a target dialogue information set containing target dialogue information;
determining at least two dialog texts corresponding to the target dialog information in the target dialog information set, and splicing the at least two dialog texts to obtain the target dialog text;
wherein each of the at least two dialog texts contains a speaker identification.
3. The method of claim 2, wherein the performing data cleansing on the initial dialog information included in the dialog information set to obtain a target dialog information set including target dialog information comprises:
determining a data washing link comprising a plurality of data washing nodes;
selecting a data cleaning rule corresponding to the ith data cleaning node in the data cleaning link, and performing data cleaning on the initial session information contained in the session information set to obtain an initial session information set;
judging whether the data cleaning link contains an unexecuted data cleaning node or not;
if yes, i is increased by 1, the initial dialogue information set is used as a dialogue information set, and a step of selecting a data cleaning rule corresponding to the ith data cleaning node in the data cleaning link is executed;
and if not, taking the initial dialog information set as a target dialog information set containing target dialog information.
4. The method of claim 1, wherein the segmenting the target dialog text into at least two text segments comprises:
determining a segmentation window parameter matched with an input parameter of the initial intention recognition model;
and carrying out segmentation processing on the target dialog text according to the segmentation window parameters to obtain the at least two text segments.
5. The method according to claim 1, wherein the pooling processing of the text features including the segment features by the pooling unit, and the converting of the result of the pooling processing by the prediction unit to obtain and output the prediction intention category information comprises:
performing initial pooling processing on the text features containing the segment features through the pooling unit to obtain pooled text features, wherein the pooled text features contain pooled segment features corresponding to each text segment;
performing target pooling on the pooled text features containing the pooled fragment features through the pooling unit to obtain target text features, and taking the target text features as the result of the pooling;
and converting the pooling processing result through the prediction unit to obtain prediction intention category information corresponding to the target dialog text, and outputting the initial intention identification model.
6. The method according to claim 5, wherein the converting, by the prediction unit, the pooling processing result to obtain the prediction intention category information corresponding to the target dialog text comprises:
converting the result of the pooling processing through the prediction unit to obtain a category probability corresponding to each intention category information in at least two intention category information;
and comparing the category probability of each intention category information, and selecting target intention category information from at least two intention category information according to the comparison result as the predicted intention category information.
7. The method according to any one of claims 1-6, wherein the parametrizing the initial intention recognition model based on the standard intention category information and the predicted intention category information corresponding to the target dialog text until an intention recognition model satisfying a training condition is obtained comprises:
determining the standard intention category information corresponding to the target dialog text in a training set;
calculating a target loss value according to the standard intention category information and the prediction intention category information;
adjusting parameters of the initial intention recognition model based on the target loss value until the intention recognition model meeting training conditions is obtained; wherein the training condition is a loss value comparison condition.
8. The method of any one of claims 1-6, further comprising:
determining an intermediate intention recognition model according to the parameter adjusting result;
extracting a verification dialog text in a verification set, and dividing the verification dialog text into at least two verification text segments;
inputting the at least two verification text segments into the middle intention recognition model for processing to obtain verification intention category information;
comparing target intention category information corresponding to the verification dialog text with the verification intention category information;
and taking the intermediate intention recognition model as the intention recognition model when the comparison result meets the training condition.
9. The method of claim 2, wherein obtaining the set of session information associated with the target service comprises:
acquiring a conversation audio set associated with the target service;
and respectively inputting the dialogue audio contained in the dialogue audio set into a voice recognition model for processing to obtain dialogue information containing the speaker identification, and forming the dialogue information set.
10. An intention recognition model training apparatus, comprising:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is configured to acquire a target dialog text and segment the target dialog text into at least two text segments;
an input module configured to input the at least two text segments to an initial intent recognition model, wherein the initial intent recognition model includes an encoding unit, a pooling unit, and a prediction unit;
the encoding module is configured to encode the at least two text segments through the encoding unit to obtain text features, wherein the text features include segment features corresponding to each text segment;
the pooling module is configured to pool the text features containing the segment features through the pooling unit, convert the pooling processing result by using the prediction unit, obtain and output prediction intention type information;
and the training module is configured to perform parameter adjustment on the initial intention recognition model based on the standard intention category information and the prediction intention category information corresponding to the target dialog text until an intention recognition model meeting the training condition is obtained.
11. An intent recognition method, comprising:
acquiring a to-be-processed dialog text of a related target user in a target service;
the dialog text to be processed is divided into at least two text segments to be processed;
inputting the at least two text segments to be processed into an intention recognition model in the method of any one of claims 1 to 9 for processing, and obtaining intention category information corresponding to the dialog text to be processed;
and determining the participation intention of the target user in the target business according to the intention category information.
12. An intention recognition apparatus, comprising:
the text acquisition module is configured to acquire a to-be-processed dialog text of a related target user in the target service;
the text segmentation module is configured to segment the dialog text to be processed into at least two text segments to be processed;
an input model module, configured to input the at least two text segments to be processed into the intention recognition model in the method according to any one of claims 1 to 9 for processing, and obtain intention category information corresponding to the dialog text to be processed;
an intention determining module configured to determine an intention of the target user to participate in the target business according to the intention category information.
13. A computing device comprising a memory and a processor; the memory is for storing computer-executable instructions, and the processor is for executing the computer-executable instructions to implement the steps of the method of any one of claims 1 to 9 or 11.
14. A computer-readable storage medium storing computer instructions, which when executed by a processor, perform the steps of the method of any one of claims 1 to 9 or 11.
CN202210841809.1A 2022-07-18 2022-07-18 Intention recognition model training method and device Pending CN115827831A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210841809.1A CN115827831A (en) 2022-07-18 2022-07-18 Intention recognition model training method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210841809.1A CN115827831A (en) 2022-07-18 2022-07-18 Intention recognition model training method and device

Publications (1)

Publication Number Publication Date
CN115827831A true CN115827831A (en) 2023-03-21

Family

ID=85522859

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210841809.1A Pending CN115827831A (en) 2022-07-18 2022-07-18 Intention recognition model training method and device

Country Status (1)

Country Link
CN (1) CN115827831A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116756294A (en) * 2023-08-14 2023-09-15 北京智精灵科技有限公司 Construction method of dialogue intention recognition model, dialogue intention recognition method and dialogue intention recognition system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116756294A (en) * 2023-08-14 2023-09-15 北京智精灵科技有限公司 Construction method of dialogue intention recognition model, dialogue intention recognition method and dialogue intention recognition system
CN116756294B (en) * 2023-08-14 2023-12-26 北京智精灵科技有限公司 Construction method of dialogue intention recognition model, dialogue intention recognition method and dialogue intention recognition system

Similar Documents

Publication Publication Date Title
CN113255755B (en) Multi-modal emotion classification method based on heterogeneous fusion network
Macary et al. On the use of self-supervised pre-trained acoustic and linguistic features for continuous speech emotion recognition
CN111339305B (en) Text classification method and device, electronic equipment and storage medium
CN111966800B (en) Emotion dialogue generation method and device and emotion dialogue model training method and device
CN110990543A (en) Intelligent conversation generation method and device, computer equipment and computer storage medium
CN111739516A (en) Speech recognition system for intelligent customer service call
CN111930914B (en) Problem generation method and device, electronic equipment and computer readable storage medium
CN111897933A (en) Emotional dialogue generation method and device and emotional dialogue model training method and device
CN115269836A (en) Intention identification method and device
CN115964638A (en) Multi-mode social data emotion classification method, system, terminal, equipment and application
CN112836053A (en) Man-machine conversation emotion analysis method and system for industrial field
CN114360557A (en) Voice tone conversion method, model training method, device, equipment and medium
CN114386426B (en) Gold medal speaking skill recommendation method and device based on multivariate semantic fusion
CN115827831A (en) Intention recognition model training method and device
CN113793599A (en) Training method of voice recognition model and voice recognition method and device
CN116361442B (en) Business hall data analysis method and system based on artificial intelligence
CN117093864A (en) Text generation model training method and device
CN114911922A (en) Emotion analysis method, emotion analysis device and storage medium
CN114638238A (en) Training method and device of neural network model
CN114373448B (en) Topic detection method and device, electronic equipment and storage medium
CN114840697B (en) Visual question-answering method and system for cloud service robot
CN117453895B (en) Intelligent customer service response method, device, equipment and readable storage medium
CN115293148A (en) Theme identification method and device
CN115114907A (en) Text processing method and device
CN115858783A (en) Training method and device of theme recognition model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination