CN117150395A

CN117150395A - Model training and intention recognition method and device, electronic equipment and storage medium

Info

Publication number: CN117150395A
Application number: CN202311055750.4A
Authority: CN
Inventors: 张蕾; 冉猛; 郭子滔; 赵进; 秦蛟禹; 危枫; 王晨子
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2023-08-21
Filing date: 2023-08-21
Publication date: 2023-12-01

Abstract

The embodiment of the application provides a model training and intention recognition method and device, electronic equipment and a storage medium. The model training method comprises the following steps: acquiring a historical multi-round conversation text set, and respectively performing splicing treatment on each historical multi-round conversation text to obtain a historical sample set based on relative position codes; creating a multi-intention understanding model to be trained comprising at least one classifier, and acquiring a historical sample subset corresponding to each classifier from a historical sample set, wherein the classifier is a multi-label classifier; aiming at each classifier, training the classifier by utilizing a historical sample subset corresponding to the classifier, and obtaining a multi-intention understanding model after the training of all the classifiers is completed. According to the embodiment of the application, the multi-intention understanding model can learn the correlation between the semantic information of the conversation text and the front and back position information of the conversation text and the conversation intention category, and at least one classifier is arranged for analysis, so that the recognition result of the multi-intention understanding model is more accurate and comprehensive.

Description

Model training and intention recognition method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a method and apparatus for model training and intent recognition, an electronic device, and a storage medium.

Background

With the increasing maturity of natural language processing related technologies, a multi-round session system has been widely applied to numerous scenes such as customer service, business consultation, online shopping, etc. Text data in a multi-round session system based on a customer service scene has related service requirements and product requirements expressed by users, so that the intention of the users can be understood by classifying the texts through labels, customer service personnel can be helped to mine the potential requirements of the users, and product optimization and service improvement can be performed in time, so that the service can be processed more effectively and the service quality can be improved. However, the intent of the user in a multi-round conversational system of a customer service scenario typically involves multiple, and therefore, the task pertains to a multi-label text classification problem. The multi-label text classification is different from the multi-category text classification task that only has one category label for each text, and the multi-label text classification provides more detailed text information and has more meaning and value.

At present, the multi-label text classification method is mainly divided into two ideas of problem transformation and algorithm self-adaption. The problem transformation method is to transform the multi-label classification problem into a plurality of simple single-label classification problems, however, the method performs single-label processing, ignores the correlation problem among labels, and is poor in accuracy. The algorithm self-adaption method adjusts the multi-category classification method to adapt to the multi-label classification problem, however, the method only usually considers the condition of correlation among a plurality of labels of a single text, ignores the relation among texts in a multi-round session, and is poor in accuracy.

Disclosure of Invention

In view of the above problems, embodiments of the present application provide a method, an apparatus, an electronic device, and a storage medium for model training and intent recognition, so as to improve accuracy of intent recognition.

According to an aspect of an embodiment of the present application, there is provided a model training method including:

acquiring a historical multi-round conversation text set, and respectively performing splicing treatment on each historical multi-round conversation text to obtain a historical sample set based on relative position codes;

creating a multi-intention understanding model to be trained comprising at least one classifier, and acquiring a historical sample subset corresponding to each classifier from the historical sample set; the classifier is a multi-label classifier;

and training each classifier by utilizing a historical sample subset corresponding to the classifier, and obtaining a multi-intention understanding model after all the classifiers are trained.

Optionally, for any one historical multi-round conversation text, performing splicing processing on the historical multi-round conversation text, including:

step 1, initializing i=j=1, and initializing a j-th history splicing sequence d corresponding to the history multi-round conversation text _j When the historical sample is empty, initializing the historical sample corresponding to the historical multi-round conversation text to be empty;

step 2, judging an ith historical conversation text s in the historical multi-round conversation text _i And d is equal to _j Whether the sum of the lengths of the two sections is smaller than or equal to a preset threshold value; if yes, executing the step 3; if not, executing the step 4;

step 3, at said d _j Splicing the tail parts of the s _i And at s _i After thatAdding a position identifier, enabling i=i+1, and executing step 5;

step 4, at said d _j A semantic identifier is added to the head of the history sample, a text separator is added to the tail of the history sample, and the d is added to the tail of the history sample _j Let j=j+1, initialize d _j Empty and step 5 is performed;

step 5, judging whether i and j are smaller than or equal to the total number of historical conversation texts in the historical multi-round conversation texts; if yes, executing the step 2; if not, obtaining a history sample corresponding to the history multi-round conversation text.

Optionally, the obtaining a subset of the history samples corresponding to each classifier from the set of history samples includes: dividing the history sample set into history sample auxiliary sets with the total number according to the total number of the classifiers; selecting a history sample auxiliary set as a verification sample set corresponding to each classifier, taking the rest history sample auxiliary set as a training sample set corresponding to the classifier, and taking the training sample set and the verification sample set as a history sample subset corresponding to the classifier; wherein, the verification sample sets corresponding to different classifiers are different.

Optionally, the training the classifier by using the historical sample subset corresponding to the classifier includes: inputting a history sample in a history sample subset corresponding to the classifier into the classifier, and identifying the history sample in the classifier to obtain a predicted intention category of the history sample identified by the classifier; calculating a model loss value according to the predicted intention category of the historical sample and the preset actual intention category of the historical sample; and when the model loss value meets a preset condition, determining that the classifier training is completed.

Optionally, the history sample comprises at least one history splicing sequence, the history splicing sequence comprises at least one history session text, and a position identifier is added after each history session text; the step of identifying the history sample in the classifier to obtain the predicted intention category of the history sample identified by the classifier comprises the following steps: sequentially aiming at each history splicing sequence in the history sample in the classifier, and acquiring a predicted candidate intention category of the history splicing sequence based on an embedded vector corresponding to the position identifier in the history splicing sequence; and determining the predicted intention category of the history sample based on the predicted candidate intention category of each history splicing sequence.

Optionally, the training the classifier by using the historical sample subset corresponding to the classifier includes: based on the historical sample subset corresponding to the classifier, training the classifier by utilizing an exponential moving average operation and an countermeasure training operation.

According to another aspect of an embodiment of the present application, there is provided an intention recognition method including:

acquiring a multi-round conversation text to be analyzed, and performing splicing processing on the multi-round conversation text to be analyzed to obtain a sample to be analyzed based on relative position codes;

acquiring a pre-trained multi-intent understanding model, wherein the multi-intent understanding model comprises at least one classifier, and the classifier is a multi-label classifier; the multi-intent understanding model is trained by the method according to any one of the above;

and respectively identifying the sample to be analyzed by utilizing each classifier in the multi-intention understanding model, and determining the intention category of the multi-round conversation text to be analyzed based on the identification result of each classifier.

Optionally, the sample to be analyzed comprises at least one splicing sequence to be analyzed, the splicing sequence to be analyzed comprises at least one conversation text to be analyzed, and a position identifier is added after each conversation text to be analyzed; the method for determining the intention category of the multi-round conversation text to be analyzed based on the recognition results of the classifiers comprises the following steps: inputting the sample to be analyzed into each classifier, sequentially aiming at each splicing sequence to be analyzed in the sample to be analyzed in the classifier, acquiring candidate intention categories of the splicing sequence to be analyzed based on embedded vectors corresponding to the position identifiers in the splicing sequence to be analyzed, voting based on the candidate intention categories of the splicing sequence to be analyzed acquired by each classifier, and determining target candidate intention categories of the splicing sequence to be analyzed; and determining the intention category of the multi-round conversation text to be analyzed based on the target candidate intention category of each splicing sequence to be analyzed.

According to another aspect of an embodiment of the present application, there is provided a model training apparatus including:

the first splicing module is used for acquiring a historical multi-round conversation text set, and respectively carrying out splicing treatment on each historical multi-round conversation text to obtain a historical sample set based on relative position coding;

the first acquisition module is used for creating a multi-intention understanding model to be trained comprising at least one classifier, and acquiring a historical sample subset corresponding to each classifier from the historical sample set; the classifier is a multi-label classifier;

the training module is used for training the classifiers by utilizing the historical sample subsets corresponding to the classifiers according to each classifier, and obtaining a multi-intention understanding model after the training of all the classifiers is completed.

Optionally, the first splicing module includes:

an initializing unit, configured to initialize i=j=1, and initialize a j-th history splicing sequence d corresponding to the history multi-round session text _j When the historical sample is empty, initializing the historical sample corresponding to the historical multi-round conversation text to be empty;

a first judging unit for judging the ith historical conversation text s in the historical multi-round conversation text _i And d is equal to _j Whether the sum of the lengths of the two sections is smaller than or equal to a preset threshold value; if yes, calling a first splicing unit; if not, calling a second splicing unit;

a first splicing unit, configured to, at d _j Splicing the tail parts of the s _i And at s _i Then adding a position identifier to enable i=i+1, and calling a second judging unit;

a second splicing unit for performing the step d _j A semantic identifier is added to the head of the history sample, a text separator is added to the tail of the history sample, and the d is added to the tail of the history sample _j Let j=j+1, initialize d _j If the result is empty, a second judging unit is called;

the second judging unit is used for judging whether i and j are smaller than or equal to the total number of the historical conversation texts in the historical multi-round conversation texts; if yes, calling a first splicing unit; if not, obtaining a history sample corresponding to the history multi-round conversation text.

Optionally, the first acquisition module includes: the dividing unit is used for dividing the history sample set into history sample auxiliary sets with the total number according to the total number of the classifiers; a selecting unit, configured to select, for each classifier, one history sample auxiliary set as a verification sample set corresponding to the classifier, take the remaining history sample auxiliary set as a training sample set corresponding to the classifier, and take the training sample set and the verification sample set as a history sample subset corresponding to the classifier; wherein, the verification sample sets corresponding to different classifiers are different.

Optionally, the training module includes: the identification unit is used for inputting the history samples in the history sample subsets corresponding to the classifier into the classifier, and identifying the history samples in the classifier to obtain the predicted intention category of the history samples identified by the classifier; the calculation unit is used for calculating a model loss value according to the predicted intention category of the history sample and the preset actual intention category of the history sample; and the determining unit is used for determining that the classifier training is completed when the model loss value meets a preset condition.

Optionally, the history sample comprises at least one history splicing sequence, the history splicing sequence comprises at least one history session text, and a position identifier is added after each history session text; the identification unit is specifically configured to sequentially obtain, in the classifier, a predicted candidate intention category of the history splice sequence for each history splice sequence in the history sample, based on an embedded vector corresponding to the location identifier in the history splice sequence; and determining the predicted intention category of the history sample based on the predicted candidate intention category of each history splicing sequence.

Optionally, the training module is specifically configured to train the classifier by using an exponential moving average operation and an countermeasure training operation based on the historical sample subset corresponding to the classifier.

According to another aspect of an embodiment of the present application, there is provided an intention recognition apparatus including:

the second splicing module is used for acquiring a multi-round conversation text to be analyzed, and carrying out splicing treatment on the multi-round conversation text to be analyzed to obtain a sample to be analyzed based on relative position codes;

a second acquisition module for acquiring a pre-trained multi-intent understanding model, the multi-intent understanding model comprising at least one classifier, the classifier being a multi-label classifier; the multi-intent understanding model is trained by the method according to any one of the above;

and the identification module is used for respectively identifying the sample to be analyzed by utilizing each classifier in the multi-intention understanding model, and determining the intention category of the multi-round conversation text to be analyzed based on the identification result of each classifier.

Optionally, the sample to be analyzed comprises at least one splicing sequence to be analyzed, the splicing sequence to be analyzed comprises at least one conversation text to be analyzed, and a position identifier is added after each conversation text to be analyzed; the identification module is specifically configured to input the sample to be analyzed into each classifier, sequentially input each splicing sequence to be analyzed in the classifier, acquire candidate intention category of the splicing sequence to be analyzed based on the embedded vector corresponding to the position identifier in the splicing sequence to be analyzed, perform voting operation based on the candidate intention category of the splicing sequence to be analyzed acquired by each classifier, and determine a target candidate intention category of the splicing sequence to be analyzed; and determining the intention category of the multi-round conversation text to be analyzed based on the target candidate intention category of each splicing sequence to be analyzed.

According to another aspect of an embodiment of the present application, there is provided an electronic apparatus including: one or more processors; and one or more computer-readable storage media having instructions stored thereon; the instructions, when executed by the one or more processors, cause the processors to perform the method of any of the above.

According to another aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, causes the processor to perform the method of any of the above.

According to the embodiment of the application, on one hand, the historical sample based on the relative position code is obtained by splicing the historical multi-round conversation text, so that the position information between the front conversation text and the rear conversation text in the multi-round conversation text can be added into the input sequence of the multi-intention understanding model, the multi-intention understanding model can learn the correlation between the semantic information of the conversation text and the front-rear position information of the conversation text and the conversation intention type label, and the analysis of the multi-intention understanding model is more accurate; on the other hand, at least one multi-label classifier is arranged on the multi-intention understanding model, the training of each classifier is carried out by utilizing samples corresponding to the classifier, the training process is simpler and more convenient, and the final intention category can be comprehensively analyzed by utilizing the prediction result of the at least one classifier, so that the recognition result of the multi-intention understanding model is more comprehensive and has interpretability.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are only some drawings of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic illustration of an overall process of an embodiment of the present application.

FIG. 2 is a flow chart of the steps of a model training method according to an embodiment of the present application.

Fig. 3 is a flowchart of a splicing process according to an embodiment of the present application.

Fig. 4 is a schematic structural diagram of a classifier according to an embodiment of the present application.

FIG. 5 is a schematic flow chart of the countermeasure training according to the embodiment of the application.

FIG. 6 is a flow chart of a model training process according to an embodiment of the present application.

FIG. 7 is a flow chart of steps of an intent recognition method in accordance with an embodiment of the present application.

FIG. 8 is a flow chart of an intent recognition process in accordance with an embodiment of the present application.

FIG. 9 is a schematic diagram of a multi-purpose understanding model process according to an embodiment of the present application.

Fig. 10 is a block diagram of a model training apparatus according to an embodiment of the present application.

Fig. 11 is a block diagram showing the structure of an intention recognition apparatus according to an embodiment of the present application.

Fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that all actions for acquiring signals, information or data in the present application are performed in compliance with the corresponding data protection legislation policy of the country of location and obtaining the authorization granted by the owner of the corresponding device.

At present, a great amount of multi-round session text data of users and customer service staff are generated in an online customer service system every day, and enterprises need to acquire the requirements of the users on related services and products from the text data, and perform traffic statistics, product upgrading, service optimization and the like. Therefore, multi-label text classification is particularly important, however, most methods are only focused on solving the problem of multi-label correlation of single text, neglecting the correlation of semantic information and the position information of the context of each text in multi-round conversation text, and meanwhile cannot effectively deal with the problem of unbalanced data sets of different categories.

Aiming at the problems, the embodiment of the application provides that a multi-intention understanding model comprising a plurality of classifiers is trained, the text splicing processing is adopted to add the position correlation among the conversational texts, the model training is carried out in a model training optimization mode, and the intention recognition is carried out by utilizing the multi-intention understanding model obtained by training, so that a more accurate intention recognition result is obtained.

Referring to fig. 1, a schematic diagram of an overall process of an embodiment of the present application is shown.

As shown in fig. 1, the overall process may include:

s1, acquiring a historical multi-round session text set and a multi-round session text to be analyzed, and respectively performing splicing processing according to a text sequence splicing Strategy (concat_strategy) to obtain a historical sample set and a sample to be analyzed.

The splicing process can solve the problem of correlation between text semantic information and text position information and the intention category labels.

S2, training by using a model training Optimization Strategy (optimization_strategy) based on a historical sample set to obtain a multi-round conversational text multi-intention understanding model based on relative position coding.

The model training optimization strategy can solve the problem of unbalanced training data sets in category.

S3, based on a sample to be analyzed, adopting a multi-intention integration Strategy (bagging_strategy), and predicting and obtaining intention types of the multi-round conversation text to be analyzed by using a trained multi-intention understanding model.

The multi-intention integration strategy can comprehensively analyze the prediction results of a plurality of classifiers in the multi-intention understanding model, so that the accuracy of the whole prediction result is improved.

The embodiment of the application is suitable for the fields of business analysis, user demand mining, user portrait construction and the like, and is beneficial to enterprises to acquire the demand information of users for each dimension of products and customer services from multi-round session texts, and carry out business volume statistics, product upgrading, service optimization and the like.

Referring to FIG. 2, a flowchart of the steps of a model training method of an embodiment of the present application is shown.

As shown in fig. 2, the model training method may include the following steps 201 to 203:

step 201, acquiring a historical multi-round conversation text set, and respectively performing splicing processing on each historical multi-round conversation text to obtain a historical sample set based on relative position coding.

The historical multi-round conversation text set comprises a plurality of historical multi-round conversation texts, and each historical multi-round conversation text comprises a plurality of historical conversation texts. For example, for a section of historical dialogue between a user and customer service, multiple pieces of user text in the historical dialogue can be extracted as historical multi-round conversation text, and each piece of user text is taken as a piece of historical conversation text.

In the embodiment of the application, for each historical multi-round conversation text in a historical multi-round conversation text set, the historical multi-round conversation text is spliced to obtain a historical sample based on relative position codes corresponding to the historical multi-round conversation text, and then the historical sample based on relative position codes corresponding to each historical multi-round conversation text forms a historical sample set based on relative position codes.

For any one history multi-round conversation text, according to a forward greedy algorithm, under the limiting condition that the length of a history splicing sequence is smaller than or equal to a preset threshold value, a plurality of history conversation texts contained in the history multi-round conversation text are spliced into at least one history splicing sequence in sequence.

In an alternative embodiment, for any one historical multi-round conversation text, the process of performing the splicing processing on the historical multi-round conversation text may include the following steps 1 to 5:

step 1, initializing i=j=1, and initializing a j-th history splicing sequence d corresponding to the history multi-round conversation text _j And initializing a history sample corresponding to the history multi-round conversation text to be empty.

Step 2, judging an ith historical conversation text s in the historical multi-round conversation text _i And d is equal to _j Whether the sum of the lengths of the two pairs is smaller than or equal to a preset threshold value. If yes, executing the step 3; if not, executing the step 4.

The preset threshold is a difference value between a maximum length of a preset sequence and a preset symbol length.

Step 3, at said d _j Splicing the tail parts of the s _i And at s _i After that, a location identifier is added, let i=i+1, and step 5 is performed.

Step 4, at said d _j A semantic identifier is added to the head of the history sample, a text separator is added to the tail of the history sample, and the d is added to the tail of the history sample _j Let j=j+1, initialize d _j Is empty and step 5 is performed.

And 5, judging whether both i and j are smaller than or equal to the total number of the historical conversation texts in the historical multi-round conversation texts. If yes, executing the step 2; if not, obtaining a history sample corresponding to the history multi-round conversation text.

For example, a historical multi-turn conversation text collection is obtained: histcove= { Conv ₁ ,Conv ₂ ,…,Conv _histID The HistConvs contains a plurality of historical multi-round session texts, and each historical multi-round session text is spliced according to a text sequence splicing Strategy (concat_Strategy), so that a historical sample set based on relative position codes corresponding to the historical multi-round session text set is generated: histAdjData = { Data ₁ ，Data ₂ ，...，Data _histID }。

Referring to fig. 3, a flowchart of a stitching process is shown in accordance with an embodiment of the present application.

As shown in fig. 3, the splicing process may include:

s1.1, input: historical multi-round conversational text Conv _ID ＝[s ₁ ，s ₂ ，...，s _n ]。

Wherein Conv _ID It may refer to the history multi-turn conversation text in the history multi-turn conversation text set HistConvs described above, i.e. id=1, 2. Conv _ID Containing n pieces of history session text s _i ，i＝1，2，…，n。

S1.2, initializing: the parameter i=j=1, the history multi-round conversational text Conv _ID Corresponding j-th history splice sequence d _j Empty, historical multi-round conversational text Conv _ID Corresponding historical sample Data _ID Is empty, the maximum sequence length max_length=l is preset.

S1.3, judging whether the length (d) _j +s _i ) The weight is less than or equal to (l-3); if yes, executing S1.4, otherwise executing S1.5.

Wherein length (d) _j +s _i ) Representation d _j Historical multi-round conversation text Conv _ID I-th historical conversation text s in (a) _i 3 represents a preset symbol length, the symbols comprising a position identifier, a semantic identifier and a text separator, each symbol length being 1. It will be appreciated that the symbols and symbol lengths may be provided in other forms, and this embodiment is not limited thereto.

S1.4, let d _j ＝d _j +s _i +[MASK]I=i+1, and S1.6 is performed.

I.e. at d _j Tail splice s of (2) _i Then at s _i Then add a location identifier [ MASK ]]Let i=i+1.

S1.5, let d _j ＝[CLS]+d _j +[SEP]At Data _ID Tail addition d _j J=j+1, initialize d _j Is empty and S1.6 is performed.

I.e. at d _j Adds a semantic identifier [ CLS ] to the header of (a) a]At d _j Adding a text separator [ SEP ] at the tail of the file]After that, d _j Add to Data _ID Let j=j+1, initialize d _j Is empty.

S1.6, judging whether i is less than or equal to n and j is less than or equal to n; if yes, executing S1.3, otherwise executing S1.7.

Wherein n represents Conv _ID List length (Conv) _ID ) =n, i.e. Conv _ID In the history session text total bar number.

S1.7, output Conv _ID Corresponding historical sample Data _ID ＝[d ₁ ，d ₂ ，...，d _q ]。

Data _ID Comprising q historical splice sequences d _j ，1≤q≤n。d _j Is a history splicing sequence in a history sample based on relative position coding constructed according to the idea of the Prompt method, and a construction template is as follows: d, d _j ＝[CLS]s _i [MASK]s _i+1 [MASK]…s _i+x [MASK][SEP]Wherein [ CLS ]]Is a semantic characterizer, [ SEP ]]Is a text separator, [ MASK ]]Is s _i Is a location identifier of (a). Splicing Strategy Concat_Strategy according to the forward greedy algorithm (Forward Greedy Algorithm, FGA), at d _j For Conv under the constraint that the length of (2) does not exceed a preset threshold _ID S in (3) _i The corresponding Data is obtained by carrying out the Prompt construction in sequence _ID 。

Step 202, creating a multi-intention understanding model to be trained comprising at least one classifier, and acquiring a historical sample subset corresponding to each classifier from the historical sample set.

In the embodiment of the application, the multi-intention understanding model to be trained can comprise at least one classifier, the multi-intention understanding model can respectively carry out intention recognition by utilizing the at least one classifier, and then comprehensive analysis is carried out on the intention recognition results of the plurality of classifiers. In the model training process, in order to simplify the training process, training may be performed separately for each classifier. Wherein the classifier is a multi-label classifier.

In an alternative embodiment, the process of obtaining the historical sample subset corresponding to each classifier from the historical sample set may include: dividing the history sample set into history sample auxiliary sets with the total number according to the total number of the classifiers; selecting a history sample auxiliary set as a verification sample set corresponding to each classifier, taking the rest history sample auxiliary set as a training sample set corresponding to the classifier, and taking the training sample set and the verification sample set as a history sample subset corresponding to the classifier; wherein, the verification sample sets corresponding to different classifiers are different.

By means of the K-fold cross validation method for model training, the problem of unbalanced classification of the training data set can be solved, and a model training optimization strategy aiming at the unbalanced classification data set is realized.

Step 203, training the classifiers by using the historical sample subsets corresponding to the classifiers according to each classifier, and obtaining a multi-intention understanding model after all the classifiers are trained.

In an alternative embodiment, the step of training the classifier by using the historical sample subset corresponding to the classifier may include the following steps A1 to A3:

and A1, inputting a history sample in a history sample subset corresponding to the classifier into the classifier, and identifying the history sample in the classifier to obtain the predicted intention category of the history sample identified by the classifier.

Illustratively, the process of identifying the historical sample in the classifier to obtain the predicted intent category of the historical sample identified by the classifier may include: sequentially aiming at each history splicing sequence in the history sample in the classifier, and acquiring a predicted candidate intention category of the history splicing sequence based on an embedded vector corresponding to the position identifier in the history splicing sequence; and determining the predicted intention category of the history sample based on the predicted candidate intention category of each history splicing sequence.

For example, after obtaining the predicted candidate intention category of each history splice sequence in the history sample, the predicted candidate intention category of all the history splice sequences in the history sample may be taken as the predicted intention category of the history sample.

Referring to fig. 4, a schematic diagram of a classifier according to an embodiment of the present application is shown.

According to the classifier structure shown in FIG. 4, one history splice sequence [ CLS ] in the history samples]s ₁ [MASK]s ₂ [MASK]…s _x [MASK][SEP]An enhanced transducer layer with RoPE (Rotary Position Embedding, rotary position encoded) in the input classifier, i.e. a rofumer layer; word segmentation operation (Token) is carried out on the RoFormer layer, so that Token Embedding vectors (embedded vectors) of each unit position in the history spliced sequence are obtained; then only each historical conversation text s is used _i Location identifier [ MASK ] of (a)]The Token encoding vector corresponding to the cell position, i.e. T _[MASK] Performing subsequent classification calculation through a Linear layer; the probability of each intent category is then calculated by the Softmax (activation function) layer; finally, the intention category with the highest probability is selected as the predicted intention category through an Argmax (independent variable maximum function) layer. Note that, for the Softmax (activation function) layer and the Argmax (argument maximum function) layer, which are not shown in fig. 4, specific structures may be shown with reference to fig. 9 as follows.

And step A2, calculating a model loss value according to the predicted intention category of the history sample and the preset actual intention category of the history sample.

Illustratively, the model loss values may include, but are not limited to: cross entropy loss function, absolute value loss function, square loss function, exponential loss function, and the like.

And A3, determining that the classifier training is completed when the model loss value meets the preset condition.

For example, in the case where the model loss value is less than a preset loss threshold, it is determined that the classifier training is complete. The specific values of the loss threshold value and the loss threshold value may be set according to practical experience, and the present embodiment is not limited thereto.

Illustratively, in training a classifier, to improve the recognition capability of an intent understanding model, the classifier may also be trained for model optimization using an exponential moving average (Exponential Moving Average, EMA) operation and an countermeasure training (Adversarial Training, AT) operation based on a subset of historical samples corresponding to the classifier.

For example, an EMA strategy may be used to smooth model weights according to equation 1, giving the model better generalization ability. The specific process of EMA operation may be handled based on practical experience and this embodiment will not be discussed in detail here.

v _t ＝β·v _t-1 +(1-β)·θ _t Equation 1

In equation 1, θ _t Weight of model at time t, v _t Shadow weight representing t moment model, v _t-1 The shadow weight of the model at time t-1 is represented, and beta represents a preset weighting weight (for example, beta is set to 0.999, although other values can be set).

For example, any AT operation mode may be selected for execution in AT operation, which is not limited in this embodiment. The AT operation may include, but is not limited to: FGM (Fast Gradient Method, fast gradient up), FGM (Fast Gradient Sign Method, fast gradient sign), PGD (Projected Gradient Descent, projection gradient down), and so on.

Taking FGM as an example, the countermeasure can be achieved by adding a disturbance to the Embedding layer according to equation 2.

Wherein,representing the distribution of input samples, +.>Representing input->Representing the label, θ is the model parameter, +.>Loss, which is a single sample,>is the perturbation and S is the perturbation space.

Referring to fig. 5, a schematic flow chart of the countermeasure training according to the embodiment of the application is shown. As shown in fig. 5, the process of countermeasure training may include: forward (forward) calculates normal loss (loss), backward (backward) calculates normal parameter gradient (grad); from the challenge training formula and gradient, the disturbance (i.e., the above-described disturbance is calculated ) Disturbance calculated on the weight accumulation (attack) of the Embedding (Embedding) layer; calculating a new loss (adv_loss) and a new gradient (adv_grad) according to the accumulated model weights; if not the last step (if PGD), calculating disturbance according to the countermeasure training formula and the new gradient, accumulating disturbance (attack) again, and proceeding to the next step (K-step); and (3) restoring the initial weight of the Embedding (Embedding) layer until the last step, accumulating the original gradient and the new gradient (adv_grad), and updating the model weight.

For example, the history-based multi-round conversation text set histconvs= { Conv ₁ ,Conv ₂ ,…,Conv _histID Corresponding history sample set histadjdata= { Data based on relative position coding ₁ ,Data ₂ ,…,Data _histID Training to obtain a multi-intention understanding model of multi-round session text based on relative position coding through a model training Optimization Strategy (optimization_strategy): multiIntrECognizer= { fold ₁ ,fold ₂ ,…,fold _K }, where fold _i Represents the ith classifier, K.epsilon.N.

Referring to FIG. 6, a flow chart of a model training process of an embodiment of the present application is shown.

As shown in fig. 6, the model training process may include:

s2.1, input: history sample set histadjdata= { Data ₁ ,Data ₂ ,…,Data _histID }。

S2.2, dividing the HistAdjDate into K historical sample auxiliary sets randomly: subsets= { Set ₁ ,Set ₂ ,…,Set _K }。

Wherein,x=m/K, m representing the total number of history samples contained in the set of history samples, K representing the total number of classifiers, +.>

S2.3, according to a data set dividing method of KFoldCV (K-fold cross validation), obtaining a historical sample subset for training and verifying K classifiers: KFOld-Datasets= { Dataset ₁ ,Dataset ₂ ,…,Dataset _K }。

Wherein, dataset _k ＝{Train _k ,Valid _k }，Train _k Training sample set representing kth classifier, train _k ＝(Subsets-Set _k )，Valid _k Verification sample set representing kth classifier, valid _k ＝Set _k 。

S2.4, initializing: the parameter k=1, the multi-purpose understanding model multitracognizer= { }.

S2.5 based on Dataset _k Training to obtain a fold by using EMA and AT strategies according to the RoFormer algorithm principle _k And add it to a multitracognizer, k=k+1。

S2.6, judging whether K is less than or equal to K, if yes, executing S2.5, otherwise executing S2.7.

S2.7, outputting: multi-purpose understanding model multitracognizer= { fold ₁ ,fold ₂ ,…,fold _K }。

In the embodiment of the application, a text sequence splicing strategy and a multi-intention understanding model structure aiming at the correlation between the text position information and the intention type label of the multi-round conversation are provided, the position information between the front text and the rear text in the multi-round conversation is added into the input sequence of the multi-intention understanding model, and the subsequent classification calculation is carried out by using the vector of the position unit in the sequence, so that the model can learn the correlation between the text semantic information and the context position information of the conversation and the intention type label of the conversation. The model training optimization strategy aiming at the class imbalance data set is also provided, the multi-intention understanding model carries out model training by using a K-fold cross validation method, various text features in a training sample are learned as much as possible through a plurality of classifiers, the expression capacity of the model is enhanced through countermeasure training, and the robustness and generalization capacity of the model are increased through exponential moving average so as to further improve the effect of multi-intention understanding.

Referring to fig. 7, a flowchart of the steps of a method for intent recognition is shown in an embodiment of the present application.

As shown in fig. 7, the intention recognition method may include the steps of:

and 701, acquiring a multi-round conversation text to be analyzed, and performing splicing processing on the multi-round conversation text to be analyzed to obtain a sample to be analyzed based on relative position codes.

The multi-round conversation text to be analyzed comprises a plurality of pieces of conversation text to be analyzed. For example, for a section of dialogue to be analyzed between a user and customer service, multiple pieces of user text in the dialogue to be analyzed can be extracted as multiple rounds of dialogue text to be analyzed, and each piece of user text is used as one piece of dialogue text to be analyzed.

For the multi-round conversation text to be analyzed, according to a forward greedy algorithm, under the limiting condition that the length of the splicing sequence to be analyzed is smaller than or equal to a preset threshold value, a plurality of pieces of conversation text to be analyzed contained in the multi-round conversation text to be analyzed are spliced into at least one splicing sequence to be analyzed in sequence.

In an alternative embodiment, the process of performing splicing processing on the multi-round conversation text to be analyzed to obtain the sample to be analyzed based on the relative position code may include the following steps (1) to (5):

Step (1), initializing j=1, and initializing a j-th splicing sequence d to be analyzed corresponding to the multi-round conversation text to be analyzed _j And (3) if the sample to be analyzed is empty, initializing the sample to be analyzed corresponding to the multi-round session text to be analyzed.

Step (2), judging an ith conversation text s to be analyzed in the multiple conversational texts to be analyzed _i And d is equal to _j Whether the sum of the lengths of the two pairs is smaller than or equal to a preset threshold value. If yes, executing the step (3); if not, executing the step (4). The preset threshold is a difference value between a maximum length of a preset sequence and a preset symbol length.

Step (3), at d _j Splicing the tail parts of the s _i And at s _i Then, a location identifier is added, i=i+1, and step (5) is performed.

Step (4), at d _j A semantic identifier is added to the head of the history sample, a text separator is added to the tail of the history sample, and the d is added to the tail of the history sample _j Let j=j+1, initialize d _j Is empty and step (5) is performed.

And (5) judging whether i and j are smaller than or equal to the total number of the session texts to be analyzed in the multi-round session texts to be analyzed. If yes, executing the step (2); and if not, obtaining a sample to be analyzed corresponding to the multi-round session text to be analyzed.

Step 702, a pre-trained multi-intent understanding model is obtained, the multi-intent understanding model including at least one classifier.

The multi-purpose understanding model is trained by the model training method described in any of the embodiments above.

And step 703, respectively identifying the sample to be analyzed by utilizing each classifier in the multi-intention understanding model, and determining the intention category of the multi-round conversation text to be analyzed based on the identification result of each classifier.

In an alternative embodiment, the process of identifying the sample to be analyzed by using each classifier in the multi-intention understanding model and determining the intention type of the multi-round conversation text to be analyzed based on the identification result of each classifier may include the following steps B1 to B2:

step B1, inputting the sample to be analyzed into each classifier, sequentially inputting each splicing sequence to be analyzed in the sample to be analyzed into the classifier, acquiring candidate intention categories of the splicing sequence to be analyzed based on embedded vectors corresponding to the position identifiers in the splicing sequence to be analyzed, voting operation is performed based on the candidate intention categories of the splicing sequence to be analyzed acquired by each classifier, and determining target candidate intention categories of the splicing sequence to be analyzed.

In the embodiment of the present application, any suitable voting method may be used to perform the voting operation, which is not limited in this embodiment.

Taking the relative majority Voting (PV) as an example, K classifiers fold can be applied according to equation 3 below _k For s _i Corresponding [ MASK ]]And carrying out fusion analysis on the multi-intention prediction result of the unit position.

Formula 1 shows that most of the candidate intention categories of the splicing sequence to be analyzed, which are acquired by each classifier, are selected. Wherein t represents the number of intention categories,represents the kth classifier pair s _i Corresponding [ MASK ]]Prediction of cell location on the t-th intent category.

And B2, determining the intention category of the multi-round conversation text to be analyzed based on the target candidate intention category of each splicing sequence to be analyzed.

For example, after obtaining the target candidate intention category of each splicing sequence to be analyzed in the sample to be analyzed, the target candidate intention category of all splicing sequences to be analyzed in the sample to be analyzed can be used as the predicted intention category of the multi-round session text to be analyzed.

For example, based on multiple rounds of conversational text Data to be analyzed _newID Prediction using a multi-intent understanding model MultiIntRactoginizer and Conv is derived by a multi-intent integration Strategy (bagging_Strategy) _newID Is set of intention categories convinteractions= { int ₁ ,int ₂ ,…,int _e }，e∈N。

Referring to FIG. 8, a flow chart of an intent recognition process is shown in accordance with an embodiment of the present application.

As shown in fig. 8, the intent recognition process may include:

s3.1, input: multi-round conversation text Data to be analyzed _newID ＝[d ₁ ,d ₂ ,…,d _q ]。

S3.2, initializing: the parameter j=1, and the intent class set convinteractions= { } of the multi-round conversation text to be analyzed is empty.

S3.3, using a MultiIntRactoginizer model for the analysis of the splice sequences d _j Performing intention recognition to obtain the fold of each classifier _k For d _j Multi-purpose tag prediction result set Results _j ＝{fold ₁ (d _j ),fold ₂ (d _j ),…,fold _K (d _j )}。

Wherein,

s3.4, according to Results _j All of the folds in (a) _k For each l in ClassLabels using the PV method _t Voting to obtain d _j Each s of (3) _i Is a conversation intention category label set />

Wherein,

s3.5 according to Labels _j Adding non-duplicate l to ConvIntins _t ，j＝j+1。

S3.6, judging whether j is less than or equal to q, if yes, executing S3.3, otherwise executing S3.7.

Where q represents the total number of the session text to be analyzed, i.e. length (Data) _newID )＝q。

S3.7, outputting: intent category set convinteractions= { int ₁ :l _x ,int ₂ :l _y ,…,int _e :l _z }。

Wherein,is fold _k For s _i Corresponding [ MASK ] ]Candidate intention category predicted by unit location, +.>ClassLabels＝{l ₁ ,l ₂ ,…,l _T -a predefined set of intention categories, T e N; />Is fold _k For s _i Corresponding [ MASK ]]The unit location is in conversational intent category l _t Prediction result, prediction output l _t Then it is noted 1 and vice versa it is noted 0.

According to the embodiment of the application, the intention recognition is carried out on the multi-round conversation text to be analyzed according to the pre-trained multi-intention recognition model, and the problem of comprehensive analysis of the prediction results of the multiple classifiers in the multi-intention understanding model can be solved based on the multi-intention integration strategy of the multiple classifiers, so that the output multi-intention understanding result is more comprehensive and has interpretability.

The overall processing procedure in the embodiment of the present application is described below based on a specific example.

For example, a certain multi-turn session text of the operator online customer service system is shown in table 1:

list one

/>

In the embodiment of the application, the whole processing process is as follows:

s11, acquiring a history multi-round conversation text set HistConvs= { Conv ₁ ，Conv ₂ ，...，Conv ₉₉₉₉ Sum of the multi-round conversational text Conv to be analyzed ₁₀₀₀₀ Performing splicing processing according to a text sequence splicing Strategy (concat_strategy) to respectively generate a history sample set HistAdjData= { Data based on relative position coding corresponding to a history multi-round conversation text set ₁ ，Data ₂ ，...，Data ₉₉₉₉ Relative position coding-based sample Data to be analyzed corresponding to the multi-round conversation text to be analyzed ₁₀₀₀₀ 。

With multi-round conversational text Conv to be analyzed _1o000 For example, the specific steps of S11 are as follows:

s11.1, input: multi-round conversation text Conv to be analyzed ₁₀₀₀₀ ＝[s ₁ ，s ₂ ，s ₃ ，s ₄ ，s ₅ ，s ₆ ，s ₇ ]。

S11.2, initializing: the parameter i=j=1, the splice sequence d to be analyzed ₁ Empty, multi-round conversational text Conv to be analyzed ₁₀₀₀₀ Corresponding sample Data to be analyzed ₁₀₀₀₀ ＝[]The maximum length of the sequence max_length=40 is set in this embodiment.

S11.3, judging length (d ₁ +s ₁ ) =15 is smaller than (max_length-3) =37, and thus S11.4 is performed.

S11.4, at d ₁ Is spliced s in sequence at the tail part of the steel wire rope ₁ And [ MASK ]]Prompt construction is performed to obtain d ₁ What is the= "get traffic, how is there no added traffic to be? [ MASK]"i=i+1=2, and S11.6 is performed.

S11.6, it is judged that i=2 and j=1 are smaller than length (Conv ₁₀₀₀₀ ) =7, thus performing S11.3.

Repeating the above steps S11.3, S11.4 and S11.6 until length (d ₁ +s ₄ ) When=47:

s11.3, judging length (d ₁ +s ₄ ) =47 is greater than 37, so S11.5 is performed.

S11.5, at d ₁ Respectively add a [ CLS ]]And [ SEP ]]Obtaining d ₁ ＝″[CLS]How does the get traffic have no added traffic to be pressed? [ MASK]Is you give my meeting gift. [ MASK]Each of whichAnd append it to Data ₁₀₀₀₀ At the end of (2) to obtain Data ₁₀₀₀₀ ＝{d ₁ J=j+1=2, initializing d ₂ Is empty.

The steps of S11.3 to S11.6 described above are repeated until i=8:

s11.6, i=8 is judged to be larger than 7,j =3 and smaller than 7, and thus S11.7 is executed.

S11.7, output Conv ₁₀₀₀₀ The corresponding sample to be analyzed:

s12, based on HistAdjData, training through a model training Optimization Strategy (optimization_Strategy) to obtain a multi-round session multi-intention understanding model based on relative position coding: the method comprises the following specific steps of:

s12.1, input history sample set HistAdjData= { Data ₁ ，Data ₂ ，...，Data ₉₉₉₉ }。

S12.2, in this embodiment, the number of classifiers is set to k=3, and HistAdjData is randomly divided into 3 auxiliary sets of history samples according to the session ID:

/>

s12.3, obtaining the data set for training and verifying 3 classifiers according to the data set partitioning method of KFoldCV _k Is a subset of the historical samples of (a):

wherein Train represents the training sample set and Valid represents the verification sample set.

S12.4, initializing: the parameter k=1, the multi-purpose understanding model multitracognizer= { }.

S12.5 Dataset-based ₁ Data are trained to obtain a fold by using EMA and AT strategies through a RoFormer algorithm principle ₁ And add it to the multitracognizer= { fold ₁ }，k＝k+1＝2。

S12.6, it is judged that k=2 is smaller than k=3, and thus S12.5 is performed.

The steps of S12.5 to S12.6 described above are repeated until k=4:

s12.6, it is judged that k=4 is greater than 3, and thus S12.7 is performed.

S12.7, output: multiIntrECognizer= { fold ₁ ，fold ₂ ，fold ₃ }。

S13, based on Data ₁₀₀₀₀ Data, predicted using MultiIntRECTRGNize, and Conv obtained by a multi-intent integration Strategy (bagging_Strategy) ₁₀₀₀₀ The final intent category set convinteractions. The method comprises the following specific steps:

s13.1, input: sample Data to be analyzed ₁₀₀₀₀ ＝[d ₁ ，d ₂ ，d ₃ ]。

S13.2, initializing: parameter j=1, multi-round conversational text Conv to be analyzed ₁₀₀₀₀ Corresponding intent category sets convinteractions= { }.

S13.3 using MultiIntrImognizer vs d ₁ Multi-intent recognition is performed according to the predefined intent class set classlabels= { l in this embodiment ₁ =query, l ₂ Counseling, l ₃ Handling, l ₄ Fault, l ₅ =harassment, l ₆ =other }, get each fold _k For d ₁ Is provided.

Referring to FIG. 9, a schematic diagram of a multi-intent understanding model process is shown, in accordance with an embodiment of the present application. As shown in fig. 9, each fold is processed _k For d ₁ Is set of prediction Results ₁ The following are provided:

s13.4, according to Results ₁ All of the folds in (a) _k For each session intention class label l in class labels using PV method _t Voting to obtain d ₁ Each s of (3) _i Is a set of conversational intent category labels:

s13.5 according to Labels ₁ Adding non-duplicate conversation intention category labels l to ConvInterons _t Get convinteractions= { query, consult }, j=j+1=2.

S13.6, j=2 is judged to be smaller than length (Data ₁₀₀₀₀ ) =3, thus performing S13.3.

The steps of S13.3 to S13.6 described above are repeated until j=4:

s13.6, j=4 is judged to be greater than 3, and thus S13.7 is performed.

S13.7, output Conv ₁₀₀₀₀ Corresponding intention category set convintersections= { int ₁ : query, int ₂ : consultation, int ₃ : failure, int ₄ : others }.

In the embodiment of the application, when the multi-intention understanding problem of the multi-round conversation text is processed, the result output by the multi-label text classification method is more comprehensive and has more value; based on the correlation existing between the semantic information of the conversation text and the contextual position information of the conversation text and the conversation intention category labels, a text sequence splicing strategy and a multi-intention understanding model structure are provided, and a model training optimization strategy is provided to train on a category unbalanced data set to obtain a multi-intention understanding model; finally, based on the prediction results of all the classifiers of the model, the multi-intention integration strategy is adopted to comprehensively obtain the multi-intention understanding result of the multi-round session with better interpretability.

Referring to fig. 10, a block diagram of a model training apparatus according to an embodiment of the present application is shown.

As shown in fig. 10, the model training apparatus may include the following modules:

the first splicing module 1001 is configured to obtain a set of historical multi-round session texts, and respectively splice each historical multi-round session text to obtain a set of historical samples based on relative position codes;

a first obtaining module 1002, configured to create a multi-purpose understanding model to be trained including at least one classifier, and obtain a subset of historical samples corresponding to each classifier from the set of historical samples; the classifier is a multi-label classifier;

the training module 1003 is configured to train each classifier by using a subset of the historical samples corresponding to the classifier, and obtain a multi-intention understanding model after training all the classifiers.

Optionally, the first splicing module 1001 includes:

a first judging unit for judging the history multiple Ith historical conversation text s in round conversation text _i And d is equal to _j Whether the sum of the lengths of the two sections is smaller than or equal to a preset threshold value; if yes, calling a first splicing unit; if not, calling a second splicing unit;

Optionally, the first obtaining module 1002 includes: the dividing unit is used for dividing the history sample set into history sample auxiliary sets with the total number according to the total number of the classifiers; a selecting unit, configured to select, for each classifier, one history sample auxiliary set as a verification sample set corresponding to the classifier, take the remaining history sample auxiliary set as a training sample set corresponding to the classifier, and take the training sample set and the verification sample set as a history sample subset corresponding to the classifier; wherein, the verification sample sets corresponding to different classifiers are different.

Optionally, the training module 1003 includes: the identification unit is used for inputting the history samples in the history sample subsets corresponding to the classifier into the classifier, and identifying the history samples in the classifier to obtain the predicted intention category of the history samples identified by the classifier; the calculation unit is used for calculating a model loss value according to the predicted intention category of the history sample and the preset actual intention category of the history sample; and the determining unit is used for determining that the classifier training is completed when the model loss value meets a preset condition.

Optionally, the training module 1003 is specifically configured to train the classifier by using an exponential moving average operation and an countermeasure training operation based on the historical sample subset corresponding to the classifier.

Referring to fig. 11, a block diagram of an intention recognition apparatus according to an embodiment of the present application is shown.

As shown in fig. 11, the intention recognition apparatus may include the following modules:

a second splicing module 1101, configured to obtain a multi-round session text to be analyzed, and perform splicing processing on the multi-round session text to be analyzed to obtain a sample to be analyzed based on relative position coding;

a second obtaining module 1102, configured to obtain a pre-trained multi-intent understanding model, where the multi-intent understanding model includes at least one classifier, and the classifier is a multi-label classifier; the multi-intent understanding model is trained by the method according to any one of the above;

the identifying module 1103 is configured to identify the sample to be analyzed by using each classifier in the multi-intention understanding model, and determine an intention type of the multi-round conversation text to be analyzed based on an identification result of each classifier.

Optionally, the sample to be analyzed comprises at least one splicing sequence to be analyzed, the splicing sequence to be analyzed comprises at least one conversation text to be analyzed, and a position identifier is added after each conversation text to be analyzed; the identifying module 1103 is specifically configured to input the sample to be analyzed into each classifier, sequentially aim at each splicing sequence to be analyzed in the sample to be analyzed in the classifier, obtain candidate intention category of the splicing sequence to be analyzed based on the embedded vector corresponding to the position identifier in the splicing sequence to be analyzed, perform voting operation based on the candidate intention category of the splicing sequence to be analyzed obtained by each classifier, and determine a target candidate intention category of the splicing sequence to be analyzed; and determining the intention category of the multi-round conversation text to be analyzed based on the target candidate intention category of each splicing sequence to be analyzed.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

In an embodiment of the application, an electronic device is also provided. The electronic device may include one or more processors and one or more computer-readable storage media having instructions stored thereon, such as an application program. The instructions, when executed by the one or more processors, cause the processors to perform the method of any of the embodiments described above.

Referring to fig. 7, a schematic diagram of an electronic device structure according to an embodiment of the present application is shown. As shown in fig. 7, the electronic device includes a processor 701, a communication interface 702, a memory 703, and a communication bus 704. The processor 701, the communication interface 702, and the memory 703 communicate with each other through the communication bus 704.

A memory 703 for storing a computer program.

The processor 701 is configured to implement the method of any of the above embodiments when executing the program stored in the memory 703.

The communication interface 702 is used for communication between the electronic device and other devices described above.

The communication bus 704 may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The above-mentioned processor 701 may include, but is not limited to: central processing units (Central Processing Unit, CPU), network processors (Network Processor, NP), digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like.

The above mentioned memory 703 may include, but is not limited to: read Only Memory (ROM), random access Memory (Random Access Memory RAM), compact disk Read Only Memory (Compact Disc Read Only Memory CD-ROM), electrically erasable programmable Read Only Memory (Electronic Erasable Programmable Read Only Memory EEPROM), hard disk, floppy disk, flash Memory, and the like.

In an embodiment of the application, there is also provided a computer readable storage medium having stored thereon a computer program executable by a processor of an electronic device, the computer program, when executed by the processor, causing the processor to perform the method as described in any of the embodiments above.

In this specification, various embodiments are interrelated, and each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, so that identical and similar parts between the various embodiments are referred to each other.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM, RAM, magnetic disk, optical disk) and including several instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. In view of the foregoing, this description should not be construed as limiting the application.

Claims

1. A method of model training, the method comprising:

2. The method of claim 1, wherein for any one of the historical multi-turn conversation texts, performing a stitching process on the historical multi-turn conversation text comprises:

step 3, at said d _j Splicing the tail parts of the s _i And at s _i Then adding a position identifier to enable i=i+1, and executing step 5;

3. The method according to claim 1, wherein the obtaining a subset of the historical samples corresponding to each classifier from the set of historical samples comprises:

Dividing the history sample set into history sample auxiliary sets with the total number according to the total number of the classifiers;

selecting a history sample auxiliary set as a verification sample set corresponding to each classifier, taking the rest history sample auxiliary set as a training sample set corresponding to the classifier, and taking the training sample set and the verification sample set as a history sample subset corresponding to the classifier; wherein, the verification sample sets corresponding to different classifiers are different.

4. The method of claim 1, wherein training the classifier using the subset of historical samples corresponding to the classifier comprises:

inputting a history sample in a history sample subset corresponding to the classifier into the classifier, and identifying the history sample in the classifier to obtain a predicted intention category of the history sample identified by the classifier;

calculating a model loss value according to the predicted intention category of the historical sample and the preset actual intention category of the historical sample;

and when the model loss value meets a preset condition, determining that the classifier training is completed.

5. The method of claim 4, wherein the history sample comprises at least one history splice sequence comprising at least one history session text, each history session text being followed by a location identifier; the step of identifying the history sample in the classifier to obtain the predicted intention category of the history sample identified by the classifier comprises the following steps:

sequentially aiming at each history splicing sequence in the history sample in the classifier, and acquiring a predicted candidate intention category of the history splicing sequence based on an embedded vector corresponding to the position identifier in the history splicing sequence;

and determining the predicted intention category of the history sample based on the predicted candidate intention category of each history splicing sequence.

6. The method of claim 1, wherein training the classifier using the subset of historical samples corresponding to the classifier comprises:

based on the historical sample subset corresponding to the classifier, training the classifier by utilizing an exponential moving average operation and an countermeasure training operation.

7. A method of intent recognition, the method comprising:

acquiring a pre-trained multi-intent understanding model, wherein the multi-intent understanding model comprises at least one classifier, and the classifier is a multi-label classifier; the multi-purpose understanding model is trained by the method of any one of claims 1 to 6;

8. The method according to claim 7, wherein the sample to be analyzed comprises at least one splice sequence to be analyzed, the splice sequence to be analyzed comprises at least one conversation text to be analyzed, and each conversation text to be analyzed is added with a position identifier; the method for determining the intention category of the multi-round conversation text to be analyzed based on the recognition results of the classifiers comprises the following steps:

Inputting the sample to be analyzed into each classifier, sequentially aiming at each splicing sequence to be analyzed in the sample to be analyzed in the classifier, acquiring candidate intention categories of the splicing sequence to be analyzed based on embedded vectors corresponding to the position identifiers in the splicing sequence to be analyzed, voting based on the candidate intention categories of the splicing sequence to be analyzed acquired by each classifier, and determining target candidate intention categories of the splicing sequence to be analyzed;

and determining the intention category of the multi-round conversation text to be analyzed based on the target candidate intention category of each splicing sequence to be analyzed.

9. A model training apparatus, the apparatus comprising:

10. An intent recognition device, the device comprising:

11. An electronic device, comprising:

one or more processors; and

one or more computer-readable storage media having instructions stored thereon;

The instructions, when executed by the one or more processors, cause the processor to perform the model training method of any one of claims 1 to 6, or to perform the intent recognition method of any one of claims 7 to 8.

12. A computer readable storage medium, having stored thereon a computer program which, when executed by a processor, causes the processor to perform the model training method of any of claims 1 to 6 or to perform the intent recognition method of any of claims 7 to 8.