CN111221963A

CN111221963A - Intelligent customer service data training model field migration method

Info

Publication number: CN111221963A
Application number: CN201911133457.9A
Authority: CN
Inventors: 张翀; 江岭
Original assignee: Chengdu Xiaoduo Technology Co Ltd
Current assignee: Chengdu Xiaoduo Technology Co Ltd
Priority date: 2019-11-19
Filing date: 2019-11-19
Publication date: 2020-06-02
Anticipated expiration: 2039-11-19
Also published as: CN111221963B

Abstract

The invention discloses a migration method in the field of intelligent customer service data training models, which comprises the following steps: training the initial network model by using all data sets to obtain a universal model, and training the initial network model by using the data sets of all fields to obtain a plurality of field models; inputting a target data set into a general model and a corresponding field model for calculation, taking the middle output of the general model as a general sentence representation, and taking the middle output of the field model as a field sentence representation; splicing the field sentence representation at the tail of the general sentence representation to obtain a spliced sentence representation; and inputting the spliced sentence representation into an initial network model to train to obtain a target model. The method obtains a target model by using the spliced sentence expression training formed by splicing the general sentence expression and the field sentence expression, and the spliced sentence expression formed by splicing the general sentence expression and the field sentence expression inherits the field knowledge learned by the general model and the field model and can be completely adapted to the field of target data.

Description

Intelligent customer service data training model field migration method

Technical Field

The invention belongs to the technical field of neural network data processing, and particularly relates to a migration method in the field of intelligent customer service data training models.

Background

The transfer learning technology in deep learning has been widely applied in the field of NLP. The existing deep learning migration learning method comprises the following steps:

1. based on parameters (early transfer learning mode), i.e. parameters of the pre-trained model are multiplexed. The input of the target model is the result of text numerical value conversion, and the parameters of the pre-training model are directly used as the initialization parameters of the target model.

Disadvantages based on parameters: the target model parameter scale and the pre-training model parameter scale are large, so that the calculation complexity is high, and the industrial application requirement cannot be met.

2. And converting the text value based on the representation, inputting the converted text value into a pre-training model, and taking the intermediate output of the pre-training model as the input of the target model.

Advantages based on representation: because only the pre-training model is used for calculating once to represent, and iteration is not repeated, the target model selects small-scale parameters, so that the speed is greatly improved, and the identification accuracy is equivalent to that based on the parameters.

Disadvantages based on the representation: the problem of field mismatching exists, for example, when a pre-training model based on the problem of the clothing field is directly applied to the field of a mobile phone, the problem of field mismatching exists, and the identification accuracy rate can meet the bottleneck.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a method for migrating the training model field of intelligent customer service data, which comprises the steps of pre-training a general model suitable for all fields and a plurality of field models respectively suitable for one field, calculating and outputting general sentence expressions and field sentence expressions by using the field models corresponding to the general model and the target data set when calculating a target data set, learning general knowledge by the general model, learning the special knowledge of the field by the field model, training and obtaining the target model by using spliced sentence expressions spliced by the general sentence expressions and the field sentence expressions, wherein the spliced sentence expressions spliced by the general sentence expressions and the field sentence expressions take the general knowledge learned by the general model and the field knowledge learned by the field model into account, and the target model obtained by training also learns the general knowledge and the special knowledge of the field, the method can be completely adapted to the field of target data, improves the recognition accuracy of semantics, and greatly reduces the target data volume used in the process of training the target model because the general model and the field model are mature models which are already trained, thereby improving the model training efficiency.

In order to achieve the above purpose, the solution adopted by the invention is as follows: a method for migrating the field of an intelligent customer service data training model comprises the following steps:

s1: training an initial network model for a semantic classification task by using all data sets to obtain a general model, training the initial network model for the semantic classification task by using the data sets of all fields to obtain a plurality of field models, wherein all the data sets comprise the data sets of the fields, each field comprises a plurality of data sets, and the data in the data sets are labeled with semantic categories in advance;

s2: inputting a target data set into a general model and a corresponding field model for calculation, taking the middle output of the general model as a general sentence representation, taking the middle output of the field model as a field sentence representation, wherein the target data set belongs to any field data set, the data quantity is less than that of the field data set, and the data in the target data set are labeled with semantic categories in advance;

s3: splicing the field sentence representation at the tail of the general sentence representation to obtain a spliced sentence representation;

s4: and inputting the expression of the spliced sentences into an initial network model to train so as to obtain a target model, wherein the adopted target data set belongs to which field, and the obtained target model belongs to which field.

The migration method further comprises the steps of training an initialization model for the semantic classification task by using all the data sets to obtain an additional model; inputting the target data set into an additional model for calculation, and taking the intermediate output of the additional model as an additional sentence for representation; splicing the field sentence representation at the tail of the general sentence representation, and splicing the additional sentence representation at the tail of the field sentence representation to obtain a spliced sentence representation; the spliced sentence representation is input into the initial network model to be trained to obtain a target model, the additional model calculates a target data set to obtain additional sentence representation, more knowledge of the spliced sentence representation added with the additional sentence representation is added, one or more angles of description texts are increased, more knowledge learned by the target model trained by the spliced sentence representation is obtained, the semantic recognition accuracy of the target model is improved, and the adaptability of the target model to the field of target data is improved.

The method for training the initial network model by using all the data sets to obtain the universal model comprises the following steps:

s111: performing numerical value conversion on each statement of each data set in all data sets to obtain a vector with a specified length; firstly, defining and generating a mapping table from Chinese characters to numbers, and corresponding different Chinese characters to one number, wherein each Chinese character has a unique number correspondence; then each sentence is converted into a vector with a specified length according to a mapping table, and the length is insufficient to supplement 0 so as to reach the specified length;

s112: forming a matrix of a data set by a plurality of vectors obtained by processing the same data set, wherein the data set comprises a plurality of sentences to form a multi-dimensional matrix;

s113: inputting the matrix of the data set into an initial network model for iterative computation;

s114: calculating the value of a loss function of the initial network model, and adjusting the parameters to be determined in each layer structure of the initial network model to reduce the average value of the values of the loss function of the network model after the parameters are adjusted;

s115: and repeating S113-S114 until the value of the loss function is not reduced or until iteration calculation is carried out for a preset number of times, wherein the target structure of the parameter to be determined after the last adjustment is a universal model.

The step of training the initial network model by using the data sets of each field to obtain a plurality of field models comprises the following steps;

s121: performing numerical value conversion on each statement of each data set in the same field data set to obtain a vector with a specified length, wherein the field data set has the same steps as the processing steps of all the data sets;

s122: forming a matrix of a data set by a plurality of vectors obtained by processing the same data set;

s123: inputting the matrix of the data set into an initial network model, and performing iterative computation by adding a field preference objective function;

s124: calculating the value of a loss function of the initial network model, and adjusting the parameters to be determined in each layer structure of the initial network model to reduce the average value of the values of the loss function of the network model after the parameters are adjusted;

s125: and repeating S123-S124 until the value of the loss function is not reduced or until iteration calculation is carried out for continuous preset times, wherein the target structure of the parameters to be determined after the last adjustment is the domain model.

The training of the initialization model using the full dataset to obtain additional models comprises the steps of:

s131: performing numerical value conversion on each statement of each data set in all data sets to obtain a vector with a specified length;

s132: forming a matrix of a data set by a plurality of vectors obtained by processing the same data set;

s133: inputting the matrix of the data set into an initial network model, and adding a target function with different field preferences from the field training model to perform iterative computation;

s134: calculating the value of a loss function of the initial network model, and adjusting the parameters to be determined in each layer structure of the initial network model to reduce the average value of the values of the loss function of the network model after the parameters are adjusted;

s135: and repeating S133-S134 until the value of the loss function is not reduced or until the iteration calculation is continuously carried out for a preset number of times, wherein the target structure of the parameter to be determined after the last adjustment is an additional model.

The target function of the field preference is a distance measurement function or an included angle measurement function.

The calculation of inputting the target data set into the general model and the domain model comprises the following steps:

s201: performing numerical value conversion on each statement of each data set in the target data set to obtain a vector with a specified length;

s202: forming a matrix of a data set by a plurality of vectors obtained by processing the same data set;

s203: inputting the matrix of the data set into a general model to calculate once to obtain intermediate output as general sentence expression;

s204: and inputting the matrix of the data set into a corresponding domain model, and calculating once to obtain intermediate output as domain sentence representation.

The training of the spliced sentence representation input into the initial network model to obtain the target model comprises the following steps: inputting the spliced sentence representation into an initial network model for iterative computation; calculating the value of a loss function of the initial network model, and adjusting the parameters to be determined in each layer structure of the initial network model to reduce the average value of the values of the loss function of the network model after the parameters are adjusted; until the value of the loss function is not reduced or until the iteration calculation of continuous preset times, and the target structure of the parameters to be determined after the last adjustment is the target model.

The invention has the beneficial effects that:

(1) the method first pre-trains a general model suitable for all fields and a plurality of field models each suitable for one field, when a target data set is calculated, a general sentence expression and a field sentence expression are calculated and output by using a general model and a field model corresponding to the target data set, general knowledge is learned by the general model, the field model learns specific knowledge of the field by training, a target model is obtained by training the spliced sentence expression spliced by the general sentence expression and the field sentence expression, the spliced sentence expression spliced by the general sentence expression and the field sentence expression gives consideration to the general knowledge learned by the general model and the field knowledge learned by the field model, the general knowledge and the specific knowledge of the field are learned by the trained target model, the target model can be completely adapted to the field of the target data, and the recognition accuracy of semantics is improved.

(2) Meanwhile, as the general model and the field model are mature models which are trained, the target data volume used in the process of training the target model is greatly reduced, thereby improving the model training efficiency.

Drawings

FIG. 1 is a diagram illustrating a data training model domain migration method according to a first embodiment of the present invention;

FIG. 2 is a diagram of a data training model domain migration method according to a second embodiment of the present invention;

FIG. 3 is a flow chart of the universal model training of the present invention;

FIG. 4 is a flow chart of the present invention domain model training;

FIG. 5 is a flow chart of additional model training in accordance with the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings:

as shown in fig. 1, a method for migrating the field of an intelligent customer service data training model includes the following steps:

s1: and training the initial network model aiming at the semantic classification task by using all the data sets to obtain a fully trained universal model, and training the initial network model aiming at the semantic classification task by using the data sets of all the fields to obtain a plurality of fully trained field models. The whole data set comprises data sets of a plurality of fields, each field comprises a plurality of data sets, and the data in the data sets are labeled with semantic categories in advance; the general model is unique, and the field model covers multiple industry fields of customer service conversation scenes, including tens of consumption fields of electric appliances, clothes, shoes and bags, foods, daily life, beauty cosmetics, ornaments and the like; the semantic categories refer to the categories of questions that the customer asks for customer service in a pre-determined e-commerce customer service conversation scene, such as: the delivery time is inquired, and whether there is a gift or not is inquired. When in labeling, the user chat linguistic data are divided into corresponding semantemes, such as: "when to ship" and "good and long to ship" are divided into the semantics of "ask for shipping time". Therefore, under the semantic of 'inquiry delivery time', rich questions can all represent the semantic of 'inquiry delivery time', and similarly, other semantic labeling processes are also the same, different questions and corresponding semantics are learned during model training, so that the questions seen during training or similar questions can be divided into correct semantics when the model performs prediction, and meanwhile, the robot reply content corresponding to the semantics is configured in advance, and thus the automatic response process of the robot is realized.

S2: inputting a target data set into a general model and a corresponding field model for calculation, taking the middle output of the general model as a general sentence representation, taking the middle output of the field model as a field sentence representation, wherein the target data set belongs to any field data set, the data quantity is less than that of the field data set, and the data in the target data set are labeled with semantic categories in advance.

S3: the domain sentence representation is spliced at the tail of the general sentence representation to obtain a spliced sentence representation, and the intermediate layer output of the general model and the domain model is a 500-dimensional vector, such as the general sentence representation of a sentence: [1, …,500], field sentence representation: [501, …,1000], the vector dimension becomes 1000 dimensions after splicing [1, …,500,501, …,1000 ].

In another preferred embodiment, as shown in fig. 2, the migration method further includes training the initialization model for the semantic classification task using the entire data set to obtain an additional model; inputting the target data set into an additional model for calculation, and taking the intermediate output of the additional model as an additional sentence for representation; splicing the field sentence representation at the tail of the general sentence representation, and splicing the additional sentence representation at the tail of the field sentence representation to obtain a spliced sentence representation; inputting the spliced sentence representation into an initial network model to train to obtain a target model; the additional model calculates the target data set to obtain additional sentence representation, the spliced sentence representation added with the additional sentence representation has more knowledge, the target model trained by adopting the spliced sentence representation has more knowledge, and the target model can be more suitable for the field of target data.

As shown in fig. 3, the training of the initial network model using the entire data set to obtain the general model includes the following steps:

s111: performing numerical value conversion on each statement of each data set in all data sets to obtain a vector with a specified length; firstly, defining and generating a mapping table from Chinese characters to numbers, and corresponding different Chinese characters to one number, wherein each Chinese character has a unique number correspondence; then each sentence is converted into a vector with a specified length according to the mapping table, and the length is insufficient to complement 0 so as to reach the specified length. Such as: "at" - > "1", "do" - > "2", etc., then "at do" becomes [1,2], setting the specified length of 35 according to the average length of the seller's messages counted in the e-commerce customer service chat, i.e. processing 35 words at most to obtain a vector with the length of 35, and supplementing 0 to 35 vectors with the length less than 35, such as "at do" becomes [0,0,0,0,0, …,1,2 ]; the semantic categories marked by the sentences also need to be subjected to numerical value conversion, and the established semantic categories are mapped to number numbers, for example, if n semantics exist, the number corresponding to each semantic is 0 to n-1;

s112: forming a matrix of a data set by a plurality of vectors obtained by processing the same data set, wherein the data set comprises a plurality of sentences to form a multi-dimensional matrix; for example, if the data set contains only two "at", then the resulting matrix is a two-dimensional matrix [ [0, …,1,2], [0, …,1,2] ];

As shown in fig. 4, the training of the initial network model using the data sets of the respective domains to obtain a plurality of domain models includes the following steps;

s123: inputting the matrix of the data set into an initial network model, and performing iterative computation by adding a field preference objective function; in the model training process, a distance measuring function is adopted to measure the distance between the intermediate layer output of the general model and the intermediate layer output of the field model, the intermediate layer refers to the layer between the model input layer and the output layer, the intermediate layer output of the general model and the intermediate layer output of the field model are both floating point vectors [ x _0, …, x _500] of 500 dimensions, and the optimization objective of the objective function with field preference is to ensure that the distance between the intermediate layer output vector of the general model and the intermediate layer output vector of the field model is large enough, so that the intermediate layer outputs of the general model and the field model are farther away in spatial distribution;

s125: and repeating S123-S124 until the value of the loss function is not reduced or until iteration calculation is carried out for continuous preset times, wherein the target structure of the parameters to be determined after the last adjustment is the domain model. In the training process of the domain model, the reasonable selection of the sample content of the domain data set can increase the learning capacity of the domain model to the domain-related knowledge, so that the domain adaptability of the target model is better.

As shown in fig. 5, training the initialization model using the full dataset to obtain additional models comprises the following steps:

s133: inputting a matrix of a data set into an initial network model, adding a field-preferred objective function different from a field training model to perform iterative computation, and adopting the field-preferred objective function when training an additional pre-training model to enlarge the spatial distinction between the intermediate layer representation of a general model and the intermediate layer representation of the additional pre-training model;

The training of the spliced sentence representation input into the initial network model to obtain the target model comprises the following steps: inputting the spliced sentence representation into an initial network model for iterative computation; calculating the value of a loss function of the initial network model, and adjusting the parameters to be determined in each layer structure of the initial network model to reduce the average value of the values of the loss function of the network model after the parameters are adjusted; until the value of the loss function is not reduced or until the iteration calculation of continuous preset times, and the target structure of the parameters to be determined after the last adjustment is the target model. With the continuous updating of the samples, continuous effect improvement can be brought to the target model by retraining the pre-training model.

Example one

Assume that there are n training samples in a dataset, where one sample: the "at does" corresponds to "inquiry at not" semantics, and many other training samples exist in the data set, such as "when to ship" corresponds to "inquiry about the time of shipment", and so on. Taking "at do" as an example, 1, the value is converted, at do, into a length 35 vector [0, …,1,2], which corresponds to the semantic "query is not at" converting to a number 0. 2. Pre-training shows that the vector obtained in the first step is converted into two 500-dimensional vectors [1, …,500], [501, …,1000] through a general model and a domain model, then the two vectors are spliced to obtain a 1000-dimensional vector [1, …,1000], similarly, other training samples of n-1 under a data set are converted to obtain a 1000-dimensional vector, a batch of 1000-dimensional vectors are input each time in the calculation process of a target model, 200 1000-dimensional vectors are fixed and input each time and correspond to 200 training samples, the semantic number corresponding to each vector is predicted in the calculation of the model, the loss value is calculated through a target function, the target function is used for evaluating the error index of the predicted semantic number and the actual number of the model, the output of the target function is called the loss value, namely the more accurate loss is predicted, the loss value is obtained after each batch of data is calculated, and then the parameters of the model are optimized through a gradient descent method, and inputting a batch of data to calculate the loss value, repeating the cycle until the loss does not decrease any more, stopping calculation, and not adjusting the target model parameters any more to obtain the customer service robot model for predicting the semantics of the buyer finally.

The above-mentioned embodiments only express the specific embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims

1. A method for migrating an intelligent customer service data training model field is characterized by comprising the following steps: the method comprises the following steps:

s1: training the initial network model by using all data sets to obtain a universal model, and training the initial network model by using the data sets of all fields to obtain a plurality of field models;

s2: inputting a target data set into a general model and a corresponding field model for calculation, taking the middle output of the general model as a general sentence representation, and taking the middle output of the field model as a field sentence representation;

s4: and inputting the spliced sentence representation into an initial network model to train to obtain a target model.

2. The intelligent customer service data training model field migration method of claim 1, wherein: the migration method further comprises training the initialization model by using all data sets to obtain an additional model; inputting the target data set into an additional model for calculation, and taking the intermediate output of the additional model as an additional sentence for representation; splicing the field sentence representation at the tail of the general sentence representation, and splicing the additional sentence representation at the tail of the field sentence representation to obtain a spliced sentence representation; and inputting the spliced sentence representation into an initial network model to train to obtain a target model.

3. The intelligent customer service data training model field migration method according to claim 1 or 2, characterized in that: the method for training the initial network model by using all the data sets to obtain the universal model comprises the following steps:

s111: performing numerical value conversion on each statement of each data set in all data sets to obtain a vector with a specified length;

s112: forming a matrix of a data set by a plurality of vectors obtained by processing the same data set;

4. The intelligent customer service data training model field migration method according to claim 1 or 2, characterized in that: the step of training the initial network model by using the data sets of each field to obtain a plurality of field models comprises the following steps;

s121: performing numerical value conversion on each statement of each data set in the data sets in the same field to obtain a vector with a specified length;

5. The intelligent customer service data training model field migration method of claim 2, wherein: the training of the initialization model using the full dataset to obtain additional models comprises the steps of:

6. The intelligent customer service data training model domain migration method of claim 5, wherein: the target function of the field preference is a distance measurement function or an included angle measurement function.

7. The intelligent customer service data training model field migration method of claim 1, wherein: the calculation of inputting the target data set into the general model and the domain model comprises the following steps:

8. The intelligent customer service data training model field migration method of claim 1, wherein: the training of the spliced sentence representation input into the initial network model to obtain the target model comprises the following steps: inputting the spliced sentence representation into an initial network model for iterative computation; calculating the value of a loss function of the initial network model, and adjusting the parameters to be determined in each layer structure of the initial network model to reduce the average value of the values of the loss function of the network model after the parameters are adjusted; until the value of the loss function is not reduced or until the iteration calculation of continuous preset times, and the target structure of the parameters to be determined after the last adjustment is the target model.

9. The intelligent customer service data training model field migration method of claim 1, wherein: the method comprises the steps of training an initial network model by using all data sets to obtain a general model, specifically, training the initial network model by using all data sets aiming at semantic classification tasks to obtain the general model; the method comprises the steps of training an initial network model by using data sets of various fields to obtain a plurality of field models, specifically, training the initial network model by using the data sets of various fields aiming at semantic classification tasks to obtain a plurality of field models; and the data of all the data sets are labeled with semantic categories in advance.

10. The intelligent customer service data training model field migration method of claim 1, wherein: the target data set belongs to any field data set, the number of the data sets is smaller than that of the field data set, and semantic categories are labeled in advance on the data in the target data set.