CN111221963B

CN111221963B - Intelligent customer service data training model field migration method

Info

Publication number: CN111221963B
Application number: CN201911133457.9A
Authority: CN
Inventors: 张翀; 江岭
Original assignee: Chengdu Xiaoduo Technology Co ltd
Current assignee: Chengdu Xiaoduo Technology Co ltd
Priority date: 2019-11-19
Filing date: 2019-11-19
Publication date: 2023-05-12
Anticipated expiration: 2039-11-19
Also published as: CN111221963A

Abstract

The invention discloses a migration method in the field of intelligent customer service data training models, which comprises the following steps: training an initial network model by using all data sets to obtain a general model, and training the initial network model by using the data sets of all the fields to obtain a plurality of field models; inputting the target data set into a general model and corresponding domain model calculation, taking the intermediate output of the general model as general sentence representation, and taking the intermediate output of the domain model as domain sentence representation; splicing the domain sentence representation at the tail of the general sentence representation to obtain a spliced sentence representation; and inputting the spliced sentence representation into an initial network model for training to obtain a target model. According to the method, the target model is obtained through training of the spliced sentence representation formed by splicing the general sentence representation and the domain sentence representation, the spliced sentence representation formed by splicing the general sentence representation and the domain sentence representation inherits the domain knowledge learned by the general model and the domain model, and the target data domain can be completely adapted.

Description

Intelligent customer service data training model field migration method

Technical Field

The invention belongs to the technical field of neural network data processing, and particularly relates to a migration method in the field of intelligent customer service data training models.

Background

The transfer learning technology in deep learning has been widely used in the NLP field. The existing deep learning transfer learning method comprises the following steps:

1. based on parameters (early transition learning mode), i.e. multiplexing parameters of the pre-trained model. And the input of the target model is the result after the text numerical conversion, and the parameters of the pre-training model are directly used as the initialization parameters of the target model.

Parameter-based shortcomings: the target model parameter scale and the pre-training model parameter scale are as large as possible, so that the calculation complexity is high, and the industrial application requirements cannot be met.

2. Based on the representation, the text numerical value is converted and input into a pre-training model, and the intermediate output of the pre-training model is used as the input of the target model.

Based on the advantages of the representation: because the pre-training model is used for calculating the representation once instead of iterating for many times, the target model selects the parameters of small scale, so the speed is greatly improved, and the identification accuracy is equivalent to that based on the parameters.

Representation-based drawbacks: the problem of field mismatch exists when a pre-trained model based on the clothing field problem is directly applied to the mobile phone field, and the recognition accuracy rate can encounter a bottleneck.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a field migration method of an intelligent customer service data training model, which is characterized in that firstly a general model suitable for all fields and a plurality of field models suitable for one field are pre-trained, when a target data set is calculated, a general sentence representation and a field sentence representation are calculated and output by using the general model and the field model corresponding to the target data set, the general model learns general knowledge, the field model learns the knowledge specific to the field through training, the spliced sentence representation formed by splicing the general sentence representation and the field sentence representation is trained to obtain a target model, the spliced sentence representation formed by splicing the general sentence representation and the field sentence representation takes the general knowledge learned by the general model and the field knowledge learned by the field model into account, the trained target model also learns the general knowledge and the field specific knowledge, the target data field can be completely adapted, and the recognition accuracy of semantics is improved.

In order to achieve the above object, the present invention adopts the following solutions: the intelligent customer service data training model field migration method comprises the following steps:

s1: training an initial network model for a semantic classification task by using all data sets to obtain a general model, training the initial network model for the semantic classification task by using the data sets of all the fields to obtain a plurality of field models, wherein all the data sets comprise the data sets of the fields, each field comprises the data sets, and the data in the data sets are labeled with semantic categories in advance;

s2: inputting a target data set into a general model and a corresponding field model for calculation, taking the middle output of the general model as general sentence representation, and taking the middle output of the field model as field sentence representation, wherein the target data set belongs to any field data set, the data quantity is smaller than that of the field data set, and the data in the target data set are all labeled with semantic categories in advance;

s3: splicing the domain sentence representation at the tail of the general sentence representation to obtain a spliced sentence representation;

s4: and inputting the spliced sentence representation into an initial network model for training to obtain a target model, wherein the adopted target data set belongs to which field, and the obtained target model belongs to which field.

The migration method further comprises the steps of training an initialization model for a semantic classification task by using all data sets to obtain an additional model; inputting the target data set into an additional model for calculation, and taking the intermediate output of the additional model as an additional sentence representation; splicing the domain sentence representation at the tail of the general sentence representation, and splicing the additional sentence representation at the tail of the domain sentence representation to obtain a spliced sentence representation; the spliced sentence representation is input into an initial network model for training to obtain a target model, the additional model calculates a target data set to obtain additional sentence representations, more knowledge represented by the spliced sentence representations added with the additional sentence representations is added, one or more angles of descriptive texts are increased, the knowledge learned by the target model trained by the spliced sentence representations is more, the recognition accuracy of the target model on the semantics is improved, and the suitability of the target model and the target data field is improved.

The training of the initial network model to obtain the universal model by using all the data sets comprises the following steps:

s111: performing numerical conversion on each statement of each data set in all data sets to obtain a vector with a specified length; firstly, defining and generating a mapping table from Chinese characters to numbers, and corresponding different Chinese characters to one number, wherein each Chinese character has a unique number correspondence; then each sentence is converted into a vector with a specified length according to the mapping table, and the length is not enough to be added with 0, so that the specified length is achieved;

s112: a plurality of vectors obtained by processing the same data set form a matrix of the data set, and one data set comprises a plurality of sentences to form a multi-dimensional matrix;

s113: inputting a matrix of the data set into an initial network model for iterative computation;

s114: calculating the value of the loss function of the initial network model, and adjusting the parameters to be determined in each layer structure of the initial network model so that the average value of the loss function of the network model after the parameters are adjusted is reduced;

s115: and repeating S113-S114 until the value of the loss function is not reduced any more or until the iteration calculation is continuously performed for preset times, wherein the target structure of the parameter to be determined after the last adjustment is a general model.

The training of the initial network model by using the data sets of all the fields to obtain a plurality of field models comprises the following steps of;

s121: performing numerical conversion on each statement of each data set in the same field data set to obtain a vector with a specified length, wherein the steps of the field data set are the same as the processing steps of all the data sets;

s122: a plurality of vectors obtained by processing the same data set form a matrix of the data set;

s123: inputting a matrix of the data set into an initial network model, and performing iterative computation by adding an objective function with field preference;

s124: calculating the value of the loss function of the initial network model, and adjusting the parameters to be determined in each layer structure of the initial network model so that the average value of the loss function of the network model after the parameters are adjusted is reduced;

s125: repeating S123-S124 until the value of the loss function is not reduced or until the iteration calculation is continuously performed for preset times, wherein the target structure of the parameter to be determined after the last adjustment is the field model.

The training the initialization model to obtain the additional model by using all the data sets comprises the following steps:

s131: performing numerical conversion on each statement of each data set in all data sets to obtain a vector with a specified length;

s132: a plurality of vectors obtained by processing the same data set form a matrix of the data set;

s133: inputting a matrix of the data set into an initial network model, and performing iterative computation by adding an objective function with different domain preference from the domain training model;

s134: calculating the value of the loss function of the initial network model, and adjusting the parameters to be determined in each layer structure of the initial network model so that the average value of the loss function of the network model after the parameters are adjusted is reduced;

s135: repeating S133-S134 until the value of the loss function is not reduced or until the iteration calculation is continuously performed for preset times, wherein the target structure of the parameter to be determined after the last adjustment is an additional model.

The objective function of the field preference is a distance measurement function or an included angle measurement function.

The inputting of the target data set into the general model and the domain model calculation comprises:

s201: performing numerical conversion on each statement of each data set in the target data set to obtain a vector with a specified length;

s202: a plurality of vectors obtained by processing the same data set form a matrix of the data set;

s203: inputting a matrix of the dataset into a general model to calculate once to obtain an intermediate output which is used as general sentence representation;

s204: and inputting the matrix of the data set into a corresponding domain model to calculate once to obtain an intermediate output which is used as domain sentence representation.

The training the spliced sentence representation into the initial network model to obtain the target model comprises the following steps: inputting the spliced sentence representation into an initial network model for iterative computation; calculating the value of the loss function of the initial network model, and adjusting the parameters to be determined in each layer structure of the initial network model so that the average value of the loss function of the network model after the parameters are adjusted is reduced; until the value of the loss function is not reduced any more or until the iteration calculation is continuously performed for preset times, the target structure of the parameter to be determined after the last adjustment is the target model.

The beneficial effects of the invention are as follows:

(1) The method comprises the steps of pre-training a general model suitable for all fields and a plurality of field models suitable for one field, calculating and outputting general sentence representation and field sentence representation by using the general model and the field models corresponding to the target data set when calculating the target data set, learning general knowledge by using the general model, training the general model to obtain the target model by training knowledge specific to the learned field by using the general sentence representation and the spliced sentence representation spliced by the field sentence representation, taking the general knowledge learned by the general model and the field knowledge learned by the field model into account by the spliced sentence representation spliced by the general sentence representation and the field sentence representation, and learning general knowledge and the field-specific knowledge by the trained target model.

(2) Meanwhile, as the general model and the field model are mature models which are already trained, the target data volume used in the process of training the target model is greatly reduced, and thus the model training efficiency is improved.

Drawings

FIG. 1 is a diagram of a method for migrating a domain of a data training model according to a first embodiment of the present invention;

FIG. 2 is a diagram of a data training model field migration method in a second embodiment of the present invention;

FIG. 3 is a flow chart of the generic model training of the present invention;

FIG. 4 is a flow chart of the model training in the field of the present invention;

FIG. 5 is a flow chart of additional model training according to the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings:

as shown in fig. 1, a method for migrating the domain of an intelligent customer service data training model includes the following steps:

s1: training an initial network model for the semantic classification task by using all data sets to obtain a fully trained general model, and training the initial network model for the semantic classification task by using the data sets of each field to obtain a plurality of fully trained field models. All the data sets comprise data sets in a plurality of fields, each field comprises a plurality of data sets, and the data in the data sets are pre-labeled with semantic categories; the universal model is unique, and the field model covers a plurality of industry fields of customer service dialogue scenes, including tens of consumption fields of electric appliances, clothing shoe bags, foods, daily life, makeup, ornaments and the like; the semantic category refers to a predetermined question category of customer service meeting in an e-commerce customer service dialogue scene, for example: inquiring about the shipping time, inquiring about whether there is a gift, etc. Dividing the chat corpus of the user into corresponding semantics when labeling, such as: "when to ship", "good to ship" will be divided into the semantics of "inquire about the time of shipment". Therefore, the 'query delivery time' semantic meaning is expressed by a very rich question method, and other semantic labeling processes are similar to the query delivery time semantic meaning, so that different question methods and corresponding semantics are learned during model training, and the question methods or similar question methods which are seen during training can be divided under correct semantics during model prediction execution, and simultaneously, robot reply content corresponding to the semantics is configured in advance, so that the automatic robot reply process is realized.

S2: and inputting the target data set into a general model and a corresponding domain model for calculation, taking the middle output of the general model as general sentence representation, taking the middle output of the domain model as domain sentence representation, wherein the target data set belongs to any domain data set, the data quantity is smaller than that of the domain data set, and the data in the target data set are all labeled with semantic categories in advance.

S3: splicing the domain sentence representation at the tail of the general sentence representation to obtain a spliced sentence representation, wherein the intermediate layer outputs of the general model and the domain model are all 500-dimensional vectors, such as general sentence representation of a sentence: [1, …,500], domain sentence representation: [501, …,1000], the post-splice vector dimension becomes 1000 dimensions [1, …,500,501, …,1000].

In another preferred embodiment, as shown in fig. 2, the migration method further includes training an initialization model for the semantic classification task using all the data sets to obtain additional models; inputting the target data set into an additional model for calculation, and taking the intermediate output of the additional model as an additional sentence representation; splicing the domain sentence representation at the tail of the general sentence representation, and splicing the additional sentence representation at the tail of the domain sentence representation to obtain a spliced sentence representation; inputting the spliced sentence representation into an initial network model for training to obtain a target model; the additional model calculates the target data set to obtain additional sentence representation, the spliced sentence representation added with the additional sentence representation has more knowledge, the target model trained by the spliced sentence representation has more knowledge, and the target model can be more suitable for the field of target data.

As shown in fig. 3, the training of the initial network model to obtain the generic model using the full dataset includes the steps of:

s111: performing numerical conversion on each statement of each data set in all data sets to obtain a vector with a specified length; firstly, defining and generating a mapping table from Chinese characters to numbers, and corresponding different Chinese characters to one number, wherein each Chinese character has a unique number correspondence; and then converting each sentence into a vector with a specified length according to the mapping table, wherein the length is less than the complement 0 to reach the specified length. Such as: "at" - > "1", "mock" - > "2", etc., then "at" becomes [1,2], a specified length of 35 is set according to the average length of the vendor message counted in the e-commerce customer service chat, i.e., a maximum of 35 words are processed to obtain a vector of 35, and vectors of less than 35 are complemented by 0 to 35, such as "at" becomes [0, …,1,2] here; the semantic category of sentence annotation also needs to be converted into numerical value, and the established semantic category is mapped to a numerical number, if n semantics exist, the number corresponding to each semantic is 0 to n-1;

s112: a plurality of vectors obtained by processing the same data set form a matrix of the data set, and one data set comprises a plurality of sentences to form a multi-dimensional matrix; for example, the dataset contains only two sentences "in does", then the matrix that is composed is a two-dimensional matrix [ [0, …,1,2], [0, …,1,2] ];

As shown in fig. 4, the training of the initial network model to obtain a plurality of domain models by using the data sets of the respective domains includes the following steps;

s123: inputting a matrix of the data set into an initial network model, and performing iterative computation by adding an objective function with field preference; in the model training process, a distance measurement function is adopted to measure the distance between the output of the middle layer of the general model and the output of the middle layer of the field model, wherein the middle layer refers to a layer between the input layer and the output layer of the model, the output of the middle layer of the general model and the output layer of the middle layer of the field model are both 500-dimensional floating point number vectors [ x_0, …, x_500], and the optimization objective of the objective function of the field preference is to enable the distance between the output vector of the middle layer of the general model and the output vector of the middle layer of the field model to be large enough, so that the middle layer output of the general model and the middle layer output of the field model can be farther apart in space distribution;

s125: repeating S123-S124 until the value of the loss function is not reduced or until the iteration calculation is continuously performed for preset times, wherein the target structure of the parameter to be determined after the last adjustment is the field model. In the training process of the domain model, the learning ability of the domain model to domain related knowledge can be increased by reasonably selecting sample content of the domain data set, so that the domain suitability of the target model is better.

As shown in fig. 5, the training of the initialization model using the entire data set to obtain additional models includes the steps of:

s133: inputting a matrix of the data set into an initial network model, performing iterative computation by adding a domain-preferred objective function different from a domain training model, wherein the objective of adopting the domain-preferred objective function when training an additional pre-training model is to enlarge the spatial distinction between the middle layer representation of the general model and the middle layer representation of the additional pre-training model;

The training the spliced sentence representation into the initial network model to obtain the target model comprises the following steps: inputting the spliced sentence representation into an initial network model for iterative computation; calculating the value of the loss function of the initial network model, and adjusting the parameters to be determined in each layer structure of the initial network model so that the average value of the loss function of the network model after the parameters are adjusted is reduced; until the value of the loss function is not reduced any more or until the iteration calculation is continuously performed for preset times, the target structure of the parameter to be determined after the last adjustment is the target model. With the continuous updating of the sample, continuous effect improvement can be brought to the target model by retraining the pre-training model.

Example 1

Assume that there are n training samples in one dataset, one of which: there are many other training samples in the dataset that the query is not in the semantic meaning of "in does" correspond to "query delivery time" and so on. Taking "at" as an example, 1, a numerical conversion, "at" becomes a vector of length 35 [0, …,1,2], which corresponds to the semantic "no at" conversion to a numerical 0. 2. The pre-training means that the vector obtained in the first step is converted into two vectors [1, …,500], [501, …,1000] with 500 dimensions through a general model and a field model, then the two vectors are spliced to obtain a vector with 1000 dimensions [1, …,1000], similarly, n-1 other training samples under a data set are converted to obtain a vector with 1000 dimensions, a batch of 1000-dimensional vectors are input each time in the calculation process of a target model, 200 vectors with 1000 dimensions are fixed each time, 200 training samples are corresponding, semantic numbers corresponding to the vectors are predicted in the calculation of the model, a loss value is calculated through a target function, the target function is used for evaluating error indexes of semantic numbers and actual numbers predicted by the model, the output of the target function is called the loss value, namely the lower the more accurate loss is predicted, the loss value is obtained after calculation of each batch of data, the loss value is obtained after calculation of the model is optimized through a gradient descent method, the loss value is calculated again, the cycle is repeated until the loss is not reduced, the calculation is stopped, and the target model parameters are not regulated any more, and finally, the predicted customer service machine model is obtained.

The foregoing examples merely illustrate specific embodiments of the invention, which are described in greater detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention.

Claims

1. The utility model provides an intelligent customer service data training model field migration method which is characterized in that: the method comprises the following steps:

s1: training an initial network model by using all data sets to obtain a general model, and training the initial network model by using the data sets of all the fields to obtain a plurality of field models;

s2: inputting the target data set into a general model and corresponding domain model calculation, taking the intermediate output of the general model as general sentence representation, and taking the intermediate output of the domain model as domain sentence representation;

s4: inputting the spliced sentence representation into an initial network model for training to obtain a target model, which specifically comprises the following steps: inputting the spliced sentence representation into an initial network model for iterative computation; calculating the value of the loss function of the initial network model, and adjusting the parameters to be determined in each layer structure of the initial network model so that the average value of the loss function of the network model after the parameters are adjusted is reduced; until the value of the loss function is not reduced any more or until the iteration calculation is continuously performed for preset times, the target structure of the parameter to be determined after the last adjustment is the target model.

2. The intelligent customer service data training model field migration method according to claim 1, wherein: the migration method further comprises the steps of training an initialization model by using all data sets to obtain an additional model; inputting the target data set into an additional model for calculation, and taking the intermediate output of the additional model as an additional sentence representation; splicing the domain sentence representation at the tail of the general sentence representation, and splicing the additional sentence representation at the tail of the domain sentence representation to obtain a spliced sentence representation; and inputting the spliced sentence representation into an initial network model for training to obtain a target model.

3. The intelligent customer service data training model field migration method according to claim 1 or 2, wherein: the training of the initial network model to obtain the universal model by using all the data sets comprises the following steps:

s111: performing numerical conversion on each statement of each data set in all data sets to obtain a vector with a specified length;

s112: a plurality of vectors obtained by processing the same data set form a matrix of the data set;

4. The intelligent customer service data training model field migration method according to claim 1 or 2, wherein: the training of the initial network model by using the data sets of all the fields to obtain a plurality of field models comprises the following steps of;

s121: performing numerical conversion on each statement of each data set in the same field of data sets to obtain a vector with a specified length;

5. The intelligent customer service data training model field migration method as claimed in claim 2, wherein: the training the initialization model to obtain the additional model by using all the data sets comprises the following steps:

6. The intelligent customer service data training model field migration method as claimed in claim 5, wherein: the objective function of the field preference is a distance measurement function or an included angle measurement function.

7. The intelligent customer service data training model field migration method according to claim 1, wherein: the inputting of the target data set into the general model and the domain model calculation comprises:

8. The intelligent customer service data training model field migration method according to claim 1, wherein: training the initial network model by using all data sets to obtain a general model, specifically training the initial network model by using all data sets aiming at semantic classification tasks to obtain the general model; training an initial network model by using the data sets of each field to obtain a plurality of field models, specifically training the initial network model by using the data sets of each field for semantic classification tasks to obtain a plurality of field models; the data of all the data sets are labeled with semantic categories in advance.

9. The intelligent customer service data training model field migration method according to claim 1, wherein: the target data set belongs to any field data set, the data quantity is smaller than that of the field data set, and the data in the target data set are labeled with semantic categories in advance.