CN115662401B - Customer service call voice recognition method based on continuous learning - Google Patents

Customer service call voice recognition method based on continuous learning Download PDF

Info

Publication number
CN115662401B
CN115662401B CN202211604120.3A CN202211604120A CN115662401B CN 115662401 B CN115662401 B CN 115662401B CN 202211604120 A CN202211604120 A CN 202211604120A CN 115662401 B CN115662401 B CN 115662401B
Authority
CN
China
Prior art keywords
customer service
voice recognition
model
service call
continuous learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211604120.3A
Other languages
Chinese (zh)
Other versions
CN115662401A (en
Inventor
何学东
孙晓倩
常利建
杨华
潘瑞平
彭渤
杜维明
张伟蓉
王迪
陈晓龙
孙丽蓉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Co ltd Customer Service Center
Original Assignee
State Grid Co ltd Customer Service Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Co ltd Customer Service Center filed Critical State Grid Co ltd Customer Service Center
Priority to CN202211604120.3A priority Critical patent/CN115662401B/en
Publication of CN115662401A publication Critical patent/CN115662401A/en
Application granted granted Critical
Publication of CN115662401B publication Critical patent/CN115662401B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

A customer service call voice recognition method based on continuous learning comprises the steps of firstly training an initial voice recognition model; secondly, setting the number of scenes assRespectively for
Figure 100004_DEST_PATH_IMAGE002
And
Figure 100004_DEST_PATH_IMAGE004
obtaining the firstsMarking the voice and text of the client service communication service scene with 95598 data, and using continuous learning method to identify the initial voice recognition model and the second voice recognition model
Figure 100004_DEST_PATH_IMAGE006
Adjusting parameters of a 95598 customer service call voice recognition continuous learning model to obtains95598 customer service call voice recognition continuous learning model; finally, 95598 customer service call voice to be recognized is input to the second stepsAnd a 95598 customer service call voice recognition continuous learning model is used for obtaining a 95598 customer service call voice Chinese text to be recognized. The invention can continuously adapt to the change of the call service scene, continuously improve the adaptability of the speech recognition model, has the continuous learning ability facing the new call service scene so as to maintain or improve the speech recognition effect of the model in each scene, and simultaneously overcomes the problem of catastrophic forgetting of the model in the process of adapting to the new call service scene.

Description

Customer service call voice recognition method based on continuous learning
Technical Field
The invention belongs to the technical field of voice recognition, and particularly relates to a customer service call voice recognition method based on continuous learning.
Background
In the process of power supply service of the power enterprise, the customer service is used as an important operation activity, which not only relates to the vital interests of power customers and the operational benefits of the power enterprise, but also influences the social responsibility and the enterprise image of the power industry as the national important post industry. With the deep progress of the power system reform and the national comprehensive deepening reform, the creation of good customer experience and customer satisfaction degree provides greater challenges for the management level of power enterprises.
The 95598 national power supply service hot line is an important customer service channel of a national power grid company, has convenience and quickness, becomes a preferred mode for providing the appeal for power customers, and the normalization of the 95598 worksheet file is a necessary premise for subsequent analysis and solution of the appeal for the customers.
However, due to the problems of different language expression abilities and different feedback information accuracies of power customers in different levels, the comprehensiveness, the accuracy and the objectivity of the artificially filled 95598 worksheet file cannot be guaranteed. Therefore, in order to meet the technical requirements of telephone voice recognition, the power customer service center performs voice recognition on telephone voice data of customers by using an artificial intelligence technology, and the standardization of work order files is facilitated to be guaranteed.
The existing voice recognition models facing the field of customer service conversation are all put into application after model training is completed, and the models are not adjusted any more in the application process, so that the problem of voice recognition accuracy reduction caused by conversation service scene change cannot be avoided. However, the national grid company is actively promoting the construction of a new power system mainly based on new energy resources, serving the dual-carbon target in full, and under the new trend of power system transformation, new telephone scenes and conversation contents are continuously appearing in the face of the change of user groups and user power utilization modes.
Therefore, with the continuous expansion of power services and the continuous emergence of telephone scenes, a speech recognition method capable of continuously adapting to the change of call service scenes is urgently needed, the adaptability of a speech recognition model is continuously improved on the basis of ensuring the universality of the original telephone scenes, namely, the model has continuous learning capability facing new call service scenes, so that the speech recognition effect of the model under each scene is maintained or improved, and meanwhile, the catastrophic forgetting problem of the model in the process of adapting to the new call service scenes is solved.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a customer service call voice recognition method based on continuous learning.
The technical problem to be solved by the invention is realized by the following technical scheme:
a customer service call voice recognition method based on continuous learning is characterized in that: the identification method comprises the following steps:
s1, training an initial voice recognition model by using a public Chinese voice recognition data set;
s2, setting the number of scenes assTo for
Figure DEST_PATH_IMAGE002
To adapt tosA 95598 customer service call service scene is obtainedsThe voice data and text marking data of the client service communication service scene 95598 are adjusted by using a continuous learning method to obtain the first voice recognition model parameters95598 customer service call voice recognitionA continuous learning model;
s3, for
Figure DEST_PATH_IMAGE004
To adapt tosA 95598 customer service call service scene is obtainedsThe voice data and text marking data of the 95598 customer service call service scene use the continuous learning method to
Figure DEST_PATH_IMAGE006
Adjusting the model parameters of 95598 customer service call speech recognition continuous learning to obtains95598 customer service call voice recognition continuous learning model;
s4, inputting the 95598 customer service call voice to be recognized to the first stepsAnd a 95598 customer service call voice recognition continuous learning model is used for obtaining a 95598 customer service call voice Chinese text to be recognized.
Moreover, the text of the public chinese speech recognition data set in step S1 includes more than 4000 common chinese characters.
And the voice data and the text marking data are obtained by a manual marking method, and the training data scale is enlarged by using a data augmentation technology before the parameters of the initial voice recognition model and the voice recognition continuous learning model are adjusted.
And the public Chinese speech recognition data set, the data set formed by the speech data and the text marking data are respectively divided into a training set, a verification set and a test set, wherein the training set is used for training the model, the verification set is used for optimizing the hyper-parameters of the model, and the test set is used for testing the speech recognition effect of the model.
Moreover, the initial speech recognition model and the 95598 customer service call speech recognition continuous learning model use an end-to-end LAS deep learning structure, comprising 3 parts of a deep learning encoder, attention mechanism and deep learning decoder;
the deep learning encoder uses the same network structure; the deep learning decoder uses the same network structure except the last layer;
the above-mentionedDeep learning decoder with initial speech recognition model last layer usageKThe full connection structure of each output network node is converted into a full connection structure through a Softmax activation functionKA probability value, whereinKThe number of characters in the character set which can be output by the initial speech recognition model is the total number of characters in the character set which can be output by the initial speech recognition model, wherein the characters in the character set which can be output by the initial speech recognition model comprise a space character, a start/stop character, an unknown Chinese character identifier and all different Chinese characters contained in the public Chinese speech recognition training set;
the final layer of the deep learning decoder of the 95598 customer service call speech recognition continuous learning model uses a model with
Figure DEST_PATH_IMAGE008
A full connection structure of each output network node, and is converted into a full connection structure through a Softmax activation function
Figure 951852DEST_PATH_IMAGE008
A probability value of the number of the probability values,
Figure 760408DEST_PATH_IMAGE008
is a firsts95598 total number of characters in the character set which can be output by the customer service call speech recognition continuous learning models95598 customer service call speech recognition continuous learning model can output characters in character set including space character, start/stop character, unknown Chinese character identifier, and the public Chinese speech recognition training set and all previous characterssAll the different chinese characters contained in a 95598 conversation business scenario training set,
Figure 895592DEST_PATH_IMAGE008
the total number of characters in the character set that can be output with the initial speech recognition modelKThe following relationships exist:
Figure DEST_PATH_IMAGE010
wherein:
Figure DEST_PATH_IMAGE012
the total number of Chinese characters which do not belong to the public Chinese speech recognition training set in the 1 st 95598 customer service call service scene training set;
to pair
Figure DEST_PATH_IMAGE014
Figure DEST_PATH_IMAGE016
Is as followspThe training set of the scene of the client service communication service of 95598 does not belong to the training set of the public Chinese speech recognition and all the previous training sets
Figure DEST_PATH_IMAGE018
The total number of Chinese characters in a 95598 call service scene training set;
for is to
Figure DEST_PATH_IMAGE020
Before the s 95598 customer service call speech recognition continuous learning model is outputKThe characters corresponding to the probability values being output by the initial speech recognition modelKThe characters corresponding to the probability values have the same meanings;
to pair
Figure 893635DEST_PATH_IMAGE020
The s 95598 th customer service call voice recognition continuous learning model output
Figure DEST_PATH_IMAGE022
To
Figure DEST_PATH_IMAGE024
The probability values correspond to all different Chinese characters which do not belong to the public Chinese speech recognition training set in the 1 st 95598 customer service call service scene training set and are sequenced from small to large according to Chinese character ASCII codes;
to pair
Figure DEST_PATH_IMAGE026
And
Figure DEST_PATH_IMAGE028
the s 95598 th customer service call speech recognition continuous learning model output
Figure DEST_PATH_IMAGE030
To
Figure DEST_PATH_IMAGE032
The probability values correspond topThe training set of the 95598 customer service call service scene does not belong to the public Chinese speech recognition training set and all the former training sets
Figure DEST_PATH_IMAGE034
All the different Chinese characters in the training set of the 95598 call service scene are sorted according to the sequence of the Chinese character ASCII codes from small to large.
Moreover, the loss function used by the initial speech recognition model is a cross-entropy loss function based on label smoothing
Figure DEST_PATH_IMAGE036
The definition is as follows:
Figure DEST_PATH_IMAGE038
(2)
wherein:
Figure DEST_PATH_IMAGE040
a set formed by all voice input signals in the public Chinese voice recognition training set;
Figure DEST_PATH_IMAGE042
is a set
Figure 292648DEST_PATH_IMAGE040
The number of elements in (1);
xis composed of
Figure 272106DEST_PATH_IMAGE040
In (1)A speech input signal;
Figure DEST_PATH_IMAGE044
is composed ofxThe length of the corresponding text label;
Kthe total number of characters in the character set which can be output for the initial speech recognition model;
Figure DEST_PATH_IMAGE046
is composed ofxSecond of corresponding text labelslQuantization of individual characters by soft labels, i.e. ifxSecond of corresponding text labelslThe first character in the character set that can be output by the initial speech recognition modeliA character, then
Figure DEST_PATH_IMAGE047
Is composed of
Figure DEST_PATH_IMAGE049
Otherwise, otherwise
Figure 767022DEST_PATH_IMAGE047
Is composed of
Figure DEST_PATH_IMAGE051
Figure DEST_PATH_IMAGE053
Is a constant value for the smoothed value;
Figure DEST_PATH_IMAGE055
is composed ofxAfter input into the initial speech recognition model, outputlThe first character in the character set that can be output by the initial speech recognition modeliA probability value for each character;
Figure DEST_PATH_IMAGE057
is a natural logarithmic functionAnd (4) counting.
And also, the said firstsLoss function used by model for continuous learning of voice recognition of 95598 customer service calls
Figure DEST_PATH_IMAGE059
Comprising 3 terms, each being a weighted classification loss term for the character
Figure DEST_PATH_IMAGE061
Term of distillation loss
Figure DEST_PATH_IMAGE063
And continuous learning regularization term
Figure DEST_PATH_IMAGE065
And is defined by the formula:
Figure DEST_PATH_IMAGE067
(3)
wherein:
Figure DEST_PATH_IMAGE069
and
Figure DEST_PATH_IMAGE071
respectively are weight parameters of corresponding loss items;
character weighted classification loss term
Figure 991593DEST_PATH_IMAGE061
By adding said secondsThe weight of Chinese characters corresponding to keywords of a 95598 customer service call service scene is increasedsThe identification precision of keywords of a 95598 customer service call service scene;
term of distillation loss
Figure DEST_PATH_IMAGE072
Through the s 95598 customer service call voice recognition continuous learning model, before the training begins, through the ssThe 95598 customer service communication service scene training set constructs distillation loss of each voice input signal so as to overcome the defect that the voice input signals are subjected to distillation loss in the training processInfluence on model training caused by uneven distribution of Chinese characters in each 95598 customer service call service scene;
continuous learning regularization term
Figure DEST_PATH_IMAGE073
The disaster-resistant forgetting capability of the model is improved by introducing a continuous learning strategy;
character weighted classification loss term
Figure 946910DEST_PATH_IMAGE061
Using a weighted cross-entropy loss function based on label smoothing, the following is defined:
Figure DEST_PATH_IMAGE075
(4)
wherein:
Figure DEST_PATH_IMAGE076
is as followssA set formed by all voice input signals in a 95598 call service scene training set;
Figure DEST_PATH_IMAGE078
is a set
Figure DEST_PATH_IMAGE079
The number of elements in (1);
xis composed of
Figure DEST_PATH_IMAGE080
A speech input signal;
Figure DEST_PATH_IMAGE082
is composed ofxThe length of the corresponding text label;
Figure DEST_PATH_IMAGE084
model for continuous learning of voice recognition of s-th 95598 customer service callThe total number of characters in the character set which can be output;
Figure DEST_PATH_IMAGE086
for the s 95598 th customer service call speech recognition continuous learning model, each character in the character set which can be output issWeight in a 95598 customer service Call service scenario forsIn a 95598 customer service call service scene, chinese characters corresponding to keywords of the scene can be endowed with larger weight, so that the trained s 95598 customer service call voice recognition continuous learning model can accurately recognize the ssKeywords of a 95598 customer service call service scene;
Figure DEST_PATH_IMAGE088
is composed ofxSecond of corresponding text labelslQuantization soft labels for individual characters, i.e. if the text is labeled the firstlThe character is the second character in the character set which can be output by the s 95598 th customer service call voice recognition continuous learning modeliA character, then
Figure DEST_PATH_IMAGE089
Is composed of
Figure DEST_PATH_IMAGE091
Otherwise
Figure DEST_PATH_IMAGE092
Is composed of
Figure DEST_PATH_IMAGE094
Figure 819574DEST_PATH_IMAGE053
Is a constant value for the smoothed value;
Figure DEST_PATH_IMAGE096
is composed ofxAfter the voice recognition of the s th 95598 customer service call is input into the continuous learning model, the output islThe character is the second character in the character set which can be output by the s 95598 th customer service call voice recognition continuous learning modeliA probability value for each character;
Figure DEST_PATH_IMAGE098
is a natural logarithmic function;
term of distillation loss
Figure 460509DEST_PATH_IMAGE076
The definition is as follows:
Figure DEST_PATH_IMAGE100
(5)
wherein:
Figure DEST_PATH_IMAGE102
the model for continuous learning of speech recognition for the s 95598 th customer service call at the beginning of training toxFor input, outputlThe character is the second character in the character set which can be output by the s 95598 th customer service call voice recognition continuous learning modeliA probability value for each character; meaning of the rest of the mathematical symbols and said character-weighted classification loss term
Figure DEST_PATH_IMAGE104
The meanings of the corresponding mathematical symbols in the Chinese characters are the same;
the elastic weight keeping method for continuous learning based on parameter regularization strategy and corresponding continuous learning regularization item
Figure DEST_PATH_IMAGE106
The definition is as follows:
Figure DEST_PATH_IMAGE108
(6)
wherein:
Figure DEST_PATH_IMAGE110
is the s 9 th5598 number of parameters of the customer service call speech recognition continuous learning model;
Figure DEST_PATH_IMAGE112
model of continuous learning for s 95598 th customer service call speech recognitionmA parameter;
Figure DEST_PATH_IMAGE114
continuous learning model for s 95598 th customer service call speech recognition at the beginning of trainingmA parameter;
Figure DEST_PATH_IMAGE116
is as followsmThe continuous learning weight of each parameter describes the s 95598 th customer service call voice recognition continuous learning model at the beginning of trainingmThe iterative calculation process of the importance degree of each parameter to all customer service call service scenes is as follows:
(1) To pair
Figure DEST_PATH_IMAGE118
For the said firstsA 95598 customer service call service scenario,
Figure DEST_PATH_IMAGE120
weighted cross-entropy loss pair based on tag smoothing for the initial speech recognition model in the public Chinese speech recognition validation setmSecond partial derivatives of the individual parameters;
(2) To pair
Figure DEST_PATH_IMAGE122
For said firstsA 95598 customer service call service scenario,
Figure DEST_PATH_IMAGE123
is as follows
Figure DEST_PATH_IMAGE125
First of service communication service scenemThe continuous learning weight of the parameter plus the second
Figure 895295DEST_PATH_IMAGE125
The speech recognition continuous learning model obtained by training the service scene of the individual customer service is
Figure 767436DEST_PATH_IMAGE125
Weighted cross entropy loss pair based on label smoothing in individual customer service call service scene verification setmSecond partial derivatives of the parameters.
The invention has the advantages and beneficial effects that:
compared with the prior art, the client service call voice recognition method based on continuous learning is provided, and considering the problem that available voice resources are few due to the fact that 95598 telephone voice data relate to client privacy, the method only needs a small amount of 95598 telephone voice data to construct a 95598 telephone voice recognition model, and enables the model to continuously adapt to new telephone scenes and conversation contents; compared with a speech recognition method based on transfer learning, the method can ensure that the model cannot forget the original scene after being trained by using new scene data.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a step diagram of an embodiment of the present invention;
FIG. 3 is a schematic diagram of the continuous learning method of the present invention.
Detailed Description
The present invention is further illustrated by the following specific examples, which are intended to be illustrative, not limiting and are not intended to limit the scope of the invention.
A customer service call voice recognition method based on continuous learning is innovative in that: the method comprises the following steps:
s1, collecting voice data from an open voice data set AISHELL-1 to form an initial voice data set, wherein the data set comprises audio data and corresponding text labels and is divided into an AISHELL-1 training set, an AISHELL-1 verification set and an AISHELL-1 testing set.
S2, training an initial voice recognition model on the AISHELL-1 training set, using a SpecAugent voice data enhancement method to improve the scale of the training set in the training process, and adjusting the hyper-parameters of the model through the AISHELL-1 verification set. The initial speech recognition model uses a Transformer network structure, in which an encoder uses 12 Transformer encoding modules and a decoder uses 6 Transformer decoding modules. The loss function for training the initial speech recognition model is a cross entropy loss function based on label smoothing
Figure DEST_PATH_IMAGE127
The formula is as follows:
Figure DEST_PATH_IMAGE129
(2)
wherein:
Figure DEST_PATH_IMAGE131
a set formed by all voice input signals in the public Chinese voice recognition training set;
Figure DEST_PATH_IMAGE133
is a set
Figure DEST_PATH_IMAGE134
The number of elements in (1);
xis composed of
Figure DEST_PATH_IMAGE135
A speech input signal;
Figure DEST_PATH_IMAGE137
is composed ofxThe length of the corresponding text label;
Kthe total number of characters in the character set which can be output for the initial speech recognition model;
Figure DEST_PATH_IMAGE139
is composed ofxSecond of corresponding text labelslQuantization of individual characters by soft labels, i.e. ifxSecond of corresponding text labelslThe first character in the character set that can be output by the initial speech recognition modeliA character, then
Figure DEST_PATH_IMAGE140
Is composed of
Figure DEST_PATH_IMAGE142
Otherwise
Figure DEST_PATH_IMAGE143
Is composed of
Figure DEST_PATH_IMAGE145
Figure DEST_PATH_IMAGE147
Is a constant value for the smoothed value;
Figure DEST_PATH_IMAGE149
is composed ofxAfter input into the initial speech recognition model, outputlThe first character in the character set that can be output by the initial speech recognition modeliProbability values of individual characters.
Figure DEST_PATH_IMAGE151
Is a natural logarithmic function. For the present embodiment, take
Figure DEST_PATH_IMAGE153
. The optimizer for training the initial speech recognition model is an Adam optimizer, the learning rate is 0.0001, and the iteration round number is 100.
And S3, collecting 95598 customer channel telephone voice data, carrying out text annotation on the telephone voice data, constructing 3 scene data sets by using 95598 voice audio data and corresponding annotation texts, respectively representing the fault repair voice of the scene 1, the electric charge bill voice of the scene 2 and the meter reading charge voice of the scene 3, and dividing each scene data set into a training set, a verification set and a test set.
And S4, inputting the scene 1 training set into the trained initial voice recognition model, adjusting parameters in the model by a continuous learning method, and adjusting the hyper-parameters of the model by the scene 1 verification set to obtain a 95598 customer service call voice recognition model.
And S5, sequentially inputting the scene 2 training set and the scene 3 training set into the trained 95598 customer service call voice recognition model, continuously adjusting parameters in the model by a continuous learning method, and adjusting the hyper-parameters of the model by a verification set, thereby realizing the updating and upgrading of the 95598 customer service call voice recognition model.
And S6, when new scene data reappear, inputting the new scene data set into the existing 95598 customer service call voice recognition model, and continuously adjusting parameters in the model by a continuous learning method to continuously update the model so as to meet the increasing demand of power customer service.
The continuous learning principle of the embodiment of the invention for 95598 customer service calls is shown in fig. 3, some parameters in the model are important for the old task, and changing the parameters can cause the model to be disastrous forgotten in the old task, so that only those parameters in the model which are not important for the old task can be changed when a new task is learned.
For this reason, this embodiment uses an EWC (Elastic Weight association) continuous learning strategy, and for the s 95598 th call service scenario, the loss function is:
Figure DEST_PATH_IMAGE155
(3)
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE157
the classification loss term is weighted for the character,
Figure DEST_PATH_IMAGE159
in order to obtain a distillation loss term,
Figure DEST_PATH_IMAGE161
in order to learn the regularization term continuously,
Figure DEST_PATH_IMAGE163
and
Figure DEST_PATH_IMAGE165
the weight parameters are respectively a distillation loss term and a continuous learning regularization term.
The formula of the character weighted classification loss term of the embodiment is as follows:
Figure DEST_PATH_IMAGE167
(4)
wherein:
Figure 836498DEST_PATH_IMAGE159
is as followssA set formed by all voice input signals in a 95598 call service scene training set;
Figure DEST_PATH_IMAGE169
is a set
Figure DEST_PATH_IMAGE170
The number of elements in (1);
xis composed of
Figure DEST_PATH_IMAGE171
A speech input signal;
Figure DEST_PATH_IMAGE173
is composed ofxThe length of the corresponding text label;
Figure DEST_PATH_IMAGE175
for the s 95598 th customer service call speech recognition of characters in character set that continuous learning model can outputThe total number;
Figure DEST_PATH_IMAGE177
for the s 95598 th customer service call speech recognition continuous learning model, each character in the character set which can be output issEach 95598 customer services weight in the call traffic scenario. For the firstsIn a 95598 customer service call service scene, chinese characters corresponding to keywords of the scene can be endowed with larger weight, so that the trained s 95598 customer service call voice recognition continuous learning model can accurately recognize the ssEach 95598 customer service call service scene keyword;
Figure DEST_PATH_IMAGE179
is composed ofxSecond of corresponding text labelslQuantization of individual characters, i.e. if the text is labelledlThe character is the second character in the character set which can be output by the s 95598 th customer service call voice recognition continuous learning modeliA character, then
Figure DEST_PATH_IMAGE180
Is composed of
Figure DEST_PATH_IMAGE182
Otherwise
Figure DEST_PATH_IMAGE183
Is composed of
Figure DEST_PATH_IMAGE185
Figure DEST_PATH_IMAGE187
Is a constant value for the smoothed value;
Figure DEST_PATH_IMAGE189
is composed ofxAfter the voice recognition of the s th 95598 customer service call is input into the continuous learning model, the output islThe s th 95598 customer service channel is one characterThe first character set that the speech sound recognition continuous learning model can outputiProbability values of individual characters.
Figure DEST_PATH_IMAGE191
Is a natural logarithmic function.
The distillation loss term for this example is given by:
Figure DEST_PATH_IMAGE193
(5)
wherein:
Figure DEST_PATH_IMAGE195
the model for continuous learning of speech recognition for the s 95598 th customer service call at the beginning of training toxIs an input, an outputlThe character is the second character in the character set which can be output by the s 95598 th customer service call voice recognition continuous learning modeliProbability values of the characters; the meanings of the rest of the mathematical symbols and said character weighted classification loss term
Figure DEST_PATH_IMAGE197
The corresponding mathematical symbols in (1) have the same meaning. In this example, the weights of the distillation loss terms in the loss function
Figure DEST_PATH_IMAGE199
Take 0.
The continuous learning regularization term of the present embodiment is formulated as follows:
Figure DEST_PATH_IMAGE201
(6)
wherein:
Figure DEST_PATH_IMAGE203
the number of parameters of the s 95598 th customer service call speech recognition continuous learning model;
Figure DEST_PATH_IMAGE205
continuous learning model for s 95598 th customer service call speech recognitionmA parameter;
Figure DEST_PATH_IMAGE207
continuous learning model for s 95598 th customer service call speech recognition at the beginning of trainingmA parameter;
Figure DEST_PATH_IMAGE209
is a firstmThe continuous learning weight of each parameter describes the s 95598 th customer service call voice recognition continuous learning model at the beginning of trainingmThe iterative calculation process of the importance degree of each parameter to all customer service call service scenes is as follows:
(1) To pair
Figure DEST_PATH_IMAGE211
For the said firstsA 95598 customer service call service scenario,
Figure DEST_PATH_IMAGE212
weighted cross-entropy loss pair based on tag smoothing for the initial speech recognition model in the public Chinese speech recognition validation setmSecond partial derivatives of the individual parameters;
(2) To pair
Figure DEST_PATH_IMAGE214
For the said secondsA 95598 customer service call service scenario,
Figure DEST_PATH_IMAGE215
is as follows
Figure DEST_PATH_IMAGE217
First of individual customer service call service scenemThe continuous learning weight of the parameter plus the second
Figure DEST_PATH_IMAGE218
Speech recognition continuation obtained by training of individual customer service call service sceneThe learning model is in
Figure 666527DEST_PATH_IMAGE218
Weighted cross entropy loss pair based on label smoothing in individual customer service call service scene verification setmSecond partial derivatives of the parameters.
In this embodiment, the weight of the regularization term in the loss function
Figure DEST_PATH_IMAGE220
10000 was taken. The optimizers for learning the new task are Adam optimizers, the learning rate is 0.0001, and the iteration round number is 100. And searching all the trained voice recognition models by using a Beam Search method to obtain the optimal Chinese text, wherein the Beam Width is set to be 5.
In order to verify the practical application effect of the invention, the invention collects 178h AISHELL-1 open voice data and 24.3h 95598 customer service call voice data to verify the algorithm, and divides the 95598 customer service call voice data into 3 scene data sets, namely 6h fault repair voice (scene 1), 8.7h electric charge bill voice (scene 2) and 9.6h meter reading and charging voice (scene 3), wherein each data set is as follows: 2: the scale of 2 is divided into a training set, a validation set, and a test set.
In this embodiment, a Character Error Rate (CER) is used as an evaluation index of a model identification result, and a specific expression is as follows:
Figure DEST_PATH_IMAGE222
(7)
wherein:IDSrespectively representing the number of words to be inserted, deleted and replaced when the recognized words are compared with the words in the standard sentence;Nrepresenting the total number of words of the sentence.
1. Word error rate result comparison of speech recognition models with and without continuous learning
The recognition word error rate results of the speech recognition model and the non-continuous learning speech recognition model for the speech data of each scene are shown in table 1. In the initial speech recognition model trained by using the public speech data set, the speech recognition result of each 95598 telephone service scene is very poor, and the speech recognition result after training by using the 95598 customer service call speech data set is obviously improved greatly. Therefore, through a large amount of public voice data sets and a small amount of 95598 customer service call voice data sets, the method provided by the invention can train and obtain a voice recognition model aiming at the 95598 customer service call.
TABLE 1 error Rate results Table of Speech recognition models with or without continuous learning
Figure DEST_PATH_IMAGE224
Furthermore, as can be seen from the table, after undergoing training of the scene 2 data, the word error rate of the speech recognition model without continuous learning in the scene 1 is increased from 27.268 to 28.196, while the word error rate of the speech recognition model based on continuous learning in the scene 1 is decreased from 27.045 to 25.361; after training of scene 3 data, although the word error rates of the voice recognition models without continuous learning in scene 1 and scene 2 are not increased, the word error rates are not obviously reduced, and the word error rates of the voice recognition models based on continuous learning in scene 1 and scene 2 are reduced. Therefore, compared with the voice recognition model without continuous learning, the voice recognition model based on continuous learning still has strong adaptability in the original scene after going through each scene.
2. Word error rate result comparison of speech recognition models based on continuous learning and transfer learning
The error rate recognition results of the speech recognition model and the speech recognition model based on the transfer learning for the speech data of each scene are shown in table 2. As can be seen from the table, compared with the speech recognition model which directly utilizes each scene data set for transfer learning, the speech recognition model of the invention has much lower error rate in each scene after being trained by 3 scene data sets. Therefore, the speech recognition model based on continuous learning can fuse new knowledge on the basis of keeping old knowledge, so that the speech recognition model has good speech recognition effect in various scenes.
TABLE 2 error Rate results Table for Speech recognition models based on continuous learning and based on transfer learning
Figure DEST_PATH_IMAGE226
In summary, in the initial model trained by the public voice data set, the invention can construct the 95598 customer service call telephone voice recognition model by only using a small amount of 95598 customer service call voice data, and can enable the model to continuously adapt to new telephone scenes and conversation contents. Compared with the existing speech recognition method based on transfer learning, the method can ensure that the model cannot be disastrous forgetting to the original scene after being trained by using new scene data.
In the embodiment, the call voice duration of the 95598 client in 3 service scenes is only 24.3h, and when the durations of the public voice data and the 95598 client call voice data are larger, the method and the system have a more accurate voice recognition result.
Although the embodiments of the present invention and the accompanying drawings are disclosed for illustrative purposes, those skilled in the art will appreciate that: various substitutions, changes and modifications are possible without departing from the spirit and scope of the invention and the appended claims, and therefore the scope of the invention is not limited to the disclosure of the embodiments and the accompanying drawings.

Claims (6)

1. A customer service call voice recognition method based on continuous learning is characterized in that: the identification method comprises the following steps:
s1, training an initial voice recognition model by using a public Chinese voice recognition data set;
s2, setting the number of scenes as S, obtaining voice data and text marking data of an S95598 th customer service call service scene for S =1 and adapting to the S95598 th customer service call service scene, and adjusting the initial voice recognition model parameters by using a continuous learning method to obtain an S95598 th customer service call voice recognition continuous learning model;
s3, for S is larger than or equal to 2, in order to adapt to the S95598 th customer service call service scene, acquiring voice data and text marking data of the S95598 th customer service call service scene, and adjusting the S-1 st 95598 th customer service call voice recognition continuous learning model parameters by using a continuous learning method to acquire an S95598 th customer service call voice recognition continuous learning model;
s4, inputting the voice of the client service call to be recognized into an S95598-th client service call voice recognition continuous learning model to obtain a Chinese text of the voice of the client service call to be recognized, wherein the voice of the client service call to be recognized is 95598;
aiming at the continuous learning of 95598 customer service calls, certain parameters in an initial speech recognition model are important for an old task, and the model can be disastrous forgotten in the old task by changing the parameters, so that only the parameters which are less important for the old task in the model can be changed when a new task is learned; for this purpose, using the EWC continuous learning strategy, for the s 95598 th call traffic scenario, the loss function is:
L s =W s ++λ d D sr R s (3)
wherein, W s Weighting the characters with classification loss terms, D s For distillation loss term, R s For continuous learning of regularization term, λ d And λ r Weight parameters of a distillation loss term and a continuous learning regularization term respectively;
the formula of the character weighted classification loss term is as follows:
Figure FDA0004070676230000011
wherein: d s A set formed by all voice input signals in an s 95598 th call service scene training set;
|D s l is the set D s The number of elements in (1);
x is D s A speech input signal;
l (x) is the length of the text label corresponding to x;
K s to serve the call for the s 95598 th customerThe total number of characters in the character set which can be output by the speech recognition continuous learning model;
Figure FDA0004070676230000021
for the weight of each character in the character set which can be output by the s 95598-th customer service call voice recognition continuous learning model in the s 95598-th customer service call service scene, for the s 95598-th customer service call service scene, a Chinese character corresponding to a keyword can be endowed with a larger weight, so that the trained s 95598-th customer service call voice recognition continuous learning model can accurately recognize the keyword of the s 95598-th customer service call service scene;
Figure FDA0004070676230000022
and (3) quantizing the soft label of the ith character marked for the text corresponding to the x, namely if the ith character marked for the text is the ith character in the character set which can be output by the s 95598 customer service call voice recognition continuous learning model
Figure FDA0004070676230000023
Is 1-epsilon, otherwise
Figure FDA0004070676230000024
Is epsilon/(K-1);
ε is a smooth value and is a constant;
Figure FDA0004070676230000025
after x is input into the s 95598-th customer service call voice recognition continuous learning model, the output l-th character is the probability value of the i-th character in the character set which can be output by the s 95598-th customer service call voice recognition continuous learning model;
log is a natural logarithmic function;
the distillation loss term is formulated as follows:
Figure FDA0004070676230000026
wherein:
Figure FDA0004070676230000027
when training is started for the s 95598-th customer service call voice recognition continuous learning model, x is used as input, and the output l-th character is the probability value of the i-th character in a character set which can be output by the s 95598-th customer service call voice recognition continuous learning model; the meanings of the rest of the mathematical symbols and the said character weighted classification loss term W s The weight λ of distillation loss term in the loss function d Taking 0;
the continuous learning regularization term formula is as follows:
Figure FDA0004070676230000031
wherein: m is a group of s The number of parameters of the s 95598 th customer service call speech recognition continuous learning model;
Figure FDA0004070676230000032
an mth parameter of the continuous learning model for the s 95598 th customer service call voice recognition;
Figure FDA0004070676230000033
the mth parameter of the model for the s 95598 customer service call voice recognition continuous learning at the beginning of training;
Figure FDA0004070676230000034
the continuous learning weight of the mth parameter is characterized by describing the s 95598 customer service call voice recognition continuous learning model, and when the training is started, the mth parameter is used for matching all the customer services in the preambleThe iterative calculation process of the importance degree of the call service scene is as follows:
(1) For s =1, for the s 95598 th customer service call traffic scenario,
Figure FDA0004070676230000035
second partial derivatives of weighted cross entropy loss based on label smoothing for the mth parameter for the initial speech recognition model in the public Chinese speech recognition validation set;
(2) For S is more than or equal to 2, for the S95598 customer service call service scene,
Figure FDA0004070676230000036
the second partial derivative of the weighting cross entropy loss based on label smoothing of the voice recognition continuous learning model obtained by training the (s-1) th customer service communication service scene in the (s-1) th customer service communication service scene verification set to the (m) th parameter is added to the continuous learning weight of the (m) th parameter of the (s-1) th customer service communication service scene.
2. The customer service call voice recognition method based on continuous learning of claim 1, wherein: the text of the public chinese speech recognition data set in step S1 includes more than 4000 commonly used chinese characters.
3. The customer service call voice recognition method based on continuous learning of claim 1, wherein: the voice data and the text marking data are obtained through a manual marking method, and the training data scale is enlarged through a data augmentation technology before the parameters of the initial voice recognition model and the voice recognition continuous learning model are adjusted.
4. The continuous learning-based customer service call voice recognition method according to claim 1, characterized in that: the public Chinese voice recognition data set, the data set formed by the voice data and the text marking data are respectively divided into a training set, a verification set and a test set, wherein the training set is used for training the model, the verification set is used for optimizing the hyper-parameters of the model, and the test set is used for testing the voice recognition effect of the model.
5. The customer service call voice recognition method based on continuous learning of claim 1, wherein: the initial speech recognition model and the 95598 customer service call speech recognition continuous learning model use an end-to-end LAS deep learning structure, comprising 3 parts of a deep learning encoder, an attention mechanism and a deep learning decoder;
the deep learning encoder uses the same network structure; the deep learning decoder uses the same network structure except the last layer;
the last layer of the deep learning decoder of the initial voice recognition model uses a full connection structure with K output network nodes and is converted into K probability values through a Softmax activation function, wherein K is the total number of characters in a character set which can be output by the initial voice recognition model, and the characters in the character set which can be output by the initial voice recognition model comprise a space character, a start/stop character, an unknown Chinese character identifier and all different Chinese characters contained in the public Chinese voice recognition training set;
the last layer of the deep learning decoder of the 95598 customer service call speech recognition continuous learning model uses a model with K s The full connection structure of each output network node is converted into K through a Softmax activation function s A probability value, K s The method comprises the steps that the total number of characters in a character set which can be output by an s 95598 th customer service call voice recognition continuous learning model is counted, the characters in the character set which can be output by the s 95598 th customer service call voice recognition continuous learning model comprise a space character, a start/stop character, an unknown Chinese character identifier, all different Chinese characters contained in the open Chinese voice recognition training set and all the first s 95598 call service scene training sets, and K s The total number K of characters in the character set that can be output by the initial speech recognition model is related to the following relationship:
Figure FDA0004070676230000041
wherein: Δ K 1 The total number of Chinese characters which do not belong to the public Chinese speech recognition training set in the 1 st 95598 customer service call service scene training set;
for p is more than or equal to 2, delta K p The total number of Chinese characters which do not belong to the public Chinese speech recognition training set and all the first S-1 95598 conversation service scene training sets in the pth 95598 customer service conversation service scene training set;
for s is more than or equal to 1, the characters corresponding to the first K probability values output by the s 95598-th customer service call voice recognition continuous learning model have the same meanings as the characters corresponding to the K probability values output by the initial voice recognition model;
for s is more than or equal to 1, the(s) 95598 th customer service call speech recognition continuous learning model outputs the (K + 1) th to the (K + delta K) 1 The probability values correspond to all different Chinese characters which do not belong to the public Chinese speech recognition training set in the 1 st 95598 customer service call service scene training set, and are sequenced according to the sequence of Chinese character ASCII codes from small to large;
for s is more than or equal to 2 and p is more than or equal to 2, the Kth + delta K output by the s 95598 customer service call voice recognition continuous learning model p-1 To K + Δ K p The probability values correspond to all different Chinese characters in the p 95598 th customer service call service scene training set, which do not belong to the public Chinese speech recognition training set and all the first p-1 95598 call service scene training sets, and are sorted according to the sequence of the Chinese character ASCII codes from small to large.
6. The continuous learning-based customer service call voice recognition method according to claim 1, characterized in that: the loss function used by the initial speech recognition model is a cross entropy loss function L based on label smoothing init Defined as follows:
Figure FDA0004070676230000051
wherein:D init A set formed by all voice input signals in the public Chinese voice recognition training set;
|D init is set D init The number of elements in (1);
x is D init A speech input signal;
l (x) is the length of the text label corresponding to x;
k is the total number of characters in the character set which can be output by the initial speech recognition model;
q i,l (x) A quantization soft label for the ith character of the text label corresponding to x, that is, if the ith character of the text label corresponding to x is the ith character in the character set that can be output by the initial speech recognition model, q is i,l (x) Is 1-epsilon, otherwise q i,l (x) Is epsilon/(K-1);
ε is a smooth value and is a constant;
Figure FDA0004070676230000061
after x is input into the initial voice recognition model, the output ith character is the probability value of the ith character in the character set which can be output by the initial voice recognition model;
log is a natural logarithmic function.
CN202211604120.3A 2022-12-14 2022-12-14 Customer service call voice recognition method based on continuous learning Active CN115662401B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211604120.3A CN115662401B (en) 2022-12-14 2022-12-14 Customer service call voice recognition method based on continuous learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211604120.3A CN115662401B (en) 2022-12-14 2022-12-14 Customer service call voice recognition method based on continuous learning

Publications (2)

Publication Number Publication Date
CN115662401A CN115662401A (en) 2023-01-31
CN115662401B true CN115662401B (en) 2023-03-10

Family

ID=85023682

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211604120.3A Active CN115662401B (en) 2022-12-14 2022-12-14 Customer service call voice recognition method based on continuous learning

Country Status (1)

Country Link
CN (1) CN115662401B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116071079B (en) * 2023-03-30 2023-06-23 国家电网有限公司客户服务中心 Customer satisfaction prediction method based on customer service call voice

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112786028A (en) * 2021-02-07 2021-05-11 百果园技术(新加坡)有限公司 Acoustic model processing method, device, equipment and readable storage medium
CN112951213A (en) * 2021-02-09 2021-06-11 中国科学院自动化研究所 End-to-end online voice detection and recognition method, system and equipment
CN113674745A (en) * 2020-04-30 2021-11-19 京东数字科技控股有限公司 Voice recognition method and device
CN113870845A (en) * 2021-09-26 2021-12-31 平安科技(深圳)有限公司 Speech recognition model training method, device, equipment and medium
CN114078471A (en) * 2020-08-20 2022-02-22 京东科技控股股份有限公司 Network model processing method, device, equipment and computer readable storage medium
CN115359784A (en) * 2022-10-21 2022-11-18 成都爱维译科技有限公司 Civil aviation land-air voice recognition model training method and system based on transfer learning

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015198317A1 (en) * 2014-06-23 2015-12-30 Intervyo R&D Ltd. Method and system for analysing subjects
CN109740155A (en) * 2018-12-27 2019-05-10 广州云趣信息科技有限公司 A kind of customer service system artificial intelligence quality inspection rule self concludes the method and system of model
EP4154248A2 (en) * 2020-10-02 2023-03-29 Google LLC Systems and methods for training dual-mode machine-learned speech recognition models

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113674745A (en) * 2020-04-30 2021-11-19 京东数字科技控股有限公司 Voice recognition method and device
CN114078471A (en) * 2020-08-20 2022-02-22 京东科技控股股份有限公司 Network model processing method, device, equipment and computer readable storage medium
CN112786028A (en) * 2021-02-07 2021-05-11 百果园技术(新加坡)有限公司 Acoustic model processing method, device, equipment and readable storage medium
CN112951213A (en) * 2021-02-09 2021-06-11 中国科学院自动化研究所 End-to-end online voice detection and recognition method, system and equipment
CN113870845A (en) * 2021-09-26 2021-12-31 平安科技(深圳)有限公司 Speech recognition model training method, device, equipment and medium
CN115359784A (en) * 2022-10-21 2022-11-18 成都爱维译科技有限公司 Civil aviation land-air voice recognition model training method and system based on transfer learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘宇宸 等.跨模态信息融合的端到端语音翻译.2022,第1-13页. *

Also Published As

Publication number Publication date
CN115662401A (en) 2023-01-31

Similar Documents

Publication Publication Date Title
CN112487143B (en) Public opinion big data analysis-based multi-label text classification method
CN109493166A (en) A kind of construction method for e-commerce shopping guide's scene Task conversational system
CN108287858A (en) The semantic extracting method and device of natural language
CN112199479A (en) Method, device and equipment for optimizing language semantic understanding model and storage medium
CN112307153B (en) Automatic construction method and device of industrial knowledge base and storage medium
CN108899013A (en) Voice search method, device and speech recognition system
CN115662401B (en) Customer service call voice recognition method based on continuous learning
CN110362797B (en) Research report generation method and related equipment
CN111737486B (en) Person post matching method and storage device based on knowledge graph and deep learning
WO2023035330A1 (en) Long text event extraction method and apparatus, and computer device and storage medium
CN110597976A (en) Key sentence extraction method and device
CN113158671B (en) Open domain information extraction method combined with named entity identification
CN110909529A (en) User emotion analysis and prejudgment system of company image promotion system
CN112256859A (en) Recommendation method based on bidirectional long-short term memory network explicit information coupling analysis
CN112463944A (en) Retrieval type intelligent question-answering method and device based on multi-model fusion
CN109408803A (en) A method of it semantic understanding for subjective item natural language and corrects
CN112115264A (en) Text classification model adjusting method facing data distribution change
CN116011456A (en) Chinese building specification text entity identification method and system based on prompt learning
Hongli [Retracted] Design and Application of English Grammar Error Correction System Based on Deep Learning
CN116342167B (en) Intelligent cost measurement method and device based on sequence labeling named entity recognition
CN115455144A (en) Data enhancement method of completion type space filling type for small sample intention recognition
CN110334189B (en) Microblog topic label determination method based on long-time and short-time and self-attention neural network
CN113779382A (en) Network public opinion prediction method based on microblog data
Chen [Retracted] The Prediction of English Online Network Performance Based on the XGBoost Algorithm
Zhang CNNA: A study of Convolutional Neural Networks with Attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant