CN115662401B

CN115662401B - Customer service call voice recognition method based on continuous learning

Info

Publication number: CN115662401B
Application number: CN202211604120.3A
Authority: CN
Inventors: 何学东; 孙晓倩; 常利建; 杨华; 潘瑞平; 彭渤; 杜维明; 张伟蓉; 王迪; 陈晓龙; 孙丽蓉
Original assignee: State Grid Co ltd Customer Service Center
Current assignee: State Grid Co ltd Customer Service Center
Priority date: 2022-12-14
Filing date: 2022-12-14
Publication date: 2023-03-10
Anticipated expiration: 2042-12-14
Also published as: CN115662401A

Abstract

A customer service call voice recognition method based on continuous learning comprises the steps of firstly training an initial voice recognition model; secondly, setting the number of scenes assRespectively for

And

obtaining the firstsMarking the voice and text of the client service communication service scene with 95598 data, and using continuous learning method to identify the initial voice recognition model and the second voice recognition model

Adjusting parameters of a 95598 customer service call voice recognition continuous learning model to obtains95598 customer service call voice recognition continuous learning model; finally, 95598 customer service call voice to be recognized is input to the second stepsAnd a 95598 customer service call voice recognition continuous learning model is used for obtaining a 95598 customer service call voice Chinese text to be recognized. The invention can continuously adapt to the change of the call service scene, continuously improve the adaptability of the speech recognition model, has the continuous learning ability facing the new call service scene so as to maintain or improve the speech recognition effect of the model in each scene, and simultaneously overcomes the problem of catastrophic forgetting of the model in the process of adapting to the new call service scene.

Description

Customer service call voice recognition method based on continuous learning

Technical Field

The invention belongs to the technical field of voice recognition, and particularly relates to a customer service call voice recognition method based on continuous learning.

Background

In the process of power supply service of the power enterprise, the customer service is used as an important operation activity, which not only relates to the vital interests of power customers and the operational benefits of the power enterprise, but also influences the social responsibility and the enterprise image of the power industry as the national important post industry. With the deep progress of the power system reform and the national comprehensive deepening reform, the creation of good customer experience and customer satisfaction degree provides greater challenges for the management level of power enterprises.

The 95598 national power supply service hot line is an important customer service channel of a national power grid company, has convenience and quickness, becomes a preferred mode for providing the appeal for power customers, and the normalization of the 95598 worksheet file is a necessary premise for subsequent analysis and solution of the appeal for the customers.

However, due to the problems of different language expression abilities and different feedback information accuracies of power customers in different levels, the comprehensiveness, the accuracy and the objectivity of the artificially filled 95598 worksheet file cannot be guaranteed. Therefore, in order to meet the technical requirements of telephone voice recognition, the power customer service center performs voice recognition on telephone voice data of customers by using an artificial intelligence technology, and the standardization of work order files is facilitated to be guaranteed.

The existing voice recognition models facing the field of customer service conversation are all put into application after model training is completed, and the models are not adjusted any more in the application process, so that the problem of voice recognition accuracy reduction caused by conversation service scene change cannot be avoided. However, the national grid company is actively promoting the construction of a new power system mainly based on new energy resources, serving the dual-carbon target in full, and under the new trend of power system transformation, new telephone scenes and conversation contents are continuously appearing in the face of the change of user groups and user power utilization modes.

Therefore, with the continuous expansion of power services and the continuous emergence of telephone scenes, a speech recognition method capable of continuously adapting to the change of call service scenes is urgently needed, the adaptability of a speech recognition model is continuously improved on the basis of ensuring the universality of the original telephone scenes, namely, the model has continuous learning capability facing new call service scenes, so that the speech recognition effect of the model under each scene is maintained or improved, and meanwhile, the catastrophic forgetting problem of the model in the process of adapting to the new call service scenes is solved.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a customer service call voice recognition method based on continuous learning.

The technical problem to be solved by the invention is realized by the following technical scheme:

a customer service call voice recognition method based on continuous learning is characterized in that: the identification method comprises the following steps:

s1, training an initial voice recognition model by using a public Chinese voice recognition data set;

s2, setting the number of scenes assTo for

To adapt tosA 95598 customer service call service scene is obtainedsThe voice data and text marking data of the client service communication service scene 95598 are adjusted by using a continuous learning method to obtain the first voice recognition model parameters95598 customer service call voice recognitionA continuous learning model;

s3, for

To adapt tosA 95598 customer service call service scene is obtainedsThe voice data and text marking data of the 95598 customer service call service scene use the continuous learning method to

Adjusting the model parameters of 95598 customer service call speech recognition continuous learning to obtains95598 customer service call voice recognition continuous learning model;

s4, inputting the 95598 customer service call voice to be recognized to the first stepsAnd a 95598 customer service call voice recognition continuous learning model is used for obtaining a 95598 customer service call voice Chinese text to be recognized.

Moreover, the text of the public chinese speech recognition data set in step S1 includes more than 4000 common chinese characters.

And the voice data and the text marking data are obtained by a manual marking method, and the training data scale is enlarged by using a data augmentation technology before the parameters of the initial voice recognition model and the voice recognition continuous learning model are adjusted.

And the public Chinese speech recognition data set, the data set formed by the speech data and the text marking data are respectively divided into a training set, a verification set and a test set, wherein the training set is used for training the model, the verification set is used for optimizing the hyper-parameters of the model, and the test set is used for testing the speech recognition effect of the model.

Moreover, the initial speech recognition model and the 95598 customer service call speech recognition continuous learning model use an end-to-end LAS deep learning structure, comprising 3 parts of a deep learning encoder, attention mechanism and deep learning decoder;

the deep learning encoder uses the same network structure; the deep learning decoder uses the same network structure except the last layer;

the above-mentionedDeep learning decoder with initial speech recognition model last layer usageKThe full connection structure of each output network node is converted into a full connection structure through a Softmax activation functionKA probability value, whereinKThe number of characters in the character set which can be output by the initial speech recognition model is the total number of characters in the character set which can be output by the initial speech recognition model, wherein the characters in the character set which can be output by the initial speech recognition model comprise a space character, a start/stop character, an unknown Chinese character identifier and all different Chinese characters contained in the public Chinese speech recognition training set;

the final layer of the deep learning decoder of the 95598 customer service call speech recognition continuous learning model uses a model with

A full connection structure of each output network node, and is converted into a full connection structure through a Softmax activation function

A probability value of the number of the probability values,

is a firsts95598 total number of characters in the character set which can be output by the customer service call speech recognition continuous learning models95598 customer service call speech recognition continuous learning model can output characters in character set including space character, start/stop character, unknown Chinese character identifier, and the public Chinese speech recognition training set and all previous characterssAll the different chinese characters contained in a 95598 conversation business scenario training set,

the total number of characters in the character set that can be output with the initial speech recognition modelKThe following relationships exist:

wherein:

the total number of Chinese characters which do not belong to the public Chinese speech recognition training set in the 1 st 95598 customer service call service scene training set;

to pair

，

Is as followspThe training set of the scene of the client service communication service of 95598 does not belong to the training set of the public Chinese speech recognition and all the previous training sets

The total number of Chinese characters in a 95598 call service scene training set;

for is to

Before the s 95598 customer service call speech recognition continuous learning model is outputKThe characters corresponding to the probability values being output by the initial speech recognition modelKThe characters corresponding to the probability values have the same meanings;

to pair

The s 95598 th customer service call voice recognition continuous learning model output

To

The probability values correspond to all different Chinese characters which do not belong to the public Chinese speech recognition training set in the 1 st 95598 customer service call service scene training set and are sequenced from small to large according to Chinese character ASCII codes;

to pair

And

the s 95598 th customer service call speech recognition continuous learning model output

To

The probability values correspond topThe training set of the 95598 customer service call service scene does not belong to the public Chinese speech recognition training set and all the former training sets

All the different Chinese characters in the training set of the 95598 call service scene are sorted according to the sequence of the Chinese character ASCII codes from small to large.

Moreover, the loss function used by the initial speech recognition model is a cross-entropy loss function based on label smoothing

The definition is as follows:

（2）

wherein:

a set formed by all voice input signals in the public Chinese voice recognition training set;

is a set

The number of elements in (1);

xis composed of

In (1)A speech input signal;

is composed ofxThe length of the corresponding text label;

Kthe total number of characters in the character set which can be output for the initial speech recognition model;

is composed ofxSecond of corresponding text labelslQuantization of individual characters by soft labels, i.e. ifxSecond of corresponding text labelslThe first character in the character set that can be output by the initial speech recognition modeliA character, then

Is composed of

Otherwise, otherwise

Is composed of

；

Is a constant value for the smoothed value;

is composed ofxAfter input into the initial speech recognition model, outputlThe first character in the character set that can be output by the initial speech recognition modeliA probability value for each character;

is a natural logarithmic functionAnd (4) counting.

And also, the said firstsLoss function used by model for continuous learning of voice recognition of 95598 customer service calls

Comprising 3 terms, each being a weighted classification loss term for the character

Term of distillation loss

And continuous learning regularization term

And is defined by the formula:

（3）

wherein:

and

respectively are weight parameters of corresponding loss items;

character weighted classification loss term

By adding said secondsThe weight of Chinese characters corresponding to keywords of a 95598 customer service call service scene is increasedsThe identification precision of keywords of a 95598 customer service call service scene;

term of distillation loss

Through the s 95598 customer service call voice recognition continuous learning model, before the training begins, through the ssThe 95598 customer service communication service scene training set constructs distillation loss of each voice input signal so as to overcome the defect that the voice input signals are subjected to distillation loss in the training processInfluence on model training caused by uneven distribution of Chinese characters in each 95598 customer service call service scene;

continuous learning regularization term

The disaster-resistant forgetting capability of the model is improved by introducing a continuous learning strategy;

character weighted classification loss term

Using a weighted cross-entropy loss function based on label smoothing, the following is defined:

（4）

wherein:

is as followssA set formed by all voice input signals in a 95598 call service scene training set;

is a set

The number of elements in (1);

xis composed of

A speech input signal;

is composed ofxThe length of the corresponding text label;

model for continuous learning of voice recognition of s-th 95598 customer service callThe total number of characters in the character set which can be output;

for the s 95598 th customer service call speech recognition continuous learning model, each character in the character set which can be output issWeight in a 95598 customer service Call service scenario forsIn a 95598 customer service call service scene, chinese characters corresponding to keywords of the scene can be endowed with larger weight, so that the trained s 95598 customer service call voice recognition continuous learning model can accurately recognize the ssKeywords of a 95598 customer service call service scene;

is composed ofxSecond of corresponding text labelslQuantization soft labels for individual characters, i.e. if the text is labeled the firstlThe character is the second character in the character set which can be output by the s 95598 th customer service call voice recognition continuous learning modeliA character, then

Is composed of

Otherwise

Is composed of

；

Is a constant value for the smoothed value;

is composed ofxAfter the voice recognition of the s th 95598 customer service call is input into the continuous learning model, the output islThe character is the second character in the character set which can be output by the s 95598 th customer service call voice recognition continuous learning modeliA probability value for each character;

is a natural logarithmic function;

term of distillation loss

The definition is as follows:

（5）

wherein:

the model for continuous learning of speech recognition for the s 95598 th customer service call at the beginning of training toxFor input, outputlThe character is the second character in the character set which can be output by the s 95598 th customer service call voice recognition continuous learning modeliA probability value for each character; meaning of the rest of the mathematical symbols and said character-weighted classification loss term

The meanings of the corresponding mathematical symbols in the Chinese characters are the same;

the elastic weight keeping method for continuous learning based on parameter regularization strategy and corresponding continuous learning regularization item

The definition is as follows:

（6）

wherein:

is the s 9 th5598 number of parameters of the customer service call speech recognition continuous learning model;

model of continuous learning for s 95598 th customer service call speech recognitionmA parameter;

continuous learning model for s 95598 th customer service call speech recognition at the beginning of trainingmA parameter;

is as followsmThe continuous learning weight of each parameter describes the s 95598 th customer service call voice recognition continuous learning model at the beginning of trainingmThe iterative calculation process of the importance degree of each parameter to all customer service call service scenes is as follows:

(1) To pair

For the said firstsA 95598 customer service call service scenario,

weighted cross-entropy loss pair based on tag smoothing for the initial speech recognition model in the public Chinese speech recognition validation setmSecond partial derivatives of the individual parameters;

(2) To pair

For said firstsA 95598 customer service call service scenario,

is as follows

First of service communication service scenemThe continuous learning weight of the parameter plus the second

The speech recognition continuous learning model obtained by training the service scene of the individual customer service is

Weighted cross entropy loss pair based on label smoothing in individual customer service call service scene verification setmSecond partial derivatives of the parameters.

The invention has the advantages and beneficial effects that:

compared with the prior art, the client service call voice recognition method based on continuous learning is provided, and considering the problem that available voice resources are few due to the fact that 95598 telephone voice data relate to client privacy, the method only needs a small amount of 95598 telephone voice data to construct a 95598 telephone voice recognition model, and enables the model to continuously adapt to new telephone scenes and conversation contents; compared with a speech recognition method based on transfer learning, the method can ensure that the model cannot forget the original scene after being trained by using new scene data.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a step diagram of an embodiment of the present invention;

FIG. 3 is a schematic diagram of the continuous learning method of the present invention.

Detailed Description

The present invention is further illustrated by the following specific examples, which are intended to be illustrative, not limiting and are not intended to limit the scope of the invention.

A customer service call voice recognition method based on continuous learning is innovative in that: the method comprises the following steps:

s1, collecting voice data from an open voice data set AISHELL-1 to form an initial voice data set, wherein the data set comprises audio data and corresponding text labels and is divided into an AISHELL-1 training set, an AISHELL-1 verification set and an AISHELL-1 testing set.

S2, training an initial voice recognition model on the AISHELL-1 training set, using a SpecAugent voice data enhancement method to improve the scale of the training set in the training process, and adjusting the hyper-parameters of the model through the AISHELL-1 verification set. The initial speech recognition model uses a Transformer network structure, in which an encoder uses 12 Transformer encoding modules and a decoder uses 6 Transformer decoding modules. The loss function for training the initial speech recognition model is a cross entropy loss function based on label smoothing

The formula is as follows:

（2）

wherein:

is a set

The number of elements in (1);

xis composed of

A speech input signal;

is composed ofxThe length of the corresponding text label;

Is composed of

Otherwise

Is composed of

；

Is a constant value for the smoothed value;

is composed ofxAfter input into the initial speech recognition model, outputlThe first character in the character set that can be output by the initial speech recognition modeliProbability values of individual characters.

Is a natural logarithmic function. For the present embodiment, take

. The optimizer for training the initial speech recognition model is an Adam optimizer, the learning rate is 0.0001, and the iteration round number is 100.

And S3, collecting 95598 customer channel telephone voice data, carrying out text annotation on the telephone voice data, constructing 3 scene data sets by using 95598 voice audio data and corresponding annotation texts, respectively representing the fault repair voice of the scene 1, the electric charge bill voice of the scene 2 and the meter reading charge voice of the scene 3, and dividing each scene data set into a training set, a verification set and a test set.

And S4, inputting the scene 1 training set into the trained initial voice recognition model, adjusting parameters in the model by a continuous learning method, and adjusting the hyper-parameters of the model by the scene 1 verification set to obtain a 95598 customer service call voice recognition model.

And S5, sequentially inputting the scene 2 training set and the scene 3 training set into the trained 95598 customer service call voice recognition model, continuously adjusting parameters in the model by a continuous learning method, and adjusting the hyper-parameters of the model by a verification set, thereby realizing the updating and upgrading of the 95598 customer service call voice recognition model.

And S6, when new scene data reappear, inputting the new scene data set into the existing 95598 customer service call voice recognition model, and continuously adjusting parameters in the model by a continuous learning method to continuously update the model so as to meet the increasing demand of power customer service.

The continuous learning principle of the embodiment of the invention for 95598 customer service calls is shown in fig. 3, some parameters in the model are important for the old task, and changing the parameters can cause the model to be disastrous forgotten in the old task, so that only those parameters in the model which are not important for the old task can be changed when a new task is learned.

For this reason, this embodiment uses an EWC (Elastic Weight association) continuous learning strategy, and for the s 95598 th call service scenario, the loss function is:

（3）

wherein the content of the first and second substances,

the classification loss term is weighted for the character,

in order to obtain a distillation loss term,

in order to learn the regularization term continuously,

and

the weight parameters are respectively a distillation loss term and a continuous learning regularization term.

The formula of the character weighted classification loss term of the embodiment is as follows:

（4）

wherein:

is a set

The number of elements in (1);

xis composed of

A speech input signal;

is composed ofxThe length of the corresponding text label;

for the s 95598 th customer service call speech recognition of characters in character set that continuous learning model can outputThe total number;

for the s 95598 th customer service call speech recognition continuous learning model, each character in the character set which can be output issEach 95598 customer services weight in the call traffic scenario. For the firstsIn a 95598 customer service call service scene, chinese characters corresponding to keywords of the scene can be endowed with larger weight, so that the trained s 95598 customer service call voice recognition continuous learning model can accurately recognize the ssEach 95598 customer service call service scene keyword;

is composed ofxSecond of corresponding text labelslQuantization of individual characters, i.e. if the text is labelledlThe character is the second character in the character set which can be output by the s 95598 th customer service call voice recognition continuous learning modeliA character, then

Is composed of

Otherwise

Is composed of

；

Is a constant value for the smoothed value;

is composed ofxAfter the voice recognition of the s th 95598 customer service call is input into the continuous learning model, the output islThe s th 95598 customer service channel is one characterThe first character set that the speech sound recognition continuous learning model can outputiProbability values of individual characters.

Is a natural logarithmic function.

The distillation loss term for this example is given by:

（5）

wherein:

the model for continuous learning of speech recognition for the s 95598 th customer service call at the beginning of training toxIs an input, an outputlThe character is the second character in the character set which can be output by the s 95598 th customer service call voice recognition continuous learning modeliProbability values of the characters; the meanings of the rest of the mathematical symbols and said character weighted classification loss term

The corresponding mathematical symbols in (1) have the same meaning. In this example, the weights of the distillation loss terms in the loss function

Take 0.

The continuous learning regularization term of the present embodiment is formulated as follows:

（6）

wherein:

the number of parameters of the s 95598 th customer service call speech recognition continuous learning model;

continuous learning model for s 95598 th customer service call speech recognitionmA parameter;

is a firstmThe continuous learning weight of each parameter describes the s 95598 th customer service call voice recognition continuous learning model at the beginning of trainingmThe iterative calculation process of the importance degree of each parameter to all customer service call service scenes is as follows:

(1) To pair

For the said firstsA 95598 customer service call service scenario,

(2) To pair

For the said secondsA 95598 customer service call service scenario,

is as follows

First of individual customer service call service scenemThe continuous learning weight of the parameter plus the second

Speech recognition continuation obtained by training of individual customer service call service sceneThe learning model is in

In this embodiment, the weight of the regularization term in the loss function

10000 was taken. The optimizers for learning the new task are Adam optimizers, the learning rate is 0.0001, and the iteration round number is 100. And searching all the trained voice recognition models by using a Beam Search method to obtain the optimal Chinese text, wherein the Beam Width is set to be 5.

In order to verify the practical application effect of the invention, the invention collects 178h AISHELL-1 open voice data and 24.3h 95598 customer service call voice data to verify the algorithm, and divides the 95598 customer service call voice data into 3 scene data sets, namely 6h fault repair voice (scene 1), 8.7h electric charge bill voice (scene 2) and 9.6h meter reading and charging voice (scene 3), wherein each data set is as follows: 2: the scale of 2 is divided into a training set, a validation set, and a test set.

In this embodiment, a Character Error Rate (CER) is used as an evaluation index of a model identification result, and a specific expression is as follows:

（7）

wherein:I、D、Srespectively representing the number of words to be inserted, deleted and replaced when the recognized words are compared with the words in the standard sentence;Nrepresenting the total number of words of the sentence.

1. Word error rate result comparison of speech recognition models with and without continuous learning

The recognition word error rate results of the speech recognition model and the non-continuous learning speech recognition model for the speech data of each scene are shown in table 1. In the initial speech recognition model trained by using the public speech data set, the speech recognition result of each 95598 telephone service scene is very poor, and the speech recognition result after training by using the 95598 customer service call speech data set is obviously improved greatly. Therefore, through a large amount of public voice data sets and a small amount of 95598 customer service call voice data sets, the method provided by the invention can train and obtain a voice recognition model aiming at the 95598 customer service call.

TABLE 1 error Rate results Table of Speech recognition models with or without continuous learning

Furthermore, as can be seen from the table, after undergoing training of the scene 2 data, the word error rate of the speech recognition model without continuous learning in the scene 1 is increased from 27.268 to 28.196, while the word error rate of the speech recognition model based on continuous learning in the scene 1 is decreased from 27.045 to 25.361; after training of scene 3 data, although the word error rates of the voice recognition models without continuous learning in scene 1 and scene 2 are not increased, the word error rates are not obviously reduced, and the word error rates of the voice recognition models based on continuous learning in scene 1 and scene 2 are reduced. Therefore, compared with the voice recognition model without continuous learning, the voice recognition model based on continuous learning still has strong adaptability in the original scene after going through each scene.

2. Word error rate result comparison of speech recognition models based on continuous learning and transfer learning

The error rate recognition results of the speech recognition model and the speech recognition model based on the transfer learning for the speech data of each scene are shown in table 2. As can be seen from the table, compared with the speech recognition model which directly utilizes each scene data set for transfer learning, the speech recognition model of the invention has much lower error rate in each scene after being trained by 3 scene data sets. Therefore, the speech recognition model based on continuous learning can fuse new knowledge on the basis of keeping old knowledge, so that the speech recognition model has good speech recognition effect in various scenes.

TABLE 2 error Rate results Table for Speech recognition models based on continuous learning and based on transfer learning

In summary, in the initial model trained by the public voice data set, the invention can construct the 95598 customer service call telephone voice recognition model by only using a small amount of 95598 customer service call voice data, and can enable the model to continuously adapt to new telephone scenes and conversation contents. Compared with the existing speech recognition method based on transfer learning, the method can ensure that the model cannot be disastrous forgetting to the original scene after being trained by using new scene data.

In the embodiment, the call voice duration of the 95598 client in 3 service scenes is only 24.3h, and when the durations of the public voice data and the 95598 client call voice data are larger, the method and the system have a more accurate voice recognition result.

Although the embodiments of the present invention and the accompanying drawings are disclosed for illustrative purposes, those skilled in the art will appreciate that: various substitutions, changes and modifications are possible without departing from the spirit and scope of the invention and the appended claims, and therefore the scope of the invention is not limited to the disclosure of the embodiments and the accompanying drawings.

Claims

1. A customer service call voice recognition method based on continuous learning is characterized in that: the identification method comprises the following steps:

s2, setting the number of scenes as S, obtaining voice data and text marking data of an S95598 th customer service call service scene for S =1 and adapting to the S95598 th customer service call service scene, and adjusting the initial voice recognition model parameters by using a continuous learning method to obtain an S95598 th customer service call voice recognition continuous learning model;

s3, for S is larger than or equal to 2, in order to adapt to the S95598 th customer service call service scene, acquiring voice data and text marking data of the S95598 th customer service call service scene, and adjusting the S-1 st 95598 th customer service call voice recognition continuous learning model parameters by using a continuous learning method to acquire an S95598 th customer service call voice recognition continuous learning model;

s4, inputting the voice of the client service call to be recognized into an S95598-th client service call voice recognition continuous learning model to obtain a Chinese text of the voice of the client service call to be recognized, wherein the voice of the client service call to be recognized is 95598;

aiming at the continuous learning of 95598 customer service calls, certain parameters in an initial speech recognition model are important for an old task, and the model can be disastrous forgotten in the old task by changing the parameters, so that only the parameters which are less important for the old task in the model can be changed when a new task is learned; for this purpose, using the EWC continuous learning strategy, for the s 95598 th call traffic scenario, the loss function is:

L _s ＝W _s ++λ _d D _s +λ _r R _s (3)

wherein, W _s Weighting the characters with classification loss terms, D _s For distillation loss term, R _s For continuous learning of regularization term, λ _d And λ _r Weight parameters of a distillation loss term and a continuous learning regularization term respectively;

the formula of the character weighted classification loss term is as follows:

wherein: d _s A set formed by all voice input signals in an s 95598 th call service scene training set;

|D _s l is the set D _s The number of elements in (1);

x is D _s A speech input signal;

l (x) is the length of the text label corresponding to x;

K _s to serve the call for the s 95598 th customerThe total number of characters in the character set which can be output by the speech recognition continuous learning model;

for the weight of each character in the character set which can be output by the s 95598-th customer service call voice recognition continuous learning model in the s 95598-th customer service call service scene, for the s 95598-th customer service call service scene, a Chinese character corresponding to a keyword can be endowed with a larger weight, so that the trained s 95598-th customer service call voice recognition continuous learning model can accurately recognize the keyword of the s 95598-th customer service call service scene;

and (3) quantizing the soft label of the ith character marked for the text corresponding to the x, namely if the ith character marked for the text is the ith character in the character set which can be output by the s 95598 customer service call voice recognition continuous learning model

Is 1-epsilon, otherwise

Is epsilon/(K-1);

ε is a smooth value and is a constant;

after x is input into the s 95598-th customer service call voice recognition continuous learning model, the output l-th character is the probability value of the i-th character in the character set which can be output by the s 95598-th customer service call voice recognition continuous learning model;

log is a natural logarithmic function;

the distillation loss term is formulated as follows:

wherein:

when training is started for the s 95598-th customer service call voice recognition continuous learning model, x is used as input, and the output l-th character is the probability value of the i-th character in a character set which can be output by the s 95598-th customer service call voice recognition continuous learning model; the meanings of the rest of the mathematical symbols and the said character weighted classification loss term W _s The weight λ of distillation loss term in the loss function _d Taking 0;

the continuous learning regularization term formula is as follows:

wherein: m is a group of _s The number of parameters of the s 95598 th customer service call speech recognition continuous learning model;

an mth parameter of the continuous learning model for the s 95598 th customer service call voice recognition;

the mth parameter of the model for the s 95598 customer service call voice recognition continuous learning at the beginning of training;

the continuous learning weight of the mth parameter is characterized by describing the s 95598 customer service call voice recognition continuous learning model, and when the training is started, the mth parameter is used for matching all the customer services in the preambleThe iterative calculation process of the importance degree of the call service scene is as follows:

(1) For s =1, for the s 95598 th customer service call traffic scenario,

second partial derivatives of weighted cross entropy loss based on label smoothing for the mth parameter for the initial speech recognition model in the public Chinese speech recognition validation set;

(2) For S is more than or equal to 2, for the S95598 customer service call service scene,

the second partial derivative of the weighting cross entropy loss based on label smoothing of the voice recognition continuous learning model obtained by training the (s-1) th customer service communication service scene in the (s-1) th customer service communication service scene verification set to the (m) th parameter is added to the continuous learning weight of the (m) th parameter of the (s-1) th customer service communication service scene.

2. The customer service call voice recognition method based on continuous learning of claim 1, wherein: the text of the public chinese speech recognition data set in step S1 includes more than 4000 commonly used chinese characters.

3. The customer service call voice recognition method based on continuous learning of claim 1, wherein: the voice data and the text marking data are obtained through a manual marking method, and the training data scale is enlarged through a data augmentation technology before the parameters of the initial voice recognition model and the voice recognition continuous learning model are adjusted.

4. The continuous learning-based customer service call voice recognition method according to claim 1, characterized in that: the public Chinese voice recognition data set, the data set formed by the voice data and the text marking data are respectively divided into a training set, a verification set and a test set, wherein the training set is used for training the model, the verification set is used for optimizing the hyper-parameters of the model, and the test set is used for testing the voice recognition effect of the model.

5. The customer service call voice recognition method based on continuous learning of claim 1, wherein: the initial speech recognition model and the 95598 customer service call speech recognition continuous learning model use an end-to-end LAS deep learning structure, comprising 3 parts of a deep learning encoder, an attention mechanism and a deep learning decoder;

the last layer of the deep learning decoder of the initial voice recognition model uses a full connection structure with K output network nodes and is converted into K probability values through a Softmax activation function, wherein K is the total number of characters in a character set which can be output by the initial voice recognition model, and the characters in the character set which can be output by the initial voice recognition model comprise a space character, a start/stop character, an unknown Chinese character identifier and all different Chinese characters contained in the public Chinese voice recognition training set;

the last layer of the deep learning decoder of the 95598 customer service call speech recognition continuous learning model uses a model with K _s The full connection structure of each output network node is converted into K through a Softmax activation function _s A probability value, K _s The method comprises the steps that the total number of characters in a character set which can be output by an s 95598 th customer service call voice recognition continuous learning model is counted, the characters in the character set which can be output by the s 95598 th customer service call voice recognition continuous learning model comprise a space character, a start/stop character, an unknown Chinese character identifier, all different Chinese characters contained in the open Chinese voice recognition training set and all the first s 95598 call service scene training sets, and K _s The total number K of characters in the character set that can be output by the initial speech recognition model is related to the following relationship:

wherein: Δ K ₁ The total number of Chinese characters which do not belong to the public Chinese speech recognition training set in the 1 st 95598 customer service call service scene training set;

for p is more than or equal to 2, delta K _p The total number of Chinese characters which do not belong to the public Chinese speech recognition training set and all the first S-1 95598 conversation service scene training sets in the pth 95598 customer service conversation service scene training set;

for s is more than or equal to 1, the characters corresponding to the first K probability values output by the s 95598-th customer service call voice recognition continuous learning model have the same meanings as the characters corresponding to the K probability values output by the initial voice recognition model;

for s is more than or equal to 1, the(s) 95598 th customer service call speech recognition continuous learning model outputs the (K + 1) th to the (K + delta K) ₁ The probability values correspond to all different Chinese characters which do not belong to the public Chinese speech recognition training set in the 1 st 95598 customer service call service scene training set, and are sequenced according to the sequence of Chinese character ASCII codes from small to large;

for s is more than or equal to 2 and p is more than or equal to 2, the Kth + delta K output by the s 95598 customer service call voice recognition continuous learning model _p-1 To K + Δ K _p The probability values correspond to all different Chinese characters in the p 95598 th customer service call service scene training set, which do not belong to the public Chinese speech recognition training set and all the first p-1 95598 call service scene training sets, and are sorted according to the sequence of the Chinese character ASCII codes from small to large.

6. The continuous learning-based customer service call voice recognition method according to claim 1, characterized in that: the loss function used by the initial speech recognition model is a cross entropy loss function L based on label smoothing _init Defined as follows:

wherein：D _init A set formed by all voice input signals in the public Chinese voice recognition training set;

|D _init is set D _init The number of elements in (1);

x is D _init A speech input signal;

l (x) is the length of the text label corresponding to x;

k is the total number of characters in the character set which can be output by the initial speech recognition model;

q _i，l (x) A quantization soft label for the ith character of the text label corresponding to x, that is, if the ith character of the text label corresponding to x is the ith character in the character set that can be output by the initial speech recognition model, q is _i，l (x) Is 1-epsilon, otherwise q _i，l (x) Is epsilon/(K-1);

ε is a smooth value and is a constant;

after x is input into the initial voice recognition model, the output ith character is the probability value of the ith character in the character set which can be output by the initial voice recognition model;

log is a natural logarithmic function.