CN113012701B

CN113012701B - Identification method, identification device, electronic equipment and storage medium

Info

Publication number: CN113012701B
Application number: CN202110281812.8A
Authority: CN
Inventors: 刘俊帅; 夏光敏; 王进
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2021-03-16
Filing date: 2021-03-16
Publication date: 2024-03-22
Anticipated expiration: 2041-03-16
Also published as: CN113012701A

Abstract

The application provides a recognition method, a device, electronic equipment and a storage medium, wherein a voice recognition error correction model is obtained through training by training data, the training data comprises data for correcting errors of a training text and context information obtained by a punctuation prediction model based on training samples, the training data of the voice recognition error correction model can be richer, the voice recognition error correction model can learn richer context information, and the precision of the voice recognition error correction model is improved. On the basis, the first context information and the second context information based on the word characteristics, the determined word characteristics and the third context information of the word characteristics are input into the voice recognition error correction model, so that the accuracy of error correction of the voice recognition error correction model can be improved. And on the basis that the accuracy of correcting the recognition result is improved, the punctuation prediction model predicts the punctuation of the recognition result with higher accuracy, so that the accuracy of punctuation prediction can be improved.

Description

Identification method, identification device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of speech recognition technologies, and in particular, to a recognition method, a device, an electronic apparatus, and a storage medium.

Background

At present, the recognition result of the voice recognition system may contain some errors, and in order to improve the accuracy of the recognition result, the error correction module may be used to correct the recognition result of the voice recognition system.

However, the error correction accuracy of the error correction module is not high, resulting in a low error correction accuracy.

Disclosure of Invention

The application provides the following technical scheme:

in one aspect, the present application provides an identification method, including:

acquiring word characteristics of each word in the text to be processed, which is recognized by the voice recognition system;

inputting the word characteristics into a punctuation prediction model to obtain first context information of the word characteristics obtained by the punctuation prediction model;

inputting the word characteristics into a voice recognition error correction model to obtain second context information of the word characteristics obtained by the voice recognition error correction model, wherein the voice recognition error correction model is obtained by training with training data, the training data comprises data for correcting errors of training texts and the punctuation prediction model is based on the training samples to obtain the context information;

determining third context information of the word feature based on the first context information and the second context information of the word feature;

And inputting the word characteristics and third context information of the word characteristics into the voice recognition error correction model to obtain a text obtained by the voice recognition error correction model after error correction processing is carried out on the text to be processed.

The determining third context information of the word feature based on the first context information and the second context information of the word feature includes:

and performing splicing processing on the first context information and the second context information of the word characteristics to obtain third context information.

The obtaining third context information of the word feature based on the first context information and the second context information of the word feature includes:

and performing dot product operation processing on the first context information and the second context information of the word feature to obtain third context information.

and inputting the first context information and the second context information of the word features into a first machine learning model for feature fusion to obtain third context information output by the first machine learning model.

The punctuation prediction model comprises a punctuation predictor model and a self-encoder;

the step of inputting the word characteristics into a punctuation prediction model to obtain first context information of the word characteristics obtained by the punctuation prediction model comprises the following steps:

inputting the word characteristics into the self-encoder, and obtaining parameters used by an intermediate layer of the self-encoder when processing first subcontext information of the word characteristics to be processed, wherein the word characteristics to be processed are first word characteristics arranged in front of the word characteristics in the text to be processed;

obtaining a feature to be used based on parameters used when the middle layer of the self-encoder processes the first subcontext information of the feature to be processed and the feature of the word;

inputting the feature to be used into the middle layer of the punctuation predictive sub-model to obtain first context information of the word feature, wherein the middle layer of the punctuation predictive sub-model is used for processing the feature to be used.

The obtaining the feature to be used based on the parameters used when the middle layer of the self-encoder processes the first subcontext information of the feature to be processed and the feature to be used includes:

And multiplying the parameters used by the middle layer of the self-encoder when the first subcontext information of the word feature to be processed is processed with the word feature to obtain the feature to be used.

and inputting parameters used when the middle layer of the self-encoder processes the first subcontext information of the word feature to be processed and the word feature into a second machine learning model for feature fusion to obtain the feature to be used output by the second machine learning model.

Another aspect of the present application provides an identification device, including:

the acquisition module is used for acquiring word characteristics of each word in the text to be processed, which is recognized by the voice recognition system;

the first obtaining module is used for inputting the word characteristics into a punctuation prediction model and obtaining first context information of the word characteristics obtained by the punctuation prediction model;

the second obtaining module is used for inputting the word characteristics into a voice recognition error correction model to obtain second context information of the word characteristics obtained by the voice recognition error correction model, wherein the voice recognition error correction model is obtained by training with training data, the training data comprises data for correcting errors of training texts and the punctuation prediction model is based on the training samples, and the obtained context information is obtained;

A determining module configured to determine third context information of the word feature based on the first context information and the second context information of the word feature;

and the third obtaining module is used for inputting the word characteristics and third context information of the word characteristics into the voice recognition error correction model to obtain a text obtained by the voice recognition error correction model after carrying out error correction on the text to be processed.

A third aspect of the present application provides an electronic device, comprising:

memory and a processor.

A memory for storing at least one set of instructions;

a processor for calling and executing the instruction set in the memory, by executing the instruction set:

A fourth aspect of the present application provides a storage medium storing a computer program for implementing the identification method according to any one of the above, the computer program being executed by a processor to implement the steps of the identification method according to any one of the above.

Compared with the prior art, the beneficial effects of this application are:

in the application, the voice recognition error correction model is obtained by training by using training data, the training data comprises data for correcting errors of training texts and the punctuation prediction model is based on training samples, and the obtained context information can enable the training data of the voice recognition error correction model to be richer, ensure that the voice recognition error correction model can learn richer context information, and improve the accuracy of the voice recognition error correction model. On the basis, third context information of the word features is determined based on the first context information and the second context information of the word features, and the word features and the third context information of the word features are input into the voice recognition error correction model, so that the accuracy of the voice recognition error correction model in error correction can be improved.

And on the basis that the accuracy of correcting the recognition result is improved, the punctuation prediction model predicts the punctuation of the recognition result with higher accuracy, so that the accuracy of punctuation prediction can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

Fig. 1 is a schematic flow chart of an identification method provided in embodiment 1 of the present application;

fig. 2 is a flow chart of an identification method provided in embodiment 2 of the present application;

fig. 3 is a flow chart of an identification method provided in embodiment 3 of the present application;

fig. 4 is a flow chart of an identification method provided in embodiment 4 of the present application;

FIG. 5 is a schematic flow chart of an identification method provided in embodiment 5 of the present application;

fig. 6 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

In order to solve the above-mentioned problems, the present application provides an identification method, and the identification method provided by the present application is described next.

Referring to fig. 1, a flow chart of an identification method provided in embodiment 1 of the present application may be applied to an electronic device, where the product type of the electronic device is not limited, and as shown in fig. 1, the method may include, but is not limited to, the following steps:

step S101, acquiring word characteristics of each word in the text to be processed, which is recognized by the voice recognition system.

The process of obtaining the word characteristics of each word in the text to be processed identified by the speech recognition system may include, but is not limited to:

and extracting word characteristics of each word in the text to be processed when the text to be processed is recognized by the voice recognition system.

Of course, the process of obtaining the word characteristics of each word in the text to be processed, which is recognized by the speech recognition system, may also include:

and searching word characteristics corresponding to each word in the text to be processed, which is recognized by the voice recognition system, from a pre-constructed word characteristic database.

The word characteristic database is formed by extracting word characteristics of a large number of texts and the extracted word characteristics and mapping relations of the words. In this embodiment, the obtaining manner of a large amount of text is not limited, and the text is downloaded from the network; alternatively, the manner in which text recognized by the speech recognition system is obtained may be used as a specific embodiment for obtaining a large amount of text.

Step S102, inputting the word characteristics into the punctuation prediction model, and obtaining first context information of the word characteristics obtained by the punctuation prediction model.

The punctuation prediction model has the capability of carrying out punctuation prediction and context information of word features, specifically, word features are input into the punctuation prediction model, and the punctuation prediction model can obtain punctuation prediction results and first context information of the word features.

Punctuation predictive models can be, but are not limited to: a unidirectional long-short-term memory recurrent neural network model or a bidirectional long-short-term memory recurrent neural network model.

If the punctuation prediction model is a bidirectional long-short-term memory cyclic neural network model, the word features are input into the punctuation prediction model, and the first context information of the word features obtained by the punctuation prediction model can be determined by, but is not limited to, the following formula:

in the above-mentioned formula(s),first context information representing word characteristics of a t-th word in a text to be processed, biLSTM representing a two-way long-short-term memory cyclic neural network model, x ^t+1 Word characteristics representing the t+1st word in the text to be processed, +>First context information representing word characteristics of the t+1st word in the text to be processed.

The word characteristics are input into the punctuation prediction model, and besides the first context information of the word characteristics obtained by the punctuation prediction model, the punctuation prediction result obtained by the punctuation prediction model can be obtained. The punctuation prediction result obtained by the punctuation prediction model can be determined by the following formula:

y _punc ＝softmax(h _punc )

In the above formula, y _punc Representing punctuation prediction results, softmax () represents a probability normalization function, h _punc First context information representing word characteristics in the text to be processed.

Step S103, inputting the word characteristics into the voice recognition error correction model, and obtaining second context information of the word characteristics obtained by the voice recognition error correction model.

In this embodiment, the speech recognition error correction model may be, but is not limited to: a unidirectional long-short-term memory recurrent neural network model or a bidirectional long-short-term memory recurrent neural network model.

Under the condition that the voice recognition error correction model is a bidirectional long-short-term memory cyclic neural network model, inputting word characteristics into the voice recognition error correction model to obtain second context information of the word characteristics obtained by the voice recognition error correction model, wherein the second context information can be determined by adopting the following formula:

in the above formula, the BiLSTM represents the bidirectional long-short-time memory cyclic neural network model and x represents the second context information corresponding to the word characteristics of the t-th word in the text to be processed ^t+1 Word characteristics representing the t +1 st word in the text to be processed,and second context information corresponding to word characteristics of the t+1st word in the text to be processed.

The speech recognition error correction model is obtained by training with training data, wherein the training data comprises data for correcting errors of training texts and context information obtained by the punctuation prediction model based on training samples.

Specifically, the training process of the speech recognition error correction model may include:

s1031, acquiring word characteristics of each word in the training text, punctuation marks of the training text and error correction labels marked for each word.

S1032, inputting the plurality of word features into the punctuation prediction model to obtain punctuation prediction results obtained by the punctuation prediction model and first context information of each word feature.

In this embodiment, parameters of the punctuation prediction model may be obtained by training through a plurality of complete training texts in advance. Of course, the parameters of the punctuation prediction model can also be initially set, and parameters which are not trained by the complete training sample.

S1033, inputting the word features into the voice recognition error correction model to obtain second context information of the word features obtained by the voice recognition error correction model.

S1034, determining third context information of the word feature based on the first context information and the second context information of the word feature.

If the parameters of the punctuation prediction model are obtained by training a plurality of complete training texts in advance, the punctuation prediction model is explained to learn relatively rich context information, therefore, a plurality of word features are input into the punctuation prediction model, the accuracy of the first context information of each word feature obtained by the punctuation prediction model is higher, and the richness and accuracy of the third context information of the word features can be further ensured.

The process of determining third context information for a word feature based on the first context information and the second context information for the word feature may include, but is not limited to:

s10341, performing splicing processing on the first context information and the second context information of the word characteristics to obtain third context information.

Step S10341 will now be described by way of example, for example, if the first context information of the word feature is [ p1, p2, p3, …, pn ], the second context information of the word feature is [ e1, e2, e3, …, en ], and the first context information and the second context information of the word feature are subjected to a concatenation process to obtain third context information of [ p1, p2, p3, …, pn, e1, e2, e3, …, en ].

And splicing the first context information and the second context information of the word characteristics to obtain third context information, wherein the first context information and the second context information can not be lost, and further the training precision of the speech recognition error correction model is guaranteed to be improved.

The process of determining third context information for a word feature based on the first context information and the second context information for the word feature may also include, but is not limited to:

s10342, dot product operation processing is carried out on the first context information and the second context information of the word characteristics, and third context information is obtained.

Dot product operation is carried out on the first context information and the second context of the word characteristics to obtain third context information, so that operation time can be saved, efficiency of obtaining the third context information is improved, and training efficiency is improved while training accuracy of a speech recognition error correction model is improved.

Alternatively, the process of determining the third context information of the word feature based on the first context information and the second context information of the word feature may also include, but is not limited to:

s10343, inputting the first context information and the second context information of the word features into a first machine learning model for feature fusion, and obtaining third context information output by the first machine learning model.

When the first context information and the second context information of the word features are input into the first machine learning model for feature fusion, the first machine learning model and the voice recognition error correction model are trained together, so that the accuracy of training the first machine learning model is ensured, the accuracy of the third context information output by the first machine learning model is further ensured, and the training accuracy of the voice recognition error correction model is improved on the basis of ensuring the richness and accuracy of training data of the voice recognition error correction model.

S1035, inputting the third context information and the word characteristics into the voice recognition error correction model to obtain an error correction result output by the voice recognition error correction model, wherein the error correction result is a result of correcting the word characteristics.

S1036, judging whether the punctuation prediction model and the voice recognition error correction model meet the training ending condition based on the punctuation prediction result, the error correction result of each word characteristic, the punctuation mark of the training text and the error correction label marked for each word.

If not, step S1037 is performed.

In this embodiment, the training ending condition may be set as required, which is not limited in this application. For example, the training end condition may be, but is not limited to: the loss function value of the punctuation prediction model is converged and the loss function value of the speech recognition error correction model is converged; or, the obtained comprehensive loss function value is converged based on the loss function value of the punctuation prediction model and the loss function value of the voice recognition error correction model.

The steps S1031-S1037 can be understood as training the punctuation prediction model while training the speech recognition error correction model, so as to realize the joint learning of the speech recognition error correction model and the punctuation prediction model.

Based on the punctuation prediction result, the error correction result of each word feature, the punctuation sign of the training text and the error correction label marked for each word, a specific implementation process for judging whether the punctuation prediction model and the speech recognition error correction model meet the training ending condition may be:

S10361, determining an error correction loss function value based on the error correction result of the word characteristics and the difference between error correction labels aiming at word labeling;

s10362, determining punctuation loss function values based on differences between punctuation prediction results and punctuation marks of the training text;

and S10363, obtaining a comprehensive loss function value based on the error correction loss function value and the punctuation loss function value.

Deriving a composite loss function value based on the error correction loss function value and the punctuation loss function value may include, but is not limited to:

and adding the error correction loss function value and the punctuation loss function value to obtain a comprehensive loss function value.

Of course, the integrated loss function value is derived based on the error correction loss function value and the punctuation loss function value, and may also include, but is not limited to:

the comprehensive loss function value is calculated by using the following formula:

loss _cp ＝a×loss _ec +b×loss _punc

in the above formula, loss _cp Representing the integrated loss function value, loss _ec Representing the error correction loss function values, a and b represent different weights, and a and b may be set as desired, and the values of a and b are not limited in this application.

S10364, judging whether the comprehensive loss function value is converged.

S1037, updating parameters of the punctuation prediction model and parameters of the speech recognition error correction model, and returning to the step S1031 until the trained end condition is met.

Step S104, third context information of the word feature is determined based on the first context information and the second context information of the word feature.

Third context information of the word feature is determined based on the first context information and the second context information of the word feature such that the third context information contains more context information than the second context information.

Step S105, inputting the word characteristics and the third context information of the word characteristics into the voice recognition error correction model to obtain a text obtained after the voice recognition error correction model performs error correction processing on the text to be processed.

Inputting the word characteristics and the third context information of the word characteristics into the voice recognition error correction model to obtain a text which is obtained by performing error correction processing on the text to be processed by the voice recognition error correction model, wherein the text can be understood as:

and inputting the word characteristics and the third context information of the word characteristics into the voice recognition error correction model to obtain a text obtained after the voice recognition error correction model performs error correction processing on the text to be processed.

The process of inputting each word feature and the third context information of each word feature into the speech recognition error correction model to obtain the text obtained by performing error correction processing on the text to be processed by the speech recognition error correction model may include:

S1051, inputting each word feature and third context information of each word feature into a voice recognition error correction model, and obtaining the context information of the word feature of the (t+1) th word in the text to be processed by adopting the following formula:

wherein,context information representing word characteristics of the (t+1) th word in the text to be processed, biLSTM representing a two-way long-short-term memory cyclic neural network model, x ^t+1 Word characteristics representing the t+1st word in the text to be processed, +>Third context information representing word characteristics of a t-th word in the text to be processed.

S1052, obtaining word characteristics of each word in the text to be processed after error correction processing by adopting the following formula:

y _ec ＝softmax(h _ec )

in the above formula, y _ec Representing word characteristics after error correction processing, softmax () represents a probability normalization operation function, h _ec Contextual information representing word characteristics of words in the text to be processed.

S1053, obtaining the text obtained after the error correction processing of the text to be processed based on the word characteristics of each word in the text to be processed after the error correction processing.

As another alternative embodiment of the present application, referring to fig. 2, a schematic flow chart of an identification method provided in embodiment 2 of the present application is mainly a refinement of the identification method described in the foregoing embodiment 1, and as shown in fig. 2, the method may include, but is not limited to, the following steps:

step S201, word characteristics of each word in the text to be processed recognized by the voice recognition system are obtained.

Step S202, inputting the word characteristics into a punctuation prediction model, and obtaining first context information of the word characteristics obtained by the punctuation prediction model.

Step S203, inputting the word characteristics into the voice recognition error correction model to obtain second context information of the word characteristics obtained by the voice recognition error correction model.

The detailed procedure of steps S201-S203 can be referred to the related description of steps S101-S103 in embodiment 1, and will not be repeated here.

And step S204, performing splicing processing on the first context information and the second context information of the word characteristics to obtain third context information.

Step S204 will now be described by way of example, where, for example, if the first context information of the word feature is [ p1, p2, p3, …, pn ], the second context information of the word feature is [ e1, e2, e3, …, en ], and the first context information and the second context information of the word feature are subjected to a concatenation process to obtain third context information of [ p1, p2, p3, …, pn, e1, e2, e3, …, en ].

Step S204 is a specific implementation of step S104 in example 1.

Step S205, inputting the word characteristics and the third context information of the word characteristics into the voice recognition error correction model to obtain a text obtained after the voice recognition error correction model performs error correction processing on the text to be processed.

The detailed process of step S205 may be referred to the related description of step S105 in embodiment 1, and will not be described herein.

In this embodiment, the first context information and the second context information of the word feature are spliced to obtain the third context information, so that the first context information and the second context information can be ensured not to be lost, and further the accuracy of error correction performed by the speech recognition error correction model is ensured.

As another alternative embodiment of the present application, referring to fig. 3, a schematic flow chart of an identification method provided in embodiment 3 of the present application is mainly a refinement of the identification method described in the foregoing embodiment 1, and as shown in fig. 3, the method may include, but is not limited to, the following steps:

step 301, obtaining word characteristics of each word in the text to be processed, which is recognized by the voice recognition system.

Step S302, inputting the word characteristics into the punctuation prediction model, and obtaining first context information of the word characteristics obtained by the punctuation prediction model.

Step S303, inputting the word characteristics into the voice recognition error correction model to obtain second context information of the word characteristics obtained by the voice recognition error correction model.

The detailed procedure of steps S301 to S303 can be referred to the relevant description of steps S101 to S103 in embodiment 1, and will not be repeated here.

Step S304, dot product operation processing is carried out on the first context information and the second context information of the word characteristics, and third context information is obtained.

Step S304 is a specific implementation of step S104 in example 1.

Step S305, inputting the word characteristics and the third context information of the word characteristics into the voice recognition error correction model to obtain a text obtained after the voice recognition error correction model performs error correction processing on the text to be processed.

The detailed process of step S305 can be referred to the related description of step S105 in embodiment 1, and will not be repeated here.

Dot product operation is carried out on the first context information and the second context of the word characteristics to obtain third context information, so that operation time can be saved, efficiency of obtaining the third context information is improved, accuracy of text obtained after error correction processing is carried out on text to be processed by a speech recognition error correction model is guaranteed, and error correction efficiency is improved.

As another alternative embodiment of the present application, referring to fig. 4, a schematic flow chart of an identification method provided in embodiment 4 of the present application is mainly a refinement of the identification method described in the foregoing embodiment 1, and as shown in fig. 4, the method may include, but is not limited to, the following steps:

step S401, word characteristics of each word in the text to be processed recognized by the voice recognition system are obtained.

Step S402, inputting the word characteristics into the punctuation prediction model, and obtaining first context information of the word characteristics obtained by the punctuation prediction model.

Step S403, inputting the word characteristics into the voice recognition error correction model to obtain second context information of the word characteristics obtained by the voice recognition error correction model.

The detailed process of steps S401 to S403 can be referred to the related description of steps S101 to S103 in embodiment 1, and will not be repeated here.

Step S404, inputting the first context information and the second context information of the word features into the first machine learning model for feature fusion, and obtaining third context information output by the first machine learning model.

Step S404 is a specific implementation of step S104 in example 1.

Step S405, inputting the word characteristics and the third context information of the word characteristics into the voice recognition error correction model to obtain a text obtained after the voice recognition error correction model performs error correction processing on the text to be processed.

The detailed process of step S405 may be referred to the related description of step S105 in embodiment 1, and will not be described herein.

The first context information and the second context information of the word features are input into the first machine learning model for feature fusion, the first machine learning model outputs the third context information, accuracy of the third context information can be guaranteed, the accurate third context information is input into the voice recognition error correction model, and accuracy of error correction of the voice recognition error correction model can be improved.

As another alternative embodiment of the present application, mainly a refinement of the identification method described in the foregoing embodiment 1, in this embodiment, the punctuation prediction model may include a punctuation predictor model and a self-encoder. In the case where the punctuation prediction model includes a punctuation predictor model and a self-encoder, the training process of the speech recognition error correction model may include the steps of:

s2001, acquiring word characteristics of each word in the training text, punctuation marks of the training text and error correction labels marked for each word.

The detailed process of step S2001 can be referred to the description of step S1031 in embodiment 1, and will not be repeated here.

S2002, inputting word characteristics into a self-encoder, and obtaining parameters used by an intermediate layer of the self-encoder when processing first subcontext information of the word characteristics to be processed, wherein the word characteristics to be processed are first word characteristics arranged before the word characteristics in training texts.

The self-encoder can be understood as: a machine learning model is learned of characteristic information (e.g., punctuation distribution information) of an input object in its feature space.

The self-encoder may be, but is not limited to: a unidirectional long-short-term memory recurrent neural network model or a bidirectional long-short-term memory recurrent neural network model.

When the self-encoder is a unidirectional long-short-time memory cyclic neural network model or a bidirectional long-short-time memory cyclic neural network model, the middle layer of the self-encoder can be understood as: a unidirectional long-short-term memory cyclic neural network model or a hidden layer of a bidirectional long-short-term memory cyclic neural network model.

In this embodiment, the middle layer of the self-encoder may process the word feature to obtain the first subcontext information of the word feature by using the following formula:

in the above-mentioned formula(s),representing parameters, x, used in processing word features from intermediate layers of an encoder ^t+1 Word characteristics representing the t+1st word in the training text,/for>Representing parameters used when processing the first subcontext information of a word feature from the middle layer of the encoder,/->First subcontext information representing word characteristics of the (t+1) th word in the training text,/and (c)>First subcontext information representing word characteristics of a t-th word in the training text.

It will be appreciated that the word feature to be processed is one of the word features in the training text, and that the first subcontext information for the word feature to be processed is also calculated using the above formula.

Step S2003, obtaining the feature to be used based on parameters and word features used when the intermediate layer of the self-encoder processes the first subcontext information of the feature to be processed.

Based on parameters and word characteristics used when the intermediate layer of the self-encoder processes the first subcontext information of the word characteristics to be processed, the obtained characteristics to be used can be understood as: and mapping the word characteristics to a characteristic space meeting the requirements of the punctuation predictive sub-model by using parameters used when the first subcontext information of the word characteristics to be processed is processed by using an intermediate layer of the self-encoder to obtain the characteristics to be used, so that the characteristics to be used meet the characteristic space meeting the requirements of the punctuation predictive sub-model, and the characteristics information (such as punctuation distribution information) of the word characteristics in the original characteristic space is included.

Based on parameters used in processing word features from the middle layer of the encoder and the word features, obtaining features to be used may include, but is not limited to:

Of course, obtaining the feature to be used based on the parameters and the word features used when the intermediate layer of the self-encoder processes the first subcontext information of the feature to be processed may also include:

and inputting parameters and word characteristics used when the middle layer of the self-encoder processes the first subcontext information of the word characteristics to be processed into a second machine learning model for carrying out characteristic fusion, so as to obtain the characteristics to be used output by the second machine learning model.

Step S2004, inputting the feature to be used into the middle layer of the punctuation predictive sub-model to obtain first context information of the word feature, wherein the middle layer of the punctuation predictive sub-model is used for processing the feature to be used.

Inputting the feature to be used into the middle layer of the punctuation predictive sub-model to obtain the first context information of the word feature, wherein the first context information of the word feature is obtained by processing the feature to be used by the middle layer of the punctuation predictive sub-model, and can be understood as:

inputting the feature to be used into the middle layer of the punctuation predictive sub-model, and processing the feature to be used by the middle layer of the punctuation predictive sub-model by using the following formula to obtain first context information of the word feature:

In the above-mentioned formula(s),parameters used when processing word features by middle layer representing punctuation predictor model,/>Representing a feature to be usedSyndrome of deficiency of kidney qi>First context information representing word characteristics of the t+1st word in the training text,/and/or>First context information representing word characteristics of the t-th word in the training text,/for>The parameter used in processing the first context information of the word feature by the middle layer representing the punctuation predictor model, sigmoid () represents a mathematical function.

Because the to-be-used features conform to the feature space required by the punctuation predictive sub-model and contain the characteristic information of the word features in the original feature space, the to-be-used features are input into the middle layer of the punctuation predictive sub-model, so that the middle layer of the punctuation predictive sub-model can be used for processing the to-be-used features without losing the characteristic information of the word features in the original feature space, and the accuracy of the first context information of the word features is ensured.

Steps S2002-S2004 are a specific embodiment of step S1032 in example 1.

In this embodiment, parameters of the punctuation predictor model may be obtained by training through a plurality of complete training texts in advance. Of course, the parameters of the punctuation predictor model may also be initially set, parameters that have not been trained by the complete training sample.

S2005, inputting the word characteristics into the voice recognition error correction model, and obtaining second context information of each word characteristic obtained by the voice recognition error correction model.

S2006, third context information of the word feature is determined based on the first context information and the second context information of the word feature.

If the parameters of the punctuation predictive sub-model are obtained by training a plurality of complete training texts in advance, the punctuation predictive model is explained to learn the relatively rich context information, so that the feature to be used is input into the punctuation predictive sub-model, the accuracy of the first context information of the feature to be used obtained by the obtained punctuation predictive sub-model is higher, and the richness and the accuracy of the third context information of the word feature to be used can be further ensured.

S2007, inputting the third context information and the word characteristics into a voice recognition error correction model to obtain an error correction result output by the voice recognition error correction model, wherein the error correction result is a result of correcting the word characteristics.

S2008, determining a self-encoder loss function value based on the first subcontext information of each word feature obtained from the encoder.

The process of determining the self-encoder loss function value based on the first subcontext information for each word feature derived from the encoder may include, but is not limited to:

The self-encoder loss function value is calculated using the following formula:

in the above formula, L _ae Represents the self-encoder loss function value, h _ae First subcontext information representing word features, MLE () represents a maximum likelihood estimation function.

And S2009, determining a punctuation predictor model loss function value based on the first context information of each word characteristic.

The process of determining punctuation predictor model loss function values based on the first context information for each word feature may include:

calculating a punctuation predictor model loss function value using the following formula:

L _punc ＝MLE(h _punc )

in the above formula, L _punc Representing punctuation predictor model loss function value, h _punc First context information representing word features, MLE () represents a maximum likelihood estimation function.

S2010, obtaining a punctuation prediction loss function value based on the self-encoder loss function value and the punctuation predictor model loss function value.

Deriving punctuation prediction loss function values based on the self-encoder loss function values and punctuation predictor model loss function values may include, but is not limited to:

and adding the self-encoder loss function value and the punctuation prediction sub-model loss function value to obtain a punctuation prediction loss function value.

Based on the self-encoder loss function value and the punctuation predictor model loss function value, another embodiment of obtaining the punctuation prediction loss function value may be:

Calculating punctuation prediction loss function values using the following formula:

L＝γL _punc +(1-γ)L _ac ；

in the above formula, L _punc Representing punctuation predictor model loss function value, L _ae And the gamma represents the super parameter, the value range of gamma is 0-1, and L represents the punctuation predictive loss function value.

And S2011, determining an error correction loss function value based on the error correction result of the word characteristics and the difference between error correction labels aiming at word labeling.

And S2012, obtaining a comprehensive loss function value based on the error correction loss function value and the punctuation loss function value.

The detailed process of step S2012 can be referred to the related description of step S10363 in embodiment 1, and is not repeated here.

S2013, judging whether the comprehensive loss function value is converged or not.

If not, go to step S2014.

Steps S2008-S2013 are a specific implementation of step S1036 in example 1.

S2014, updating parameters of the punctuation prediction model and parameters of the speech recognition error correction model, and returning to the execution step S2001 until the trained end condition is met.

In this embodiment, on the basis of ensuring the accuracy of the first context information of the word feature, the accuracy of the third context information of the word feature can be ensured, so that the training accuracy of the speech recognition error correction model is ensured.

Corresponding to the foregoing training process of the self-encoder, punctuation predictor model and speech recognition error correction model, referring to fig. 5, a flow chart of a recognition method provided in embodiment 5 of the present application is shown in fig. 5, where the embodiment is mainly a refinement of the recognition method described in embodiment 1, and the method may include, but is not limited to, the following steps:

step S501, obtaining word characteristics of each word in the text to be processed, which is recognized by the speech recognition system.

The detailed process of step S501 may be referred to the related description of step S101 in embodiment 1, and will not be described herein.

Step S502, inputting word characteristics into a self-encoder, and obtaining parameters used by an intermediate layer of the self-encoder when processing first subcontext information of the word characteristics to be processed, wherein the word characteristics to be processed are first word characteristics arranged before the word characteristics in a text to be processed.

Step S503, obtaining the feature to be used based on parameters and word features used when the intermediate layer of the self-encoder processes the first subcontext information of the feature to be processed.

Based on parameters used when the intermediate layer of the self-encoder processes the first subcontext information of the word feature to be processed and the word feature, obtaining the feature to be used, including:

Step S504, inputting the feature to be used into the middle layer of the punctuation predictive sub-model to obtain first context information of the word feature, wherein the middle layer of the punctuation predictive sub-model is used for processing the feature to be used.

Steps S502-S504 are a specific implementation of step S102 in example 1.

Step S505, inputting the word characteristics into the voice recognition error correction model, and obtaining second context information of the word characteristics obtained by the voice recognition error correction model.

Step S506, third context information of the word feature is determined based on the first context information and the second context information of the word feature.

And S507, inputting the word characteristics and the third context information of the word characteristics into the voice recognition error correction model to obtain a text obtained after the voice recognition error correction model performs error correction processing on the text to be processed.

The detailed procedure of steps S505-S507 can be referred to the relevant description of steps S103-S105 in embodiment 1, and will not be repeated here.

In this embodiment, since the feature to be used meets the feature space required by the punctuation prediction sub-model and includes the characteristic information of the word feature in the original feature space, the feature to be used is input into the middle layer of the punctuation prediction sub-model, so that the middle layer of the punctuation prediction sub-model can be ensured to process the feature to be used without losing the characteristic information of the word feature in the original feature space, the accuracy of the first context information of the word feature is ensured, and the accuracy of error correction of the speech recognition error correction model is further improved.

Corresponding to the embodiment of the identification method provided by the application, the application also provides an embodiment of the electronic equipment applying the identification method.

As shown in fig. 6, which is a schematic structural diagram of an embodiment 1 of an electronic device provided in the present application, the electronic device may include the following structures:

memory 100 and processor 200.

A memory 100 for storing at least one set of instructions;

a processor 200 for calling and executing the instruction set in the memory 100, by executing the instruction set:

Corresponding to the embodiment of the identification method provided by the application, the application also provides an embodiment of the identification device.

In this embodiment, the identifying device may include:

In this embodiment, the determining module may specifically be configured to:

performing splicing processing on the first context information and the second context information of the word characteristics to obtain third context information;

or, dot product operation processing is carried out on the first context information and the second context information of the word characteristics, so that third context information is obtained;

or, inputting the first context information and the second context information of the word features into a first machine learning model for feature fusion to obtain third context information output by the first machine learning model.

In this embodiment, the punctuation prediction model may include a punctuation predictor model and a self-encoder;

accordingly, the first obtaining module may be specifically configured to:

inputting word characteristics into the self-encoder, and obtaining parameters used by an intermediate layer of the self-encoder when processing first subcontext information of the word characteristics to be processed, wherein the word characteristics to be processed are first word characteristics arranged before the word characteristics in a text to be processed;

Based on parameters and word characteristics used when the intermediate layer of the self-encoder processes the first subcontext information of the word characteristics to be processed, obtaining the characteristics to be used;

In this embodiment, the process of obtaining the feature to be used by the first obtaining module based on the parameter and the word feature used when the intermediate layer of the self-encoder processes the first subcontext information of the feature to be processed may specifically be:

multiplying the parameters used by the middle layer of the self-encoder when the first subcontext information of the word feature to be processed is processed with the word feature to obtain the feature to be used;

or, inputting parameters and word characteristics used when the middle layer of the self-encoder processes the first subcontext information of the word characteristics to be processed into a second machine learning model for performing characteristic fusion, so as to obtain the characteristics to be used output by the second machine learning model.

Corresponding to the above-mentioned embodiment of the identification method provided by the present application, the present application further provides an embodiment of a storage medium.

In this embodiment, a storage medium stores a computer program for implementing the identification method according to any one of the foregoing embodiments, where the computer program is executed by a processor to implement the steps of the identification method according to any one of the foregoing embodiments.

It should be noted that, in each embodiment, the differences from the other embodiments are emphasized, and the same similar parts between the embodiments are referred to each other. For the apparatus class embodiments, the description is relatively simple as it is substantially similar to the method embodiments, and reference is made to the description of the method embodiments for relevant points.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present application.

From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in the embodiments or some parts of the embodiments of the present application.

The foregoing has described in detail a control method, apparatus and electronic device provided in the present application, and specific examples have been applied to illustrate the principles and embodiments of the present application, where the foregoing examples are only used to help understand the method and core idea of the present application; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. An identification method, comprising:

inputting the word characteristics into a voice recognition error correction model to obtain second context information of the word characteristics obtained by the voice recognition error correction model, wherein the voice recognition error correction model is obtained by training with training data, and the training data comprises data for correcting errors of training texts and the context information obtained by the punctuation prediction model based on training samples;

2. The method of claim 1, the determining third context information for the word feature based on the first context information and the second context information for the word feature comprising:

3. The method of claim 1, the deriving third context information for the word feature based on the first context information and the second context information for the word feature, comprising:

4. The method of claim 1, the deriving third context information for the word feature based on the first context information and the second context information for the word feature, comprising:

5. The method of claim 1, the punctuation prediction model comprising a punctuation predictor model and a self-encoder;

6. The method of claim 5, the obtaining a feature to be used based on parameters used by the intermediate layer of the self-encoder in processing the first subcontext information of the feature to be processed and the feature to be used, comprising:

7. The method of claim 5, the obtaining a feature to be used based on parameters used by the intermediate layer of the self-encoder in processing the first subcontext information of the feature to be processed and the feature to be used, comprising:

8. An identification device, comprising:

the second obtaining module is used for inputting the word characteristics into a voice recognition error correction model to obtain second context information of the word characteristics obtained by the voice recognition error correction model, wherein the voice recognition error correction model is obtained by training with training data, the training data comprises data for correcting errors of training texts and the punctuation prediction model is based on training samples, and the obtained context information is obtained;

9. An electronic device, comprising:

a memory and a processor;

a memory for storing at least one set of instructions;

10. A storage medium storing a computer program implementing the identification method according to any one of claims 1-7, the computer program being executed by a processor to implement the steps of the identification method according to any one of claims 1-7.