CN113012701A - Identification method, identification device, electronic equipment and storage medium - Google Patents

Identification method, identification device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113012701A
CN113012701A CN202110281812.8A CN202110281812A CN113012701A CN 113012701 A CN113012701 A CN 113012701A CN 202110281812 A CN202110281812 A CN 202110281812A CN 113012701 A CN113012701 A CN 113012701A
Authority
CN
China
Prior art keywords
word
context information
error correction
feature
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110281812.8A
Other languages
Chinese (zh)
Other versions
CN113012701B (en
Inventor
刘俊帅
夏光敏
王进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN202110281812.8A priority Critical patent/CN113012701B/en
Publication of CN113012701A publication Critical patent/CN113012701A/en
Application granted granted Critical
Publication of CN113012701B publication Critical patent/CN113012701B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Abstract

The application provides a recognition method, a recognition device, an electronic device and a storage medium, wherein a speech recognition error correction model is obtained by training with training data, the training data comprises data for correcting a training text and context information obtained by a punctuation prediction model based on a training sample, the training data of the speech recognition error correction model can be richer, the speech recognition error correction model can be ensured to learn richer context information, and the precision of the speech recognition error correction model is improved. On the basis, the first context information and the second context information based on the word characteristics, the determined word characteristics and the third context information based on the word characteristics are input into the speech recognition error correction model, so that the accuracy of the speech recognition error correction model can be improved. And on the basis of improving the accuracy of error correction of the identification result, the punctuation prediction model carries out punctuation prediction on the identification result with higher accuracy, so that the accuracy of punctuation prediction can be improved.

Description

Identification method, identification device, electronic equipment and storage medium
Technical Field
The present application relates to the field of speech recognition technologies, and in particular, to a recognition method, a recognition device, an electronic device, and a storage medium.
Background
At present, the recognition result of the speech recognition system may contain some errors, and in order to improve the accuracy of the recognition result, the error correction module may be used to correct the recognition result of the speech recognition system.
However, the error correction module has low error correction precision, which results in low error correction accuracy.
Disclosure of Invention
The application provides the following technical scheme:
one aspect of the present application provides an identification method, including:
acquiring word characteristics of each word in a text to be processed, which is identified by a voice identification system;
inputting the word features into a punctuation prediction model to obtain first context information of the word features obtained by the punctuation prediction model;
inputting the word features into a speech recognition error correction model to obtain second context information of the word features obtained by the speech recognition error correction model, wherein the speech recognition error correction model is obtained by training with training data, and the training data comprises data for correcting the training text and context information obtained by the punctuation prediction model based on the training samples;
determining third context information of the word feature based on the first context information and the second context information of the word feature;
and inputting the word characteristics and the third context information of the word characteristics into the voice recognition error correction model to obtain a text obtained by performing error correction processing on the text to be processed by the voice recognition error correction model.
Determining third context information of the word feature based on the first context information and the second context information of the word feature, including:
and splicing the first context information and the second context information of the word features to obtain third context information.
Obtaining third context information of the word feature based on the first context information and the second context information of the word feature, including:
and performing dot product operation processing on the first context information and the second context information of the word features to obtain third context information.
Obtaining third context information of the word feature based on the first context information and the second context information of the word feature, including:
and inputting the first context information and the second context information of the word features into a first machine learning model for feature fusion to obtain third context information output by the first machine learning model.
The punctuation prediction model comprises a punctuation prediction submodel and a self-encoder;
the inputting the word features into a punctuation prediction model to obtain first context information of the word features obtained by the punctuation prediction model includes:
inputting the word features into the self-encoder, and obtaining parameters used when a middle layer of the self-encoder processes first sub-context information of word features to be processed, wherein the word features to be processed are first word features arranged in front of the word features in the text to be processed;
obtaining a feature to be used based on a parameter used when the middle layer of the self-encoder processes the first subcontext information of the feature of the word to be processed and the word feature;
inputting the feature to be used into an intermediate layer of the punctuation prediction submodel, and obtaining first context information of the word feature by processing the feature to be used by the intermediate layer of the punctuation prediction submodel.
The obtaining of the feature to be used based on the parameters used when the middle layer of the self-encoder processes the first subcontext information of the feature to be processed and the word feature comprises:
and multiplying the parameters used when the middle layer of the self-encoder processes the first subcontext information of the word feature to be processed by the word feature to obtain the feature to be used.
The obtaining of the feature to be used based on the parameters used when the middle layer of the self-encoder processes the first subcontext information of the feature to be processed and the word feature comprises:
and inputting parameters used when the middle layer of the self-encoder processes the first subcontext information of the word features to be processed and the word features into a second machine learning model for feature fusion to obtain the features to be used output by the second machine learning model.
Another aspect of the present application provides an identification apparatus, including:
the acquisition module is used for acquiring the word characteristics of each word in the text to be processed, which is identified by the voice recognition system;
the first obtaining module is used for inputting the word features into a punctuation prediction model and obtaining first context information of the word features obtained by the punctuation prediction model;
a second obtaining module, configured to input the word feature into a speech recognition error correction model, and obtain second context information of the word feature, where the second context information is obtained by the speech recognition error correction model, the speech recognition error correction model is obtained by training with training data, and the training data includes data for correcting a training text and context information obtained by the punctuation prediction model based on the training sample;
a determining module, configured to determine third context information of the word feature based on the first context information and the second context information of the word feature;
and the third obtaining module is used for inputting the word characteristics and third context information of the word characteristics into the voice recognition error correction model to obtain a text obtained after the voice recognition error correction model performs error correction processing on the text to be processed.
A third aspect of the present application provides an electronic device comprising:
a memory and a processor.
A memory for storing at least one set of instructions;
a processor for calling and executing the set of instructions in the memory, by executing the set of instructions:
acquiring word characteristics of each word in a text to be processed, which is identified by a voice identification system;
inputting the word features into a punctuation prediction model to obtain first context information of the word features obtained by the punctuation prediction model;
inputting the word features into a speech recognition error correction model to obtain second context information of the word features obtained by the speech recognition error correction model, wherein the speech recognition error correction model is obtained by training with training data, and the training data comprises data for correcting the training text and context information obtained by the punctuation prediction model based on the training samples;
determining third context information of the word feature based on the first context information and the second context information of the word feature;
and inputting the word characteristics and the third context information of the word characteristics into the voice recognition error correction model to obtain a text obtained by performing error correction processing on the text to be processed by the voice recognition error correction model.
A fourth aspect of the present application provides a storage medium storing a computer program for implementing the identification method according to any one of the above, the computer program being executed by a processor for implementing the steps of the identification method according to any one of the above.
Compared with the prior art, the beneficial effect of this application is:
in the application, the speech recognition error correction model is obtained by training with training data, the training data comprises data for correcting the training text and context information obtained by the punctuation prediction model based on the training sample, so that the training data of the speech recognition error correction model can be richer, the speech recognition error correction model can learn richer context information, and the precision of the speech recognition error correction model is improved. On the basis, third context information of the word features is determined based on the first context information and the second context information of the word features, and the word features and the third context information of the word features are input into the speech recognition error correction model, so that the accuracy of error correction of the speech recognition error correction model can be improved.
And on the basis of improving the accuracy of error correction of the identification result, the punctuation prediction model carries out punctuation prediction on the identification result with higher accuracy, so that the accuracy of punctuation prediction can be improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
Fig. 1 is a schematic flow chart of an identification method provided in embodiment 1 of the present application;
fig. 2 is a schematic flowchart of an identification method provided in embodiment 2 of the present application;
fig. 3 is a schematic flowchart of an identification method provided in embodiment 3 of the present application;
fig. 4 is a schematic flowchart of an identification method provided in embodiment 4 of the present application;
fig. 5 is a schematic flowchart of an identification method provided in embodiment 5 of the present application;
fig. 6 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In order to solve the above problem, the present application provides an identification method, and the identification method provided by the present application is described next.
Referring to fig. 1, a schematic flow chart of an identification method provided in embodiment 1 of the present application is shown, where the identification method provided in the present application may be applied to an electronic device, and the present application does not limit a product type of the electronic device, and as shown in fig. 1, the method may include, but is not limited to, the following steps:
step S101, obtaining word characteristics of each word in the text to be processed, which is recognized by the voice recognition system.
The process of obtaining the word feature of each word in the text to be processed recognized by the speech recognition system may include, but is not limited to:
when the speech recognition system recognizes the text to be processed, the word features of each word in the text to be processed are extracted.
Of course, the process of obtaining the word feature of each word in the text to be processed, which is recognized by the speech recognition system, may also include:
and searching word characteristics corresponding to each word in the text to be processed identified by the voice recognition system from a pre-constructed word characteristic database.
The word feature database is formed by extracting word features of a large number of texts and mapping relations between the extracted word features and words. In the embodiment, the method for obtaining a large number of texts is not limited, and the texts are downloaded from the network; alternatively, the manner of obtaining the text recognized by the speech recognition system may be taken as a specific implementation of obtaining a large amount of text.
Step S102, inputting the word features into the punctuation prediction model, and obtaining first context information of the word features obtained by the punctuation prediction model.
The punctuation prediction model has the capability of punctuation prediction and context information of word features, and particularly, the word features are input into the punctuation prediction model, and the punctuation prediction model can obtain a punctuation prediction result and first context information of the word features.
The punctuation prediction model may be, but is not limited to: the one-way long-time and short-time memory cyclic neural network model or the two-way long-time and short-time memory cyclic neural network model.
If the punctuation prediction model is a bidirectional long-and-short-term memory recurrent neural network model, the word features are input into the punctuation prediction model, and the first context information of the word features obtained by the punctuation prediction model can be determined by, but is not limited to, the following formula:
Figure BDA0002978961260000061
in the above-mentioned formula,
Figure BDA0002978961260000062
representing first context information of word characteristics of the t-th word in the text to be processed, BilSTM representing a bidirectional long-and-short time memory recurrent neural network model, xt+1The word characteristics of the t +1 th word in the text to be processed are represented,
Figure BDA0002978961260000063
and the first context information represents the word characteristics of the t +1 th word in the text to be processed.
The word features are input into the punctuation prediction model, and the punctuation prediction result obtained by the punctuation prediction model can be obtained in addition to the first context information of the word features obtained by the punctuation prediction model. The punctuation prediction result obtained by the punctuation prediction model can be determined by the following formula:
ypunc=softmax(hpunc)
in the above formula, ypuncRepresenting punctuation prediction results, softmax () representing a probability normalization function, hpuncFirst context information representing word features in the text to be processed.
And step S103, inputting the word characteristics into the voice recognition error correction model, and obtaining second context information of the word characteristics obtained by the voice recognition error correction model.
In this embodiment, the speech recognition error correction model may be, but is not limited to: the one-way long-time and short-time memory cyclic neural network model or the two-way long-time and short-time memory cyclic neural network model.
Under the condition that the speech recognition error correction model is a bidirectional long-time memory recurrent neural network model, the word characteristics are input into the speech recognition error correction model, second context information of the word characteristics obtained by the speech recognition error correction model is obtained, and the second context information can be determined by adopting the following formula:
Figure BDA0002978961260000071
in the formula, second context information corresponding to the word characteristics of the tth word in the text to be processed is represented, BilSTM represents a bidirectional long-and-short-term memory recurrent neural network model, and xt+1The word characteristics of the t +1 th word in the text to be processed are represented,
Figure BDA0002978961260000072
and second context information corresponding to the word characteristics of the t +1 th word in the text to be processed is represented.
The speech recognition error correction model is obtained by training by using training data, the training data comprises data for correcting the error of the training text and context information obtained by the punctuation prediction model based on the training samples.
Specifically, the training process of the speech recognition error correction model may include:
and S1031, obtaining word features of each word in the training text, punctuation marks of the training text and error correction labels labeled for each word.
S1032, inputting the plurality of word features into the punctuation prediction model, and obtaining punctuation prediction results obtained by the punctuation prediction model and first context information of the word features.
In this embodiment, the parameters of the punctuation prediction model may be obtained by training a plurality of complete training texts in advance. Of course, the parameters of the punctuation prediction model may also be initially set, parameters that have not been trained by a complete training sample.
S1033, inputting the plurality of word features into the speech recognition error correction model, and obtaining second context information of each word feature obtained by the speech recognition error correction model.
S1034, determining third context information of the word features based on the first context information and the second context information of the word features.
If the parameters of the punctuation prediction model are obtained by training a plurality of complete training texts in advance, the punctuation prediction model is proved to have learned richer context information, so that a plurality of word features are input into the punctuation prediction model, the accuracy of the first context information of each word feature obtained by the obtained punctuation prediction model is higher, and the richness and the accuracy of the third context information of the word features can be further ensured.
The process of determining the third context information of the word feature based on the first context information and the second context information of the word feature may include, but is not limited to:
s10341, the first context information and the second context information of the word features are spliced to obtain third context information.
For example, step S10341 is described, where for example, if the first context information of the word feature is [ p1, p2, p3, …, pn ], the second context information of the word feature is [ e1, e2, e3, …, en ], and the first context information and the second context information of the word feature are subjected to a concatenation process to obtain third context information [ p1, p2, p3, …, pn, e1, e2, e3, …, en ].
And splicing the first context information and the second context information of the word characteristics to obtain third context information, wherein the first context information and the second context information are not lost, and the training precision of the speech recognition error correction model is further ensured to be improved.
The process of determining the third context information of the word feature based on the first context information and the second context information of the word feature may also include, but is not limited to:
s10342, performing dot product operation processing on the first context information and the second context information of the word features to obtain third context information.
And performing dot product operation processing on the first context information and the second context of the word features to obtain third context information, so that the operation time can be saved, the efficiency of obtaining the third context information is improved, and the training efficiency is improved while the training precision of the speech recognition error correction model is improved.
Alternatively, the process of determining the third context information of the word feature based on the first context information and the second context information of the word feature may also include, but is not limited to:
s10343, inputting the first context information and the second context information of the word feature into the first machine learning model for feature fusion, and obtaining third context information output by the first machine learning model.
When the first context information and the second context information of the word features are input into the first machine learning model for feature fusion, the first machine learning model and the voice recognition error correction model are trained together, the accuracy of the training of the first machine learning model is ensured, the accuracy of the third context information output by the first machine learning model is further ensured, and the training accuracy of the voice recognition error correction model is improved on the basis of ensuring the richness and the accuracy of training data of the voice recognition error correction model.
And S1035, inputting the third context information and the word characteristics into the voice recognition error correction model, and obtaining an error correction result output by the voice recognition error correction model, wherein the error correction result is a result of correcting the word characteristics.
S1036, judging whether the punctuation prediction model and the voice recognition error correction model meet the training end conditions or not based on the punctuation prediction result, the error correction result of each word feature, punctuation symbols of the training text and the error correction label marked for each word.
If not, step S1037 is performed.
In this embodiment, the training end condition may be set as needed, and is not limited in this application. For example, the end-of-training condition may be, but is not limited to: the loss function value of the punctuation prediction model is converged and the loss function value of the speech recognition error correction model is converged; or, the obtained comprehensive loss function value is converged based on the loss function value of the punctuation prediction model and the loss function value of the speech recognition error correction model.
Steps S1031 to S1037 may be understood as training the punctuation prediction model while training the speech recognition error correction model, so as to implement joint learning of the speech recognition error correction model and the punctuation prediction model.
Based on the punctuation prediction result, the error correction result of each word feature, the punctuation marks of the training text and the error correction label labeled for each word, a specific implementation process for judging whether the punctuation prediction model and the speech recognition error correction model meet the training end condition can be as follows:
s10361, determining an error correction loss function value based on the error correction result of the word features and the difference between error correction labels marked for the words;
s10362, determining a punctuation loss function value based on the difference between punctuation prediction results and punctuation symbols of the training text;
s10363, obtaining a comprehensive loss function value based on the error correction loss function value and the punctuation loss function value.
The resulting composite loss function value is based on the error correction loss function value and the punctuation loss function value and may include, but is not limited to:
and adding the error correction loss function value and the punctuation loss function value to obtain a comprehensive loss function value.
Of course, the synthetic loss function value is obtained based on the error correction loss function value and the punctuation loss function value, and may include, but is not limited to:
and calculating to obtain a comprehensive loss function value by using the following formula:
losscp=a×lossec+b×losspunc
in the above formula, losscpRepresenting the value of the combined loss function, lossecThe values of the error correction loss function are shown, a and b are different weights, a and b can be set according to needs, and the values of a and b are not limited in the application.
S10364, judging whether the comprehensive loss function value is converged.
And S1037, updating the parameters of the punctuation prediction model and the parameters of the speech recognition error correction model, and returning to execute the step S1031 until the training end condition is met.
And step S104, determining third context information of the word features based on the first context information and the second context information of the word features.
And determining third context information of the word features based on the first context information and the second context information of the word features, so that the third context information contains more context information than the second context information.
And S105, inputting the word characteristics and the third context information of the word characteristics into the voice recognition error correction model, and obtaining a text obtained after the voice recognition error correction model performs error correction on the text to be processed.
The word features and the third context information of the word features are input to the speech recognition error correction model to obtain a text obtained by the speech recognition error correction model after performing error correction processing on the text to be processed, which can be understood as follows:
and inputting the word characteristics and the third context information of the word characteristics into the voice recognition error correction model to obtain a text obtained by the voice recognition error correction model after error correction processing is carried out on the text to be processed.
The process of inputting the word features and the third context information of the word features into the speech recognition error correction model to obtain a text obtained by the speech recognition error correction model after performing error correction processing on the text to be processed may include:
s1051, inputting the word characteristics and the third context information of the word characteristics into a speech recognition error correction model, and obtaining the context information of the word characteristics of the t +1 th word in the text to be processed by adopting the following formula:
Figure BDA0002978961260000101
wherein the content of the first and second substances,
Figure BDA0002978961260000102
to indicate a waitProcessing context information of word features of t +1 th word in text, BiLSTM representing bidirectional long-and-short time memory recurrent neural network model, xt+1The word characteristics of the t +1 th word in the text to be processed are represented,
Figure BDA0002978961260000103
and third context information representing word characteristics of the t-th word in the text to be processed.
S1052, obtaining the word characteristics of each word in the text to be processed after error correction processing by adopting the following formula:
yec=softmax(hec)
in the above formula, yecRepresenting the word features after error correction, softmax () representing the probability normalization function, hecContext information representing word characteristics of words in the text to be processed.
S1053, based on the word characteristics of each word in the text to be processed after error correction, obtaining the text obtained after error correction of the text to be processed.
In the application, the speech recognition error correction model is obtained by training with training data, the training data comprises data for correcting the training text and context information obtained by the punctuation prediction model based on the training sample, so that the training data of the speech recognition error correction model can be richer, the speech recognition error correction model can learn richer context information, and the precision of the speech recognition error correction model is improved. On the basis, third context information of the word features is determined based on the first context information and the second context information of the word features, and the word features and the third context information of the word features are input into the speech recognition error correction model, so that the accuracy of error correction of the speech recognition error correction model can be improved.
And on the basis of improving the accuracy of error correction of the identification result, the punctuation prediction model carries out punctuation prediction on the identification result with higher accuracy, so that the accuracy of punctuation prediction can be improved.
As another alternative embodiment of the present application, referring to fig. 2, a schematic flow chart of an identification method provided in embodiment 2 of the present application is provided, and this embodiment mainly relates to a refinement scheme of the identification method described in embodiment 1 above, as shown in fig. 2, the method may include, but is not limited to, the following steps:
step S201, acquiring word characteristics of each word in the text to be processed, which is recognized by the voice recognition system.
Step S202, inputting the word features into the punctuation prediction model, and obtaining first context information of the word features obtained by the punctuation prediction model.
Step S203, inputting the word characteristics into the speech recognition error correction model, and obtaining second context information of the word characteristics obtained by the speech recognition error correction model.
The speech recognition error correction model is obtained by training by using training data, the training data comprises data for correcting the error of the training text and context information obtained by the punctuation prediction model based on the training samples.
The detailed processes of steps S201 to S203 can refer to the related descriptions of steps S101 to S103 in embodiment 1, and are not described herein again.
And S204, splicing the first context information and the second context information of the word features to obtain third context information.
For example, if the first context information of the word feature is [ p1, p2, p3, …, pn ], the second context information of the word feature is [ e1, e2, e3, …, en ], the first context information and the second context information of the word feature are spliced to obtain third context information [ p1, p2, p3, …, pn, e1, e2, e3, …, en ].
Step S204 is a specific implementation manner of step S104 in example 1.
Step S205, inputting the word characteristics and the third context information of the word characteristics into the speech recognition error correction model, and obtaining a text obtained after the speech recognition error correction model performs error correction processing on the text to be processed.
The detailed process of step S205 can refer to the related description of step S105 in embodiment 1, and is not described herein again.
In this embodiment, the first context information and the second context information of the word feature are spliced to obtain the third context information, so that the first context information and the second context information are not lost, and the accuracy of error correction of the speech recognition error correction model is further ensured.
As another alternative embodiment of the present application, referring to fig. 3, a schematic flow chart of an identification method provided in embodiment 3 of the present application is provided, and this embodiment mainly relates to a refinement scheme of the identification method described in the foregoing embodiment 1, as shown in fig. 3, the method may include, but is not limited to, the following steps:
step S301, word characteristics of each word in the text to be processed, which is recognized by the voice recognition system, are obtained.
Step S302, inputting the word features into the punctuation prediction model, and obtaining first context information of the word features obtained by the punctuation prediction model.
Step S303, inputting the word characteristics into the speech recognition error correction model, and obtaining second context information of the word characteristics obtained by the speech recognition error correction model.
The speech recognition error correction model is obtained by training by using training data, the training data comprises data for correcting the error of the training text and context information obtained by the punctuation prediction model based on the training samples.
The detailed processes of steps S301 to S303 can refer to the related descriptions of steps S101 to S103 in embodiment 1, and are not described herein again.
Step S304, performing dot product operation processing on the first context information and the second context information of the word features to obtain third context information.
Step S304 is a specific implementation manner of step S104 in embodiment 1.
Step S305, inputting the word characteristics and the third context information of the word characteristics into the speech recognition error correction model, and obtaining a text obtained after the speech recognition error correction model performs error correction processing on the text to be processed.
The detailed process of step S305 can refer to the related description of step S105 in embodiment 1, and is not described herein again.
And performing dot product operation processing on the first context information and the second context of the word characteristics to obtain third context information, so that the operation time can be saved, the efficiency of obtaining the third context information is improved, and the error correction efficiency is improved while the accuracy of the text obtained after the error correction processing is performed on the text to be processed by the voice recognition error correction model is ensured.
As another alternative embodiment of the present application, referring to fig. 4, a schematic flow chart of an identification method provided in embodiment 4 of the present application is provided, and this embodiment mainly relates to a refinement scheme of the identification method described in the foregoing embodiment 1, as shown in fig. 4, the method may include, but is not limited to, the following steps:
step S401, word characteristics of each word in the text to be processed identified by the voice recognition system are obtained.
Step S402, inputting the word features into the punctuation prediction model, and obtaining first context information of the word features obtained by the punctuation prediction model.
And S403, inputting the word features into the speech recognition error correction model, and obtaining second context information of the word features obtained by the speech recognition error correction model.
The speech recognition error correction model is obtained by training by using training data, the training data comprises data for correcting the error of the training text and context information obtained by the punctuation prediction model based on the training samples.
The detailed processes of steps S401 to S403 can refer to the related descriptions of steps S101 to S103 in embodiment 1, and are not described herein again.
Step S404, inputting the first context information and the second context information of the word features into a first machine learning model for feature fusion to obtain third context information output by the first machine learning model.
Step S404 is a specific implementation manner of step S104 in embodiment 1.
Step S405, inputting the word characteristics and the third context information of the word characteristics into the speech recognition error correction model, and obtaining a text obtained after the speech recognition error correction model performs error correction processing on the text to be processed.
The detailed process of step S405 can refer to the related description of step S105 in embodiment 1, and is not described herein again.
The first context information and the second context information of the word features are input into the first machine learning model for feature fusion, the first machine learning model outputs the third context information, the accuracy of the third context information can be guaranteed, the accurate third context information is input into the voice recognition error correction model, and the error correction accuracy of the voice recognition error correction model can be improved.
As another alternative embodiment of the present application, mainly a refinement of the recognition method described in embodiment 1 above, in this embodiment, the punctuation prediction model may include a punctuation prediction submodel and an auto-encoder. In the case that the punctuation prediction model comprises a punctuation prediction submodel and an auto-encoder, the training process of the speech recognition error correction model may comprise the following steps:
s2001, acquiring word features of each word in the training text, punctuation marks of the training text and an error correction label labeled for each word.
The detailed process of step S2001 may be referred to the related description of step S1031 in embodiment 1, and is not described herein again.
And S2002, inputting the word features into the self-encoder to obtain parameters used when the middle layer of the self-encoder processes the first subcontext information of the word features to be processed, wherein the word features to be processed are the first word features arranged in front of the word features in the training text.
An auto-encoder can be understood as: a machine learning model that learns characteristic information (e.g., punctuation distribution information) of an input object within its feature space.
The self-encoder may be, but is not limited to: the one-way long-time and short-time memory cyclic neural network model or the two-way long-time and short-time memory cyclic neural network model.
When the self-encoder is a unidirectional long-time and short-time memory cyclic neural network model or a bidirectional long-time and short-time memory cyclic neural network model, the intermediate layer of the self-encoder can be understood as follows: and a hidden layer of the unidirectional long-time and short-time memory cyclic neural network model or the bidirectional long-time and short-time memory cyclic neural network model.
In this embodiment, the intermediate layer of the self-encoder may process the word feature by using the following formula to obtain the first subcontext information of the word feature:
Figure BDA0002978961260000141
in the above-mentioned formula,
Figure BDA0002978961260000142
representing parameters, x, used in processing the word features from the intermediate layer of the encodert+1Representing the word features of the t +1 th word in the training text,
Figure BDA0002978961260000143
parameters used in processing the first subcontext information of the word features from an intermediate layer of the encoder,
Figure BDA0002978961260000144
first subcontext information representing word features of a t +1 th word in the training text,
Figure BDA0002978961260000145
first subcontext information representing word features of a tth word in the training text.
It can be understood that the word feature to be processed is one of the word features in the training text, and the first subcontext information of the word feature to be processed is also calculated by using the above formula.
And step S2003, obtaining the to-be-used characteristics based on parameters and word characteristics used when the middle layer of the self-encoder processes the first subcontext information of the to-be-processed word characteristics.
The to-be-used feature is obtained based on parameters and word features used when the middle layer of the self-encoder processes the first subcontext information of the to-be-processed word feature, which can be understood as: the method comprises the steps of utilizing parameters used when the middle layer of the self-encoder processes first subcontext information of a word feature to be processed, mapping the word feature to a feature space meeting the requirements of a punctuation prediction submodel to obtain a feature to be used, enabling the feature to be used to meet the feature space of the punctuation prediction submodel, and enabling the feature to be used to contain characteristic information (such as punctuation distribution information) of the word feature in an original feature space.
The feature to be used is obtained based on the parameters used when the intermediate layer of the self-encoder processes the word feature and the word feature, and may include but is not limited to:
and multiplying the parameters used when the middle layer of the self-encoder processes the first subcontext information of the word feature to be processed by the word feature to obtain the feature to be used.
Of course, the obtaining of the feature to be used based on the parameter and the word feature used when the middle layer of the self-encoder processes the first subcontext information of the feature to be processed may also include:
and inputting parameters and word characteristics used when the first subcontext information of the word characteristics to be processed in the middle layer of the self-encoder is processed into a second machine learning model for characteristic fusion to obtain the characteristics to be used output by the second machine learning model.
And step S2004, inputting the features to be used into the middle layer of the punctuation prediction submodel, and processing the features to be used by the middle layer of the punctuation prediction submodel to obtain first context information of the word features.
Inputting the feature to be used into the intermediate layer of the punctuation prediction submodel, and processing the feature to be used by the intermediate layer of the punctuation prediction submodel to obtain the first context information of the word feature, which can be understood as:
inputting the characteristics to be used into an intermediate layer of the punctuation prediction submodel, and processing the characteristics to be used by the intermediate layer of the punctuation prediction submodel by using the following formula to obtain first context information of the word characteristics:
Figure BDA0002978961260000151
in the above-mentioned formula,
Figure BDA0002978961260000152
parameters used when the middle layer representing the punctuation prediction submodel processes the word features,
Figure BDA0002978961260000153
the characteristics to be used are indicated,
Figure BDA0002978961260000154
first context information representing word features of a t +1 th word in the training text,
Figure BDA0002978961260000155
first context information representing word features of a tth word in the training text,
Figure BDA0002978961260000156
the parameter used when the middle layer of the punctuation prediction submodel processes the first context information of the word feature, and sigmoid () represents a mathematical function.
The feature to be used accords with the feature space required by the punctuation prediction submodel and contains the characteristic information of the word feature in the original feature space, so that the feature to be used is input into the middle layer of the punctuation prediction submodel, the middle layer of the punctuation prediction submodel can be ensured to process the feature to be used without losing the characteristic information of the word feature in the original feature space, and the accuracy of the first context information of the word feature is ensured.
Steps S2002-S2004 are a specific implementation of step S1032 in example 1.
In this embodiment, the parameters of the punctuation prediction submodel may be obtained by training a plurality of complete training texts in advance. Of course, the parameters of the punctuation predictor model may also be initially set, parameters that have not been trained by a complete training sample.
And S2005, inputting the plurality of word features into the voice recognition error correction model, and obtaining second context information of each word feature obtained by the voice recognition error correction model.
And S2006, determining third context information of the word features based on the first context information and the second context information of the word features.
If the parameters of the punctuation prediction submodel are obtained by training a plurality of complete training texts in advance, the punctuation prediction submodel is proved to have learned richer context information, so that the features to be used are input into the punctuation prediction submodel, the accuracy of the obtained first context information of the features to be used of the punctuation prediction submodel is higher, and the richness and the accuracy of the third context information of the features of the words to be used can be further ensured.
And S2007, inputting the third context information and the word characteristics into the voice recognition error correction model, and obtaining an error correction result output by the voice recognition error correction model, wherein the error correction result is a result of correcting the word characteristics.
And S2008, determining a loss function value of the self-encoder based on the first subcontext information of each word feature obtained by the self-encoder.
The process of determining the value of the auto-encoder loss function based on the first subcontext information obtained for each word feature from the encoder may include, but is not limited to:
the self-encoder loss function value is calculated using the following formula:
Figure BDA0002978961260000161
in the above formula, LaeRepresenting the value of the loss function of the self-encoder, haeFirst subcontext information representing word features, MLE () representing a maximum likelihood estimation function.
And S2009, determining a punctuation prediction sub-model loss function value based on the first context information of each word feature.
The process of determining a punctuation predictor sub-model loss function value based on the first context information of each word feature may include:
calculating the loss function value of the punctuation predictor model by using the following formula:
Lpunc=MLE(hpunc)
in the above formula, LpuncLoss function value, h, representing a punctuation predictor modelpuncFirst context information representing a word feature, MLE () representing a maximum likelihood estimation function.
And S2010, obtaining a punctuation prediction loss function value based on the self-encoder loss function value and the punctuation prediction sub-model loss function value.
The punctuation prediction loss function value is obtained based on the self-encoder loss function value and the punctuation predictor model loss function value, which may include but is not limited to:
and adding the loss function value of the self-encoder and the loss function value of the punctuation prediction submodel to obtain a punctuation prediction loss function value.
Another embodiment of obtaining the punctuation prediction loss function value based on the self-encoder loss function value and the punctuation prediction submodel loss function value may be:
the punctuation prediction loss function value is calculated using the following formula:
L=γLpunc+(1-γ)Lac
in the above formula, LpuncLoss function value, L, representing a punctuation predictor modelaeRepresents the loss function value of the self-encoder, y represents the ultra-parameter, the value range of y is 0-1, and L represents the point prediction loss function value.
S2011, determining an error correction loss function value based on the error correction result of the word feature and the difference between the error correction labels labeled for the words.
And S2012, obtaining a comprehensive loss function value based on the error correction loss function value and the punctuation loss function value.
The detailed process of step S2012 can be referred to the related description of step S10363 in embodiment 1, and is not described herein again.
And S2013, judging whether the comprehensive loss function value is converged.
If not, go to step S2014.
Steps S2008-S2013 are a specific implementation of step S1036 in example 1.
And S2014, updating the parameters of the punctuation prediction model and the parameters of the speech recognition error correction model, and returning to execute the step S2001 until the training end condition is met.
In this embodiment, the training end condition may be set as needed, and is not limited in this application. For example, the end-of-training condition may be, but is not limited to: the loss function value of the punctuation prediction model is converged and the loss function value of the speech recognition error correction model is converged; or, the obtained comprehensive loss function value is converged based on the loss function value of the punctuation prediction model and the loss function value of the speech recognition error correction model.
In this embodiment, on the basis of ensuring the accuracy of the first context information of the word feature, the accuracy of the third context information of the word feature can be ensured, so as to ensure the training accuracy of the speech recognition error correction model.
Corresponding to the training process of the above self-encoder, punctuation prediction submodel and speech recognition error correction model, referring to fig. 5, a flow chart of a recognition method provided in embodiment 5 of the present application is shown, and this embodiment mainly describes a refinement scheme of the recognition method described in embodiment 1, as shown in fig. 5, the method may include, but is not limited to, the following steps:
step S501, word characteristics of each word in the text to be processed, which is recognized by the voice recognition system, are obtained.
The detailed process of step S501 can refer to the related description of step S101 in embodiment 1, and is not described herein again.
Step S502, inputting the word characteristics into the self-encoder, and obtaining parameters used when the middle layer of the self-encoder processes the first subcontext information of the word characteristics to be processed, wherein the word characteristics to be processed is the first word characteristics arranged in front of the word characteristics in the text to be processed.
Step S503, parameters and word characteristics used when the first subcontext information of the word characteristics to be processed is processed based on the middle layer of the self-encoder are obtained, and the characteristics to be used are obtained.
The method for obtaining the feature to be used based on the parameters used when the middle layer of the self-encoder processes the first subcontext information of the feature to be processed and the word feature comprises the following steps:
and multiplying the parameters used when the middle layer of the self-encoder processes the first subcontext information of the word feature to be processed by the word feature to obtain the feature to be used.
Of course, the obtaining of the feature to be used based on the parameter and the word feature used when the middle layer of the self-encoder processes the first subcontext information of the feature to be processed may also include:
and inputting parameters and word characteristics used when the first subcontext information of the word characteristics to be processed in the middle layer of the self-encoder is processed into a second machine learning model for characteristic fusion to obtain the characteristics to be used output by the second machine learning model.
Step S504, inputting the characteristics to be used into the middle layer of the punctuation prediction submodel, and processing the characteristics to be used by the middle layer of the punctuation prediction submodel to obtain first context information of the word characteristics.
Steps S502 to S504 are a specific implementation of step S102 in example 1.
And step S505, inputting the word characteristics into the voice recognition error correction model, and obtaining second context information of the word characteristics obtained by the voice recognition error correction model.
The speech recognition error correction model is obtained by training by using training data, the training data comprises data for correcting the error of the training text and context information obtained by the punctuation prediction model based on the training samples.
Step S506, third context information of the word features is determined based on the first context information and the second context information of the word features.
Step S507, inputting the word characteristics and the third context information of the word characteristics into the voice recognition error correction model, and obtaining a text obtained after the voice recognition error correction model performs error correction processing on the text to be processed.
The detailed processes of steps S505 to S507 can be referred to the related descriptions of steps S103 to S105 in embodiment 1, and are not described herein again.
In this embodiment, since the feature to be used conforms to the feature space required by the punctuation prediction submodel and includes the feature information of the word feature in the original feature space, the feature to be used is input to the intermediate layer of the punctuation prediction submodel, so that the intermediate layer of the punctuation prediction submodel can process the feature to be used without losing the feature information of the word feature in the original feature space, the accuracy of the first context information of the word feature is ensured, and the accuracy of the error correction of the speech recognition error correction model is further improved.
Corresponding to the embodiment of the identification method provided by the application, the application also provides an embodiment of the electronic equipment applying the identification method.
As shown in fig. 6, which is a schematic structural diagram of an embodiment 1 of an electronic device provided in the present application, the electronic device may include the following structures:
a memory 100 and a processor 200.
A memory 100 for storing at least one set of instructions;
a processor 200 for calling and executing the set of instructions in the memory 100, and executing the set of instructions to:
acquiring word characteristics of each word in a text to be processed, which is identified by a voice identification system;
inputting the word features into a punctuation prediction model to obtain first context information of the word features obtained by the punctuation prediction model;
inputting the word features into a speech recognition error correction model to obtain second context information of the word features obtained by the speech recognition error correction model, wherein the speech recognition error correction model is obtained by training with training data, and the training data comprises data for correcting the training text and context information obtained by the punctuation prediction model based on the training samples;
determining third context information of the word feature based on the first context information and the second context information of the word feature;
and inputting the word characteristics and the third context information of the word characteristics into the voice recognition error correction model to obtain a text obtained by performing error correction processing on the text to be processed by the voice recognition error correction model.
Corresponding to the embodiment of the identification method provided by the application, the application also provides an embodiment of an identification device.
In this embodiment, the identification device may include:
the acquisition module is used for acquiring the word characteristics of each word in the text to be processed, which is identified by the voice recognition system;
the first obtaining module is used for inputting the word features into a punctuation prediction model and obtaining first context information of the word features obtained by the punctuation prediction model;
a second obtaining module, configured to input the word feature into a speech recognition error correction model, and obtain second context information of the word feature, where the second context information is obtained by the speech recognition error correction model, the speech recognition error correction model is obtained by training with training data, and the training data includes data for correcting a training text and context information obtained by the punctuation prediction model based on the training sample;
a determining module, configured to determine third context information of the word feature based on the first context information and the second context information of the word feature;
and the third obtaining module is used for inputting the word characteristics and third context information of the word characteristics into the voice recognition error correction model to obtain a text obtained after the voice recognition error correction model performs error correction processing on the text to be processed.
In this embodiment, the determining module may be specifically configured to:
splicing the first context information and the second context information of the word characteristics to obtain third context information;
or, performing dot product operation processing on the first context information and the second context information of the word features to obtain third context information;
or inputting the first context information and the second context information of the word features into a first machine learning model for feature fusion to obtain third context information output by the first machine learning model.
In this embodiment, the punctuation prediction model may include a punctuation prediction sub-model and a self-encoder;
accordingly, the first obtaining module may be specifically configured to:
inputting word characteristics into the self-encoder, and acquiring parameters used when the middle layer of the self-encoder processes first subcontext information of the word characteristics to be processed, wherein the word characteristics to be processed are first word characteristics arranged in front of the word characteristics in a text to be processed;
the method comprises the steps that parameters and word characteristics used when first subcontext information of word characteristics to be processed is processed based on an intermediate layer of an autoencoder are used, and the characteristics to be used are obtained;
and inputting the characteristics to be used into an intermediate layer of the punctuation prediction submodel, and processing the characteristics to be used by the intermediate layer of the punctuation prediction submodel to obtain first context information of the word characteristics.
In this embodiment, the process of obtaining the to-be-used feature by the first obtaining module based on the parameter and the word feature used when the middle layer of the self-encoder processes the first subcontext information of the to-be-processed word feature may specifically be:
multiplying parameters used when the middle layer of the self-encoder processes the first subcontext information of the word feature to be processed by the word feature to obtain the feature to be used;
or, inputting parameters and word features used when the first subcontext information of the word features to be processed in the middle layer of the self-encoder is processed into a second machine learning model for feature fusion to obtain the features to be used output by the second machine learning model.
Corresponding to the embodiment of the identification method provided by the application, the application also provides an embodiment of a storage medium.
In this embodiment, a storage medium stores a computer program for implementing the identification method according to any one of the foregoing embodiments, and the computer program is executed by a processor for implementing the steps of the identification method according to any one of the foregoing embodiments.
It should be noted that each embodiment is mainly described as a difference from the other embodiments, and the same and similar parts between the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.
The foregoing detailed description is directed to a control method, a control device, and an electronic device provided by the present application, and specific examples are applied in the present application to explain the principles and embodiments of the present application, and the descriptions of the foregoing examples are only used to help understand the method and the core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. An identification method, comprising:
acquiring word characteristics of each word in a text to be processed, which is identified by a voice identification system;
inputting the word features into a punctuation prediction model to obtain first context information of the word features obtained by the punctuation prediction model;
inputting the word features into a speech recognition error correction model to obtain second context information of the word features obtained by the speech recognition error correction model, wherein the speech recognition error correction model is obtained by training with training data, and the training data comprises data for correcting the training text and context information obtained by the punctuation prediction model based on the training samples;
determining third context information of the word feature based on the first context information and the second context information of the word feature;
and inputting the word characteristics and the third context information of the word characteristics into the voice recognition error correction model to obtain a text obtained by performing error correction processing on the text to be processed by the voice recognition error correction model.
2. The method of claim 1, the determining third context information for the word feature based on the first context information and the second context information for the word feature, comprising:
and splicing the first context information and the second context information of the word features to obtain third context information.
3. The method of claim 1, wherein obtaining third context information for the word feature based on the first context information and the second context information for the word feature comprises:
and performing dot product operation processing on the first context information and the second context information of the word features to obtain third context information.
4. The method of claim 1, wherein obtaining third context information for the word feature based on the first context information and the second context information for the word feature comprises:
and inputting the first context information and the second context information of the word features into a first machine learning model for feature fusion to obtain third context information output by the first machine learning model.
5. The method of claim 1, the punctuation prediction model comprising a punctuation prediction submodel and an auto-encoder;
the inputting the word features into a punctuation prediction model to obtain first context information of the word features obtained by the punctuation prediction model includes:
inputting the word features into the self-encoder, and obtaining parameters used when a middle layer of the self-encoder processes first sub-context information of word features to be processed, wherein the word features to be processed are first word features arranged in front of the word features in the text to be processed;
obtaining a feature to be used based on a parameter used when the middle layer of the self-encoder processes the first subcontext information of the feature of the word to be processed and the word feature;
inputting the feature to be used into an intermediate layer of the punctuation prediction submodel, and obtaining first context information of the word feature by processing the feature to be used by the intermediate layer of the punctuation prediction submodel.
6. The method according to claim 5, wherein the obtaining the feature to be used based on the word feature and parameters used by an intermediate layer of the self-encoder to process the first subcontext information of the feature to be processed comprises:
and multiplying the parameters used when the middle layer of the self-encoder processes the first subcontext information of the word feature to be processed by the word feature to obtain the feature to be used.
7. The method according to claim 5, wherein the obtaining the feature to be used based on the word feature and parameters used by an intermediate layer of the self-encoder to process the first subcontext information of the feature to be processed comprises:
and inputting parameters used when the middle layer of the self-encoder processes the first subcontext information of the word features to be processed and the word features into a second machine learning model for feature fusion to obtain the features to be used output by the second machine learning model.
8. An identification device comprising:
the acquisition module is used for acquiring the word characteristics of each word in the text to be processed, which is identified by the voice recognition system;
the first obtaining module is used for inputting the word features into a punctuation prediction model and obtaining first context information of the word features obtained by the punctuation prediction model;
a second obtaining module, configured to input the word feature into a speech recognition error correction model, and obtain second context information of the word feature, where the second context information is obtained by the speech recognition error correction model, the speech recognition error correction model is obtained by training with training data, and the training data includes data for correcting a training text and context information obtained by the punctuation prediction model based on the training sample;
a determining module, configured to determine third context information of the word feature based on the first context information and the second context information of the word feature;
and the third obtaining module is used for inputting the word characteristics and third context information of the word characteristics into the voice recognition error correction model to obtain a text obtained after the voice recognition error correction model performs error correction processing on the text to be processed.
9. An electronic device, comprising:
a memory and a processor;
a memory for storing at least one set of instructions;
a processor for calling and executing the set of instructions in the memory, by executing the set of instructions:
acquiring word characteristics of each word in a text to be processed, which is identified by a voice identification system;
inputting the word features into a punctuation prediction model to obtain first context information of the word features obtained by the punctuation prediction model;
inputting the word features into a speech recognition error correction model to obtain second context information of the word features obtained by the speech recognition error correction model, wherein the speech recognition error correction model is obtained by training with training data, and the training data comprises data for correcting the training text and context information obtained by the punctuation prediction model based on the training samples;
determining third context information of the word feature based on the first context information and the second context information of the word feature;
and inputting the word characteristics and the third context information of the word characteristics into the voice recognition error correction model to obtain a text obtained by performing error correction processing on the text to be processed by the voice recognition error correction model.
10. A storage medium storing a computer program implementing the identification method according to any one of claims 1 to 7, the computer program being executable by a processor to implement the steps of the identification method according to any one of claims 1 to 7.
CN202110281812.8A 2021-03-16 2021-03-16 Identification method, identification device, electronic equipment and storage medium Active CN113012701B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110281812.8A CN113012701B (en) 2021-03-16 2021-03-16 Identification method, identification device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110281812.8A CN113012701B (en) 2021-03-16 2021-03-16 Identification method, identification device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113012701A true CN113012701A (en) 2021-06-22
CN113012701B CN113012701B (en) 2024-03-22

Family

ID=76408405

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110281812.8A Active CN113012701B (en) 2021-03-16 2021-03-16 Identification method, identification device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113012701B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114049885A (en) * 2022-01-12 2022-02-15 阿里巴巴达摩院(杭州)科技有限公司 Punctuation mark recognition model construction method and punctuation mark recognition model construction device
WO2023273612A1 (en) * 2021-06-30 2023-01-05 北京有竹居网络技术有限公司 Training method and apparatus for speech recognition model, speech recognition method and apparatus, medium, and device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009276495A (en) * 2008-05-14 2009-11-26 Nippon Telegr & Teleph Corp <Ntt> Incorrect speech recognition correction support device, its method, program and its recording medium
CN104484322A (en) * 2010-09-24 2015-04-01 新加坡国立大学 Methods and systems for automated text correction
CN105869634A (en) * 2016-03-31 2016-08-17 重庆大学 Field-based method and system for feeding back text error correction after speech recognition
CN108766437A (en) * 2018-05-31 2018-11-06 平安科技(深圳)有限公司 Audio recognition method, device, computer equipment and storage medium
CN110069143A (en) * 2018-01-22 2019-07-30 北京搜狗科技发展有限公司 A kind of information is anti-error to entangle method, apparatus and electronic equipment
CN110705264A (en) * 2019-09-27 2020-01-17 上海智臻智能网络科技股份有限公司 Punctuation correction method, punctuation correction apparatus, and punctuation correction medium
CN110765772A (en) * 2019-10-12 2020-02-07 北京工商大学 Text neural network error correction model after Chinese speech recognition with pinyin as characteristic
WO2020186778A1 (en) * 2019-03-15 2020-09-24 平安科技(深圳)有限公司 Error word correction method and device, computer device, and storage medium
US20200327284A1 (en) * 2018-03-23 2020-10-15 Servicenow, Inc. Hybrid learning system for natural language understanding
CN112101032A (en) * 2020-08-31 2020-12-18 广州探迹科技有限公司 Named entity identification and error correction method based on self-distillation

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009276495A (en) * 2008-05-14 2009-11-26 Nippon Telegr & Teleph Corp <Ntt> Incorrect speech recognition correction support device, its method, program and its recording medium
CN104484322A (en) * 2010-09-24 2015-04-01 新加坡国立大学 Methods and systems for automated text correction
CN105869634A (en) * 2016-03-31 2016-08-17 重庆大学 Field-based method and system for feeding back text error correction after speech recognition
CN110069143A (en) * 2018-01-22 2019-07-30 北京搜狗科技发展有限公司 A kind of information is anti-error to entangle method, apparatus and electronic equipment
US20200327284A1 (en) * 2018-03-23 2020-10-15 Servicenow, Inc. Hybrid learning system for natural language understanding
CN108766437A (en) * 2018-05-31 2018-11-06 平安科技(深圳)有限公司 Audio recognition method, device, computer equipment and storage medium
WO2020186778A1 (en) * 2019-03-15 2020-09-24 平安科技(深圳)有限公司 Error word correction method and device, computer device, and storage medium
CN110705264A (en) * 2019-09-27 2020-01-17 上海智臻智能网络科技股份有限公司 Punctuation correction method, punctuation correction apparatus, and punctuation correction medium
CN110765772A (en) * 2019-10-12 2020-02-07 北京工商大学 Text neural network error correction model after Chinese speech recognition with pinyin as characteristic
CN112101032A (en) * 2020-08-31 2020-12-18 广州探迹科技有限公司 Named entity identification and error correction method based on self-distillation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SVETLANA STOYANCHEV等: "Localized detection of speech recognition errors", 2012 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP(SLT), pages 25 - 30 *
景艳娥: "基于深度学习技术的语法纠错算法模型构建分析", 《信息技术》, vol. 44, no. 9, pages 143 - 147 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023273612A1 (en) * 2021-06-30 2023-01-05 北京有竹居网络技术有限公司 Training method and apparatus for speech recognition model, speech recognition method and apparatus, medium, and device
CN114049885A (en) * 2022-01-12 2022-02-15 阿里巴巴达摩院(杭州)科技有限公司 Punctuation mark recognition model construction method and punctuation mark recognition model construction device
CN114049885B (en) * 2022-01-12 2022-04-22 阿里巴巴达摩院(杭州)科技有限公司 Punctuation mark recognition model construction method and punctuation mark recognition model construction device

Also Published As

Publication number Publication date
CN113012701B (en) 2024-03-22

Similar Documents

Publication Publication Date Title
CN109471915B (en) Text evaluation method, device and equipment and readable storage medium
CN105068998B (en) Interpretation method and device based on neural network model
CN113012701B (en) Identification method, identification device, electronic equipment and storage medium
CN111951780B (en) Multitasking model training method for speech synthesis and related equipment
CN113297366B (en) Emotion recognition model training method, device, equipment and medium for multi-round dialogue
CN111291187B (en) Emotion analysis method and device, electronic equipment and storage medium
CN114492363B (en) Small sample fine adjustment method, system and related device
CN112434142B (en) Method for marking training sample, server, computing equipment and storage medium
CN111859967B (en) Entity identification method and device and electronic equipment
CN111858854A (en) Question-answer matching method based on historical dialogue information and related device
CN111724766B (en) Language identification method, related equipment and readable storage medium
JP2021530066A (en) Problem correction methods, devices, electronic devices and storage media for mental arithmetic problems
CN113642652A (en) Method, device and equipment for generating fusion model
CN112767921A (en) Voice recognition self-adaption method and system based on cache language model
CN111291552A (en) Method and system for correcting text content
CN113657098B (en) Text error correction method, device, equipment and storage medium
CN112966476B (en) Text processing method and device, electronic equipment and storage medium
CN113486174B (en) Model training, reading understanding method and device, electronic equipment and storage medium
CN111859948A (en) Language identification, language model training and character prediction method and device
CN113488023A (en) Language identification model construction method and language identification method
CN112183060A (en) Reference resolution method of multi-round dialogue system
CN115147849A (en) Training method of character coding model, character matching method and device
CN115454788A (en) Log anomaly detection method, device, equipment and storage medium
CN112989040B (en) Dialogue text labeling method and device, electronic equipment and storage medium
CN114896966A (en) Method, system, equipment and medium for positioning grammar error of Chinese text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant