CN114912418A

CN114912418A - Text error correction method and system based on multivariate semantic features and mask strategy

Info

Publication number: CN114912418A
Application number: CN202111457771.XA
Authority: CN
Inventors: 郭磊; 边延风; 褚国庆; 邵亚红; 余明; 杨荣繁
Original assignee: Tianyi Digital Life Technology Co Ltd
Current assignee: Tianyi Digital Life Technology Co Ltd
Priority date: 2021-12-02
Filing date: 2021-12-02
Publication date: 2022-08-16

Abstract

The application provides a text error correction method and a text error correction system based on a mask strategy and multi-element semantic feature coding, wherein the method comprises the following steps: taking a voice transcription text to be corrected as an input, and determining an alternative error position in the voice transcription text by utilizing a pre-constructed confusion dictionary; encoding the voice transcription text by utilizing an encoder in a self-encoder based on the trained multi-element semantic feature extraction to obtain continuous feature vectors; inputting the obtained feature vectors into a trained error detection model based on a masking strategy to obtain an error occurrence confidence for each of the determined candidate error positions; and for each alternative error location, in response to the error occurrence confidence for that alternative error location being above a predefined threshold, correcting the text of that alternative error location using the confusion dictionary to output corrected phonetic transcription text.

Description

Text error correction method and system based on multivariate semantic features and mask strategy

Technical Field

The application relates to the field of artificial intelligence, in particular to a text error correction method based on multivariate semantic features and a mask strategy.

Background

With the continuous development of artificial intelligence big data technology, the related business of China telecom big data analysis and mining is increasing, and China telecom has become an important user of big data artificial intelligence technology nowadays. In the actual telecommunication business, the related technologies such as artificial intelligence and the like are widely used for information analysis and data mining of big data. Today, the analysis of call data by relying on the wide-range call services of telecommunication is of great importance in the services of marketing and the like. Analysis of call log data often relies on text logging of voice transcription, however, due to voice transcription limitations and interference of noise of the call equipment, the transcribed text often has more errors, and therefore error correction processing needs to be performed on the text in such a case.

To solve this problem, a conventional solution is to use a dictionary confusion method, in which a technician constructs an error correction confusion dictionary using homophones, nearphones, and many kinds of easy-to-error words of homomorphic characters, calculates the confusion of a text sentence by using an n-gram language model based on a markov process to perform error detection, corrects the detected error by using the confusion dictionary, and then determines whether the confusion is reduced, thereby achieving the objective of error correction. The method mainly has the following two problems that influence the accuracy of text error correction:

(1) when the word segmentation tool performs general performance on the text to be detected, the word segmentation device can divide wrong word segmentation results, so that a language model and a wrong correction result of the confusion dictionary are influenced, and the originally correct words are wrongly modified.

(2) The language model of the n-gram is used for error detection, only the first 1-3 characters of the current character can be considered, the global information of the text to be detected is difficult to obtain, the size of the language model window is simply expanded, a large number of unknown words and low-frequency words can be generated, and after the language model is subjected to smoothing treatment, the integral confusion degree of sentences in the reasoning process is too high, and the judgment of the transcription errors is influenced. Meanwhile, the language model has difficulty in explicitly learning the syntactic semantic information, and the shortcomings of the language model can easily cause large errors in error positioning and checking.

Therefore, in order to improve the performance of detecting and correcting the voice transcription error characters while ensuring high interpretability and business flexibility of error correction based on the obfuscated dictionary, it is desirable to provide an improved method for correcting the voice transcription text so as to increase the processing speed of the single sentence text while increasing the accuracy of error correction.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

Aiming at the defects in the prior art, the method for improving the text error correction accuracy of the confusion dictionary based on the random mask and combined with the multivariate semantic feature coding is provided, the detection and correction performance of the voice transcription error characters is improved by using the random mask strategy and the multivariate semantic feature coding while the high interpretability and the service flexibility of the confusion dictionary based on the error correction are ensured, and the processing speed of the single sentence text is accelerated while the accuracy is improved. According to the method, an efficient error detection model is trained in a supervision mode, error points which possibly occur are intelligently identified and probability evaluation is carried out, wherein for a transcribed fragment text, the position where the error possibly occurs can be obtained only through one round of forward operation.

According to a first aspect of the present application, there is provided a method for text error correction in speech transcription, the method comprising: taking a voice transcription text to be corrected as an input, and determining an alternative error position in the voice transcription text by utilizing a pre-constructed confusion dictionary; encoding the speech transcription text with an encoder in a trained multivariate semantic feature extraction based auto-encoder to obtain continuous feature vectors; inputting the obtained feature vectors into a trained error detection model based on a masking strategy to obtain an error occurrence confidence for each of the determined candidate error positions; and for each alternative error location, in response to the error occurrence confidence for that alternative error location being above a predefined threshold, correcting the text of that alternative error location using the confusion dictionary to output corrected phonetic transcription text.

In the technical scheme of the embodiment of the application, the error correction is carried out on the voice transcription text by utilizing the encoder based on the multivariate semantic features and the error detection model based on the mask strategy, so that the accuracy rate can be improved, and the processing speed of the single-sentence text can be increased.

In some embodiments, the confusion dictionary in which the gathered erroneous words are recorded as erroneous phrase-correction phrase is pre-constructed by gathering historical analysis data and common error-prone words, homophones, nearsighted words, and homographs.

In some embodiments, determining alternative error locations in the speech transcribed text using a pre-constructed confusion dictionary further comprises: the alternative error locations in the phonetic transcription text are determined by tagging words in the confusion dictionary with a forward maximum matching algorithm to retrieve possible errors collected in the confusion dictionary.

In some embodiments, encoding the speech transcription text with an encoder in a trained multivariate semantic feature extraction based auto-encoder to obtain a continuous feature vector further comprises: performing multi-element semantic feature extraction on the voice transcription text to obtain discrete semantic features, wherein the multi-element semantic feature extraction comprises Chinese pinyin feature extraction, Chinese part-of-speech feature extraction and text word position feature extraction; and mapping the obtained discrete semantic features into a low-dimensional continuous feature space to obtain continuous feature vectors. By encoding the extracted multivariate semantic features from the speech transcription text into continuous feature vectors using a self-encoder based on multivariate semantic feature extraction, the accuracy and interpretability of error detection and correction can be improved.

In some embodiments, the self-encoder and the error detection model are trained using multi-tasking based on the following operations, wherein a training data set includes artificially labeled speech transcription text, wherein portions of the determined alternative error locations where errors actually exist and where no errors occur are labeled in the speech transcription text: inputting each training sample in the training data set into the self-encoder to extract discrete semantic features for encoding; inputting each training sample in the training data set into the error detection model to mask the alternative error locations with a mask based on a masking strategy and predicting actual characters of mask locations with context; and calculating and accumulating the loss of the self-encoder and the loss of the error detection model, and updating the parameters of the self-encoder and the error detection model. By utilizing multi-task learning to train the self-encoder based on the extraction of the multivariate semantic features and the error detection model based on the mask strategy, the interpretability and the prediction accuracy of the model can be further enhanced.

According to a second aspect of the present application, there is provided a system for text error correction in speech transcription, the system comprising: an alternative error determination module configured to take as input a speech transcription text to be corrected, determine alternative error locations in the speech transcription text using a pre-constructed confusion dictionary; a semantic feature encoding module configured to encode the speech transcription text with an encoder of a trained multi-semantic feature extraction based auto-encoder to obtain a continuous feature vector; a transcription error detection module configured to input the resulting feature vectors into a trained masking-strategy-based error detection model to obtain an error occurrence confidence for each of the determined alternative error locations; and

a transcription error correction module configured to, for each alternative error location, in response to an error occurrence confidence for the alternative error location being above a predefined threshold, correct text of the alternative error location using the confusion dictionary to output corrected phonetic transcription text.

In some embodiments, the semantic feature encoding module is further configured to: performing multi-element semantic feature extraction on the voice transcription text to obtain discrete semantic features, wherein the multi-element semantic feature extraction comprises Chinese pinyin feature extraction, Chinese part-of-speech feature extraction and text word position feature extraction; and mapping the obtained discrete semantic features into a low-dimensional continuous feature space to obtain continuous feature vectors.

In some embodiments, the self-encoder and the error detection model are trained using multi-tasking based on the following operations, wherein a training data set includes artificially labeled speech transcription text, wherein portions of the determined alternative error locations where errors actually exist and where no errors occur are labeled in the speech transcription text: inputting each training sample in the training data set into the self-encoder to extract discrete semantic features for encoding; inputting each training sample in the training data set into the error detection model to mask the alternative error locations with a mask based on a masking strategy and predicting actual characters of mask locations with context; and calculating and accumulating the loss of the self-encoder and the loss of the error detection model, and updating the parameters of the self-encoder and the error detection model.

According to a third aspect of the present application, there is provided a system for speech transcription text error correction, the system comprising: a memory storing a trained multivariate semantic feature extraction based self-encoder and a masking policy based error detection model and computer executable instructions; and at least one processor that when executed causes the at least one processor to perform the method of any one of the preceding first aspects.

Compared with the scheme in the prior art, the method and the system for correcting the text by the voice transcription have the advantages that:

(1) the method comprises the steps of replacing a language model of an n-gram with an error detection model based on a mask strategy to carry out error detection, replacing original characters with masks for words appearing in a confusion dictionary, predicting whether the mask positions actually generate text transcription errors, intelligently identifying error points which possibly appear and carrying out probability evaluation, obtaining the positions where the errors possibly appear by only one round of forward operation for a transcribed fragment text, and improving the error correction precision and processing speed;

(2) the method and the device have the advantages that on the basis of the multiple discrete characteristics such as pinyin characteristics, part-of-speech characteristics and word sequence characteristics of the text to be detected, the encoder maps the text to the low-dimensional continuous characteristic space and the text is used as the explicit characteristics to be added into the model training, so that the accuracy rate and the interpretability of error detection and correction are improved; and

(3) the method introduces multi-task learning, and enhances the interpretability and the prediction accuracy of the model by simultaneously learning two tasks of multivariate semantic coding and mask error prediction.

These and other features and advantages will become apparent upon reading the following detailed description and upon reference to the accompanying drawings. It is to be understood that both the foregoing general description and the following detailed description are explanatory only and are not restrictive of aspects as claimed.

Drawings

So that the manner in which the above recited features of the present application can be understood in detail, a more particular description of the disclosure, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only certain typical aspects of this application and are therefore not to be considered limiting of its scope, for the description may admit to other equally effective aspects.

FIG. 1 shows an example architecture diagram of a system for speech transcription text error correction according to one embodiment of the present application;

FIG. 2 illustrates an example flow diagram of a method for training an auto-encoder and error detection model using multi-tasking according to one embodiment of this application;

FIG. 3 illustrates an example flow diagram of a method for speech transcription text correction according to one embodiment of this application;

FIG. 4 illustrates an example flow diagram of a multivariate semantic feature coding part according to one embodiment of the application; and

FIG. 5 illustrates an example architecture diagram of a system for speech transcription text correction according to one embodiment of the present application.

Detailed Description

The present application is described in detail below with reference to the attached drawing figures, and features of the present application will become further apparent from the detailed description below.

FIG. 1 illustrates an example architecture diagram of a system 100 for speech transcription text correction according to one embodiment of the present application. As shown in fig. 1, the system 100 of the present application includes at least: an alternative error determination module 101, a semantic feature encoding module 102, a transcription error detection module 103, and a transcription error correction module 104.

The alternative error determination module 101 may be configured to take as input the speech transcribed text to be corrected and determine alternative error locations in the speech transcribed text using a pre-constructed confusion dictionary. In some cases, a confusion dictionary supporting error correction can be constructed in advance by collecting historical call analysis data, department business experience such as outbound marketing, and common error-prone words and homophones, and the collected error words are recorded in the confusion dictionary in the form of "error phrase-correction phrase". Further, the alternative error determination module 101 may be configured to mark the possible errors collected in the confusion dictionary by marking the words appearing in the confusion dictionary in the phonetic transcription text to be corrected using a forward maximum matching algorithm, so as to mark alternative error positions in the phonetic transcription text.

The semantic feature encoding module 102 may be configured to encode the speech transcription text to be corrected using an encoder of a trained multivariate semantic feature extraction based self-encoder to obtain a continuous feature vector. Since the pronunciation of words, the position of a sentence, and the part of speech owned by a sentence are often considered in the process of manually participating in text error correction, the semantic feature coding module 102 may be further configured to perform a multivariate semantic feature extraction on the speech transcription text to be error corrected to obtain discrete semantic features, where the multivariate semantic feature extraction includes chinese pinyin feature extraction, chinese part of speech feature extraction, and text word position feature extraction, and then map the obtained discrete semantic features to a low-dimensional continuous feature space to obtain continuous feature vectors.

The transcription error detection module 103 may be configured to input the resulting feature vectors into a trained error detection model based on a masking strategy to obtain an error occurrence confidence for each of the determined candidate error locations. In some cases, the above-mentioned self-encoder based on multivariate semantic feature extraction and the error detection model based on the masking strategy are trained by using multitask learning, and a specific training process is shown in fig. 2 and described in further detail below.

The transcription error correction module 104 may be configured to, for each alternative error location, in response to an error occurrence confidence for the alternative error location being above a predefined threshold, correct the text of the alternative error location using a pre-constructed confusion dictionary, and output a corrected speech transcribed text after correcting all locations requiring correction.

Those skilled in the art will appreciate that the system and its various modules of the present application may be implemented in hardware as well as software, and that the various modules may be combined or combined in any suitable manner.

FIG. 2 illustrates an example flow diagram of a method 200 for training a self-encoder and error detection model using multi-tasking according to one embodiment of this application.

Before training the self-encoder based on the multivariate semantic feature extraction and the error detection model based on the mask strategy, a training data set of manual labeling needs to be constructed. In particular, the training data set may be constructed by: firstly, the words in the confusion dictionary are marked on the existing voice transcription corpus by utilizing a forward maximum matching algorithm, and possible errors collected in the confusion dictionary are retrieved. Then, a batch of training data sets are constructed through manual marking, and the parts, with errors and without errors, in the alternative error positions generated by the retrieval are identified.

Then, multi-task learning is adopted in the training process to train the self-encoder based on the multi-element semantic feature extraction and the error detection model based on the mask strategy at the same time. The multi-task learning means that a plurality of related tasks are simultaneously learned in parallel, gradients are simultaneously propagated in reverse, and the plurality of tasks mutually help learning through the shared representation of the bottom layer, so that the generalization effect is improved. In short, the multi-task learning is to put a plurality of related tasks together for learning, and in the learning process, the information related to the learned field is shared and supplemented with each other through a shallow shared representation, so that the learning is mutually promoted, and the generalization effect is improved.

In particular, during the training process, it is possible toAnd inputting each sample in the constructed training data set into a self-encoder to extract discrete semantic features for encoding, wherein the self-encoder comprises an encoder and a decoder. In addition, each sample in the constructed training dataset may be input into an error detection model to mask the labeled candidate error locations with a mask based on a masking policy, and the actual characters of the mask locations are predicted with context. Subsequently, the artificially labeled error occurrence position is used as a real label, a multi-task learning mode is adopted, the two tasks of the multi-semantic feature coding and the mask error prediction are simultaneously focused, and the loss (for example, l) of the two tasks is calculated ₁ And l ₂ ) The losses are accumulated to update the parameters of the autoencoder and the error detection model, and then the steps of the training process are repeated, and the optimization is continuously iterated until convergence. In some cases, accumulating losses refers to basing the weights assigned to tasks (e.g., task 1 has a weight of w ₁ The weight of task 2 is w ₂ ) To accumulate the losses to obtain a total loss l, where l ═ w ₁ l ₁ +w ₂ l ₂ 。

FIG. 3 shows an example flow diagram of a method 300 for speech transcription text error correction according to one embodiment of the present application. The method 300 begins at step 301, and the alternative error determination module 101 may use the speech transcribed text to be corrected as input, and determine alternative error locations in the speech transcribed text using a pre-constructed confusion dictionary. In some cases, a confusion dictionary in which the collected erroneous words are recorded as erroneous phrase-corrected phrase may be constructed in advance by collecting the history analysis data and common error-prone words, homophones, nearsighted words, and homographs. Further, the alternative error determination module 101 may utilize a forward maximum matching algorithm to mark words appearing in the confusion dictionary in the speech transcribed text to be corrected to retrieve possible errors collected in the confusion dictionary to identify alternative error locations in the speech transcribed text to be corrected.

In step 302, the semantic feature coding module 102 may utilize an encoder in a trained multi-semantic feature extraction-based self-encoder to encode the speech transcription text to be corrected to obtain a continuous feature vector. Further, the semantic feature coding module 102 may perform multi-semantic feature extraction on the speech transcription text to be corrected to obtain discrete semantic features, and map the obtained discrete semantic features into a low-dimensional continuous feature space to obtain continuous feature vectors. In some cases, the multivariate semantic feature extraction can include Chinese pinyin feature extraction, Chinese part-of-speech feature extraction, and Chinese word order feature extraction.

In step 303, the transcription error detection module 103 may input the obtained continuous feature vectors into a trained error detection model based on a masking strategy to obtain an error occurrence confidence for each of the determined candidate error locations. An error detection model based on a masking strategy can be used for masking words of the identified alternative error positions, and actual characters of the mask positions are predicted by using the context to judge the error occurrence confidence of the positions.

In step 304, the transcription error correction module 104 may correct the text of the alternative error location using a pre-constructed confusion dictionary to output the corrected phonetic transcription text when the confidence of the error occurrence of the alternative error location is above a predefined threshold.

FIG. 4 illustrates an example flow diagram of a multivariate semantic feature coding part 400 according to one embodiment of the application. As shown in fig. 4, the multivariate semantic feature extraction and encoding in step 302 in fig. 3 further comprises the steps of:

302-1: a pinyin extraction tool is used for extracting pinyin of each sentence in the text to be corrected as discrete characteristics for model learning;

302-2: extracting part-of-speech discrete characteristics to which each word in each sentence in the text to be corrected belongs by using a sequence labeling tool (part-of-speech labeling);

302-3: constructing discrete characteristics of the position of each word in each sentence in the text to be corrected; and

302-4: an autoencoder model is constructed to encode discrete semantic feature information into a continuous feature vector.

Therefore, the method and the device improve the model expression capability and the prediction accuracy by fusing the multivariate semantic features and mapping the discrete information features in the continuous feature space through the self-encoder.

FIG. 5 illustrates an example architecture diagram of a system 500 for speech transcription text correction according to one embodiment of the present application. As shown in fig. 5, the system 500 may include a memory 501 and at least one processor 502.

The memory 501 may store a trained self-encoder and a mask policy based error detection model. The memory 501 may include RAM, ROM, or a combination thereof. The memory 501 may store computer-executable instructions that, when executed by the at least one processor 502, cause the at least one processor to perform various functions described herein, including taking as input the phonetic transcription text to be corrected, determining alternative error locations in the phonetic transcription text using a pre-built confusion dictionary; encoding the voice transcription text by utilizing an encoder in a self-encoder based on the trained multi-element semantic feature extraction to obtain continuous feature vectors; inputting the obtained feature vectors into a trained error detection model based on a masking strategy to obtain an error occurrence confidence for each of the determined candidate error positions; and for each alternative error location, in response to the error occurrence confidence for that alternative error location being above a predefined threshold, correcting the text of that alternative error location using the confusion dictionary to output corrected phonetic transcription text. In some cases, memory 501 may include, among other things, a BIOS that may control basic hardware or software operations, such as interaction with peripheral components or devices. The processor 502 may include intelligent hardware devices (e.g., general-purpose processors, DSPs, CPUs, microcontrollers, ASICs, FPGAs, programmable logic devices, discrete gate or transistor logic components, discrete hardware components, or any combination thereof).

The various illustrative blocks and modules described in connection with the disclosure herein may be implemented or performed with a general purpose processor, a DSP, an ASIC, an FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices (e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration).

The functions described herein may be implemented in hardware, software executed by a processor, firmware, or any combination thereof. If implemented in software executed by a processor, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Other examples and implementations are within the scope of the disclosure and the following claims. For example, due to the nature of software, the functions described herein may be implemented using software executed by a processor, hardware, firmware, hard-wired, or any combination thereof. Features that implement functions may also be physically located at various locations, including being distributed such that portions of functions are implemented at different physical locations.

What has been described above includes examples of aspects of the claimed subject matter. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the claimed subject matter, but one of ordinary skill in the art may recognize that many further combinations and permutations of the claimed subject matter are possible. Accordingly, the disclosed subject matter is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims.

Claims

1. A method for error correction of speech transcribed text, said method comprising:

taking a voice transcription text to be corrected as an input, and determining an alternative error position in the voice transcription text by utilizing a pre-constructed confusion dictionary;

encoding the speech transcription text with an encoder in a trained multivariate semantic feature extraction based auto-encoder to obtain continuous feature vectors;

inputting the obtained feature vectors into a trained error detection model based on a masking strategy to obtain an error occurrence confidence for each of the determined candidate error positions; and

for each alternative error location, in response to the error occurrence confidence for that alternative error location being above a predefined threshold, correcting the text of that alternative error location using the confusion dictionary to output a corrected speech transcribed text.

2. The method of claim 1, wherein the confusion dictionary is pre-constructed by gathering historical analysis data and common error-prone words, homophones, nearsighted words, and homographs, wherein the gathered erroneous words are recorded as erroneous phrase-corrected phrase in the confusion dictionary.

3. The method of claim 1, wherein determining alternative error locations in the speech transcribed text using a pre-constructed confusion dictionary further comprises:

the alternative error locations in the phonetic transcription text are determined by tagging words in the confusion dictionary with a forward maximum matching algorithm to retrieve possible errors collected in the confusion dictionary.

4. The method of claim 1, wherein encoding the speech transcription text using an encoder in a trained self-encoder based on multivariate semantic feature extraction to obtain continuous feature vectors further comprises:

performing multi-element semantic feature extraction on the voice transcription text to obtain discrete semantic features, wherein the multi-element semantic feature extraction comprises Chinese pinyin feature extraction, Chinese part-of-speech feature extraction and text word position feature extraction; and

the resulting discrete semantic features are mapped into a low-dimensional continuous feature space to obtain a continuous feature vector.

5. The method of claim 1, wherein the self-encoder and the error detection model are trained using multi-tasking based on the following operations, wherein a training data set includes artificially labeled speech transcription text, wherein portions of the determined alternative error locations where errors actually exist and where no errors occur are labeled in the speech transcription text:

inputting each training sample in the training data set into the self-encoder to extract discrete semantic features for encoding;

inputting each training sample in the training data set into the error detection model to mask the alternative error locations with a mask based on a masking strategy and predicting actual characters of mask locations with context; and

and calculating and accumulating the loss of the self-encoder and the loss of the error detection model, and updating the parameters of the self-encoder and the error detection model.

6. A system for error correction of speech transcribed text, the system comprising:

an alternative error determination module configured to take as input a voice transcription text to be corrected, determine alternative error positions in the voice transcription text by using a pre-constructed confusion dictionary;

a semantic feature encoding module configured to encode the speech transcription text with an encoder of a trained multi-semantic feature extraction based auto-encoder to obtain a continuous feature vector;

a transcription error detection module configured to input the resulting feature vectors into a trained masking-strategy-based error detection model to obtain an error occurrence confidence for each of the determined alternative error locations; and

7. The system of claim 6, wherein the confusion dictionary is pre-constructed by gathering historical analysis data and common error-prone words, homophones, nearsighted words, and homographs, wherein the gathered erroneous words are recorded as erroneous phrase-corrected phrase in the confusion dictionary.

8. The system of claim 6, wherein the semantic feature encoding module is further configured to:

9. The system of claim 6, wherein the self-encoder and the error detection model are trained using multi-tasking based on the following operations, wherein a training data set includes artificially labeled speech transcription text, wherein portions of the determined alternative error locations where errors actually exist and where no errors occur are labeled in the speech transcription text:

10. A system for error correction of speech transcribed text, the system comprising:

a memory storing a trained multivariate semantic feature extraction based self-encoder and a masking policy based error detection model and computer executable instructions; and

at least one processor that, when executed, causes the at least one processor to perform the method of any one of claims 1-5.