CN116050391A

CN116050391A - Speech recognition error correction method and device based on subdivision industry error correction word list

Info

Publication number: CN116050391A
Application number: CN202211439648.XA
Authority: CN
Inventors: 马晓亮; 安玲玲; 陈茂强; 罗宇文; 杜德泉; 宋灿辉; 李鑫; 赵汝强
Original assignee: Guangzhou Yunqu Information Technology Co ltd; Guangzhou Institute of Technology of Xidian University
Current assignee: Guangzhou Yunqu Information Technology Co ltd; Guangzhou Institute of Technology of Xidian University
Priority date: 2022-11-17
Filing date: 2022-11-17
Publication date: 2023-05-02

Abstract

The application provides a speech recognition error correction method and device based on a subdivision industry error correction vocabulary, wherein the speech recognition error correction method based on the subdivision industry error correction vocabulary comprises the following steps: acquiring a conversation voice to be recognized; performing voice recognition on the call voice to be recognized to obtain a text to be recognized; preprocessing a text to be identified to obtain a continuous segmented text; acquiring a preset subdivision industry error correction word list, wherein the subdivision industry error correction word list comprises mapping relations of error words and correct words; and correcting the error of the continuous segmented text according to the error correction word list of the subdivision industry to obtain corrected text. According to the method and the device, the error correction word list of the subdivision industry is constructed and used for optimizing the general ASR speech recognition result, so that the accuracy of speech recognition can be improved.

Description

Speech recognition error correction method and device based on subdivision industry error correction word list

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a speech recognition error correction method and device based on an error correction vocabulary of a subdivision industry.

Background

Currently, the application of the general automatic speech recognition technology (ASR) is relatively wide, and some technical products are available for purchase, such as the Qinghai and the Alry ASR. However, these general ASR products are more applied to daily scenes, and it is difficult to meet the accuracy requirements of engineering application in the subdivision industry. The accuracy of speech recognition in the prior art is low.

That is, the accuracy of speech recognition in the prior art is low.

Disclosure of Invention

The application aims to provide a voice recognition error correction method and device based on an error correction vocabulary of a subdivision industry, and aims to solve the problem of low accuracy of voice recognition in the prior art.

In one aspect, the present application provides a speech recognition error correction method based on a subdivision industry error correction vocabulary, where the speech recognition error correction method based on the subdivision industry error correction vocabulary includes:

acquiring a conversation voice to be recognized;

performing voice recognition on the call voice to be recognized to obtain a text to be recognized;

preprocessing a text to be identified to obtain a continuous segmented text;

acquiring a preset subdivision industry error correction word list, wherein the subdivision industry error correction word list comprises mapping relations of error words and correct words;

and correcting the error of the continuous segmented text according to the error correction word list of the subdivision industry to obtain corrected text.

Optionally, the obtaining the preset subdivision industry error correction vocabulary includes:

performing word misprediction on a preset transcription sample text in the universal ASR transcription text set based on a preset BERT word misprediction model to obtain a predicted word misprediction;

extracting keywords from the preset transcription sample text to obtain target keywords;

judging whether the target keywords are consistent with the predicted miswords;

if the error correction result is inconsistent, acquiring a manual error correction result;

and determining an error correction word list of the subdivision industry according to the mapping relation between the error words and the correct words in the manual error correction result.

Optionally, the performing word misprediction on the preset transcription sample text in the universal ASR transcription text set based on the preset BERT word misprediction model to obtain a predicted word misprediction includes:

acquiring a preset text training set;

carrying out random masking treatment on a preset text training set based on an MLM model to obtain an MLM training set;

and training the BERT model according to the MLM training set to obtain a BERT misword prediction model.

Optionally, the performing random masking processing on the preset text training set based on the MLM model to obtain the MLM training set includes:

and randomly shielding 15% of words from each sample text in the preset text training set based on the MLM model to obtain the MLM training set.

Optionally, the extracting the keyword from the preset transcription sample text to obtain a target keyword includes:

inputting a preset transcription sample text into a preset convolution neural network to carry out one-time convolution to obtain a text abstract;

word segmentation is carried out on the text abstract, and a plurality of text word segmentation is obtained;

and determining the text word segmentation with word frequency higher than a preset value as a target keyword.

Optionally, the preset convolutional neural network comprises an input layer, a convolutional layer and a pooling layer, wherein the input layer converts an input text into a two-dimensional matrix by using WordEmbeddding; the convolution layer uses Text-CNN to perform feature extraction on the two-dimensional matrix to obtain a plurality of feature vectors; and the pooling layer pools and splices the plurality of feature vectors to obtain the text abstract.

Optionally, the determining the text word segmentation with the word frequency higher than the preset value as the target keyword includes:

rejecting the stop words in the text word segments to obtain rejected text word segments;

and determining the text word segmentation with word frequency higher than a preset value in the removed text word segmentation as a target keyword.

In one aspect, the present application provides a speech recognition error correction device based on a subdivision industry error correction vocabulary, the speech recognition error correction device based on the subdivision industry error correction vocabulary includes:

the first acquisition unit is used for acquiring the call voice to be identified;

the voice recognition unit is used for carrying out voice recognition on the call voice to be recognized to obtain a text to be recognized;

the preprocessing unit is used for preprocessing the text to be recognized to obtain a continuous segmented text;

the second acquisition unit is used for acquiring a preset subdivision industry error correction word list, wherein the subdivision industry error correction word list comprises mapping relations of error words and correct words;

and the error correction unit is used for correcting the continuous segmented text according to the subdivision industry error correction word list to obtain corrected text.

In one aspect, the present application further provides an electronic device, including:

one or more processors;

a memory; and

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the processor to implement the subdivision industry error correction vocabulary based speech recognition error correction method of any one of the first aspects.

In one aspect, the present application also provides a computer readable storage medium having stored thereon a computer program to be loaded by a processor to perform the steps in the subdivision industry error correction vocabulary based speech recognition error correction method of any one of the first aspects.

The application provides a speech recognition error correction method based on a subdivision industry error correction vocabulary, which comprises the following steps: acquiring a conversation voice to be recognized; performing voice recognition on the call voice to be recognized to obtain a text to be recognized; preprocessing a text to be identified to obtain a continuous segmented text; acquiring a preset subdivision industry error correction word list, wherein the subdivision industry error correction word list comprises mapping relations of error words and correct words; and correcting the error of the continuous segmented text according to the error correction word list of the subdivision industry to obtain corrected text. According to the method and the device, the error correction word list of the subdivision industry is constructed and used for optimizing the general ASR speech recognition result, so that the accuracy of speech recognition can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a speech recognition error correction system based on a subdivision industry error correction vocabulary according to an embodiment of the present application;

FIG. 2 is a flow chart of one embodiment of a speech recognition error correction method based on a subdivision industry error correction vocabulary provided by an embodiment of the present application;

FIG. 3 is a schematic structural diagram of one embodiment of a speech recognition error correction device based on a subdivision industry error correction vocabulary provided in an embodiment of the present application;

fig. 4 is a schematic structural diagram of an embodiment of an electronic device provided in an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

In the description of the present application, it should be understood that the terms "center," "longitudinal," "transverse," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate an orientation or positional relationship based on that shown in the drawings, merely for convenience of description and to simplify the description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be configured and operated in a particular orientation, and thus should not be construed as limiting the present application. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more features. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

In this application, the term "exemplary" is used to mean "serving as an example, instance, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments. The following description is presented to enable any person skilled in the art to make and use the application. In the following description, details are set forth for purposes of explanation. It will be apparent to one of ordinary skill in the art that the present application may be practiced without these specific details. In other instances, well-known structures and processes have not been shown in detail to avoid unnecessarily obscuring the description of the present application. Thus, the present application is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.

It should be noted that, because the method in the embodiment of the present application is executed in the electronic device, the processing objects of each electronic device exist in the form of data or information, for example, time, which is substantially time information, it can be understood that in the subsequent embodiment, if the size, the number, the position, etc. are all corresponding data, so that the electronic device processes the data, which is not described herein in detail.

The embodiment of the application provides a speech recognition error correction method and device based on an error correction vocabulary of a subdivision industry, and the method and device are respectively described in detail below.

Referring to fig. 1, fig. 1 is a schematic diagram of a scenario of a speech recognition and error correction system based on a fine-division industry error correction vocabulary according to an embodiment of the present application, where the speech recognition and error correction system based on a fine-division industry error correction vocabulary may include an electronic device 100, and a speech recognition and error correction device based on a fine-division industry error correction vocabulary is integrated in the electronic device 100, such as the electronic device in fig. 1.

In this embodiment of the present application, the electronic device 100 may be an independent server, or may be a server network or a server cluster formed by servers, for example, the electronic device 100 described in the embodiment of the present application includes, but is not limited to, a computer, a network host, a single network server, a plurality of network server sets, or a cloud server formed by a plurality of servers. Wherein the Cloud server is composed of a large number of computers or web servers based on Cloud Computing (Cloud Computing).

It will be understood by those skilled in the art that the application environment shown in fig. 1 is only one application scenario of the present application scenario, and is not limited to the application scenario of the present application scenario, and other application environments may further include more or fewer electronic devices than those shown in fig. 1, for example, only 1 electronic device is shown in fig. 1, and it will be understood that the speech recognition error correction system based on the subdivision industry error correction vocabulary may further include one or more other servers, which are not limited herein.

In addition, as shown in FIG. 1, the subdivision industry error correction vocabulary based speech recognition error correction system may also include a memory 200 for storing data.

It should be noted that, the schematic view of the speech recognition error correction system based on the subdivision industry error correction vocabulary shown in fig. 1 is only an example, and the speech recognition error correction system based on the subdivision industry error correction vocabulary and the scene described in the embodiments of the present application are for more clearly describing the technical solution of the embodiments of the present application, and do not constitute a limitation on the technical solution provided in the embodiments of the present application, and as one of ordinary skill in the art can know, along with the evolution of the speech recognition error correction system based on the subdivision industry error correction vocabulary and the appearance of a new service scene, the technical solution provided in the embodiments of the present application is equally applicable to similar technical problems.

Firstly, in the embodiment of the present application, a speech recognition error correction method based on a fine-division industry error correction vocabulary is provided, an execution subject of the speech recognition error correction method based on the fine-division industry error correction vocabulary is a speech recognition error correction device based on the fine-division industry error correction vocabulary, the speech recognition error correction device based on the fine-division industry error correction vocabulary is applied to an electronic device, and the speech recognition error correction method based on the fine-division industry error correction vocabulary includes: acquiring a conversation voice to be recognized; performing voice recognition on the call voice to be recognized to obtain a text to be recognized; preprocessing a text to be identified to obtain a continuous segmented text; acquiring a preset subdivision industry error correction word list, wherein the subdivision industry error correction word list comprises mapping relations of error words and correct words; and correcting the error of the continuous segmented text according to the error correction word list of the subdivision industry to obtain corrected text.

Referring to fig. 2, fig. 2 is a flowchart of an embodiment of a speech recognition error correction method based on a fine-segment industry error correction vocabulary according to an embodiment of the present application. The speech recognition error correction method based on the subdivision industry error correction vocabulary comprises the following steps:

s201, acquiring a conversation voice to be recognized.

In the embodiment of the application, the call voice to be recognized can be a call record between customer service and a customer.

S202, performing voice recognition on the call voice to be recognized to obtain a text to be recognized.

Specifically, speech recognition is performed on a call speech to be recognized by using an ASR (automatic speech recognition) technology (Automatic Speech Recognition), which is a technology for converting human speech into text, to obtain a text to be recognized.

S203, preprocessing the text to be recognized to obtain continuous segmented text.

S204, acquiring a preset subdivision industry error correction word list, wherein the subdivision industry error correction word list comprises mapping relations of error words and correct words.

The subdivision industry error correction vocabulary can be used for carrying out error recognition and replacement of professional words. The subdivision industry error correction word list comprises mapping relations between error words and correct words. The subdivision industry error correction vocabulary may be manually set in advance.

In a specific embodiment, the BERT misword prediction model is pre-trained for building a subdivision industry error correction vocabulary. The obtaining of the preset subdivision industry error correction vocabulary may include:

(1) Specifically, a preset text training set is obtained.

The preset text training set is a training set of manual annotation, and the number of samples in the preset text training set is smaller than that of the universal ASR transcription text set. The preset text training set comprises a plurality of sample texts, and the plurality of sample texts are obtained through ASR transcription.

(2) And carrying out random masking treatment on the preset text training set based on the MLM model to obtain the MLM training set.

Specifically, 15% of words are randomly shielded for each sample text in a preset text training set based on an MLM model, and the MLM training set is obtained.

(3) And training the BERT model according to the MLM training set to obtain a BERT misword prediction model.

The BERT is essentially used for learning a good characteristic representation for words by running a self-supervision learning method on the basis of massive corpus, thereby being used for tasks such as text classification, text generation and the like. BERT incorporates the MLM (Masked Language Model) algorithm to pretrain the bi-directional transducer to generate a deep bi-directional language representation. The model randomly replaces token in each training sequence with a [ -MASK ] tag with 15% probability, the task being to let the model predict these [ -MASK ] tagged industry specialized words omnidirectionally based on context, thereby training the parameters of the Transformer model. The self-supervision training mode enables the model to learn how to characterize vocabulary, and can be applied to downstream tasks.

The MLM (Masked Language Model) model can be understood as a complete blank filling task, namely, randomly masking words in a text according to a certain proportion (namely, replacing the words with [ MASK ] marks), so that the model carries out omnidirectional prediction according to the context of the masked words, the parameters of the prediction model are trained, and finally, the model learns to carry out characterization prediction on words. The MLM algorithm predicts the target position vocabulary by using a deep transducer model.

The BERT uses the Encoder portion of the transducer as a generative modelThe multi-layer transducer reads input data once for bidirectional learning, so that the context relation between words in the text can be learned. The Encoder adds position coding based on the input vector X to obtain a new word vector X _embedding ：

X _embedding ＝X+X _pos

Adding X _pos The purpose of (a) is to encode the position of each word in the text as:

where pos represents the position of the word in the text and i represents the dimension of the word vector.

Each layer of the Encoder consists of a multi-head self-attention mechanism and a fully-connected feedforward neural network, wherein the attention mechanism has a calculation formula as follows:

the query content query and the key content to be noted are respectively converted into matrix representations Q and K, through QK ^T Performing dot multiplication operation on the two matrixes to calculate the similarity between the query content query and the content key needing attention, and introducing

To prevent that too large dot products result in a smaller gradient after softmax. Then, the multi-head self-attention mechanism is obtained by projecting Q, K and V through h linear transforms and splicing a plurality of attention values. Let W be _i As a linear mapping function, i.e. [1, h are known]The multi-headed attentiveness mechanism may be expressed as:

MultiHead(Q，K，V)＝Concat(head ₁ ，...，head _h )W ^o

wherein, the liquid crystal display device comprises a liquid crystal display device,

d _model is the hidden layer dimension of the output and is also equal to the dimension of the word vector. The input and output of the previous layer are connected with residual connection after the attention mechanism and the hidden layer is normalized using LayerNormalization. The feedforward neural network is to perform two-layer linear mapping on the hidden layer, and then obtain the generated vector representation through an activation function.

The BERT is a pre-trained BERT, the pre-trained BERT is a universal language model with good effect, the BERT is finely tuned by using data of call center subdivision industry, and the model after fine tuning can be used for false word recognition. Randomly selecting words from a text transcribed by a universal ASR (i.e. replacing original words with [ Mask ] marks), predicting the original words at the [ Mask ] mark positions by utilizing the trimmed BERT, comparing the prediction result with the word segmentation result of the text abstract, marking the words with consistent comparison result as correct, manually correcting the words with inconsistent comparison result, and recording error correction word pairs in an error correction word list of the subdivision industry according to the corresponding relation of error words and correct words

In this embodiment of the present application, obtaining a preset error correction vocabulary of a subdivision industry may include:

(1) And carrying out misword prediction on a preset transcription sample text in the universal ASR transcription text set based on a preset BERT misword prediction model to obtain a predicted misword.

The universal ASR transcription text set comprises a plurality of transcription sample texts, and the number of samples in the preset text training set is smaller than that of the universal ASR transcription text set.

(2) Extracting keywords from a preset transcription sample text to obtain target keywords;

(3) Judging whether the target keywords are consistent with the predicted miswords;

(4) If the error correction result is inconsistent, acquiring a manual error correction result;

(5) And determining an error correction word list of the subdivision industry according to the mapping relation between the error words and the correct words in the manual error correction result.

In a specific embodiment, extracting keywords from a preset transcription sample text to obtain target keywords includes:

(1) Inputting a preset transcription sample text into a preset convolution neural network to carry out one-time convolution, so as to obtain a text abstract.

Specifically, the preset convolutional neural network comprises an input layer, a convolutional layer and a pooling layer, wherein the input layer converts an input text into a two-dimensional matrix by using Word coding; the convolution layer uses Text-CNN to perform feature extraction on the two-dimensional matrix to obtain a plurality of feature vectors; and the pooling layer pools and splices the plurality of feature vectors to obtain the text abstract.

Firstly, we use the universal ASR to transfer the audio into text, and obtain the preset transfer sample text. Then, a Convolution Neural Network (CNN) is utilized to carry out convolution once, and important features are extracted from the initial preset transcription sample text to form a text abstract. Finally, word segmentation is carried out on the abstract of the preset transcription sample text by using a word segmentation tool, stop words are removed, and meanwhile, the words are ordered according to word frequency, so that target keywords of the audio are obtained, and the target keywords can be used as correct labels in the professional word error correction task of the subdivision industry.

The convolutional neural network can reduce the dimension of mass data of the image, extract important features from the mass data, realize automatic recognition of the image by a machine, is widely used in the field of computer vision at present, and can be also used for feature extraction of text data. The preset convolutional neural network mainly comprises three layers: the first layer is an input layer, and Word Embedding is used for converting an input text into a two-dimensional matrix, so that convolution operation is facilitated; the second layer is a convolution layer, and the Text-CNN is used for extracting the characteristics of the word vectors, so that the convolution operation not only considers word senses, but also synthesizes word sequence and word context information; the third layer is a pooling layer, which is used for reducing the dimension of the high-dimension feature data, extracting the maximum value of each feature vector to represent the feature, and considering that the maximum value represents the most important feature. And finally, splicing the values of each feature vector pool to obtain a final feature vector, namely a text abstract.

Specifically, word Embedding is used to convert one-dimensional text into a high-dimensional Word vector representation, with each row of the Word vector representing a Word in the text. Assuming that there are n words in the text, each Word is converted into a vector representation in k dimensions, the Word vector matrix output by the Word encoding layer has the shape n×k, and the text with length n can be expressed as:

wherein the method comprises the steps of

Representing a concatenation operation on the word vector. Let x _i：i+h-1 Representing x in a window of length h _i To x _i+h-1 Is used with a filter +.>

Convolving the window of length h to obtain feature c _i ：/>

c _i ＝f(w·x _i：i+h-1 +b)

Wherein the method comprises the steps of

Representing the bias term, f is a nonlinear activation function. Using filters for all windows { x } in the text _1：h ，x _2：h+1 ，…，x _n-h+1：n Convolution operation to obtain a feature map:

c＝[c ₁ ，c ₂ ，...，c _n-h+1 ]

wherein the method comprises the steps of

Then, the text feature vector obtained through the pooling operation is +.>

The text abstract.

(2) Word segmentation is carried out on the text abstract, and a plurality of text word segmentation is obtained;

(3) And determining the text word segmentation with word frequency higher than a preset value as a target keyword.

Specifically, determining the text word segmentation with word frequency higher than a preset value as a target keyword comprises the following steps: rejecting the stop words in the text word segments to obtain rejected text word segments; and determining the text word segmentation with word frequency higher than a preset value in the removed text word segmentation as a target keyword. The stop words may be preset. The preset value may be set according to the specific case.

S205, correcting the continuous segmented text according to the correction word list of the subdivision industry to obtain corrected text.

Specifically, the error word in the continuous segmented text is replaced by the correct word, and the text after error correction is obtained.

In order to better implement the voice recognition error correction method based on the fine division industry error correction vocabulary in the embodiment of the present application, on the basis of the voice recognition error correction method based on the fine division industry error correction vocabulary, a voice recognition error correction device based on the fine division industry error correction vocabulary is further provided in the embodiment of the present application, as shown in fig. 3, fig. 3 is a schematic structural diagram of one embodiment of the voice recognition error correction device based on the fine division industry error correction vocabulary provided in the embodiment of the present application, where the voice recognition error correction device 400 based on the fine division industry error correction vocabulary includes:

a first obtaining unit 401, configured to obtain a call voice to be recognized;

a voice recognition unit 402, configured to perform voice recognition on a call voice to be recognized, so as to obtain a text to be recognized;

a preprocessing unit 403, configured to preprocess a text to be identified to obtain a continuous segmented text;

a second obtaining unit 404, configured to obtain a preset subdivision industry error correction vocabulary, where the subdivision industry error correction vocabulary includes a mapping relationship between an error word and a correct word;

and the error correction unit 405 is configured to perform error correction on the continuous segmented text according to the segmentation industry error correction vocabulary, so as to obtain an error corrected text.

The embodiment of the application also provides electronic equipment, which integrates any voice recognition error correction device based on the subdivision industry error correction vocabulary. As shown in fig. 4, a schematic structural diagram of an electronic device according to an embodiment of the present application is shown, specifically:

the electronic device may include one or more processing cores 'processors 501, one or more computer-readable storage media's memory 502, a power supply 503, and an input unit 504, among other components. It will be appreciated by those skilled in the art that the electronic device structure shown in the figures is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:

the processor 501 is a control center of the electronic device, and connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 502, and calling data stored in the memory 502, thereby performing overall monitoring of the electronic device. Optionally, processor 501 may include one or more processing cores; preferably, the processor 501 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 501.

The memory 502 may be used to store software programs and modules, and the processor 501 executes various functional applications and data processing by executing the software programs and modules stored in the memory 502. The memory 502 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 502 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 502 may also include a memory controller to provide access to the memory 502 by the processor 501.

The electronic device further comprises a power supply 503 for powering the various components, preferably the power supply 503 is logically connected to the processor 501 via a power management system, whereby the functions of managing charging, discharging, and power consumption are performed by the power management system. The power supply 503 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

The electronic device may further comprise an input unit 504, which input unit 504 may be used for receiving input digital or character information and for generating keyboard, mouse, joystick, optical or trackball signal inputs in connection with user settings and function control.

Although not shown, the electronic device may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processor 501 in the electronic device loads executable files corresponding to the processes of one or more application programs into the memory 502 according to the following instructions, and the processor 501 executes the application programs stored in the memory 502, so as to implement various functions as follows:

acquiring a conversation voice to be recognized; performing voice recognition on the call voice to be recognized to obtain a text to be recognized; preprocessing a text to be identified to obtain a continuous segmented text; acquiring a preset subdivision industry error correction word list, wherein the subdivision industry error correction word list comprises mapping relations of error words and correct words; and correcting the error of the continuous segmented text according to the error correction word list of the subdivision industry to obtain corrected text.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.

To this end, embodiments of the present application provide a computer readable storage medium, which may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like. On which a computer program is stored, the computer program being loaded by a processor to perform the steps of any of the speech recognition error correction methods based on the subdivision industry error correction vocabulary provided by the embodiments of the present application. For example, the loading of the computer program by the processor may perform the steps of:

In the foregoing embodiments, the descriptions of the embodiments are focused on, and the portions of one embodiment that are not described in detail in the foregoing embodiments may be referred to in the foregoing detailed description of other embodiments, which are not described herein again.

In the implementation, each unit or structure may be implemented as an independent entity, or may be implemented as the same entity or several entities in any combination, and the implementation of each unit or structure may be referred to the foregoing method embodiments and will not be repeated herein.

The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.

The foregoing describes in detail a speech recognition error correction method and apparatus based on a fine-division industry error correction vocabulary provided in the embodiments of the present application, and specific examples are applied to illustrate the principles and embodiments of the present application, where the foregoing description of the embodiments is only for helping to understand the method and core ideas of the present application; meanwhile, as those skilled in the art will vary in the specific embodiments and application scope according to the ideas of the present application, the contents of the present specification should not be construed as limiting the present application in summary.

Claims

1. The speech recognition error correction method based on the subdivision industry error correction vocabulary is characterized by comprising the following steps of:

acquiring a conversation voice to be recognized;

preprocessing a text to be identified to obtain a continuous segmented text;

2. The voice recognition error correction method based on the subdivision industry error correction vocabulary according to claim 1, wherein the obtaining the preset subdivision industry error correction vocabulary comprises:

judging whether the target keywords are consistent with the predicted miswords;

3. The speech recognition error correction method based on the subdivision industry error correction vocabulary according to claim 2, wherein the error word prediction is performed on the preset transcription sample text in the universal ASR transcription text set based on the preset BERT error word prediction model, so as to obtain a predicted error word, and before the step of:

acquiring a preset text training set;

4. The speech recognition error correction method based on the subdivision industry error correction vocabulary according to claim 3, wherein the performing random masking processing on the preset text training set based on the MLM model to obtain the MLM training set includes:

5. The voice recognition error correction method based on the subdivision industry error correction vocabulary according to claim 2, wherein the keyword extraction is performed on the preset transcription sample text to obtain a target keyword, and the method comprises the following steps:

6. The voice recognition error correction method based on the subdivision industry error correction vocabulary of claim 5, wherein the preset convolutional neural network comprises an input layer, a convolutional layer and a pooling layer, and the input layer converts an input text into a two-dimensional matrix by using Word coding; the convolution layer uses Text-CNN to perform feature extraction on the two-dimensional matrix to obtain a plurality of feature vectors; and the pooling layer pools and splices the plurality of feature vectors to obtain the text abstract.

7. The speech recognition error correction method based on the fine-segment industry error correction vocabulary according to claim 5, wherein the determining the text segmentation with the word frequency higher than the preset value as the target keyword comprises:

8. The utility model provides a speech recognition error correction device based on segmentation trade error correction vocabulary which characterized in that, speech recognition error correction device based on segmentation trade error correction vocabulary includes:

9. An electronic device, the electronic device comprising:

one or more processors;

a memory; and

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the processor to implement the subdivision industry error correction vocabulary based speech recognition error correction method of any one of claims 1-7.

10. A computer readable storage medium, having stored thereon a computer program, the computer program being loaded by a processor to perform the steps of the subdivision industry error correction vocabulary based speech recognition error correction method of any one of claims 1 to 7.