CN116050391A - Speech recognition error correction method and device based on subdivision industry error correction word list - Google Patents

Speech recognition error correction method and device based on subdivision industry error correction word list Download PDF

Info

Publication number
CN116050391A
CN116050391A CN202211439648.XA CN202211439648A CN116050391A CN 116050391 A CN116050391 A CN 116050391A CN 202211439648 A CN202211439648 A CN 202211439648A CN 116050391 A CN116050391 A CN 116050391A
Authority
CN
China
Prior art keywords
error correction
text
word
preset
subdivision industry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211439648.XA
Other languages
Chinese (zh)
Inventor
马晓亮
安玲玲
陈茂强
罗宇文
杜德泉
宋灿辉
李鑫
赵汝强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Yunqu Information Technology Co ltd
Guangzhou Institute of Technology of Xidian University
Original Assignee
Guangzhou Yunqu Information Technology Co ltd
Guangzhou Institute of Technology of Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Yunqu Information Technology Co ltd, Guangzhou Institute of Technology of Xidian University filed Critical Guangzhou Yunqu Information Technology Co ltd
Priority to CN202211439648.XA priority Critical patent/CN116050391A/en
Publication of CN116050391A publication Critical patent/CN116050391A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The application provides a speech recognition error correction method and device based on a subdivision industry error correction vocabulary, wherein the speech recognition error correction method based on the subdivision industry error correction vocabulary comprises the following steps: acquiring a conversation voice to be recognized; performing voice recognition on the call voice to be recognized to obtain a text to be recognized; preprocessing a text to be identified to obtain a continuous segmented text; acquiring a preset subdivision industry error correction word list, wherein the subdivision industry error correction word list comprises mapping relations of error words and correct words; and correcting the error of the continuous segmented text according to the error correction word list of the subdivision industry to obtain corrected text. According to the method and the device, the error correction word list of the subdivision industry is constructed and used for optimizing the general ASR speech recognition result, so that the accuracy of speech recognition can be improved.

Description

Speech recognition error correction method and device based on subdivision industry error correction word list
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a speech recognition error correction method and device based on an error correction vocabulary of a subdivision industry.
Background
Currently, the application of the general automatic speech recognition technology (ASR) is relatively wide, and some technical products are available for purchase, such as the Qinghai and the Alry ASR. However, these general ASR products are more applied to daily scenes, and it is difficult to meet the accuracy requirements of engineering application in the subdivision industry. The accuracy of speech recognition in the prior art is low.
That is, the accuracy of speech recognition in the prior art is low.
Disclosure of Invention
The application aims to provide a voice recognition error correction method and device based on an error correction vocabulary of a subdivision industry, and aims to solve the problem of low accuracy of voice recognition in the prior art.
In one aspect, the present application provides a speech recognition error correction method based on a subdivision industry error correction vocabulary, where the speech recognition error correction method based on the subdivision industry error correction vocabulary includes:
acquiring a conversation voice to be recognized;
performing voice recognition on the call voice to be recognized to obtain a text to be recognized;
preprocessing a text to be identified to obtain a continuous segmented text;
acquiring a preset subdivision industry error correction word list, wherein the subdivision industry error correction word list comprises mapping relations of error words and correct words;
and correcting the error of the continuous segmented text according to the error correction word list of the subdivision industry to obtain corrected text.
Optionally, the obtaining the preset subdivision industry error correction vocabulary includes:
performing word misprediction on a preset transcription sample text in the universal ASR transcription text set based on a preset BERT word misprediction model to obtain a predicted word misprediction;
extracting keywords from the preset transcription sample text to obtain target keywords;
judging whether the target keywords are consistent with the predicted miswords;
if the error correction result is inconsistent, acquiring a manual error correction result;
and determining an error correction word list of the subdivision industry according to the mapping relation between the error words and the correct words in the manual error correction result.
Optionally, the performing word misprediction on the preset transcription sample text in the universal ASR transcription text set based on the preset BERT word misprediction model to obtain a predicted word misprediction includes:
acquiring a preset text training set;
carrying out random masking treatment on a preset text training set based on an MLM model to obtain an MLM training set;
and training the BERT model according to the MLM training set to obtain a BERT misword prediction model.
Optionally, the performing random masking processing on the preset text training set based on the MLM model to obtain the MLM training set includes:
and randomly shielding 15% of words from each sample text in the preset text training set based on the MLM model to obtain the MLM training set.
Optionally, the extracting the keyword from the preset transcription sample text to obtain a target keyword includes:
inputting a preset transcription sample text into a preset convolution neural network to carry out one-time convolution to obtain a text abstract;
word segmentation is carried out on the text abstract, and a plurality of text word segmentation is obtained;
and determining the text word segmentation with word frequency higher than a preset value as a target keyword.
Optionally, the preset convolutional neural network comprises an input layer, a convolutional layer and a pooling layer, wherein the input layer converts an input text into a two-dimensional matrix by using WordEmbeddding; the convolution layer uses Text-CNN to perform feature extraction on the two-dimensional matrix to obtain a plurality of feature vectors; and the pooling layer pools and splices the plurality of feature vectors to obtain the text abstract.
Optionally, the determining the text word segmentation with the word frequency higher than the preset value as the target keyword includes:
rejecting the stop words in the text word segments to obtain rejected text word segments;
and determining the text word segmentation with word frequency higher than a preset value in the removed text word segmentation as a target keyword.
In one aspect, the present application provides a speech recognition error correction device based on a subdivision industry error correction vocabulary, the speech recognition error correction device based on the subdivision industry error correction vocabulary includes:
the first acquisition unit is used for acquiring the call voice to be identified;
the voice recognition unit is used for carrying out voice recognition on the call voice to be recognized to obtain a text to be recognized;
the preprocessing unit is used for preprocessing the text to be recognized to obtain a continuous segmented text;
the second acquisition unit is used for acquiring a preset subdivision industry error correction word list, wherein the subdivision industry error correction word list comprises mapping relations of error words and correct words;
and the error correction unit is used for correcting the continuous segmented text according to the subdivision industry error correction word list to obtain corrected text.
In one aspect, the present application further provides an electronic device, including:
one or more processors;
a memory; and
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the processor to implement the subdivision industry error correction vocabulary based speech recognition error correction method of any one of the first aspects.
In one aspect, the present application also provides a computer readable storage medium having stored thereon a computer program to be loaded by a processor to perform the steps in the subdivision industry error correction vocabulary based speech recognition error correction method of any one of the first aspects.
The application provides a speech recognition error correction method based on a subdivision industry error correction vocabulary, which comprises the following steps: acquiring a conversation voice to be recognized; performing voice recognition on the call voice to be recognized to obtain a text to be recognized; preprocessing a text to be identified to obtain a continuous segmented text; acquiring a preset subdivision industry error correction word list, wherein the subdivision industry error correction word list comprises mapping relations of error words and correct words; and correcting the error of the continuous segmented text according to the error correction word list of the subdivision industry to obtain corrected text. According to the method and the device, the error correction word list of the subdivision industry is constructed and used for optimizing the general ASR speech recognition result, so that the accuracy of speech recognition can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of a speech recognition error correction system based on a subdivision industry error correction vocabulary according to an embodiment of the present application;
FIG. 2 is a flow chart of one embodiment of a speech recognition error correction method based on a subdivision industry error correction vocabulary provided by an embodiment of the present application;
FIG. 3 is a schematic structural diagram of one embodiment of a speech recognition error correction device based on a subdivision industry error correction vocabulary provided in an embodiment of the present application;
fig. 4 is a schematic structural diagram of an embodiment of an electronic device provided in an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
In the description of the present application, it should be understood that the terms "center," "longitudinal," "transverse," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate an orientation or positional relationship based on that shown in the drawings, merely for convenience of description and to simplify the description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be configured and operated in a particular orientation, and thus should not be construed as limiting the present application. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more features. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
In this application, the term "exemplary" is used to mean "serving as an example, instance, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments. The following description is presented to enable any person skilled in the art to make and use the application. In the following description, details are set forth for purposes of explanation. It will be apparent to one of ordinary skill in the art that the present application may be practiced without these specific details. In other instances, well-known structures and processes have not been shown in detail to avoid unnecessarily obscuring the description of the present application. Thus, the present application is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
It should be noted that, because the method in the embodiment of the present application is executed in the electronic device, the processing objects of each electronic device exist in the form of data or information, for example, time, which is substantially time information, it can be understood that in the subsequent embodiment, if the size, the number, the position, etc. are all corresponding data, so that the electronic device processes the data, which is not described herein in detail.
The embodiment of the application provides a speech recognition error correction method and device based on an error correction vocabulary of a subdivision industry, and the method and device are respectively described in detail below.
Referring to fig. 1, fig. 1 is a schematic diagram of a scenario of a speech recognition and error correction system based on a fine-division industry error correction vocabulary according to an embodiment of the present application, where the speech recognition and error correction system based on a fine-division industry error correction vocabulary may include an electronic device 100, and a speech recognition and error correction device based on a fine-division industry error correction vocabulary is integrated in the electronic device 100, such as the electronic device in fig. 1.
In this embodiment of the present application, the electronic device 100 may be an independent server, or may be a server network or a server cluster formed by servers, for example, the electronic device 100 described in the embodiment of the present application includes, but is not limited to, a computer, a network host, a single network server, a plurality of network server sets, or a cloud server formed by a plurality of servers. Wherein the Cloud server is composed of a large number of computers or web servers based on Cloud Computing (Cloud Computing).
It will be understood by those skilled in the art that the application environment shown in fig. 1 is only one application scenario of the present application scenario, and is not limited to the application scenario of the present application scenario, and other application environments may further include more or fewer electronic devices than those shown in fig. 1, for example, only 1 electronic device is shown in fig. 1, and it will be understood that the speech recognition error correction system based on the subdivision industry error correction vocabulary may further include one or more other servers, which are not limited herein.
In addition, as shown in FIG. 1, the subdivision industry error correction vocabulary based speech recognition error correction system may also include a memory 200 for storing data.
It should be noted that, the schematic view of the speech recognition error correction system based on the subdivision industry error correction vocabulary shown in fig. 1 is only an example, and the speech recognition error correction system based on the subdivision industry error correction vocabulary and the scene described in the embodiments of the present application are for more clearly describing the technical solution of the embodiments of the present application, and do not constitute a limitation on the technical solution provided in the embodiments of the present application, and as one of ordinary skill in the art can know, along with the evolution of the speech recognition error correction system based on the subdivision industry error correction vocabulary and the appearance of a new service scene, the technical solution provided in the embodiments of the present application is equally applicable to similar technical problems.
Firstly, in the embodiment of the present application, a speech recognition error correction method based on a fine-division industry error correction vocabulary is provided, an execution subject of the speech recognition error correction method based on the fine-division industry error correction vocabulary is a speech recognition error correction device based on the fine-division industry error correction vocabulary, the speech recognition error correction device based on the fine-division industry error correction vocabulary is applied to an electronic device, and the speech recognition error correction method based on the fine-division industry error correction vocabulary includes: acquiring a conversation voice to be recognized; performing voice recognition on the call voice to be recognized to obtain a text to be recognized; preprocessing a text to be identified to obtain a continuous segmented text; acquiring a preset subdivision industry error correction word list, wherein the subdivision industry error correction word list comprises mapping relations of error words and correct words; and correcting the error of the continuous segmented text according to the error correction word list of the subdivision industry to obtain corrected text.
Referring to fig. 2, fig. 2 is a flowchart of an embodiment of a speech recognition error correction method based on a fine-segment industry error correction vocabulary according to an embodiment of the present application. The speech recognition error correction method based on the subdivision industry error correction vocabulary comprises the following steps:
s201, acquiring a conversation voice to be recognized.
In the embodiment of the application, the call voice to be recognized can be a call record between customer service and a customer.
S202, performing voice recognition on the call voice to be recognized to obtain a text to be recognized.
Specifically, speech recognition is performed on a call speech to be recognized by using an ASR (automatic speech recognition) technology (Automatic Speech Recognition), which is a technology for converting human speech into text, to obtain a text to be recognized.
S203, preprocessing the text to be recognized to obtain continuous segmented text.
S204, acquiring a preset subdivision industry error correction word list, wherein the subdivision industry error correction word list comprises mapping relations of error words and correct words.
The subdivision industry error correction vocabulary can be used for carrying out error recognition and replacement of professional words. The subdivision industry error correction word list comprises mapping relations between error words and correct words. The subdivision industry error correction vocabulary may be manually set in advance.
In a specific embodiment, the BERT misword prediction model is pre-trained for building a subdivision industry error correction vocabulary. The obtaining of the preset subdivision industry error correction vocabulary may include:
(1) Specifically, a preset text training set is obtained.
The preset text training set is a training set of manual annotation, and the number of samples in the preset text training set is smaller than that of the universal ASR transcription text set. The preset text training set comprises a plurality of sample texts, and the plurality of sample texts are obtained through ASR transcription.
(2) And carrying out random masking treatment on the preset text training set based on the MLM model to obtain the MLM training set.
Specifically, 15% of words are randomly shielded for each sample text in a preset text training set based on an MLM model, and the MLM training set is obtained.
(3) And training the BERT model according to the MLM training set to obtain a BERT misword prediction model.
The BERT is essentially used for learning a good characteristic representation for words by running a self-supervision learning method on the basis of massive corpus, thereby being used for tasks such as text classification, text generation and the like. BERT incorporates the MLM (Masked Language Model) algorithm to pretrain the bi-directional transducer to generate a deep bi-directional language representation. The model randomly replaces token in each training sequence with a [ -MASK ] tag with 15% probability, the task being to let the model predict these [ -MASK ] tagged industry specialized words omnidirectionally based on context, thereby training the parameters of the Transformer model. The self-supervision training mode enables the model to learn how to characterize vocabulary, and can be applied to downstream tasks.
The MLM (Masked Language Model) model can be understood as a complete blank filling task, namely, randomly masking words in a text according to a certain proportion (namely, replacing the words with [ MASK ] marks), so that the model carries out omnidirectional prediction according to the context of the masked words, the parameters of the prediction model are trained, and finally, the model learns to carry out characterization prediction on words. The MLM algorithm predicts the target position vocabulary by using a deep transducer model.
The BERT uses the Encoder portion of the transducer as a generative modelThe multi-layer transducer reads input data once for bidirectional learning, so that the context relation between words in the text can be learned. The Encoder adds position coding based on the input vector X to obtain a new word vector X embedding
X embedding =X+X pos
Adding X pos The purpose of (a) is to encode the position of each word in the text as:
Figure BDA0003947825620000111
Figure BDA0003947825620000112
where pos represents the position of the word in the text and i represents the dimension of the word vector.
Each layer of the Encoder consists of a multi-head self-attention mechanism and a fully-connected feedforward neural network, wherein the attention mechanism has a calculation formula as follows:
Figure BDA0003947825620000113
the query content query and the key content to be noted are respectively converted into matrix representations Q and K, through QK T Performing dot multiplication operation on the two matrixes to calculate the similarity between the query content query and the content key needing attention, and introducing
Figure BDA0003947825620000114
To prevent that too large dot products result in a smaller gradient after softmax. Then, the multi-head self-attention mechanism is obtained by projecting Q, K and V through h linear transforms and splicing a plurality of attention values. Let W be i As a linear mapping function, i.e. [1, h are known]The multi-headed attentiveness mechanism may be expressed as:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W o
Figure BDA0003947825620000115
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0003947825620000116
d model is the hidden layer dimension of the output and is also equal to the dimension of the word vector. The input and output of the previous layer are connected with residual connection after the attention mechanism and the hidden layer is normalized using LayerNormalization. The feedforward neural network is to perform two-layer linear mapping on the hidden layer, and then obtain the generated vector representation through an activation function.
The BERT is a pre-trained BERT, the pre-trained BERT is a universal language model with good effect, the BERT is finely tuned by using data of call center subdivision industry, and the model after fine tuning can be used for false word recognition. Randomly selecting words from a text transcribed by a universal ASR (i.e. replacing original words with [ Mask ] marks), predicting the original words at the [ Mask ] mark positions by utilizing the trimmed BERT, comparing the prediction result with the word segmentation result of the text abstract, marking the words with consistent comparison result as correct, manually correcting the words with inconsistent comparison result, and recording error correction word pairs in an error correction word list of the subdivision industry according to the corresponding relation of error words and correct words
In this embodiment of the present application, obtaining a preset error correction vocabulary of a subdivision industry may include:
(1) And carrying out misword prediction on a preset transcription sample text in the universal ASR transcription text set based on a preset BERT misword prediction model to obtain a predicted misword.
The universal ASR transcription text set comprises a plurality of transcription sample texts, and the number of samples in the preset text training set is smaller than that of the universal ASR transcription text set.
(2) Extracting keywords from a preset transcription sample text to obtain target keywords;
(3) Judging whether the target keywords are consistent with the predicted miswords;
(4) If the error correction result is inconsistent, acquiring a manual error correction result;
(5) And determining an error correction word list of the subdivision industry according to the mapping relation between the error words and the correct words in the manual error correction result.
In a specific embodiment, extracting keywords from a preset transcription sample text to obtain target keywords includes:
(1) Inputting a preset transcription sample text into a preset convolution neural network to carry out one-time convolution, so as to obtain a text abstract.
Specifically, the preset convolutional neural network comprises an input layer, a convolutional layer and a pooling layer, wherein the input layer converts an input text into a two-dimensional matrix by using Word coding; the convolution layer uses Text-CNN to perform feature extraction on the two-dimensional matrix to obtain a plurality of feature vectors; and the pooling layer pools and splices the plurality of feature vectors to obtain the text abstract.
Firstly, we use the universal ASR to transfer the audio into text, and obtain the preset transfer sample text. Then, a Convolution Neural Network (CNN) is utilized to carry out convolution once, and important features are extracted from the initial preset transcription sample text to form a text abstract. Finally, word segmentation is carried out on the abstract of the preset transcription sample text by using a word segmentation tool, stop words are removed, and meanwhile, the words are ordered according to word frequency, so that target keywords of the audio are obtained, and the target keywords can be used as correct labels in the professional word error correction task of the subdivision industry.
The convolutional neural network can reduce the dimension of mass data of the image, extract important features from the mass data, realize automatic recognition of the image by a machine, is widely used in the field of computer vision at present, and can be also used for feature extraction of text data. The preset convolutional neural network mainly comprises three layers: the first layer is an input layer, and Word Embedding is used for converting an input text into a two-dimensional matrix, so that convolution operation is facilitated; the second layer is a convolution layer, and the Text-CNN is used for extracting the characteristics of the word vectors, so that the convolution operation not only considers word senses, but also synthesizes word sequence and word context information; the third layer is a pooling layer, which is used for reducing the dimension of the high-dimension feature data, extracting the maximum value of each feature vector to represent the feature, and considering that the maximum value represents the most important feature. And finally, splicing the values of each feature vector pool to obtain a final feature vector, namely a text abstract.
Specifically, word Embedding is used to convert one-dimensional text into a high-dimensional Word vector representation, with each row of the Word vector representing a Word in the text. Assuming that there are n words in the text, each Word is converted into a vector representation in k dimensions, the Word vector matrix output by the Word encoding layer has the shape n×k, and the text with length n can be expressed as:
Figure BDA0003947825620000141
wherein the method comprises the steps of
Figure BDA0003947825620000142
Representing a concatenation operation on the word vector. Let x i:i+h-1 Representing x in a window of length h i To x i+h-1 Is used with a filter +.>
Figure BDA0003947825620000143
Convolving the window of length h to obtain feature c i :/>
c i =f(w·x i:i+h-1 +b)
Wherein the method comprises the steps of
Figure BDA0003947825620000144
Representing the bias term, f is a nonlinear activation function. Using filters for all windows { x } in the text 1:h ,x 2:h+1 ,…,x n-h+1:n Convolution operation to obtain a feature map:
c=[c 1 ,c 2 ,...,c n-h+1 ]
wherein the method comprises the steps of
Figure BDA0003947825620000145
Then, the text feature vector obtained through the pooling operation is +.>
Figure BDA0003947825620000146
The text abstract.
(2) Word segmentation is carried out on the text abstract, and a plurality of text word segmentation is obtained;
(3) And determining the text word segmentation with word frequency higher than a preset value as a target keyword.
Specifically, determining the text word segmentation with word frequency higher than a preset value as a target keyword comprises the following steps: rejecting the stop words in the text word segments to obtain rejected text word segments; and determining the text word segmentation with word frequency higher than a preset value in the removed text word segmentation as a target keyword. The stop words may be preset. The preset value may be set according to the specific case.
S205, correcting the continuous segmented text according to the correction word list of the subdivision industry to obtain corrected text.
Specifically, the error word in the continuous segmented text is replaced by the correct word, and the text after error correction is obtained.
In order to better implement the voice recognition error correction method based on the fine division industry error correction vocabulary in the embodiment of the present application, on the basis of the voice recognition error correction method based on the fine division industry error correction vocabulary, a voice recognition error correction device based on the fine division industry error correction vocabulary is further provided in the embodiment of the present application, as shown in fig. 3, fig. 3 is a schematic structural diagram of one embodiment of the voice recognition error correction device based on the fine division industry error correction vocabulary provided in the embodiment of the present application, where the voice recognition error correction device 400 based on the fine division industry error correction vocabulary includes:
a first obtaining unit 401, configured to obtain a call voice to be recognized;
a voice recognition unit 402, configured to perform voice recognition on a call voice to be recognized, so as to obtain a text to be recognized;
a preprocessing unit 403, configured to preprocess a text to be identified to obtain a continuous segmented text;
a second obtaining unit 404, configured to obtain a preset subdivision industry error correction vocabulary, where the subdivision industry error correction vocabulary includes a mapping relationship between an error word and a correct word;
and the error correction unit 405 is configured to perform error correction on the continuous segmented text according to the segmentation industry error correction vocabulary, so as to obtain an error corrected text.
The embodiment of the application also provides electronic equipment, which integrates any voice recognition error correction device based on the subdivision industry error correction vocabulary. As shown in fig. 4, a schematic structural diagram of an electronic device according to an embodiment of the present application is shown, specifically:
the electronic device may include one or more processing cores 'processors 501, one or more computer-readable storage media's memory 502, a power supply 503, and an input unit 504, among other components. It will be appreciated by those skilled in the art that the electronic device structure shown in the figures is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:
the processor 501 is a control center of the electronic device, and connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 502, and calling data stored in the memory 502, thereby performing overall monitoring of the electronic device. Optionally, processor 501 may include one or more processing cores; preferably, the processor 501 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 501.
The memory 502 may be used to store software programs and modules, and the processor 501 executes various functional applications and data processing by executing the software programs and modules stored in the memory 502. The memory 502 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 502 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 502 may also include a memory controller to provide access to the memory 502 by the processor 501.
The electronic device further comprises a power supply 503 for powering the various components, preferably the power supply 503 is logically connected to the processor 501 via a power management system, whereby the functions of managing charging, discharging, and power consumption are performed by the power management system. The power supply 503 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
The electronic device may further comprise an input unit 504, which input unit 504 may be used for receiving input digital or character information and for generating keyboard, mouse, joystick, optical or trackball signal inputs in connection with user settings and function control.
Although not shown, the electronic device may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processor 501 in the electronic device loads executable files corresponding to the processes of one or more application programs into the memory 502 according to the following instructions, and the processor 501 executes the application programs stored in the memory 502, so as to implement various functions as follows:
acquiring a conversation voice to be recognized; performing voice recognition on the call voice to be recognized to obtain a text to be recognized; preprocessing a text to be identified to obtain a continuous segmented text; acquiring a preset subdivision industry error correction word list, wherein the subdivision industry error correction word list comprises mapping relations of error words and correct words; and correcting the error of the continuous segmented text according to the error correction word list of the subdivision industry to obtain corrected text.
Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.
To this end, embodiments of the present application provide a computer readable storage medium, which may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like. On which a computer program is stored, the computer program being loaded by a processor to perform the steps of any of the speech recognition error correction methods based on the subdivision industry error correction vocabulary provided by the embodiments of the present application. For example, the loading of the computer program by the processor may perform the steps of:
acquiring a conversation voice to be recognized; performing voice recognition on the call voice to be recognized to obtain a text to be recognized; preprocessing a text to be identified to obtain a continuous segmented text; acquiring a preset subdivision industry error correction word list, wherein the subdivision industry error correction word list comprises mapping relations of error words and correct words; and correcting the error of the continuous segmented text according to the error correction word list of the subdivision industry to obtain corrected text.
In the foregoing embodiments, the descriptions of the embodiments are focused on, and the portions of one embodiment that are not described in detail in the foregoing embodiments may be referred to in the foregoing detailed description of other embodiments, which are not described herein again.
In the implementation, each unit or structure may be implemented as an independent entity, or may be implemented as the same entity or several entities in any combination, and the implementation of each unit or structure may be referred to the foregoing method embodiments and will not be repeated herein.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
The foregoing describes in detail a speech recognition error correction method and apparatus based on a fine-division industry error correction vocabulary provided in the embodiments of the present application, and specific examples are applied to illustrate the principles and embodiments of the present application, where the foregoing description of the embodiments is only for helping to understand the method and core ideas of the present application; meanwhile, as those skilled in the art will vary in the specific embodiments and application scope according to the ideas of the present application, the contents of the present specification should not be construed as limiting the present application in summary.

Claims (10)

1. The speech recognition error correction method based on the subdivision industry error correction vocabulary is characterized by comprising the following steps of:
acquiring a conversation voice to be recognized;
performing voice recognition on the call voice to be recognized to obtain a text to be recognized;
preprocessing a text to be identified to obtain a continuous segmented text;
acquiring a preset subdivision industry error correction word list, wherein the subdivision industry error correction word list comprises mapping relations of error words and correct words;
and correcting the error of the continuous segmented text according to the error correction word list of the subdivision industry to obtain corrected text.
2. The voice recognition error correction method based on the subdivision industry error correction vocabulary according to claim 1, wherein the obtaining the preset subdivision industry error correction vocabulary comprises:
performing word misprediction on a preset transcription sample text in the universal ASR transcription text set based on a preset BERT word misprediction model to obtain a predicted word misprediction;
extracting keywords from the preset transcription sample text to obtain target keywords;
judging whether the target keywords are consistent with the predicted miswords;
if the error correction result is inconsistent, acquiring a manual error correction result;
and determining an error correction word list of the subdivision industry according to the mapping relation between the error words and the correct words in the manual error correction result.
3. The speech recognition error correction method based on the subdivision industry error correction vocabulary according to claim 2, wherein the error word prediction is performed on the preset transcription sample text in the universal ASR transcription text set based on the preset BERT error word prediction model, so as to obtain a predicted error word, and before the step of:
acquiring a preset text training set;
carrying out random masking treatment on a preset text training set based on an MLM model to obtain an MLM training set;
and training the BERT model according to the MLM training set to obtain a BERT misword prediction model.
4. The speech recognition error correction method based on the subdivision industry error correction vocabulary according to claim 3, wherein the performing random masking processing on the preset text training set based on the MLM model to obtain the MLM training set includes:
and randomly shielding 15% of words from each sample text in the preset text training set based on the MLM model to obtain the MLM training set.
5. The voice recognition error correction method based on the subdivision industry error correction vocabulary according to claim 2, wherein the keyword extraction is performed on the preset transcription sample text to obtain a target keyword, and the method comprises the following steps:
inputting a preset transcription sample text into a preset convolution neural network to carry out one-time convolution to obtain a text abstract;
word segmentation is carried out on the text abstract, and a plurality of text word segmentation is obtained;
and determining the text word segmentation with word frequency higher than a preset value as a target keyword.
6. The voice recognition error correction method based on the subdivision industry error correction vocabulary of claim 5, wherein the preset convolutional neural network comprises an input layer, a convolutional layer and a pooling layer, and the input layer converts an input text into a two-dimensional matrix by using Word coding; the convolution layer uses Text-CNN to perform feature extraction on the two-dimensional matrix to obtain a plurality of feature vectors; and the pooling layer pools and splices the plurality of feature vectors to obtain the text abstract.
7. The speech recognition error correction method based on the fine-segment industry error correction vocabulary according to claim 5, wherein the determining the text segmentation with the word frequency higher than the preset value as the target keyword comprises:
rejecting the stop words in the text word segments to obtain rejected text word segments;
and determining the text word segmentation with word frequency higher than a preset value in the removed text word segmentation as a target keyword.
8. The utility model provides a speech recognition error correction device based on segmentation trade error correction vocabulary which characterized in that, speech recognition error correction device based on segmentation trade error correction vocabulary includes:
the first acquisition unit is used for acquiring the call voice to be identified;
the voice recognition unit is used for carrying out voice recognition on the call voice to be recognized to obtain a text to be recognized;
the preprocessing unit is used for preprocessing the text to be recognized to obtain a continuous segmented text;
the second acquisition unit is used for acquiring a preset subdivision industry error correction word list, wherein the subdivision industry error correction word list comprises mapping relations of error words and correct words;
and the error correction unit is used for correcting the continuous segmented text according to the subdivision industry error correction word list to obtain corrected text.
9. An electronic device, the electronic device comprising:
one or more processors;
a memory; and
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the processor to implement the subdivision industry error correction vocabulary based speech recognition error correction method of any one of claims 1-7.
10. A computer readable storage medium, having stored thereon a computer program, the computer program being loaded by a processor to perform the steps of the subdivision industry error correction vocabulary based speech recognition error correction method of any one of claims 1 to 7.
CN202211439648.XA 2022-11-17 2022-11-17 Speech recognition error correction method and device based on subdivision industry error correction word list Pending CN116050391A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211439648.XA CN116050391A (en) 2022-11-17 2022-11-17 Speech recognition error correction method and device based on subdivision industry error correction word list

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211439648.XA CN116050391A (en) 2022-11-17 2022-11-17 Speech recognition error correction method and device based on subdivision industry error correction word list

Publications (1)

Publication Number Publication Date
CN116050391A true CN116050391A (en) 2023-05-02

Family

ID=86131985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211439648.XA Pending CN116050391A (en) 2022-11-17 2022-11-17 Speech recognition error correction method and device based on subdivision industry error correction word list

Country Status (1)

Country Link
CN (1) CN116050391A (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105244029A (en) * 2015-08-28 2016-01-13 科大讯飞股份有限公司 Voice recognition post-processing method and system
CN107045496A (en) * 2017-04-19 2017-08-15 畅捷通信息技术股份有限公司 The error correction method and error correction device of text after speech recognition
CN110135414A (en) * 2019-05-16 2019-08-16 京北方信息技术股份有限公司 Corpus update method, device, storage medium and terminal
CN110738042A (en) * 2019-09-12 2020-01-31 腾讯音乐娱乐科技(深圳)有限公司 Error correction dictionary creating method, device, terminal and computer storage medium
CN111161707A (en) * 2020-02-12 2020-05-15 龙马智芯(珠海横琴)科技有限公司 Method for automatically supplementing quality inspection keyword list, electronic equipment and storage medium
CN111814455A (en) * 2020-06-29 2020-10-23 平安国际智慧城市科技股份有限公司 Search term error correction pair construction method, terminal and storage medium
CN112489655A (en) * 2020-11-18 2021-03-12 元梦人文智能国际有限公司 Method, system and storage medium for correcting error of speech recognition text in specific field
WO2021189803A1 (en) * 2020-09-03 2021-09-30 平安科技(深圳)有限公司 Text error correction method and apparatus, electronic device, and storage medium
CN114580382A (en) * 2022-02-11 2022-06-03 阿里巴巴(中国)有限公司 Text error correction method and device
KR20220075807A (en) * 2020-11-30 2022-06-08 부산대학교 산학협력단 System and Method for correcting Context sensitive spelling error using Generative Adversarial Network
CN114639386A (en) * 2022-02-11 2022-06-17 阿里巴巴(中国)有限公司 Text error correction and text error correction word bank construction method
CN114818668A (en) * 2022-04-26 2022-07-29 北京中科智加科技有限公司 Method and device for correcting personal name of voice transcribed text and computer equipment
CN115186665A (en) * 2022-09-15 2022-10-14 北京智谱华章科技有限公司 Semantic-based unsupervised academic keyword extraction method and equipment

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105244029A (en) * 2015-08-28 2016-01-13 科大讯飞股份有限公司 Voice recognition post-processing method and system
CN107045496A (en) * 2017-04-19 2017-08-15 畅捷通信息技术股份有限公司 The error correction method and error correction device of text after speech recognition
CN110135414A (en) * 2019-05-16 2019-08-16 京北方信息技术股份有限公司 Corpus update method, device, storage medium and terminal
CN110738042A (en) * 2019-09-12 2020-01-31 腾讯音乐娱乐科技(深圳)有限公司 Error correction dictionary creating method, device, terminal and computer storage medium
CN111161707A (en) * 2020-02-12 2020-05-15 龙马智芯(珠海横琴)科技有限公司 Method for automatically supplementing quality inspection keyword list, electronic equipment and storage medium
CN111814455A (en) * 2020-06-29 2020-10-23 平安国际智慧城市科技股份有限公司 Search term error correction pair construction method, terminal and storage medium
WO2021189803A1 (en) * 2020-09-03 2021-09-30 平安科技(深圳)有限公司 Text error correction method and apparatus, electronic device, and storage medium
CN112489655A (en) * 2020-11-18 2021-03-12 元梦人文智能国际有限公司 Method, system and storage medium for correcting error of speech recognition text in specific field
KR20220075807A (en) * 2020-11-30 2022-06-08 부산대학교 산학협력단 System and Method for correcting Context sensitive spelling error using Generative Adversarial Network
CN114580382A (en) * 2022-02-11 2022-06-03 阿里巴巴(中国)有限公司 Text error correction method and device
CN114639386A (en) * 2022-02-11 2022-06-17 阿里巴巴(中国)有限公司 Text error correction and text error correction word bank construction method
CN114818668A (en) * 2022-04-26 2022-07-29 北京中科智加科技有限公司 Method and device for correcting personal name of voice transcribed text and computer equipment
CN115186665A (en) * 2022-09-15 2022-10-14 北京智谱华章科技有限公司 Semantic-based unsupervised academic keyword extraction method and equipment

Similar Documents

Publication Publication Date Title
CN110609891B (en) Visual dialog generation method based on context awareness graph neural network
CN107066464B (en) Semantic natural language vector space
CN110580292B (en) Text label generation method, device and computer readable storage medium
CN110321563B (en) Text emotion analysis method based on hybrid supervision model
CN110069709B (en) Intention recognition method, device, computer readable medium and electronic equipment
CN111966812B (en) Automatic question answering method based on dynamic word vector and storage medium
CN110990555B (en) End-to-end retrieval type dialogue method and system and computer equipment
CN111680484B (en) Answer model generation method and system for visual general knowledge reasoning question and answer
CN116304745B (en) Text topic matching method and system based on deep semantic information
US20230298630A1 (en) Apparatuses and methods for selectively inserting text into a video resume
CN115545041A (en) Model construction method and system for enhancing semantic vector representation of medical statement
CN110867225A (en) Character-level clinical concept extraction named entity recognition method and system
CN113705207A (en) Grammar error recognition method and device
CN110705274B (en) Fusion type word meaning embedding method based on real-time learning
CN112307179A (en) Text matching method, device, equipment and storage medium
Ermatita et al. Sentiment Analysis of COVID-19 using Multimodal Fusion Neural Networks.
CN113095072A (en) Text processing method and device
CN115759119A (en) Financial text emotion analysis method, system, medium and equipment
CN116050391A (en) Speech recognition error correction method and device based on subdivision industry error correction word list
WO2023173554A1 (en) Inappropriate agent language identification method and apparatus, electronic device and storage medium
CN115017260A (en) Keyword generation method based on subtopic modeling
CN114595324A (en) Method, device, terminal and non-transitory storage medium for power grid service data domain division
CN116415624A (en) Model training method and device, and content recommendation method and device
CN112270185A (en) Text representation method based on topic model
CN113343666B (en) Method, device, equipment and storage medium for determining confidence of score

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination