CN114639386A

CN114639386A - Text error correction and text error correction word bank construction method

Info

Publication number: CN114639386A
Application number: CN202210129723.6A
Authority: CN
Inventors: 曹迪; 邓憧
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-02-11
Filing date: 2022-02-11
Publication date: 2022-06-17

Abstract

The embodiment of the specification provides a text error correction and text error correction word bank construction method, wherein the text error correction method comprises the following steps: receiving a text editing message of a current conversation sent by a client, analyzing the text editing message, obtaining an error correction word pair generated under the current conversation, obtaining a real-time voice recognition text, correcting the real-time voice recognition text by using the error correction word pair to obtain a corrected text, and sending the corrected text to the client for displaying. Because the error correction word pair is generated when the user edits the text at the client, the error correction word pair conforms to the actual editing requirement of the user when the subsequently received real-time speech recognition text is corrected, customized text error correction is realized, and the user can quickly correct the error in the subsequent speech recognition by one-time text editing, so that the efficiency and the accuracy of the error correction of the speech recognition text are improved.

Description

Text error correction and text error correction word bank construction method

Technical Field

The embodiment of the specification relates to the technical field of computers, in particular to a text error correction and text error correction word bank construction method.

Background

The speech recognition technology takes speech as a research object, and enables a machine to automatically recognize and understand human spoken language through speech signal processing and pattern recognition. Speech recognition technology is a technology that allows a machine to convert speech data into corresponding text through a recognition and understanding process.

However, because the speech recognition technology is affected by subjective or objective factors such as accent and spoken habit, the recognition is often inaccurate, and therefore, it is necessary to correct the error of the speech recognition text, that is, correct the error in the speech recognition text, so as to obtain a more accurate speech recognition text. Text error correction is therefore a crucial branch of speech recognition technology. There is a need to provide a more accurate text correction scheme.

Disclosure of Invention

In view of this, the embodiments of the present specification provide a text error correction method. One or more embodiments of the present disclosure also relate to a method for constructing a text error correction lexicon, a text error correction apparatus, a text error correction lexicon construction apparatus, a computing device, a computer-readable storage medium, and a computer program, so as to solve technical defects in the prior art.

According to a first aspect of embodiments herein, there is provided a text correction method, including:

receiving a text editing message of a current session sent by a client;

analyzing the text editing message to obtain an error correction word pair generated under the current conversation;

acquiring a real-time voice recognition text, and correcting the real-time voice recognition text by using an error correction word pair to obtain an error correction text;

and sending the error correction text to the client for display.

According to a second aspect of the embodiments of the present specification, there is provided a method for constructing a text correction lexicon, including:

acquiring a text editing message of a current session;

and storing the error correction word pairs into a text error correction word bank of the current conversation, wherein the text error correction word bank is used for correcting errors of the text to be corrected.

According to a third aspect of embodiments herein, there is provided a text correction apparatus including:

the receiving module is configured to receive a text editing message of a current conversation sent by a client;

the first analysis module is configured to analyze the text editing message to obtain an error correction word pair generated under the current conversation;

the error correction module is configured to obtain a real-time voice recognition text, and correct the real-time voice recognition text by using an error correction word pair to obtain a corrected text;

and the sending module is configured to send the corrected text to the client for display.

According to a fourth aspect of the embodiments of the present specification, there is provided a text error correction lexicon constructing apparatus, including:

the acquisition module is configured to acquire a text editing message of a current session;

the second analysis module is configured to analyze the text editing message to obtain an error correction word pair generated under the current conversation;

and the storage module is configured to store the error correction word pairs into a text error correction word bank of the current conversation, wherein the text error correction word bank is used for correcting errors of the text to be corrected.

According to a fifth aspect of embodiments herein, there is provided a computing device comprising:

a memory and a processor;

the memory is used for storing computer-executable instructions, and the processor is used for executing the computer-executable instructions, and the computer-executable instructions are executed by the processor to realize the steps of the text error correction method or the steps of the text error correction word bank construction method.

According to a sixth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the text correction method or the steps of the text correction lexicon construction method described above.

According to a seventh aspect of embodiments herein, there is provided a computer program, wherein when the computer program is executed in a computer, the computer program causes the computer to execute the steps of the above-described text error correction method or the steps of the text error correction lexicon construction method.

In an embodiment of the present specification, a text editing message of a current conversation sent by a client is received, the text editing message is analyzed to obtain an error correction word pair generated under the current conversation, a real-time speech recognition text is obtained, the error correction word pair is used to correct the real-time speech recognition text to obtain a corrected text, and the corrected text is sent to the client for display. The user edits the text at the client, correspondingly receives the text editing message of the current conversation sent by the client, the text editing message represents the behavior of the user for editing the error words in the text, the error correction word pair generated under the current conversation, namely the error correction word pair generated by the user editing behavior can be obtained by analyzing the text editing message, the error correction word pair can be utilized to correct the real-time voice recognition text aiming at the next obtained real-time voice recognition text, and the corrected text obtained by error correction is sent to the client for display, because the error correction word pair is generated when the user edits the text at the client, the actual editing requirement of the user is met when the subsequently received real-time voice recognition text is corrected, the customized text error correction is realized, and the user can quickly correct the error in the subsequent voice recognition after one-time text editing, the efficiency and accuracy of correcting the speech recognition text are improved.

Drawings

FIG. 1 is a flow chart of a text correction method provided in one embodiment of the present specification;

FIG. 2 is an architecture diagram of a semantic error correction model for error correction replacement in a text error correction method according to an embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating a processing procedure of a text error correction method according to an embodiment of the present disclosure;

fig. 4 is a flowchart of a method for constructing a text error correction lexicon according to an embodiment of the present specification;

fig. 5 is a schematic structural diagram of a text error correction apparatus according to an embodiment of the present specification;

fig. 6 is a schematic structural diagram of a text error correction lexicon constructing apparatus according to an embodiment of the present specification;

fig. 7 is a block diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if," as used herein, may be interpreted as "at … …" or "when … …" or "in response to a determination," depending on the context.

First, the noun terms to which one or more embodiments of the present specification relate are explained.

The voice recognition technology comprises the following steps: also known as Automatic Speech Recognition (ASR), is the ability of smart devices to understand human Speech. The goal is to convert the lexical content in human speech into computer readable input such as keystrokes, binary codes or character sequences. The speech recognition technology relates to the interdisciplinary science of digital signal processing, artificial intelligence, linguistics, mathematical statistics, acoustics, emotionality, psychology and the like. The technology can provide a plurality of applications such as automatic customer service, automatic voice translation, command control, voice verification code, and the like. In recent years, with the rise of artificial intelligence, speech recognition technology makes a breakthrough in both theory and application, starts to go from the laboratory to the market, and gradually enters our daily life. Speech recognition is now used in many areas, mainly including speech recognition dictation machines, speech paging and answering platforms, autonomous advertising platforms, intelligent customer service, etc.

Text error correction technology: the method is a technology for detecting whether a section of characters has wrongly written characters or not and correcting the wrongly written characters in the field of natural language processing, is generally used in a text preprocessing stage, and can remarkably solve the problem of inaccurate ASR in scenes such as intelligent customer service and the like.

BERT (bidirectional Encoder retrieval from transformations): is a language representation model. The main model structure is formed by stacking coding layers (encoders) of a translation model (transformer), and the model is a 2-stage framework, is pre-trained respectively, and is finely adjusted on each specific task. The pre-training stage requires a large amount of data and a large amount of computer resources, and can generally acquire an open-source language pre-training model and then directly perform fine tuning on the basis.

RoBERTA: an improved BERT model.

Mask Language Model (MLM, Masked Language Model): in effect, it is a complete task to randomly mask out certain words in the text and then predict the masked words by the model.

A multi-stage decision problem: in real life, there is a kind of activity process, and due to its particularity, the process can be divided into several interrelated stages, and in each of which a decision is needed to be made, so that the whole process can achieve the ideal activity effect. Therefore, the decision selection of each stage cannot be determined at will, and depends on the current state, and influences the later development. When the decision of each stage is determined, a decision sequence is formed, so that an active route of the whole process is determined, the problem is regarded as a multi-stage process with a chain structure in tandem, namely a multi-stage decision process, and the problem is called a multi-stage decision problem.

Dynamic Programming (DP): in a multi-stage decision problem, decisions taken at each stage are generally time-dependent, the decisions depend on the current state and immediately cause state transition, and a decision sequence is generated in a changed state and has a meaning of dynamic.

In the present specification, a text error correction method is provided, and the present specification also relates to a text error correction lexicon construction method, a text error correction apparatus, a text error correction lexicon construction apparatus, a computing device, and a computer-readable storage medium, which are described in detail one by one in the following embodiments.

The execution main body of the text error correction method provided in the embodiments of the present specification may be a computing device providing a text error correction function, or may also be a computing device providing a voice recognition function, such as a real-time conference device, a voice question and answer device, a personal computer, or the like. The text error correction method provided by the embodiments of the present specification may be implemented by at least one of software, hardware circuit, and logic circuit provided in the execution main body.

Referring to fig. 1, fig. 1 shows a flowchart of a text error correction method provided according to an embodiment of the present specification, which specifically includes the following steps.

Step 102: and receiving a text editing message of the current conversation sent by the client.

In this specification, the client refers to a front end providing a text editing interface for a user, and may be an application program, a web page, or the like, and the client may also be a front end device providing a voice recognition function, such as a real-time conference device, a recording device, a voice question and answer device, or the like.

In computer technology, and particularly in web applications, a session refers to the time interval between a user interacting with a computer/network, and generally refers to the time elapsed between a user logging into a computer/network and logging out. In this specification, the text editing message of the current session refers to a text editing message input by a user currently logged in the client in real time at the client.

In practical applications, for example, in a real-time conference scenario, the client may display a speech recognition text generated after speech recognition is performed on speech data in real time, a user logging in the client may view the speech recognition text in real time, the client provides a text editing function for the user, and if the user finds that an error exists in the speech recognition text, the user may edit the text manually to modify the content of the error. The user can directly edit the voice recognition text; the client may also provide a text editing area, such as a text editing box, on the interactive interface, when the user has a text editing requirement, the text segment to be edited (may be a segment or a sentence in the text of the speech device) is selected, and the text segment is displayed in the text editing box, so that the user can edit the text segment displayed in the text editing box in a targeted manner, where the text editing box may be displayed in a pop-up window form or in a fixed position form, and is not specifically limited herein.

The text editing message refers to a message generated when a user edits a voice recognition text, and mainly includes information before and after a text editing action occurs, for example, identification information of the text segment to be edited (for example, the text segment of the second paragraph, the second sentence, and the second line of the voice recognition text), identification information of the voice recognition text (for example, the name and creation time of a voice device text), an edited word pair (for example, a "hunt-related" is changed to a "policy", and the word pair can be carried with the "hunt-related" policy "), an original text segment before editing, an edited target text segment, and the like.

When a user edits a text, there is a possibility that a word is edited word by word, and a next word is input after a word is input, and this may cause a text editing operation recognized as independent, but actually, a word is edited and should belong to one text editing operation.

In order to deal with the situation, the client may monitor the operation of text editing performed by the user, and if the interval time of the multiple edits is too short (for example, less than 3 seconds), the operation is recognized as a text editing operation, and the text editing messages of the multiple edits are counted and uniformly sent. That is to say, when the client recognizes that a user edits a text (that is, the user triggers a text editing operation on the client interface), the client counts text editing information generated by the user within a preset interval time, and then sends the text editing information counted within the preset interval time to the execution main body of the embodiment of the present specification in the form of a message.

In an implementation manner of the embodiment of the present specification, step 102 may be specifically implemented by: receiving a plurality of text editing messages sent by a client within a preset interval time; and integrating a plurality of text editing messages to obtain the text editing message of the current conversation.

In order to solve the problem of text editing operation identification, in another implementation manner of this specification, the client may further send the generated text editing message to the execution main body in the embodiment of the specification when recognizing that the user edits the text, and if a plurality of text editing messages sent by the client are received within a preset interval time, the plurality of text editing messages are identified as being generated by one text editing operation, so that the plurality of text editing messages are integrated, and the specific integration content is that all information carried by the plurality of text editing messages is extracted and then packaged in one message, so that the text editing message of the current session is obtained.

By setting the preset interval time, receiving a plurality of text editing messages sent by the client within the preset interval time and then integrating the plurality of text editing messages, the problem that one text editing operation may be identified as multiple independent text editing operations can be effectively solved, the accuracy of text editing operation identification is ensured, and the efficiency of text error correction is further improved.

Step 104: and analyzing the text editing message to obtain an error correction word pair generated under the current conversation.

After receiving the text editing message of the current conversation sent by the client, the text editing message can be analyzed, and the error correction word pair generated under the current conversation can be obtained through analysis. In one case, the client can directly identify and record words edited by the user, and the text editing message can directly carry error correction word pairs; in another case, the client identifies a text editing operation of the user, and identifies a text segment edited by the user, the text editing message may include an original text segment before editing and a target text segment after editing.

In one implementation of the embodiments of the present specification, a text editing message may include an original text segment before editing and a target text segment after editing. Accordingly, step 104 may be specifically implemented as follows:

performing text alignment and word segmentation processing on the original text segment and the target text segment to obtain a first word segmentation sequence of the original text segment and a second word segmentation sequence of the target text segment;

and determining error correction word pairs based on the first word segmentation sequence and the second word segmentation sequence.

For the situation that the text editing message comprises an original text segment before editing and a target text segment after editing, the original text segment and the target text segment can be extracted from the text editing message, and then text alignment and word segmentation processing are performed on the original text segment and the target text segment. The execution sequence of the text alignment processing and the word segmentation processing is not limited, the original text fragment and the target text fragment can be subjected to the text alignment processing firstly, and after the original text fragment and the target text fragment are subjected to the text alignment processing, the word segmentation processing is carried out on the original text fragment and the target text fragment after the text alignment, so that a first word segmentation sequence of the original text fragment and a second word segmentation sequence of the target text fragment are obtained; the original text segment and the target text segment may be subjected to word segmentation, and then the original text segment and the target text segment are subjected to word segmentation, and then text alignment is performed on the original text segment and the target text segment after word segmentation, so as to obtain a first word segmentation sequence of the original text segment and a second word segmentation sequence of the target text segment. The text alignment processing is to convert the two text segments into character sequences, and then to make one-to-one correspondence to character elements in the character sequences; when performing word segmentation processing, word segmentation may be performed by using a word segmentation tool (for example, a word segmentation tool for a Chinese character 'ji'), or word segmentation may be performed according to the part of speech, the length, etc. of each word.

After the first word segmentation sequence and the second word segmentation sequence are obtained, an error correction word pair can be determined based on the first word segmentation sequence and the second word segmentation sequence, specifically, corresponding words in the first word segmentation sequence and the second word segmentation sequence can be compared, and the error correction word pair is further determined.

Preferably, in determining the error-correcting word pairs based on the first word segmentation sequence and the second word segmentation sequence, a DP-based edit-distance (edge-distance) algorithm may be used, where the edit-distance algorithm refers to the minimum number of single-character editing operations required for converting from one word segmentation sequence s1 to the other word segmentation sequence s2 between the two word segmentation sequences s1 and s2, where the single-character editing operations may include insertion, deletion, and replacement, for any character in the word segmentation sequence s1, the number of operations required for converting the character into any character in the word segmentation sequence s2 through the single-character editing operations is counted, then a matrix may be generated based on the counted number, where the abscissa of the matrix is the number of characters in the word segmentation sequence s1, the ordinate of the matrix is the number of characters in the word segmentation sequence s2, the number of times of interconversions of the characters in the matrix is, and the scores of the elements in the matrix are calculated, and if the difference between the characters at the corresponding positions is larger, combining a backtracking operation after the matrix is obtained, namely combining the scores of the adjacent elements, and deducing an error correction word pair.

For example, the original text fragment before editing is "a hunt because of a traditional trend like some CTAs", the user edits the text of the original text fragment to obtain the edited target text fragment "a traditional trend strategy like some CTAs", and after performing text alignment and word segmentation on the original text fragment and the target text fragment, a first word segmentation sequence of the original text fragment and a second word segmentation sequence of the target text fragment are obtained:

[ because/traditional/image/some/CTA/one/trend/one/hunting ]

[ because/tradition/image/some/CTA//. star/trend/. star/strategy ]

Based on the first word segmentation sequence and the second word segmentation sequence, an error correction word pair can be obtained: hunt-strategy.

The method comprises the steps of carrying out text alignment and word segmentation on an original text segment and a target text segment to obtain a first word segmentation sequence of the original text segment and a second word segmentation sequence of the target text segment, and then determining an error correction word pair based on the first word segmentation sequence and the second word segmentation sequence, so that the error correction word pair can be accurately determined.

In one implementation manner of the embodiment of the present specification, before step 104, the text error correction method may further include the following steps:

acquiring a first text length of an original text fragment before editing and a second text length of a target text fragment after editing;

and calculating the text length difference according to the first text length and the second text length.

Accordingly, step 104 may be performed in case the text length difference does not exceed a preset threshold.

When a user edits a text of a speech recognition text, the user generally modifies a wrong word, and if the word is modified in a large space, for example, a word is changed into a large text, or a large content is changed into a word, the words do not belong to the category of wrong modification, and the text editing message of the category does not need to be analyzed. Therefore, in order to avoid such inefficient and time-consuming redundant parsing of large-space modifications, in an embodiment of the present specification, a first text length of an original text segment before editing and a second text length of an edited target text segment may be obtained, and then a difference between the first text length and the second text scene may be calculated to obtain a text length difference. If the text length difference does not exceed the preset threshold, that is, the text length difference is modified in a small range, step 104 may be executed to parse the text editing message; if the text length difference exceeds the predetermined threshold, i.e. the text is modified at a large length, step 104 is not performed, i.e. the subsequent text error correction process is not performed.

Step 106: and acquiring a real-time voice recognition text, and correcting the real-time voice recognition text by using the error correction word pair to obtain a corrected text.

After the text editing information is analyzed to obtain the error correction word pair generated under the current conversation, a real-time voice recognition text can be obtained, the real-time voice recognition text is the voice recognition text generated in real time after the text editing is performed by the user, for example, in a conference scene, the user edits the voice recognition text generated in the 5 th minute of the conference to obtain the error correction word pair, and then the error correction word pair can be directly utilized to correct the real-time voice recognition text for the next generated real-time voice recognition text under the same conference scene.

In one implementation manner of the embodiment of the present specification, after the step 104, the text error correction method may further include the following steps: and storing the error correction word pairs into a text error correction word bank of the current conversation.

Accordingly, step 106 may be specifically implemented as follows: and reading an error correction word pair from the text error correction word bank, and correcting the real-time voice recognition text by using the error correction word pair to obtain a corrected text.

In a specific implementation, in order to adapt to a text editing operation of a user, in a specific scenario, the user often edits a speech recognition text more than once, a text error correction lexicon corresponding to the user, that is, a text error correction lexicon of a current conversation is created, after a text editing message is analyzed to obtain an error correction word pair generated under the current conversation, the error correction word pair is stored in the text error correction lexicon, and the error correction word pair generated under the current conversation is maintained by the text error correction lexicon of the current conversation. The text error correction word bank is used as a semi-persistent storage unit and can only store error correction word pairs generated in the login stage of a user, so that the error correction word pairs can be read from the text error correction word bank of the current conversation during the login period of the user, and the real-time voice recognition text is corrected by using the read error correction word pairs to obtain a corrected text, thereby achieving the customized error correction effect for a specific user. The text error correction word bank can store the edited target text segment while storing the error correction word pairs.

In one implementation of an embodiment of the present specification, a text editing message includes a user identifier;

after the step of storing the error correction word pairs in the text error correction word bank of the current conversation, the text error correction method may further include the steps of:

under the condition that a preset trigger condition is reached, extracting a target error correction word pair from a text error correction word bank;

and storing the target error correction word pair into a user database corresponding to the user identifier.

The embodiment of the present specification may further provide a function of persistently storing the error correction word pair, that is, a user database corresponding to the user may be created, and when a preset trigger condition is reached, for example, the current conference is ended, the conference duration reaches a certain duration, and the like, the target error correction word pair may be extracted from the text error correction word library, and then the target error correction word pair is stored in the corresponding user database. When the user logs in next time, the error correction word pair generated by text editing before can be read from the persistent target database, and efficient and customized text error correction can be realized in the voice recognition task of this time. The target database can store the edited target text segment while storing the error correction word pair.

Specifically, the manner of extracting the target error correction word pair from the text error correction word library may be to extract all error correction word pairs generated by the current speech recognition task from the text error correction word library as target error correction word pairs and store them in the user database, or to extract an error correction word pair generated by the current speech recognition task from the text error correction word library and partially meeting a preset condition as a target error correction word pair and store them in the user database, where the preset condition may be that the number of times of replacement of the word pair in the current speech recognition task reaches a certain number of times.

In an implementation manner of the embodiment of the present specification, the text error correction method may further include the following steps:

acquiring historical error correction word pairs from a text error correction word bank or a user database;

carrying out conflict analysis on the error correction word pair generated under the current conversation and the historical error correction word pair;

and if the conflict exists, deleting the historical error correction word pair.

In a specific speech recognition task scene, when the same user appears in different text segments for the same sentence, the modified content may be different, the previously edited and generated historical error correction word pair is stored in a text error correction word bank or a user database, the historical error correction word pair may be obtained from the text error correction word bank or the user database, then the error correction word pair generated under the current conversation and the historical error correction word pair are subjected to conflict analysis, and a specific conflict analysis process is that for the same target original word, candidate words are respectively extracted from the error correction word pair and the historical error correction word pair which contain the target original word and are generated under the current conversation, and whether the two candidate words are the same or have an included relationship is judged. If the two candidate words are the same or have a contained relation, the error correction word pair generated under the current conversation and the historical error correction word pair are considered to have no conflict and can be reserved; if the two candidate words are completely different, the conflict exists between the error correction word pair generated under the current conversation and the historical error correction word pair, and the historical error correction word pair needs to be deleted, namely the historical error correction word pair is deleted from the text error correction word bank or the user database. Of course, since the two text edits belong to conflicting behaviors, the error correction word pair generated in the current session may be discarded in the case of a conflict.

The obtaining of the real-time speech recognition text in step 106 may specifically be: and acquiring real-time voice data, and identifying the real-time voice data to obtain a real-time voice identification text. That is, the execution body of the embodiment of the present specification can provide a speech recognition function, receive real-time speech data, and then recognize the real-time speech data by using ASR technology to obtain a real-time speech recognition text. Of course, the real-time speech recognition text in the embodiment of the present disclosure may also be sent by other speech recognition devices, and is not limited specifically here.

The real-time voice recognition text is corrected by using the error correction word pair, namely, an error original word in the real-time voice recognition text is determined by using a word matching mode according to a target original word in the error correction word pair, and then the error original word is replaced by using a candidate word in the error correction word pair, so that the real-time voice recognition text is corrected. The direct replacement mode can reduce the energy consumption generated during word replacement.

In an implementation manner of the embodiment of the present specification, the error correction word pair includes a target original word and a candidate word; the step of correcting the real-time speech recognition text by using the corrected word pair to obtain the corrected text can be specifically realized by the following mode:

acquiring an edited target text segment corresponding to the error correction word pair;

performing mask processing on target original words in the real-time voice recognition text according to the error correction word pair to obtain a target processing text;

inputting a target processing text and a target text fragment into a pre-trained semantic error correction model to obtain a first probability that a mask position in the target processing text is a target original word and a second probability that the mask position is a candidate word;

and under the condition that the first probability and the second probability reach a preset replacement condition, replacing the target original word in the real-time voice recognition text with the candidate word to obtain a corrected text.

In some special cases, the direct replacement method can reduce the energy consumption generated during word replacement, but does not necessarily conform to the actual scene, for example, the user modifies "hunt involved" at a certain position in the speech recognition text into "strategy", but a text segment "employee a is involved widely" appears in the subsequent speech recognition text, wherein "hunt involved" does not belong to wrong words and should not be corrected. Therefore, in one implementation of the embodiments of the present specification, to address this problem, text error correction is performed in conjunction with a pre-trained semantic error correction model.

Specifically, the edited target text segment corresponding to the error correction word pair (which may be specifically obtained from a text error correction word bank) is obtained, then, according to the error correction word pair, mask processing is performed on the target original words in the real-time speech recognition text, where the mask processing is to hide the target original words in the real-time speech recognition text to obtain a target processed text, so that the real-time speech recognition text is converted into a similar complete space filling (cloze) problem. The method comprises the steps of inputting a target processing text and a target text fragment into a pre-trained semantic error correction model, processing a cloze problem on the target processing text by the semantic error correction model, and obtaining a first probability that a mask position in the target processing text is a target original word and a second probability that the mask position in the target processing text is a candidate word, wherein the pre-trained semantic error correction model can be an MLM model. And under the condition that the first probability and the second probability reach a preset replacement condition, replacing a target original word in the real-time voice recognition text with a candidate word to obtain a corrected text, wherein the preset replacement condition can be that the first probability is smaller than a first preset threshold and the second probability is larger than a second preset threshold, or the first probability is smaller than the first preset threshold, the second probability is larger than the second preset threshold, and the difference between the second probability and the first probability is larger than a third threshold. And under the condition that the first probability and the second probability reach the preset replacement condition, indicating that error correction is needed and the error correction behavior is accurate, replacing the target original word in the real-time voice recognition text with the candidate word to obtain an accurate corrected text.

Referring to fig. 2, fig. 2 is an architecture diagram illustrating an error correction replacement performed by a semantic error correction model in a text error correction method according to an embodiment of the present disclosure. Wherein the semantic error correction model employs a phone-RoBERTa model, wherein the phone representation model input includes phone-level features (i.e., pronunciation information of the text). For example, the target text segment input into the semantic error correction model is "with an american option", the target processing text (i.e., "token (token)" in fig. 2) is "with an american [ MASK ]", the phone basic feature of the target text segment is "you yi mei shi qi q" and the position index and the block index information are added accordingly, these are input into the semantic error correction model, at the 6 th and 7 th position indexes of the output layer, a first probability that the target original word is "complete" and a second probability that the target original word is a candidate word "option" are output, and whether to replace the target text segment is determined based on the first probability and the second probability.

Taking the real-time speech recognition text as "this is also to keep saying that we are a strategy of the external propaganda and a delay of the strategy of the actual transaction, but do not have a drift that will not produce some hunting" as an example, the error correction word pair is "hunting-strategy", and the edited target text segment is "because of the traditional tendency strategy like some CTA". The target processing text input into the semantic error correction model is ' this is also to maintain a delay of saying that a strategy of our external propaganda and a strategy of actual transaction do not generate a drift of some [ MASK ] [ MASK ], and the target text fragment is ' because of the traditional trend strategy like some CTA '. The output of the semantic error correction model is the probability that the [ MASK ] [ MASK ] area is two words of "hunting" and "strategy", respectively.

Step 108: and sending the corrected text to the client for display.

After the corrected text is obtained, the corrected text is sent to the client, the corrected text is displayed by the client, for a user, after the user edits the text once, the subsequent voice recognition text does not need to be edited manually, the text error correction efficiency is improved, meanwhile, the subsequent text error correction is completely insensitive to the user, and the user experience is improved.

By the method, the voice recognition effect can be continuously improved under the condition that a user feels no sense, for example, in the current conversation, the voice recognition text is 'a hunting-related text which is caused by the traditional tendency like some CTAs', the user edits the text to obtain 'a tendency strategy which is caused by the traditional tendency like some CTAs', an error correction word pair 'hunting-related strategy' can be obtained according to the steps, and if a real-time voice recognition text 'the tendency of a large disk and an industry is hunted by a fund theory conference', the real-time voice recognition text is automatically corrected to be 'the tendency strategy of the large disk and the industry by the fund theory conference'.

By applying the embodiment of the specification, a user edits a text at a client, correspondingly receives a text editing message of a current conversation sent by the client, the text editing message represents the behavior of editing error words in the text by the user, and by analyzing the text editing message, an error correction word pair generated under the current conversation, namely an error correction word pair generated by the editing behavior of the user, can be used for correcting the real-time voice recognition text by aiming at the next obtained real-time voice recognition text, and sends the corrected text obtained by error correction to the client for display, because the error correction word pair is generated when the user edits the text at the client, the error correction word pair accords with the actual editing requirement of the user when the subsequently received real-time voice recognition text is corrected, thereby realizing customized text error correction, and the user edits the text once, and in the subsequent speech recognition, the error can be corrected quickly, so that the efficiency and the accuracy of correcting the speech recognition text are improved.

The following will further describe the text error correction method by taking the application of the text error correction method provided in this specification in a real-time conference scenario as an example with reference to fig. 3. Fig. 3 is a flowchart illustrating a processing procedure of a text error correction method according to an embodiment of the present specification, and specifically includes the following steps.

The first step is as follows: the front end monitors the text editing behavior of the user at the front end by monitoring the event, and generates a text editing message.

The second step is that: and the front end sends the text editing message to a service end of the conference software.

The third step: and the server triggers the user behavior analysis module.

The user behavior analysis module obtains a first text length of an original text segment before editing and a second text length of a target text segment after editing, calculates a text length difference between the first text length and the second text length, analyzes a text editing message under the condition that the text length difference does not exceed a preset threshold value, namely performs text alignment and word segmentation on the original text segment and the target text segment to obtain a first word segmentation sequence of the original text segment and a second word segmentation sequence of the target text segment, determines an error correction word pair generated under the current conversation based on the first word segmentation sequence and the second word segmentation sequence, and stores the error correction word pair into a text error correction word bank of the current conversation. Specifically, the process of parsing the text editing message is shown in the embodiment shown in fig. 1, and details are not repeated here. And simultaneously, the user behavior analysis module also acquires historical error correction word pairs from the text error correction word bank and the user database, performs conflict analysis on the error correction word pairs generated under the current conversation and the historical error correction word pairs, and deletes the historical error correction word pairs in the text error correction word bank and the user database if conflicts exist.

And fourthly, under the condition that the conference is finished or the preset time is reached, triggering a user behavior selection module, extracting a target error correction word pair from the text error correction word bank and storing the target error correction word pair into a user database.

And fifthly, acquiring real-time voice data, performing character recognition on the real-time voice data to obtain a real-time voice recognition text, and then driving the real-time error correction agent module.

And sixthly, the real-time error correction agent module acquires the error correction word pair generated by the current conversation from the text error correction word bank, and corrects the real-time voice recognition text to obtain a corrected text.

The specific error correction mode may adopt a semantic error correction model in the embodiment shown in fig. 1, and the specific process is shown in the embodiment shown in fig. 1 and is not described here again.

And seventhly, the real-time error correction agent module sends the corrected text to the client and displays the corrected text on a screen.

By using the scheme, a user edits a text at a client, correspondingly receives a text editing message of the current conversation sent by the client, the text editing message represents the behavior of the user for editing error words in the text, the error correction word pair generated under the current conversation can be obtained by analyzing the text editing message, namely the error correction word pair generated by the editing behavior of the user, aiming at the next obtained real-time speech recognition text, the error correction word pair can be utilized to correct the real-time speech recognition text, and the corrected text obtained by error correction is sent to the client for display, because the error correction word pair is generated when the user edits the text at the client, the actual editing requirement of the user is met when the subsequently received real-time speech recognition text is corrected, the customized text error correction is realized, and the user edits the text once, and in the subsequent speech recognition, the error can be corrected quickly, so that the efficiency and the accuracy of correcting the speech recognition text are improved.

The execution main body of the text correction library construction method provided by the embodiment of the specification can be a data server side, a control side and the like. The text correction library construction method provided by the embodiments of the present specification may be implemented by at least one of software, a hardware circuit, and a logic circuit provided in the execution subject.

Referring to fig. 4, fig. 4 is a flowchart illustrating a text error correction lexicon building method according to an embodiment of the present specification, which specifically includes the following steps.

Step 402: and acquiring the text editing message of the current conversation.

Step 402 is similar to step 102 in the embodiment shown in fig. 1, and the text editing message of the current session refers to the text editing message input by the user currently logged in the client in real time. The specific implementation is shown in the embodiment shown in fig. 1, and is not described herein again.

Step 404: and analyzing the text editing message to obtain an error correction word pair generated under the current conversation.

Step 404 is similar to step 104 in the embodiment shown in fig. 1, and specific implementation details are shown in the embodiment shown in fig. 1 and are not described herein again.

Step 406: and storing the error correction word pairs into a text error correction word bank of the current conversation, wherein the text error correction word bank is used for correcting errors of the text to be corrected.

In order to adapt to the text editing operation of a user, the user often edits the speech recognition text more than once, a text error correction word bank corresponding to the user, namely a text error correction word bank of the current conversation is created, after the text editing message is analyzed to obtain error correction word pairs generated under the current conversation, the error correction word pairs are stored in the text error correction word bank, and the error correction word pairs generated under the current conversation are maintained by the text error correction word bank of the current conversation. Then, for the subsequently obtained text to be corrected, the text correction thesaurus can be directly used for correcting the error, wherein the text correction thesaurus can be a semi-persistent or persistent database, and the customized error correction effect on a specific user can be achieved.

By applying the embodiment of the description, the user edits the text, and accordingly can obtain the text editing message of the current conversation, wherein the text editing message represents the behavior of the user for editing the error word in the text, the error correction word pair generated under the current conversation can be obtained by analyzing the text editing message, namely the error correction word pair generated by the user editing behavior, and then the error correction word pair is stored in the text error correction word bank of the current conversation, so that the error correction can be performed on the obtained text to be corrected by using the text error correction word bank. Because the error correction word pair is generated when the user edits the text, the actual editing requirement of the user is met when the obtained text to be corrected is corrected, the customized text error correction is realized, and the user can quickly correct the subsequent text to be corrected by one-time text editing, so that the efficiency and the accuracy of the text error correction are improved.

Corresponding to the above method embodiment, the present specification further provides a text error correction apparatus embodiment, and fig. 5 shows a schematic structural diagram of a text error correction apparatus provided in an embodiment of the present specification. As shown in fig. 5, the apparatus includes:

a receiving module 520 configured to receive a text editing message of a current session sent by a client;

a first parsing module 540, configured to parse the text editing message to obtain an error correction word pair generated under the current session;

the error correction module 560 is configured to obtain the real-time speech recognition text, and correct the error of the real-time speech recognition text by using the error correction word pair to obtain a corrected text;

a sending module 580 configured to send the revised text to the client for display.

Optionally, the text editing message includes an original text segment before editing and a target text segment after editing;

the first parsing module 540 is further configured to perform text alignment and word segmentation on the original text segment and the target text segment, so as to obtain a first word segmentation sequence of the original text segment and a second word segmentation sequence of the target text segment; and determining error correction word pairs based on the first word segmentation sequence and the second word segmentation sequence.

Optionally, the apparatus further comprises:

the text length difference calculation module is configured to obtain a first text length of an original text fragment before editing and a second text length of a target text fragment after editing; calculating a text length difference according to the first text length and the second text length;

the first parsing module 540 is further configured to, in a case that the text length difference does not exceed the preset threshold, parse the text editing message to obtain an error correction word pair generated in the current session.

Optionally, the apparatus further comprises:

a first storage module configured to store the error correction word pairs to a text error correction word bank of a current conversation;

the error correction module 560 is further configured to read an error correction word pair from the text error correction word bank, and perform error correction on the real-time speech recognition text by using the error correction word pair to obtain a corrected text.

Optionally, the text editing message comprises a user identification; the device also includes:

the extraction module is configured to extract a target error correction word pair from the text error correction word bank under the condition that a preset trigger condition is reached;

and the second storage module is configured to store the target error correction word pair into a user database corresponding to the user identifier.

Optionally, the apparatus further comprises:

the conflict analysis module is configured to acquire historical error correction word pairs from a text error correction word bank or a user database; carrying out conflict analysis on the error correction word pair generated under the current conversation and the historical error correction word pair;

a deletion module configured to delete the historical error correction word pair if there is a conflict.

Optionally, the receiving module 520 is further configured to receive a plurality of text editing messages sent by the client within a preset interval time; and integrating a plurality of text editing messages to obtain the text editing message of the current conversation.

Optionally, the error correction word pair includes a target original word and a candidate word;

an error correction module 560, further configured to obtain edited target text segments corresponding to the error correction word pairs; according to the error correction word pair, carrying out mask processing on a target original word in the real-time voice recognition text to obtain a target processing text; inputting a target processing text and a target text fragment into a pre-trained semantic error correction model to obtain a first probability that a mask position in the target processing text is a target original word and a second probability that the mask position is a candidate word; and under the condition that the first probability and the second probability reach a preset replacement condition, replacing the target original word in the real-time voice recognition text with the candidate word to obtain a corrected text.

The above is a schematic scheme of a text error correction apparatus of the present embodiment. It should be noted that the technical solution of the text error correction device and the technical solution of the text error correction method belong to the same concept, and details that are not described in detail in the technical solution of the text error correction device can be referred to the description of the technical solution of the text error correction method.

Corresponding to the above method embodiment, the present specification further provides an embodiment of a text error correction lexicon constructing device, and fig. 6 shows a schematic structural diagram of the text error correction lexicon constructing device provided in an embodiment of the present specification. As shown in fig. 6, the apparatus includes:

an obtaining module 620 configured to obtain a text editing message of a current session;

the second parsing module 640 is configured to parse the text editing message to obtain an error correction word pair generated under the current session;

a storage module 660 configured to store the error correction word pairs to a text error correction word bank of the current conversation, wherein the text error correction word bank is used for error correction of the text to be corrected.

By applying the embodiment of the description, the user edits the text, and accordingly can obtain the text editing message of the current conversation, wherein the text editing message represents the behavior of the user for editing the error word in the text, the error correction word pair generated under the current conversation can be obtained by analyzing the text editing message, namely the error correction word pair generated by the user editing behavior, and then the error correction word pair is stored in the text error correction word bank of the current conversation, so that the error correction can be performed on the obtained text to be corrected by using the text error correction word bank. The error correction word pair is generated when the user edits the text, so that the actual editing requirement of the user is met when the obtained text to be corrected is corrected, customized text correction is realized, and the user can quickly correct the subsequently obtained text to be corrected by one-time text editing, so that the efficiency and the accuracy of text correction are improved.

The foregoing is a schematic scheme of the text error correction lexicon constructing apparatus according to this embodiment. It should be noted that the technical solution of the text error correction lexicon constructing apparatus and the technical solution of the text error correction lexicon constructing method belong to the same concept, and details of the technical solution of the text error correction lexicon constructing apparatus, which are not described in detail, can be referred to the description of the technical solution of the text error correction lexicon constructing method.

FIG. 7 illustrates a block diagram of a computing device, according to one embodiment of the present description. The components of the computing device 700 include, but are not limited to, memory 710 and a processor 720. Processor 720 is coupled to memory 710 via bus 730, and database 750 is used to store data.

Computing device 700 also includes access device 740, access device 740 enabling computing device 700 to communicate via one or more networks 760. Examples of such networks include a Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The Access device 740 may include one or more of any type of Network Interface (e.g., a Network Interface Card (NIC)) that may be wired or Wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) Wireless Interface, a worldwide Interoperability for Microwave Access (Wi-MAX) Interface, an ethernet Interface, a Universal Serial Bus (USB) Interface, a cellular Network Interface, a bluetooth Interface, a Near Field Communication (NFC) Interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 700, as well as other components not shown in FIG. 7, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 7 is for purposes of example only and is not limiting as to the scope of the present description. Other components may be added or replaced as desired by those skilled in the art.

Computing device 700 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 700 may also be a mobile or stationary server.

The processor 720 is configured to execute computer-executable instructions, which when executed by the processor implement the steps of the text correction method or the steps of the text correction lexicon construction method described above.

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the text error correction method or the text error correction lexicon building method belong to the same concept, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the text error correction method or the text error correction lexicon building method.

An embodiment of the present specification further provides a computer-readable storage medium, which stores computer-executable instructions, and when the computer-executable instructions are executed by a processor, the steps of the text error correction method or the steps of the text error correction thesaurus construction method are implemented.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium and the technical solution of the text error correction method or the text error correction lexicon construction method belong to the same concept, and details of the technical solution of the storage medium, which are not described in detail, can be referred to the description of the technical solution of the text error correction method or the text error correction lexicon construction method.

An embodiment of the present specification further provides a computer program, wherein when the computer program is executed in a computer, the computer is caused to execute the steps of the text error correction method or the steps of the text error correction lexicon building method.

The above is an illustrative scheme of a computer program of the present embodiment. It should be noted that the technical solution of the computer program and the technical solution of the text error correction method or the text error correction thesaurus construction method belong to the same concept, and details of the technical solution of the computer program, which are not described in detail, can be referred to the description of the technical solution of the text error correction method or the text error correction thesaurus construction method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts, but those skilled in the art should understand that the present embodiment is not limited by the described acts, because some steps may be performed in other sequences or simultaneously according to the present embodiment. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims

1. A text error correction method comprising:

receiving a text editing message of a current session sent by a client;

acquiring a real-time voice recognition text, and correcting the real-time voice recognition text by using the error correction word pair to obtain a corrected text;

and sending the corrected text to the client for display.

2. The method of claim 1, the text editing message comprising an original text segment before editing and a target text segment after editing;

the analyzing the text editing message to obtain the error correction word pair generated under the current conversation includes:

and determining an error correction word pair based on the first word segmentation sequence and the second word segmentation sequence.

3. The method according to claim 1 or 2, before said parsing the text editing message to obtain the error correction word pair generated under the current conversation, further comprising:

calculating a text length difference according to the first text length and the second text length;

and under the condition that the text length difference does not exceed a preset threshold value, analyzing the text editing message to obtain an error correction word pair generated under the current conversation.

4. The method according to claim 1 or 2, after said parsing the text editing message to obtain the error correction word pair generated under the current conversation, further comprising:

storing the error correction word pair to a text error correction word bank of the current conversation;

the correcting the real-time voice recognition text by using the correction word pair to obtain a corrected text comprises the following steps:

and reading the error correction word pair from the text error correction word bank, and correcting the real-time voice recognition text by using the error correction word pair to obtain a corrected text.

5. The method of claim 4, the text editing message comprising a user identification;

after the storing the corrected word pair to the text corrected word bank of the current conversation, the method further comprises:

under the condition that a preset trigger condition is reached, extracting a target error correction word pair from the text error correction word bank;

6. The method of claim 5, further comprising:

acquiring historical error correction word pairs from the text error correction word bank or the user database;

and if the conflict exists, deleting the historical error correction word pair.

7. The method of claim 1, the receiving a text-editing message of a current conversation sent by a client, comprising:

receiving a plurality of text editing messages sent by a client within a preset interval time;

and integrating the plurality of text editing messages to obtain the text editing message of the current conversation.

8. The method of claim 1, wherein the error correction word pair comprises a target original word and a candidate word;

performing mask processing on the target original word in the real-time voice recognition text according to the error correction word pair to obtain a target processing text;

inputting the target processing text and the target text fragment into a pre-trained semantic error correction model to obtain a first probability that a mask position in the target processing text is the target original word and a second probability that the mask position is the candidate word;

and replacing the target original word in the real-time voice recognition text with the candidate word to obtain a corrected text under the condition that the first probability and the second probability reach a preset replacement condition.

9. A text error correction word stock construction method comprises the following steps:

acquiring a text editing message of a current session;

and storing the error correction word pairs into a text error correction word bank of the current conversation, wherein the text error correction word bank is used for correcting errors of texts to be corrected.

10. A text correction apparatus comprising:

the error correction module is configured to obtain a real-time voice recognition text, and correct the error of the real-time voice recognition text by using the error correction word pair to obtain a corrected text;

a sending module configured to send the corrected text to the client for display.

11. A text error correction thesaurus construction device comprises:

and the storage module is configured to store the error correction word pairs into a text error correction word bank of the current conversation, wherein the text error correction word bank is used for correcting errors of texts to be corrected.

12. A computing device, comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions, and the processor is configured to execute the computer-executable instructions, which when executed by the processor implement the steps of the text correction method according to any one of claims 1 to 8, or the steps of the text correction library construction method according to claim 9.

13. A computer readable storage medium storing computer executable instructions which, when executed by a processor, implement the steps of the text correction method of any one of claims 1 to 8 or the steps of the text correction library construction method of claim 9.