CN111339757A

CN111339757A - Error correction method for voice recognition result in collection scene

Info

Publication number: CN111339757A
Application number: CN202010089898.XA
Authority: CN
Inventors: 鲁进
Original assignee: Shanghai Kaian Information Technology Co ltd
Current assignee: Shanghai Kaian Information Technology Co ltd
Priority date: 2020-02-13
Filing date: 2020-02-13
Publication date: 2020-06-26

Abstract

The invention discloses an error correction method of a voice recognition result in a receiving-urging scene, which comprises the following steps: step 1, generating a special dictionary base of the collection urging field, and step 2, training an HMM model of linguistic data between a collector urging in the collection urging field and a client call: the method comprises the following steps of utilizing a call corpus of a receiver and a client in the field of collection, carrying out certain manual labeling and arrangement, using the call corpus as a training sample, calculating initial emission probability, transition probability and emission probability, step 3, generating a text to be corrected, step 4, generating a text set after correction, and step 5, carrying out error correction, namely, a text screening method: firstly, replacing the determined candidate text to be corrected with the corresponding candidate corrected text set, decoding by using a trained HMM model and combining an algorithm, and calculating the final candidate corrected text. The invention can correct the voice recognition of the collector well, and is convenient for large-scale production application of the voice recognition product.

Description

Error correction method for voice recognition result in collection scene

Technical Field

The invention relates to the technical field of voice recognition, in particular to an error correction method for a voice recognition result in a receiving-urging scene.

Background

With the popularization of deep learning, great breakthroughs are made in the aspects of computer vision, speech recognition, natural language processing and the like. Taking speech recognition as an example, the accuracy of speech recognition has reached 97% so that the application field of speech recognition is wider and wider. In the field of financial collection, a collector and a client can carry out a large amount of telephone communication. The method comprises the steps that speech recognition (ASR) needs to be carried out on call audio to be converted into corresponding call texts, and on one hand, quality inspection analysis is carried out on the call texts to ensure compliance; on the other hand, the text analysis and mining are carried out on the call text, and a solid foundation is laid for subsequently improving the receiving acceleration effect.

In the actual voice interaction process, the voice recognition error rate is high due to the influences of various factors such as nonstandard Mandarin of a user, noise, vocabulary loss in the professional field and the like. The prior art focuses on improving the accuracy of voice recognition, but lacks an error correction means for a recognition result. Therefore, the large-scale production application of the voice recognition product is greatly influenced.

Disclosure of Invention

The present invention aims to provide a method for correcting a speech recognition result in a call-in scene, so as to solve the problems proposed in the background art.

In order to achieve the purpose, the invention provides the following technical scheme: a method for correcting a voice recognition result in a receiving-urging scene comprises the following steps:

step 1, generation of a special dictionary base in the collection urging field: making statistics on language materials of call between a receiver in the collection accelerating field and a client, carrying out certain manual labeling, sorting and desensitization, wherein the language materials are used as training language materials, and are processed according to sorted language material texts and sorted by special service personnel to form a special dictionary library in the collection accelerating field;

step 2, training an HMM model of the linguistic data between the call of the call taker and the client in the call taker field: the method comprises the following steps of (1) utilizing a call corpus between an acquirer in an acquisition field and a client, carrying out certain manual marking and sorting, and then using the call corpus as a training sample to calculate initial emission probability, transition probability and emission probability;

step 3, generating a text to be corrected: performing word segmentation on the voice recognition text, and checking whether each word is in a user dictionary library to judge whether a text to be corrected is added;

step 4, the text set generation method after error correction: firstly, converting a candidate text to be corrected into pinyin, and secondly, calculating the editing distance between the candidate text to be corrected and the pinyin of each word in a collection field special dictionary library by using an editing distance algorithm to judge whether the original candidate text to be corrected is reserved;

and step 5, correcting the error, namely: firstly, replacing the determined candidate text to be corrected with the corresponding candidate corrected text set, decoding by using a trained HMM model and combining a decoding algorithm, and calculating the final candidate corrected text.

Preferably, the method for determining whether to add the candidate text to be corrected is that if each word is in the collection-urging-field-specific dictionary base, no correction is performed, and if not, the word is taken as the candidate text to be corrected.

Preferably, the method of whether to reserve the original candidate text to be corrected is to use the original candidate text to be corrected as a candidate text set after correction if the edit distance is smaller than the threshold, and reserve the original candidate text to be corrected first if the edit distance is larger than the threshold.

Preferably, the method for processing the sorted corpus text comprises text word segmentation, part-of-speech tagging, word frequency statistics, pinyin tagging and similar word retrieval.

Preferably, the decoding algorithm for performing the decoding operation is a viterbi algorithm.

Compared with the prior art, the invention has the beneficial effects that:

the invention can correct the voice recognition of the collector well, and is convenient for large-scale production application of the voice recognition product.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a flow chart of the construction of a dictionary base for urging collection of a domain of expertise according to the present invention;

FIG. 3 is a flow of text screening after final error correction is constructed.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1-3, the present invention provides a technical solution: a method for correcting a voice recognition result in a receiving-urging scene comprises the following steps:

step 2, training an HMM (hidden Markov model) of linguistic data between the call of the collector and the client in the collection field: the method comprises the following steps of (1) utilizing a call corpus between an acquirer in an acquisition field and a client, carrying out certain manual marking and sorting, and then using the call corpus as a training sample to calculate initial emission probability, transition probability and emission probability;

and step 5, correcting the error, namely: firstly, replacing the determined candidate text to be corrected with the corresponding candidate corrected text set, and utilizing a trained HMM (hidden Markov model) to perform decoding work by combining a decoding algorithm to calculate the final candidate corrected text.

Specifically, the method for judging whether to add the candidate text to be corrected is to perform no correction if each word is in the collection-promoting field proprietary dictionary library, and to use the word as the candidate text to be corrected if the word is not in the collection-promoting field proprietary dictionary library.

Specifically, the method for determining whether to reserve the original candidate text to be corrected is to use the original candidate text to be corrected as a candidate text set after correction if the edit distance is smaller than a threshold, reserve the original candidate text to be corrected first if the edit distance is larger than the threshold, calculate the edit distances of the full pinyin and the first pinyin, use the word as a candidate text after correction if the edit distances are respectively smaller than the respective thresholds, and use the word ranked with the edit distance of the word in the first three as the candidate text after correction if the edit distances are larger than the threshold. Through the step, if there are one or more candidate texts after error correction in each candidate text to be corrected, the candidate text after error correction needs to be continuously screened.

The method comprises the steps of dividing words, labeling parts of speech, counting word frequency, marking pinyin, retrieving similar words, dividing words, removing stop words, sequencing according to the word frequency in a reverse order, and identifying the parts of speech and the pinyin of each word.

Specifically, the decoding algorithm for performing the decoding operation is a viterbi algorithm, and the search efficiency is high.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A method for correcting a voice recognition result in a receiving-urging scene is characterized by comprising the following steps:

2. The method according to claim 1, wherein the method comprises the following steps: and the judgment method for whether to add the candidate text to be corrected is to perform no correction if each word is in the collection urging field special dictionary library, and to take the word as the candidate text to be corrected if the word is not in the collection urging field special dictionary library.

3. The method according to claim 2, wherein the method comprises the following steps: and if the editing distance is greater than the threshold value, the original candidate text to be corrected is firstly reserved.

4. The method according to claim 3, wherein the method comprises the following steps: the method for processing the sorted corpus text comprises text word segmentation, part of speech tagging, word frequency statistics, pinyin tagging and similar word retrieval.

5. The method according to claim 2, wherein the method comprises the following steps: the decoding algorithm for decoding is a viterbi algorithm.