WO2023035525A1 - Speech recognition error correction method and system, and apparatus and storage medium - Google Patents

Speech recognition error correction method and system, and apparatus and storage medium Download PDF

Info

Publication number
WO2023035525A1
WO2023035525A1 PCT/CN2022/071074 CN2022071074W WO2023035525A1 WO 2023035525 A1 WO2023035525 A1 WO 2023035525A1 CN 2022071074 W CN2022071074 W CN 2022071074W WO 2023035525 A1 WO2023035525 A1 WO 2023035525A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
corrected
fst
detected
text
Prior art date
Application number
PCT/CN2022/071074
Other languages
French (fr)
Chinese (zh)
Inventor
庄子扬
魏韬
马骏
王少军
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023035525A1 publication Critical patent/WO2023035525A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to artificial intelligence technology, and in particular to a speech recognition error correction method, system, device and storage medium.
  • the present application aims to solve one of the technical problems in the related art at least to a certain extent.
  • the present application provides a voice recognition error correction method, system, device and storage medium, which can realize the purpose of correcting voice recognition text and improving the accuracy of voice recognition.
  • an embodiment of the present application provides a speech recognition error correction method, the method includes the following steps: performing speech recognition on the speech to be detected, obtaining the text to be detected and the corresponding pronunciation sequence to be detected; according to the speech to be detected Pronunciation sequence, construct FST to be detected; Obtain keyword FST and Chinese character confusion set; Wherein, described keyword FST, described Chinese character confusion set and described FST to be detected belong to the same vertical field; According to described FST to be detected and described The keyword FST determines some words to be corrected and some sentences to be corrected in the text to be detected; wherein, the sentences to be corrected include the words to be corrected; if the words to be corrected Exist in the Chinese character confusion set, determine the replacement words corresponding to each word to be corrected according to the Chinese character confusion set; replace the word to be corrected in the sentence to be corrected with the Replace words to obtain a replacement sentence; calculate the first logic score of the sentence to be corrected and the second logic score of the replacement sentence; when the first logic score is less than the second logic score
  • the embodiment of the present application also proposes a speech recognition error correction system, the system includes a first module, a second module, a third module, a fourth module, a fifth module, a sixth module and a seventh module module; the first module is used to perform speech recognition on the speech to be detected to obtain the text to be detected and the corresponding pronunciation sequence to be detected; the second module is used to construct the FST to be detected according to the pronunciation sequence to be detected; the The third module is used to obtain keyword FST and Chinese character confusion set; Wherein, described keyword FST, described Chinese character confusion set and described to-be-detected FST belong to the same vertical domain; Described fourth module is used for according to described to-be-detected FST and the keyword FST determine some words to be corrected and some sentences to be corrected in the text to be detected; wherein, the sentences to be corrected include the words to be corrected; the fifth The module is used to determine the replacement words corresponding to each of the words to be corrected according to the confusion set of Chinese characters
  • an embodiment of the present application also proposes a device, the device comprising: at least one processor; at least one memory for storing at least one program; when the at least one program is executed by the at least one processor Executing, so that the at least one processor implements a speech recognition error correction method; wherein, the speech recognition error correction method includes: performing speech recognition on the speech to be detected to obtain the text to be detected and the corresponding pronunciation sequence to be detected; according to the Describe the pronunciation sequence to be detected, construct the FST to be detected; obtain the keyword FST and the confusion set of Chinese characters; wherein, the keyword FST, the confusion set of Chinese characters and the FST to be detected belong to the same vertical field; according to the FST to be detected and the keyword FST, determine some words to be corrected and some sentences to be corrected in the text to be detected; wherein, the sentences to be corrected include the words to be corrected; if the words to be corrected Wrongly written words exist in the confusion set of Chinese characters, and according to the confusion set of Chinese characters,
  • the embodiment of the present application also provides a computer storage medium, which stores a program executable by the processor, and the program executable by the processor implements a speech recognition correction function when executed by the processor Error method; wherein, the speech recognition error correction method includes: performing speech recognition on the speech to be detected, obtaining the text to be detected and the corresponding pronunciation sequence to be detected; according to the pronunciation sequence to be detected, constructing the FST to be detected; obtaining the keyword FST and Chinese character confusion set; wherein, the keyword FST, the Chinese character confusion set and the FST to be detected belong to the same vertical field; according to the FST to be detected and the keyword FST, determine the text to be detected Some words to be corrected and some sentences to be corrected; wherein, the sentences to be corrected include the words to be corrected; if the words to be corrected exist in the confusion set of Chinese characters, according to the Chinese characters The confusion set determines the replacement word corresponding to each word to be corrected; the word to be corrected in the sentence to be corrected is replaced by the replacement
  • the beneficial effects of the embodiments of the present application are as follows: first, speech recognition is performed on the speech to be detected, and the text to be detected and the corresponding pronunciation sequence to be detected are obtained; the FST to be detected is constructed according to the pronunciation sequence to be detected; the FST to be detected is constructed according to the keyword to be detected FST, determine some words to be corrected in the text to be detected, and determine the sentence to be corrected that contains the word to be corrected; if there is the word to be corrected in the obtained Chinese character confusion set, determine each word to be corrected The replacement word corresponding to the wrong word, and replace the word to be corrected in the sentence to be corrected with the replacement word to generate a replacement sentence; calculate the first logic score of the sentence to be corrected and the second logic score of the replacement sentence , when the first logic score is smaller than the second logic score, the sentence to be corrected in the text to be detected is replaced with a replacement sentence, thereby completing the error correction of the speech recognition text.
  • the speech recognition error correction method proposed in the embodiment of the present application is to determine the words to be corrected that may have errors according to the pronunciation of the speech recognition text, and to confuse them according to the Chinese characters in the corresponding business. Set to provide replacement words for the word to be corrected, and finally determine whether error correction is required according to the logical score of the corresponding sentence before and after the replacement of the word to be corrected.
  • the embodiment of the present application can find the recognition errors caused by mispronunciation in the specified business field, thereby increasing the probability of finding the words to be corrected; Make effective corrections, reduce the miscorrection rate of speech recognition texts, thereby effectively improving the accuracy of speech recognition texts, so that speech recognition technology can play a greater role in digital medical, smart home and other fields.
  • Fig. 1 is a flow chart of the steps of the speech recognition error correction method provided by the embodiment of the present application
  • Fig. 2 is the schematic diagram of the FST to be detected provided by the embodiment of the present application.
  • Fig. 3 is the step flowchart of constructing keyword FST and constructing Chinese character confusion set that the embodiment of the present application provides;
  • Fig. 4 is the flow chart of the steps of constructing the confusion set of Chinese characters provided by the embodiment of the present application.
  • Fig. 5 is the flow chart of the steps of constructing the pronunciation confusion set provided by the embodiment of the present application.
  • FIG. 6 is a flow chart of steps for constructing a keyword table provided by an embodiment of the present application.
  • Fig. 7 is the flow chart of the steps of constructing keyword FST provided by the embodiment of the present application.
  • FIG. 8 is a schematic diagram of the keyword FST provided by the embodiment of the present application.
  • FIG. 9 is a schematic diagram of a speech recognition error correction system provided by an embodiment of the present application.
  • Fig. 10 is a schematic diagram of the device provided by the embodiment of the present application.
  • FIG. 1 is a flow chart of the steps of the speech recognition error correction method provided by the embodiment of the present application.
  • the method involves the field of artificial intelligence speech recognition error correction.
  • the method includes but is not limited to steps S100-S170:
  • Step S100 performing speech recognition on the speech to be detected to obtain the text to be detected and the corresponding pronunciation sequence to be detected;
  • the voice to be detected in this embodiment of the present application refers to a voice segment generated by a person performing a business in a vertical business field.
  • the voice to be detected can be the recording of a discussion meeting conducted by doctors on a certain case, or the recording of the online communication between the patient and the doctor, or the telephone communication between the patient and the front desk of the hospital.
  • its business recordings will contain a large number of medical-related nouns, including but not limited to hospital names, surgical names, or drug names, etc.
  • the embodiment of the present application proposes to determine the words to be corrected according to the pronunciation. Therefore, in this step S100, the speech to be detected is firstly recognized, and the text to be detected is generated.
  • the text to be detected is a paragraph of text corresponding to the speech to be detected. sequence. Table lookup is performed for each word in the text to be detected to obtain the corresponding pinyin unit, and the pinyin units of all characters are recorded in the pronunciation sequence to be detected, so the pronunciation sequence to be detected is a pinyin sequence corresponding to the text to be detected.
  • the generated pronunciation sequence to be detected is: "wo hen kuai le”.
  • the common pronunciation of the word is generally selected as the corresponding pronunciation, for example, the corresponding pinyin unit of " ⁇ " is selected as " le”.
  • the pronunciation sequence to be detected can be generated.
  • Step S110 constructing an FST to be detected according to the pronunciation sequence to be detected
  • FST refers to a finite state transducer (Finite State Transducers). This structure is similar to a tree diagram and can be used to construct a dictionary to express different status and transition paths. However, in the embodiment of the present application, the FST to be detected constructed according to the pronunciation sequence to be detected can actually be regarded as a path expressing the pinyin unit corresponding to each word in the text to be detected.
  • the text to be detected is: "the weather is fine today"
  • the obtained pronunciation sequence to be detected can be expressed as: "jin tian tian qi qing lang”
  • the FST to be detected corresponding to the text to be detected can be constructed.
  • Concrete construction result is with reference to Fig. 2, and Fig. 2 is the schematic diagram of the FST to be detected that the embodiment of the present application provides, as shown in Fig.
  • each pinyin unit in the pronunciation sequence to be detected according to the pinyin unit order Arrange to get six sub-nodes 210, the order of which is: "jin-tian-tian-qi-qing-lang", the last sub-node points to the end point 220.
  • a pronunciation path corresponding to the text to be detected can be obtained, that is, the FST to be detected as shown in FIG. 2 .
  • Step S120 obtaining keyword FST and Chinese character confusion set
  • the keyword FST corresponding to the service records the pinyin corresponding to the keyword table in this service.
  • the structure of the keyword FST is similar to that of the FST to be detected, and its specific steps will be elaborated below.
  • the Chinese character confusion set records words or words that are easily confused in this business field. The specific construction process of the Chinese character confusion set will be elaborated below.
  • the keyword FST, the Chinese character confusion set and the FST to be detected belong to the same vertical field, so the errors in the text to be detected can be found more accurately by using the keyword FST and the Chinese character confusion set.
  • Step S130 according to the FST to be detected and the keyword FST, determine a number of words to be corrected and a number of sentences to be corrected in the text to be detected; wherein, the sentences to be corrected include words to be corrected;
  • the structure of the keyword FST and the FST to be detected is similar, it is convenient to reorganize and compare the keyword FST and the FST to be detected, that is, to compare the pinyin unit in the FST to be detected with the pinyin unit in the keyword FST For comparison, if the pinyin is the same, it is recorded as the same node, if there is one or more identical nodes in the FST to be detected and the keyword FST, then the words in the text to be detected corresponding to these same nodes are used as words to be corrected , therefore, the number of these identical nodes is the same as the number of words of the corresponding word to be corrected.
  • the order of the pinyin units should also be followed when using the keyword FST for reorganization. That is to say, if the FST to be detected and the key The word FST contains multiple identical nodes, and these nodes should be continuous.
  • the words to be corrected in the text to be detected can be determined, and these words to be corrected all exist in the sentence.
  • There is already a relatively mature sentence segmentation scheme in the related technology and the text to be detected can be segmented by using the related technology, and the sentence containing one or more words to be corrected can be determined as the sentence to be corrected.
  • step S130 all the words to be corrected in the text to be detected can be determined, and the sentences to be detected containing these words to be corrected can be determined. Since the words to be corrected are obtained through pronunciation screening based on the recombination with the keyword FST, it can effectively increase the discovery rate of recognition errors caused by mispronunciation in the specified business field, and help reduce Re-correction rate, improve the accuracy of speech recognition.
  • Step S140 if the word to be corrected exists in the Chinese character confusion set, determine the replacement word corresponding to each word to be corrected according to the Chinese character confusion set;
  • Each word to be corrected to be determined in step S130 is matched in the confusion set of Chinese characters in this business field, if the word to be corrected currently exists in the confusion set of Chinese characters, then the confusion set of Chinese characters includes the words that are related to the confusion to be corrected.
  • the replacement term for the term is matched in the confusion set of Chinese characters in this business field, if the word to be corrected currently exists in the confusion set of Chinese characters, then the confusion set of Chinese characters includes the words that are related to the confusion to be corrected.
  • the replacement term for the term is used.
  • Step S150 replacing the word to be corrected in the sentence to be corrected with a replacement word to obtain a replacement sentence
  • the words to be corrected in the sentence to be corrected are replaced with the replacement words determined in step S140, while other parts of the sentence to be corrected are not changed, so as to obtain a new replacement sentence.
  • the sentence to be corrected is: "The weather is fine today", and the word to be corrected is: “weather”, and "weather” exists in the Chinese character confusion set corresponding to this business
  • the corresponding replacement word for "weather” is : “Tianqi”
  • the generated replacement sentence is: "It's sunny today in Tianqi”.
  • step S150 the words to be corrected in all the sentences to be corrected in the text to be detected are replaced, and several replacement sentences containing the replaced words are determined.
  • Step S160 calculating the first logic score of the sentence to be corrected and the second logic score of the replacement sentence
  • step S160 the first logic score of the sentence to be corrected and the second logic score of the corresponding replacement sentence need to be calculated.
  • the language logic model of the corresponding service can be used to calculate the logic score of the statement.
  • N-gram is an algorithm based on a statistical language model. gram refers to a byte segment, and N refers to the number of bytes. This model mainly estimates the probability of the Nth word appearing based on the previous (N-1) words, such as Binary Bi-gram and ternary Tri-gram, for the whole sentence, the probability of this sentence can be obtained according to the probability of each word in the sentence, and the probability of each word in the sentence can be obtained by training The training corpus of the N-gram model is calculated.
  • the embodiment of this application does not specifically limit the specific training process of the language logic model, nor does it specifically limit the way the language logic model calculates the logic score. What the embodiment of the application wants to illustrate is that by corresponding to a large amount of text data in the business field A language logic model capable of calculating logic scores of sentences in the business domain can be trained.
  • the first logic score can be determined by inputting the sentence to be corrected into the language logic model.
  • the sentence to be corrected is: "I love reading”, and after performing necessary word segmentation and other processing on the sentence to be corrected according to related technologies such as business dictionaries, it can be determined that the sentence to be corrected can be divided into the following words: "I”, "Love”, "reading”, in some embodiments, the logic score formula of the sentence to be corrected can be expressed as follows:
  • p represents the probability
  • " represents the set.
  • the occurrence probability of each part can be determined, for example, p(me
  • ) -0.2, p(love
  • me) -0.8, p(reading
  • love) -0.7, p (
  • the replacement sentence is input into the same language logic model, and the second logic score of the replacement sentence is calculated according to the above steps.
  • Step S170 when the first logic score of the sentence to be corrected is less than the second logic score of the replacement sentence, replace the sentence to be corrected in the text to be detected with the replacement sentence;
  • step S160 compare the first logical score and the second logical score calculated in step S160, assuming that the first logical score is -2.1 and the second logical score is -2.0, then the first logical score is smaller than the second logical score, and That is to say, for this business field, the language logic of the replacement sentence is more fluent, and it is more likely to be a correct sentence. Therefore, the sentence to be corrected is replaced with the corresponding replacement sentence.
  • the speech recognition error correction of the text to be detected can be completed.
  • step S170 whether the sentence to be corrected is replaced with a replacement sentence depends on the logic scores of the two sentences, and in step S130, it is explained that the sentence to be corrected is a sentence containing several words to be corrected Words, when the sentence to be corrected contains multiple words to be corrected, then it is understandable that the same sentence to be corrected may generate multiple replacement sentences, and different replacement sentences may get different The second logical score of .
  • the sentence to be corrected is "the weather is sunny today", and the words to be corrected in the sentence to be corrected are "weather” and “sunny”, the corresponding replacement words for "weather” are “Tianqi”, " The replacement word corresponding to "sunny” is "love man”.
  • different replacement sentences can be generated.
  • the sentence to be corrected it is necessary to arrange and combine the words to be corrected in the sentence to be corrected to obtain multiple replacement sentences.
  • three kinds of replacement sentences can be obtained, which are respectively the first replacement sentence: "It is sunny today in Tianqi", the second Replacement sentence: “Today Tian Qiqing Lang", the third replacement sentence: "Today's weather lover”, in this embodiment, respectively calculate the sentence corresponding to the error correction sentence, the first replacement sentence, the second replacement sentence and the third replacement sentence logic score, and select the sentence with the highest logic score as the final error correction result.
  • This embodiment integrates all possible replacements in the entire sentence to be corrected, and calculates the logical score of all possible replacement sentences, which can reduce the accuracy of the entire sentence caused by inaccurate replacement of some words to be corrected.
  • the words to be corrected are replaced one by one according to the sequence in the sentence to be corrected.
  • sentence to be corrected “the weather is fine today” as an example
  • the first logical score is to calculate the second logical score corresponding to the replacement sentence "Today Tianqi is sunny”, and according to the comparison result of the logical score, it is determined that the original sentence to be corrected "the weather is sunny today” is more in line with language logic, and there is no need to correct the "weather ” to replace it.
  • the present embodiment replaces one by one and compares them one by one, the logic is simpler, and for the situation that one word to be corrected corresponds to multiple replacement words, the number of calculations can be reduced and the improvement can be improved. error correction efficiency.
  • this application does not specifically limit the processing method of the sentence to be corrected that contains multiple words to be corrected. , can improve the accuracy rate of the replacement of the words to be corrected, thereby reducing the miscorrection rate of speech recognition.
  • the embodiment of the present application provides a speech recognition and error correction method, firstly perform speech recognition on the speech to be detected, obtain the text to be detected and the corresponding pronunciation sequence to be detected; construct the FST to be detected according to the pronunciation sequence to be detected ; According to the FST to be detected and the keyword FST obtained, determine some words to be corrected in the text to be detected, and determine the sentence to be corrected that contains the words to be corrected; Wrong word, determine the replacement word corresponding to each word to be corrected, and replace the word to be corrected in the sentence to be corrected with the replacement word to generate a replacement sentence; calculate the first logic of the sentence to be corrected score and the second logic score of the replacement sentence, when the first logic score is smaller than the second logic score, the sentence to be corrected in the text to be detected is replaced with the replacement sentence, thereby completing the error correction of the speech recognition text.
  • the speech recognition error correction method proposed in the embodiment of the present application is to determine the words to be corrected that may have errors according to the pronunciation of the speech recognition text, and to confuse them according to the Chinese characters in the corresponding business. Set to provide replacement words for the word to be corrected, and finally determine whether error correction is required according to the logical score of the corresponding sentence before and after the replacement of the word to be corrected.
  • the embodiment of the present application can find the recognition errors caused by mispronunciation in the specified business field, thereby increasing the probability of finding the words to be corrected; Make effective corrections, reduce the miscorrection rate of speech recognition texts, thereby effectively improving the accuracy of speech recognition texts, so that speech recognition technology can play a greater role in digital medical, smart home and other fields.
  • the speech recognition error correction method proposed by the embodiment of the present application also includes the steps of constructing a keyword FST and constructing a confusion set of Chinese characters. Referring to FIG. 3, FIG. The flow chart of the steps of the Chinese character confusion set, the method includes but not limited to steps S300-S380:
  • Step S300 acquiring the training voice, the training voice and the voice to be detected belong to the same vertical field
  • training voice multiple training voices are obtained, and the training voice and the voice to be detected belong to the same vertical field. It has been explained above. as a training voice.
  • Step S310 performing speech recognition on the training speech to obtain speech recognition text
  • the speech recognition technology in the related art is used to perform speech recognition on the training speech to obtain a speech recognition text, which is a text sequence corresponding to the training speech.
  • Step S320 according to the speech recognition text, determine the corresponding first pronunciation sequence
  • this step S320 can refer to step S100 in FIG. 1 , that is, look up each word in the speech recognition text to determine its corresponding unique pinyin, and record these pinyin in sequence as the first pronunciation sequence.
  • Step S330 performing manual recognition on the training speech to obtain the manual recognition text
  • step S310 manually recognize the training speech that has undergone speech recognition in step S310, that is, let people listen to the training speech, and convert the heard result into text, and record it into the manually recognized text.
  • the manually recognized text is also a character sequence corresponding to the training speech, which is basically consistent with the number of words and the distribution of words in the speech recognition text, so Speech-recognized text can be compared to human-recognized text.
  • Step S340 according to the manually recognized text, determine the corresponding second pronunciation sequence
  • this step S340 can refer to step S100 in FIG. 1 , that is, perform table lookup for each word in the manually recognized text, determine its corresponding unique pinyin, and record these pinyin in sequence as the second pronunciation sequence.
  • Step S350 determine the Chinese character confusion set according to the speech recognition text and the manual recognition text
  • the number of words and the distribution of words in the speech recognition text are basically the same as those of the manual recognition text, so the speech recognition text and the manual recognition text can be compared to generate a Chinese character confusion set, and the specific process of generating a Chinese character confusion set You can refer to Figure 4.
  • Fig. 4 is the flow chart of the steps of constructing the Chinese character confusion set provided by the embodiment of the present application, the method includes but not limited to steps S351-S354:
  • Step S351 comparing the first word in the speech recognition text with the second word in the corresponding position in the manual recognition text
  • the number of words and the distribution of words in the speech recognition text and the artificial recognition text are basically the same, so the word or word in the speech recognition text can be used as the first word, and the word or word in the artificial recognition text can be used as the first word.
  • word as the second word compare the corresponding position of the first word and the second word. One-to-one correspondence is performed between all the first words and all the second words to complete the comparison between the speech recognition text and the manual recognition text.
  • first word and the second word should be corresponding in position and have the same number of words, that is to say, one word is compared with another word, and one word is compared with another word, and the number of words of these two words same.
  • this step S351 can perform word segmentation on the speech recognition text and the artificial recognition text and then compare them. The location corresponds, it should be “I” and “nest” for comparison, and “walking” and “walking” for comparison.
  • this step S351 can also compare the speech recognition text and the manual recognition text word by word.
  • the speech recognition text is "I am walking”
  • the manual recognition text is "Walk Walking”. It should be a comparison between “I” and “wo”, “walk” and “walk”, and “road” and “road”.
  • voice recognition text and manual recognition If there is a difference in the text, it will be processed in the next step S352.
  • Step S352 if there is a difference between the current first word and the current second word, use the current first word and the current second word as the replacement word, and store the replacement word in the first candidate area;
  • both the first word and the second word are used as replacement words and stored in a candidate area.
  • the speech recognition text is "I am walking”
  • the manual recognition text is "Wo walking”
  • the corresponding position should be “I” and "Wo” for comparison, and it is found that the positions of "I” and “Wo” are corresponding, and the number of words is the same.
  • the first word "I” and the second word "wo” are stored in the same first candidate area.
  • the first word and the second word at the next position are compared, and if there is a difference, the first word and the second word are stored in another first candidate area.
  • step S351 in this step, the speech recognition text and the manual recognition text may be segmented and compared, then in this step S352, it can be directly determined whether the replacement word is a word or a phrase. If step S351 is to compare the speech recognition text and the manual recognition text word by word, then in step S352, according to the language logic before and after the first word or the second word in the related art, it can be determined that the replacement word is specifically A word is still a word.
  • the embodiment of the present application does not specifically limit the number of words to be replaced. What this step intends to illustrate is that the words or phrases that can be replaced can be determined according to the difference between the speech recognition text and the manual recognition text.
  • Step S353 when the comparison between the first word and the second word is completed, several first candidate areas with the same word are merged into the same second candidate area;
  • first candidate areas there are as many first candidate areas as there are differences.
  • voice recognition text is: “ Tian Qiqinglang today”
  • artificial recognition text is: " weather is fine today "
  • step S351-S352 can determine that the replacement word in a first candidate area has " Tian Qi " and " weather "
  • another replacement word in the first candidate area has " sunny " and " lover ".
  • the training speech segment may be relatively long, the same word may appear multiple times, and speech recognition may generate different recognition results for the same word, for example, the next sentence of the speech recognition text is: “The weather will be clear tomorrow", the next sentence of the artificially recognized text is: “The weather will be sunny tomorrow", then for these two sentences, it can be determined that the replacement words in the third first candidate area are "clear” and " sunny".
  • first candidate areas containing the same words can be merged into the same second candidate area, for
  • the voice recognition text is: "Today Tian Qiqinglang will be sunny tomorrow”
  • the artificially recognized text is "The weather will be sunny today and the weather will be sunny tomorrow”.
  • step S140 in FIG. 1 is: if there are words to be corrected in the Chinese character confusion set corresponding to the business, determine the replacement word corresponding to each word to be corrected.
  • the word to be corrected is to determine the word that can be replaced in the corresponding candidate area.
  • the word to be corrected is "Qinglang”
  • the corresponding replacement word is "Qinglang” or "Qinglang”.
  • the positions of the error correction words are replaced one by one with all the replacement words, so as to determine the words that are more in line with the language logic as the final error correction result.
  • Step S354 determining a confusion set of Chinese characters, which includes a plurality of second candidate regions.
  • the Chinese character confusion set contains several second candidate areas, and each second candidate area contains more than two replacement words. , the replacement words are the first word and the second word.
  • the embodiment of the present application provides a method for constructing a Chinese character confusion set, by comparing the speech recognition text and the manual recognition text, determining the replacement word according to the difference, and performing the first candidate area with the same replacement word Combined to maximize the discovery of different misrecognition results for the same word.
  • step S350 has been described clearly, and step S360 will be described below.
  • Step S360 determine the pronunciation confusion set according to the first pronunciation sequence and the second pronunciation sequence
  • the speech recognition text and the artificial recognition text have basically the same word count and word distribution, and a word can correspond to a pinyin unit
  • the first pronunciation sequence and the second pronunciation sequence can also be compared, and a pronunciation confusion set is generated to generate Refer to Figure 5 for the specific process of the pronunciation confusion set.
  • FIG. 5 is a flow chart of steps for constructing a pronunciation confusion set provided by an embodiment of the present application.
  • the method includes but is not limited to steps S361-S364:
  • Step S361 comparing the first pinyin unit in the first pronunciation sequence with the second pinyin unit in the corresponding position in the second pronunciation sequence;
  • the pinyin unit is the pinyin of a word
  • comparing the first pinyin unit in the first pronunciation sequence with the second pinyin unit in the corresponding position in the second pronunciation sequence is actually the speech recognition text of each word
  • the phonetic unit is compared with the human-recognized phonetic unit for each word in the text.
  • Step S362 if there is a difference between the current first pinyin unit and the current second pinyin unit, storing the current first pinyin unit and the current second pinyin unit into the first confusion area;
  • the first pronunciation sequence is "fen xi” (the corresponding text is “analysis”)
  • the second pronunciation sequence is "fen qi” (corresponding to The text is “staging")
  • Step S363 when the comparison between the first pinyin unit and the second pinyin unit is completed, a number of first confusion areas with the same pinyin are merged into the same second confusion area;
  • this step S363 can refer to step S353, which is to merge several first confusion areas with the same pinyin into the same confusion area, for example, the first pinyin unit existing in a first confusion area is "xi”, and the second pinyin unit is "qi", the first pinyin unit existing in another first confusion zone is "ji", and the second pinyin unit is "qi", then there is the same pinyin unit "qi” in the two first confusion zones, then The two first confusion areas are merged into the same second confusion area, and this new second confusion area includes three pinyin units, namely "xi", "qi" and "ji".
  • Step S364 determine the pronunciation confusion set, the pronunciation confusion set includes several second confusion areas;
  • the pronunciation confusion set is constructed, and the pronunciation confusion set includes several second confusion areas, and each second confusion area contains more than two pinyin units.
  • the embodiment of the present application provides a method for constructing a pronunciation confusion set.
  • the pronunciation confusion set is determined by the difference between the first pronunciation sequence and the second pronunciation sequence, and homophonic pronunciations are assembled as much as possible by merging confusion regions. , help to reduce the impact of the user's non-standard pronunciation on speech recognition error correction, and improve the correct rate of speech recognition error correction.
  • step S360 has been described, and step S370 will be described below.
  • Step S370 determine the keyword list according to the pronunciation confusion set, the speech recognition text, the manual recognition text, the first pronunciation sequence and the second pronunciation sequence;
  • this step S370 needs to determine the keyword table corresponding to the service, and the keyword table is used to characterize words that are easily recognized incorrectly in speech recognition of the service. Refer to Figure 6 for the construction of the keyword table.
  • FIG. 6 is a flow chart of steps for constructing a keyword table provided by an embodiment of the present application.
  • the method includes but is not limited to steps S371-S372:
  • Step S371 determine the key pinyin according to the pronunciation confusion set
  • the pronunciation confusion set is constructed according to the difference between the first pronunciation sequence and the second pronunciation sequence, it is also possible to correspondingly determine which parts of the first pronunciation sequence and the second pronunciation sequence are different according to the pronunciation confusion set, Determine the pinyin unit with difference, determine whether the pinyin unit with difference is a single character or a word in a word according to related technologies, if it is a word, then use the pinyin unit as key pinyin, key pinyin includes the first pronunciation in the sequence Several first key pinyin and some second key pinyin in the second pronunciation sequence.
  • this pinyin unit being positioned at the first pronunciation sequence or the second pronunciation sequence, it is determined that this pinyin unit is the first key pinyin or the second key pinyin; if it is a word in a word, then will include this pinyin unit A plurality of pinyin units are used as the first key pinyin or the second key pinyin.
  • xi and qi are contained in a confusion area of the pronunciation confusion set, corresponding to the first pronunciation sequence, "xi” is actually a word in a word “fen-xi”, so “fen-xi " as the first key pinyin, and correspondingly take “fen-qi” in the second pronunciation sequence as the second key pinyin.
  • Step S372 Use the words corresponding to the first key pinyin in the speech recognition text and the words corresponding to the second key pinyin in the manual recognition text as keywords, and store the keywords in the keyword table.
  • the first key pinyin is associated with the speech recognition text
  • the second key pinyin is associated with the artificially recognized text to obtain key words.
  • the first key pinyin "fen-xi” can correspond to The keyword “analysis” in the speech recognition text
  • the second key pinyin “fen-qi” can correspond to the keyword “stage” in the manual recognition text
  • both "analysis” and "stage” are stored in the keyword table middle.
  • the embodiment of the present application provides a method for constructing a keyword table, which mainly stores words with differences between the first pronunciation sequence and the second pronunciation sequence into the keyword table.
  • Step S370 has been described through steps S371-S372, and step S380 will be described below.
  • Step S380 determine the keyword FST according to the keyword table
  • the keyword FST can be constructed through the words in the keyword table and their corresponding pinyin, and the construction process of the keyword FST can refer to FIG. 7 .
  • FIG. 7 is a flow chart of steps for constructing a keyword FST provided by an embodiment of the present application.
  • the method includes but is not limited to steps S381-S385:
  • Step S381 constructing the root node in the keyword FST;
  • FIG. 8 is a schematic diagram of a keyword FST provided in an embodiment of the present application.
  • a root node is constructed as the starting point of the keyword FST, as shown in FIG. 8 , the root node is denoted by a label 800 .
  • Step S382 under the root node, construct the first child node according to the first pinyin unit in the key pinyin;
  • any one of the key pinyin corresponding to the keyword list is selected, and the first pinyin unit in the current key pinyin is determined, and the first child node under the root node is constructed according to the pinyin unit. That is to say, in fact, the first child node is constructed according to the pinyin of the first character of each keyword in the joint vocabulary.
  • keywords include “staging”, “analysis”, “isolation” and “firm”, then according to the first pinyin unit in the key pinyin, three first child nodes 810 can be determined, “fen”, “ge” and “shi” respectively.
  • Step S383 under the first sub-node, construct several second sub-nodes according to the second pinyin unit in the key pinyin and the order of the pinyin units in the key pinyin;
  • keywords include "staging", “analysis”, “isolation” and "firm", under the 3 first child nodes 810, according to the remaining second phonetic unit of each keyword , five second child nodes 820 can be constructed, namely "qi", “xi”, “li”, “wu” and “suo”, and it is understandable that since "firm” has three characters, the corresponding After removing the first pinyin unit, there are two second pinyin units left. According to the order of the pinyin units, two second sub-nodes are built down, that is, under the second sub-node corresponding to "wu”. A "suo" corresponding to the second child node.
  • the second sub-node is constructed from all pinyin units except the first pinyin unit of the key pinyin, so for the same first sub-node, there may be multiple second sub-nodes on the same pronunciation path.
  • Step S384 under the second sub-node, construct several third sub-nodes according to the key pinyin and keyword table;
  • step S383 mentions that the second child node has already corresponding all remaining pinyin units after the key pinyin is removed from the first pinyin unit, so a third child node can be added under the second child node at the end of the path,
  • the third sub-node is used to represent the keyword corresponding to the current key pinyin, and the keyword can be found in the keyword table.
  • nodes 830 in the third column which are "staging”, “analysis”, “isolation” and "firm”.
  • Step S385 adding an arc returning to the root node for each first child node and second child node;
  • an arc 940 returning to the root node is added to each first child node and second child node.
  • a closed search loop is formed.
  • only two arcs are drawn in Fig. 8, in fact, all first child nodes and all second child nodes have arcs returning to the root node.
  • the FST to be detected needs to be reorganized with the keyword FST to find the same node, as shown in Figure 8, for example, the node "fen” exists in the FST to be detected, and the node “fen” also exists in the keyword FST fen", the FST to be detected and the keyword FST will be compared with the child nodes of the next layer. If the two consecutive nodes in the FST to be detected are "fen-qi", and the pronunciation path entered by the keyword FST If it is "fen-xi", inconsistent results will appear.
  • the keyword FST will return to the root node when it finds that the second child node "xi" cannot be matched, and compare again, then the next key FST will enter the pronunciation path "fen-qi",
  • two consecutive matching nodes "fen-qi" in the FST to be detected and the keyword FST can be obtained, and the formation of a search closed loop can ensure as much as possible that in the reorganization of step S120, all possible matching nodes can be searched in the keyword FST. Nodes with the same FST to be detected, thereby improving the discovery rate of words to be corrected.
  • the embodiment of the present application provides a method for constructing a keyword FST, constructing multiple sub-nodes through the order of pinyin units and the keyword list, and improving the discovery rate of words to be corrected by searching a closed loop.
  • step S380 has been explained.
  • the embodiment of the present application provides a method for constructing the keyword FST and the Chinese character confusion set, and the keyword FST and the Chinese character confusion set can be applied to the method steps shown in Figure 1 to complete the method described in the embodiment of the application Proposed Speech Recognition Error Correction Method.
  • the embodiment of the present application provides a method of first performing speech recognition on the speech to be detected, generating the text to be detected and the corresponding pronunciation sequence to be detected; according to the print sequence to be detected, determine the FST to be detected; then According to the FST to be detected and the keyword FST of the corresponding business, some words to be corrected in the text to be detected can be determined, and the sentences to be corrected that contain the words to be corrected can be determined; Wrong words, determine the replacement word corresponding to each word to be corrected, and replace the word to be corrected in the sentence to be corrected with the replacement word to generate a replacement sentence; when the first logic of the sentence to be corrected If the score is smaller than the second logic score of the replacement sentence, the sentence to be corrected in the text to be detected is replaced with the replacement sentence, thereby completing the error correction of the speech recognition text.
  • the embodiment of the present application provides a specific construction method of the keyword FST and the Chinese character confusion set.
  • the speech recognition error correction method proposed in the embodiment of the present application is to determine the words to be corrected that may have errors according to the pronunciation of the speech recognition text, and to confuse them according to the Chinese characters in the corresponding business. Set to provide replacement words for the word to be corrected, and finally determine whether error correction is required according to the logical score of the corresponding sentence before and after the replacement of the word to be corrected.
  • the embodiment of the present application can find the recognition errors caused by mispronunciation in the specified business field, thereby improving the discovery probability of the words to be corrected; and using the Chinese character confusion set and logic score comparison, these recognition errors Effective correction can reduce the miscorrection rate of speech recognition text, thereby effectively improving the accuracy of speech recognition text, so that speech recognition technology can play a greater role in digital medical, smart home and other fields.
  • FIG. 9 is a schematic diagram of a speech recognition error correction system provided by an embodiment of the present application.
  • the system 900 includes but is not limited to a first module 910, a second module 920, a third module 930, a fourth module 940, and a fifth module.
  • the first module is used for speech recognition of the speech to be detected, and obtains the text to be detected and the corresponding pronunciation sequence to be detected;
  • the second module is used to construct the FST to be detected according to the pronunciation sequence to be detected;
  • the third module is used to obtain the keyword FST and Chinese character confusion set; wherein, keyword FST, Chinese character confusion set and FST to be detected belong to the same vertical field;
  • the fourth module is used to determine some words to be corrected in the text to be detected and some Sentence to be corrected; wherein, the sentence to be corrected contains words to be corrected;
  • the fifth module is used to determine the corresponding replacement of each word to be corrected according to the confusion set of Chinese characters if the words to be corrected exist in the confusion set of Chinese characters word;
  • the sixth module is used to replace the word to be corrected in the sentence to be corrected with a replacement word to obtain a replacement sentence;
  • the seventh module is used to calculate the first logic score of the sentence to be corrected and the first logic score of the replacement sentence Two logic scores;
  • FIG. 10 is a schematic diagram of a device provided by an embodiment of the present application.
  • the device 1000 includes at least one processor 1010 and at least one memory 1020 for storing at least one program; in FIG. 10, a processor and a memory as an example.
  • the processor and the memory may be connected through a bus or in other ways, and connection through a bus is taken as an example in FIG. 10 .
  • memory can be used to store non-transitory software programs and non-transitory computer-executable programs.
  • the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device or other non-transitory solid-state storage device.
  • the memory optionally includes memory located remotely from the processor, which remote memory may be connected to the device via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • Another embodiment of the present application also provides an apparatus, which can be used to execute the control method in any of the above embodiments, for example, execute the method steps in FIG. 1 described above.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the embodiment of the present application also discloses a computer storage medium, which stores a program executable by the processor, wherein the program executable by the processor is used to implement the speech recognition error correction method proposed by the present application when executed by the processor,
  • the computer readable storage medium can be nonvolatile or volatile.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

The present application relates to artificial intelligence technology. Disclosed are a speech recognition error correction method and system, and an apparatus and a storage medium. The method comprises: performing speech recognition on speech to be subjected to detection, so as to obtain text to be subjected to detection and a corresponding pronunciation sequence to be subjected to detection; according to said pronunciation sequence, constructing an FST to be subjected to detection; according to said FST and a keyword FST, determining several words to be subjected to error correction in said text, and determining a sentence to be subjected to error correction that includes said words; if said words are present in a Chinese character confusion set, determining a replacement word corresponding to each of said words; replacing said words in said sentence with the replacement words, so as to generate a replacement sentence; and when a first logic score of said sentence is less than a second logic score of the replacement sentence, replacing said sentence in said text with the replacement sentence, thereby completing error correction. By means of the technical solution in the present application, words which may have an error can be determined according to pronunciation, thereby realizing effective error correction on speech recognition text.

Description

语音识别纠错方法、系统、装置及存储介质Speech recognition error correction method, system, device and storage medium
本申请要求于2021年9月10日提交中国专利局、申请号为202111064048.5,发明名称为“语音识别纠错方法、系统、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202111064048.5 filed on September 10, 2021, and the title of the invention is "Speech Recognition Error Correction Method, System, Device, and Storage Medium", the entire content of which is incorporated by reference incorporated in this application.
技术领域technical field
本申请涉及人工智能技术,尤其涉及一种语音识别纠错方法、系统、装置及存储介质。The present application relates to artificial intelligence technology, and in particular to a speech recognition error correction method, system, device and storage medium.
背景技术Background technique
随着深度学习技术的不断发展,应用深度学习技术的语音识别领域取得重大的突破,语音识别(Automatic Speech Recognition,ASR)的准确率也越来越高。相对于其他人机交互的方式,以语音识别为主的交互更简单,而且符合人们的日常习惯,因此语音识别技术正在逐渐渗透到智能家居、数字医疗、自动驾驶等领域。With the continuous development of deep learning technology, a major breakthrough has been made in the field of speech recognition using deep learning technology, and the accuracy of automatic speech recognition (ASR) is getting higher and higher. Compared with other human-computer interaction methods, the interaction based on speech recognition is simpler and conforms to people's daily habits. Therefore, speech recognition technology is gradually penetrating into smart home, digital medical care, automatic driving and other fields.
但是发明人意识到,在实际的应用中,语音识别技术还是受到很大的限制,例如用户的发音不够标准,环境噪音较大等等因素都会影响语音识别的准确性,为了提高语音识别的准确性,相关技术中提出基于文本对语音识别文本进行语法或句法纠错的方案,但这种方案准确率较低,而且,对于不同垂直领域有不同的错误模式,相关技术中的方案难以有效地发现语音识别文本中的错误,从而导致语音识别技术的准确度下降。However, the inventor realizes that in practical applications, speech recognition technology is still subject to great limitations, such as the user's pronunciation is not standard enough, environmental noise is large and other factors will affect the accuracy of speech recognition, in order to improve the accuracy of speech recognition In the related art, a text-based grammatical or syntactic error correction scheme for the speech recognition text is proposed, but the accuracy of this scheme is low, and there are different error modes for different vertical fields, so it is difficult for the scheme in the related art to be effective. Errors in speech recognition text are found, resulting in decreased accuracy of speech recognition technology.
发明内容Contents of the invention
本申请旨在至少在一定程度上解决相关技术中的技术问题之一。为此,本申请提供一种语音识别纠错方法、系统、装置及存储介质,能够实现对语音识别文本进行纠错,提高语音识别的准确度的目的。This application aims to solve one of the technical problems in the related art at least to a certain extent. To this end, the present application provides a voice recognition error correction method, system, device and storage medium, which can realize the purpose of correcting voice recognition text and improving the accuracy of voice recognition.
为实现上述目的,本申请实施例提供了一种语音识别纠错方法,所述方法包括以下步骤:对待检测语音进行语音识别,得到待检测文本和对应的待检测发音序列;根据所述待检测发音序列,构建待检测FST;获取关键词FST和汉字混淆集;其中,所述关键词FST、所述汉字混淆集及所述待检测FST属于同一垂直领域;根据所述待检测FST和所述关键词FST,确定所述待检测文本中的若干待纠错字词以及若干待纠错语句;其中,所述待纠错语句包含所述待纠错字词;若所述待纠错字词存在于所述汉字混淆集中,根据所述汉字混淆集确定每个所述待纠错字词对应的替换字词;将所述待纠错语句中的所述待纠错字词替换为所述替换字词,得到替换语句;计算所述待纠错语句的第一逻辑得分和所述替换语句的第二逻辑得分;当所述第一逻辑得分小于所述第二逻辑得分,将所述待检测文本中的待纠错语句替换为所述替换语句。In order to achieve the above object, an embodiment of the present application provides a speech recognition error correction method, the method includes the following steps: performing speech recognition on the speech to be detected, obtaining the text to be detected and the corresponding pronunciation sequence to be detected; according to the speech to be detected Pronunciation sequence, construct FST to be detected; Obtain keyword FST and Chinese character confusion set; Wherein, described keyword FST, described Chinese character confusion set and described FST to be detected belong to the same vertical field; According to described FST to be detected and described The keyword FST determines some words to be corrected and some sentences to be corrected in the text to be detected; wherein, the sentences to be corrected include the words to be corrected; if the words to be corrected Exist in the Chinese character confusion set, determine the replacement words corresponding to each word to be corrected according to the Chinese character confusion set; replace the word to be corrected in the sentence to be corrected with the Replace words to obtain a replacement sentence; calculate the first logic score of the sentence to be corrected and the second logic score of the replacement sentence; when the first logic score is less than the second logic score, the The sentence to be corrected in the detected text is replaced with the replacement sentence.
为实现上述目的,本申请实施例还提出了一种语音识别纠错系统,所述系统包括第一模块、第二模块、第三模块、第四模块、第五模块、第六模块和第七模块;所述第一模块用于对待检测语音进行语音识别,得到待检测文本和对应的待检测发音序列;所述第二模块用于根据所述待检测发音序列,构建待检测FST;所述第三模块用于获取关键词FST和汉字混淆集;其中,所述关键词FST、所述汉字混淆集及所述待检测FST属于同一垂直领域;所述第四模块用于根据所述待检测FST和所述关键词FST,确定所述待检测文本中的若干待纠错字词以及若干待纠错语句;其中,所述待纠错语句包含所述待纠错字词;所述第五模块用于若所述待纠错字词存在于所述汉字混淆集中,根据所述汉字混淆集确定每个所述待纠错字词对应的替换字词;所述第六模块用于将所述待纠错语句中的所述待纠错字词替换为所述替换字词,得到替换语句;所述第七模块用于计算所述待纠错语句的第一逻辑得分和所述替换语句的第二逻辑得分;所述第八模块用于当第一逻辑得分小于第二逻辑得分,将所述待检测文本中的待纠错语句替换为所述替换语句。In order to achieve the above purpose, the embodiment of the present application also proposes a speech recognition error correction system, the system includes a first module, a second module, a third module, a fourth module, a fifth module, a sixth module and a seventh module module; the first module is used to perform speech recognition on the speech to be detected to obtain the text to be detected and the corresponding pronunciation sequence to be detected; the second module is used to construct the FST to be detected according to the pronunciation sequence to be detected; the The third module is used to obtain keyword FST and Chinese character confusion set; Wherein, described keyword FST, described Chinese character confusion set and described to-be-detected FST belong to the same vertical domain; Described fourth module is used for according to described to-be-detected FST and the keyword FST determine some words to be corrected and some sentences to be corrected in the text to be detected; wherein, the sentences to be corrected include the words to be corrected; the fifth The module is used to determine the replacement words corresponding to each of the words to be corrected according to the confusion set of Chinese characters if the word to be corrected exists in the Chinese character confusion set; the sixth module is used to The word to be corrected in the sentence to be corrected is replaced with the replacement word to obtain a replacement sentence; the seventh module is used to calculate the first logic score of the sentence to be corrected and the replacement sentence The second logic score; the eighth module is used to replace the sentence to be corrected in the text to be detected with the replacement sentence when the first logic score is less than the second logic score.
为实现上述目的,本申请实施例还提出了一种装置,所述装置包括:至少一个处理器;至少一个存储器,用于存储至少一个程序;当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现一种语音识别纠错方法;其中,所述语音识别纠错方法包括:对待检测语音进行语音识别,得到待检测文本和对应的待检测发音序列;根据所述待检测发音序列,构建待检测FST;获取关 键词FST和汉字混淆集;其中,所述关键词FST、所述汉字混淆集及所述待检测FST属于同一垂直领域;根据所述待检测FST和所述关键词FST,确定所述待检测文本中的若干待纠错字词以及若干待纠错语句;其中,所述待纠错语句包含所述待纠错字词;若所述待纠错字词存在于所述汉字混淆集中,根据所述汉字混淆集确定每个所述待纠错字词对应的替换字词;将所述待纠错语句中的所述待纠错字词替换为所述替换字词,得到替换语句;计算所述待纠错语句的第一逻辑得分和所述替换语句的第二逻辑得分;当所述第一逻辑得分小于所述第二逻辑得分,将所述待检测文本中的待纠错语句替换为所述替换语句。In order to achieve the above purpose, an embodiment of the present application also proposes a device, the device comprising: at least one processor; at least one memory for storing at least one program; when the at least one program is executed by the at least one processor Executing, so that the at least one processor implements a speech recognition error correction method; wherein, the speech recognition error correction method includes: performing speech recognition on the speech to be detected to obtain the text to be detected and the corresponding pronunciation sequence to be detected; according to the Describe the pronunciation sequence to be detected, construct the FST to be detected; obtain the keyword FST and the confusion set of Chinese characters; wherein, the keyword FST, the confusion set of Chinese characters and the FST to be detected belong to the same vertical field; according to the FST to be detected and the keyword FST, determine some words to be corrected and some sentences to be corrected in the text to be detected; wherein, the sentences to be corrected include the words to be corrected; if the words to be corrected Wrongly written words exist in the confusion set of Chinese characters, and according to the confusion set of Chinese characters, the replacement words corresponding to each word to be corrected are determined; the words to be corrected in the sentence to be corrected are replaced For the replacement words, obtain a replacement sentence; calculate the first logic score of the sentence to be corrected and the second logic score of the replacement sentence; when the first logic score is less than the second logic score, the The sentence to be corrected in the text to be detected is replaced with the replacement sentence.
为实现上述目的,本申请实施例还提供了一种计算机存储介质,其中存储有处理器可执行的程序,所述处理器可执行的程序在由所述处理器执行时实现一种语音识别纠错方法;其中,所述语音识别纠错方法包括:对待检测语音进行语音识别,得到待检测文本和对应的待检测发音序列;根据所述待检测发音序列,构建待检测FST;获取关键词FST和汉字混淆集;其中,所述关键词FST、所述汉字混淆集及所述待检测FST属于同一垂直领域;根据所述待检测FST和所述关键词FST,确定所述待检测文本中的若干待纠错字词以及若干待纠错语句;其中,所述待纠错语句包含所述待纠错字词;若所述待纠错字词存在于所述汉字混淆集中,根据所述汉字混淆集确定每个所述待纠错字词对应的替换字词;将所述待纠错语句中的所述待纠错字词替换为所述替换字词,得到替换语句;计算所述待纠错语句的第一逻辑得分和所述替换语句的第二逻辑得分;当所述第一逻辑得分小于所述第二逻辑得分,将所述待检测文本中的待纠错语句替换为所述替换语句。In order to achieve the above object, the embodiment of the present application also provides a computer storage medium, which stores a program executable by the processor, and the program executable by the processor implements a speech recognition correction function when executed by the processor Error method; wherein, the speech recognition error correction method includes: performing speech recognition on the speech to be detected, obtaining the text to be detected and the corresponding pronunciation sequence to be detected; according to the pronunciation sequence to be detected, constructing the FST to be detected; obtaining the keyword FST and Chinese character confusion set; wherein, the keyword FST, the Chinese character confusion set and the FST to be detected belong to the same vertical field; according to the FST to be detected and the keyword FST, determine the text to be detected Some words to be corrected and some sentences to be corrected; wherein, the sentences to be corrected include the words to be corrected; if the words to be corrected exist in the confusion set of Chinese characters, according to the Chinese characters The confusion set determines the replacement word corresponding to each word to be corrected; the word to be corrected in the sentence to be corrected is replaced by the replacement word to obtain a replacement sentence; the word to be corrected is calculated. The first logic score of the error correction sentence and the second logic score of the replacement sentence; when the first logic score is less than the second logic score, replace the error correction sentence in the text to be detected with the replace statement.
本申请实施例的有益效果如下:首先对待检测语音进行语音识别,获取到待检测文本和对应的待检测发音序列;根据待检测发音序列构建待检测FST;根据待检测FST和获取到的关键词FST,确定待检测文本中的若干待纠错字词,并确定包含待纠错字词的待纠错语句;若获取到的汉字混淆集中存在所述待纠错字词,确定每个待纠错字词对应的替换字词,并将待纠错语句中的待纠错字词替换为替换字词,生成替换语句;计算待纠错语句的第一逻辑得分以及替换语句的第二逻辑得分,当第一逻辑得分小于第二逻辑得分,将待检测文本中的待纠错语句替换为替换语句,从而完成语音识别文本的纠错。与相关技术中依赖语法或句法的方案相比,本申请实施例提出的语音识别纠错方法是根据语音识别文本的发音确定可能存在错误的待纠错字词,并根据对应业务中的汉字混淆集,为该待纠错字词提供替换字词,最后根据该待纠错字词替换前后,对应语句的逻辑得分来确定是否需要进行纠错。可见,本申请实施例能够发现指定业务领域中出现的、由于发音错误而导致的识别错误,从而提高待纠错字词的被发现概率;并利用汉字混淆集和逻辑得分比较,对这些识别错误作有效的纠正,降低语音识别文本的误纠率,从而有效地提高语音识别文本的准确率,令语音识别技术能够在数字医疗、智能家居等领域发挥更大的作用。The beneficial effects of the embodiments of the present application are as follows: first, speech recognition is performed on the speech to be detected, and the text to be detected and the corresponding pronunciation sequence to be detected are obtained; the FST to be detected is constructed according to the pronunciation sequence to be detected; the FST to be detected is constructed according to the keyword to be detected FST, determine some words to be corrected in the text to be detected, and determine the sentence to be corrected that contains the word to be corrected; if there is the word to be corrected in the obtained Chinese character confusion set, determine each word to be corrected The replacement word corresponding to the wrong word, and replace the word to be corrected in the sentence to be corrected with the replacement word to generate a replacement sentence; calculate the first logic score of the sentence to be corrected and the second logic score of the replacement sentence , when the first logic score is smaller than the second logic score, the sentence to be corrected in the text to be detected is replaced with a replacement sentence, thereby completing the error correction of the speech recognition text. Compared with the solutions that rely on grammar or syntax in the related art, the speech recognition error correction method proposed in the embodiment of the present application is to determine the words to be corrected that may have errors according to the pronunciation of the speech recognition text, and to confuse them according to the Chinese characters in the corresponding business. Set to provide replacement words for the word to be corrected, and finally determine whether error correction is required according to the logical score of the corresponding sentence before and after the replacement of the word to be corrected. It can be seen that the embodiment of the present application can find the recognition errors caused by mispronunciation in the specified business field, thereby increasing the probability of finding the words to be corrected; Make effective corrections, reduce the miscorrection rate of speech recognition texts, thereby effectively improving the accuracy of speech recognition texts, so that speech recognition technology can play a greater role in digital medical, smart home and other fields.
附图说明Description of drawings
附图用来提供对本申请技术方案的进一步理解,并且构成说明书的一部分,与本申请的实施例一起用于解释本申请的技术方案,并不构成对本申请技术方案的限制。The accompanying drawings are used to provide a further understanding of the technical solution of the present application, and constitute a part of the specification, and are used together with the embodiments of the present application to explain the technical solution of the present application, and do not constitute a limitation to the technical solution of the present application.
图1是本申请实施例提供的语音识别纠错方法的步骤流程图;Fig. 1 is a flow chart of the steps of the speech recognition error correction method provided by the embodiment of the present application;
图2为本申请实施例提供的待检测FST的示意图;Fig. 2 is the schematic diagram of the FST to be detected provided by the embodiment of the present application;
图3为本申请实施例提供的构建关键词FST和构建汉字混淆集的步骤流程图;Fig. 3 is the step flowchart of constructing keyword FST and constructing Chinese character confusion set that the embodiment of the present application provides;
图4为本申请实施例提供的构建汉字混淆集的步骤流程图;Fig. 4 is the flow chart of the steps of constructing the confusion set of Chinese characters provided by the embodiment of the present application;
图5为本申请实施例提供的构建发音混淆集的步骤流程图;Fig. 5 is the flow chart of the steps of constructing the pronunciation confusion set provided by the embodiment of the present application;
图6为本申请实施例提供的构建关键词表的步骤流程图;FIG. 6 is a flow chart of steps for constructing a keyword table provided by an embodiment of the present application;
图7为本申请实施例提供的构建关键词FST的步骤流程图;Fig. 7 is the flow chart of the steps of constructing keyword FST provided by the embodiment of the present application;
图8为本申请实施例提供的关键词FST的示意图;FIG. 8 is a schematic diagram of the keyword FST provided by the embodiment of the present application;
图9为本申请实施例提供的语音识别纠错系统的示意图;FIG. 9 is a schematic diagram of a speech recognition error correction system provided by an embodiment of the present application;
图10为本申请实施例提供的装置的示意图。Fig. 10 is a schematic diagram of the device provided by the embodiment of the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, not to limit the present application.
需要说明的是,虽然在系统示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在 某些情况下,可以以不同于系统中的模块划分,或流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that although the functional modules are divided in the system schematic diagram and the logical order is shown in the flow chart, in some cases, it can be executed in a different order than the module division in the system or the flow chart steps shown or described. The terms "first", "second" and the like in the specification and claims and the above drawings are used to distinguish similar objects, and not necessarily used to describe a specific sequence or sequence.
在后续的描述中,使用用于表示元件的诸如“模块”、“部件”或“单元”的后缀仅为了有利于本申请的说明,其本身没有特有的意义。因此,“模块”、“部件”或“单元”可以混合地使用。In the subsequent description, use of suffixes such as 'module', 'part' or 'unit' for denoting elements is only for facilitating the description of the present application and has no specific meaning by itself. Therefore, 'module', 'part' or 'unit' may be used in combination.
下面结合附图,对本申请实施例作进一步阐述。The embodiments of the present application will be further described below in conjunction with the accompanying drawings.
参考图1,图1是本申请实施例提供的语音识别纠错方法的步骤流程图,该方法涉及人工智能的语音识别纠错领域,该方法包括但不限于步骤S100-S170:Referring to FIG. 1, FIG. 1 is a flow chart of the steps of the speech recognition error correction method provided by the embodiment of the present application. The method involves the field of artificial intelligence speech recognition error correction. The method includes but is not limited to steps S100-S170:
步骤S100、对待检测语音进行语音识别,得到待检测文本和对应的待检测发音序列;Step S100, performing speech recognition on the speech to be detected to obtain the text to be detected and the corresponding pronunciation sequence to be detected;
具体地,本申请实施例中的待检测语音是指一个垂直业务领域内,人员在执行业务的时候所产生的语音片段。例如,数字医疗行业内中,待检测语音可以是医生们针对某病例所展开的讨论会议的录音,或者是患者与医生进行线上交流时的录音,又或者是患者向医院前台进行电话沟通时的录音,对于医疗这一垂直领域,其业务录音中会包含大量的医学相关名词,这些名词包括但不限于医院名称、手术名称或者是药品名称等等,但是在语音识别的过程中,由于待检测语音的语音质量低,或者是说话人带有口音这些原因,这些相关名词可能会识别错误,例如将“阿司匹林”识别为“阿西匹林”,而这些错误,相关技术中基于语法或句法的语音识别纠错方案是无法发现并纠正的,因此会导致语音识别的准确度较低。Specifically, the voice to be detected in this embodiment of the present application refers to a voice segment generated by a person performing a business in a vertical business field. For example, in the digital medical industry, the voice to be detected can be the recording of a discussion meeting conducted by doctors on a certain case, or the recording of the online communication between the patient and the doctor, or the telephone communication between the patient and the front desk of the hospital. For the vertical field of medical treatment, its business recordings will contain a large number of medical-related nouns, including but not limited to hospital names, surgical names, or drug names, etc. Due to the low voice quality of the detected speech, or the speaker has an accent, these related nouns may be misrecognized, for example, "aspirin" is recognized as "asipirin", and these errors are based on grammar or syntax in related technologies Speech recognition error correction schemes cannot be found and corrected, thus resulting in lower accuracy of speech recognition.
基于此,本申请实施例提出根据发音来确定待纠错字词,因此在本步骤S100中,先对待检测语音进行语音识别,生成待检测文本,待检测文本为对应于待检测语音的一段文字序列。对待检测文本中的每个字进行查表,得到对应的拼音单元,并将所有字的拼音单元记录到待检测发音序列中,因此待检测发音序列为对应于待检测文本的一段拼音序列。Based on this, the embodiment of the present application proposes to determine the words to be corrected according to the pronunciation. Therefore, in this step S100, the speech to be detected is firstly recognized, and the text to be detected is generated. The text to be detected is a paragraph of text corresponding to the speech to be detected. sequence. Table lookup is performed for each word in the text to be detected to obtain the corresponding pinyin unit, and the pinyin units of all characters are recorded in the pronunciation sequence to be detected, so the pronunciation sequence to be detected is a pinyin sequence corresponding to the text to be detected.
例如,待检测文本为:“我很快乐”,则生成的待检测发音序列为:“wo hen kuai le”。另外需要说明的是,在本申请实施例中,当待检测文本中的字为多音字的时候,一般选取该字的常见发音作为对应的发音,例如“乐”则选取对应的拼音单元为“le”。For example, if the text to be detected is: "I am very happy", then the generated pronunciation sequence to be detected is: "wo hen kuai le". In addition, it should be noted that in the embodiment of the present application, when the word in the text to be detected is a polyphonic word, the common pronunciation of the word is generally selected as the corresponding pronunciation, for example, the corresponding pinyin unit of "乐" is selected as " le".
这样一来,确定待检测文本中的每个字的拼音之后,就可以生成待检测发音序列。In this way, after the pinyin of each word in the text to be detected is determined, the pronunciation sequence to be detected can be generated.
步骤S110、根据待检测发音序列,构建待检测FST;Step S110, constructing an FST to be detected according to the pronunciation sequence to be detected;
具体地,根据待检测发音序列,构建出一个待检测FST,FST是指一种有限状态转换器(Finite State Transducers),该种结构类似于树形图,可以用于构建词典,从而表达不同的状态以及转移路径。而在本申请实施例中,根据待检测发音序列构建出的待检测FST实际上可以看作是表达待检测文本中各个字词所对应拼音单元的路径。Specifically, according to the pronunciation sequence to be detected, a FST to be detected is constructed. FST refers to a finite state transducer (Finite State Transducers). This structure is similar to a tree diagram and can be used to construct a dictionary to express different status and transition paths. However, in the embodiment of the present application, the FST to be detected constructed according to the pronunciation sequence to be detected can actually be regarded as a path expressing the pinyin unit corresponding to each word in the text to be detected.
例如,根据步骤S110,待检测文本为:“今天天气晴朗”,则得到的待检测发音序列可以表示为:“jin tian tian qi qing lang”,再根据该待检测发音序列中拼音出现的前后顺序,可以构建对应该待检测文本的待检测FST。具体的构建结果参照图2,图2为本申请实施例提供的待检测FST的示意图,如图2所示,先建立一个根节点200,对待检测发音序列中的每个拼音单元按拼音单元顺序排列,得到六个子节点210,其顺序为:“jin-tian-tian-qi-qing-lang”,最后一个子节点指向终点220。根据上述内容中的方式,可以得到对应待检测文本的一条发音路径,也就是如图2所示的待检测FST。For example, according to step S110, the text to be detected is: "the weather is fine today", then the obtained pronunciation sequence to be detected can be expressed as: "jin tian tian qi qing lang", and then according to the order of occurrence of pinyin in the pronunciation sequence to be detected , the FST to be detected corresponding to the text to be detected can be constructed. Concrete construction result is with reference to Fig. 2, and Fig. 2 is the schematic diagram of the FST to be detected that the embodiment of the present application provides, as shown in Fig. 2, first establishes a root node 200, treats each pinyin unit in the pronunciation sequence to be detected according to the pinyin unit order Arrange to get six sub-nodes 210, the order of which is: "jin-tian-tian-qi-qing-lang", the last sub-node points to the end point 220. According to the method in the above content, a pronunciation path corresponding to the text to be detected can be obtained, that is, the FST to be detected as shown in FIG. 2 .
步骤S120、获取关键词FST和汉字混淆集;Step S120, obtaining keyword FST and Chinese character confusion set;
具体地,对应业务的关键词FST中记录着本业务中关键词表对应的拼音,该关键词FST的结构与待检测FST的结构类似,其具体步骤将在下文中展开阐述。汉字混淆集记录着本业务领域内容易混淆的字或词,汉字混淆集的具体构建过程将在下文中展开阐述。Specifically, the keyword FST corresponding to the service records the pinyin corresponding to the keyword table in this service. The structure of the keyword FST is similar to that of the FST to be detected, and its specific steps will be elaborated below. The Chinese character confusion set records words or words that are easily confused in this business field. The specific construction process of the Chinese character confusion set will be elaborated below.
需要说明的是,关键词FST、汉字混淆集及待检测FST属于同一垂直领域,因此利用关键词FST和汉字混淆集,可以更准确地找出待检测文本中的错误。It should be noted that the keyword FST, the Chinese character confusion set and the FST to be detected belong to the same vertical field, so the errors in the text to be detected can be found more accurately by using the keyword FST and the Chinese character confusion set.
步骤S130、根据待检测FST和关键词FST,确定待检测文本中的若干待纠错字词以及若干待纠错语句;其中,待纠错语句包含待纠错字词;;Step S130, according to the FST to be detected and the keyword FST, determine a number of words to be corrected and a number of sentences to be corrected in the text to be detected; wherein, the sentences to be corrected include words to be corrected;
具体地,由于关键词FST和待检测FST的结构类似,因此可以很方便地将关键词FST和待检测FST进行重组比较,也就是将待检测FST中的拼音单元与关键词FST中的拼音单元进行对比,若拼音相同,则记为相同节点,若待检测FST和关键词FST中存在一个或多个相同节点,则将这些相同节点对应的待检测文本中的字词作为待纠错字词,因此,这些相同节点的数量与对应的待纠错字词的 字数相同。Specifically, since the structure of the keyword FST and the FST to be detected is similar, it is convenient to reorganize and compare the keyword FST and the FST to be detected, that is, to compare the pinyin unit in the FST to be detected with the pinyin unit in the keyword FST For comparison, if the pinyin is the same, it is recorded as the same node, if there is one or more identical nodes in the FST to be detected and the keyword FST, then the words in the text to be detected corresponding to these same nodes are used as words to be corrected , therefore, the number of these identical nodes is the same as the number of words of the corresponding word to be corrected.
例如,如图2所示的待检测FST中存在一个节点“lang”,若关键词FST中也存在一个节点为“lang”,则将待检测文本中对应位置的“朗”字作为待纠错字词;而图2中存在两个连续的子节点为“qing-lang”,若关键词FST中也存在两个连续的节点为“qing-lang”,则将待检测文本中对应位置的“晴朗”作为待纠错字词。For example, there is a node "lang" in the FST to be detected as shown in Figure 2, if there is also a node "lang" in the keyword FST, then the word "Lang" in the corresponding position in the text to be detected is used as the error to be corrected words; and there are two continuous child nodes in Fig. 2 as "qing-lang", if there are also two continuous nodes in the keyword FST as "qing-lang", then the "qing-lang" in the corresponding position in the text to be detected will be Sunny" as the word to be corrected.
需要说明的是,由于待检测FST是按待检测文本的拼音单元顺序构建而成的,因此在利用关键词FST进行重组时,也应该遵循拼音单元顺序,也就是说,若待检测FST和关键词FST中包含多个相同节点,这些节点应当为连续的。例如,如图2所示的待检测FST中存在两个连续的节点“tian-qi”,而关键词FST中存在三个连续的节点为“tian-ran-qi”,虽然这两个FST中均存在“tian”和“qi”这两个相同的节点,但在关键词FST中,“tian”和“qi”这两个节点并不连续,因此不能将“tian-qi”看作是本申请实施例中的相同节点,也不能将待检测文本中的“天气”作为待纠错字词。It should be noted that since the FST to be detected is constructed according to the order of the pinyin units of the text to be detected, the order of the pinyin units should also be followed when using the keyword FST for reorganization. That is to say, if the FST to be detected and the key The word FST contains multiple identical nodes, and these nodes should be continuous. For example, there are two consecutive nodes "tian-qi" in the FST to be detected as shown in Figure 2, and there are three consecutive nodes "tian-ran-qi" in the keyword FST, although the two FSTs There are two identical nodes "tian" and "qi", but in the keyword FST, the two nodes "tian" and "qi" are not continuous, so "tian-qi" cannot be regarded as the The same node in the embodiment of the application cannot use "weather" in the text to be detected as the word to be corrected.
根据上述内容,可以确定待检测文本中的待纠错字词,这些待纠错字词都存在于语句之中。相关技术中已经有比较成熟的分句方案,可以利用相关技术可以对待检测文本进行分句,并将包含一个或多个待纠错字词的语句确定为待纠错语句。According to the above content, the words to be corrected in the text to be detected can be determined, and these words to be corrected all exist in the sentence. There is already a relatively mature sentence segmentation scheme in the related technology, and the text to be detected can be segmented by using the related technology, and the sentence containing one or more words to be corrected can be determined as the sentence to be corrected.
根据本步骤S130,可以确定待检测文本中所有待纠错字词,并且确定包含这些待纠错字词的待检测语句。由于待纠错字词是根据与关键词FST进行重组,通过发音筛选得到的,因此能够有效地提高指定业务领域中出现的、由于发音错误而导致的识别错误的被发现率,有助于降低重纠率,提高语音识别的准确性。According to this step S130, all the words to be corrected in the text to be detected can be determined, and the sentences to be detected containing these words to be corrected can be determined. Since the words to be corrected are obtained through pronunciation screening based on the recombination with the keyword FST, it can effectively increase the discovery rate of recognition errors caused by mispronunciation in the specified business field, and help reduce Re-correction rate, improve the accuracy of speech recognition.
步骤S140、若待纠错字词存在于汉字混淆集中,根据汉字混淆集确定每个待纠错字词对应的替换字词;Step S140, if the word to be corrected exists in the Chinese character confusion set, determine the replacement word corresponding to each word to be corrected according to the Chinese character confusion set;
将步骤S130中确定的每个待纠错字词在本业务领域内的汉字混淆集中进行匹配,若当前待纠错字词存在于该汉字混淆集中,则该汉字混淆集中包括与该待纠错字词对应的替换字词。Each word to be corrected to be determined in step S130 is matched in the confusion set of Chinese characters in this business field, if the word to be corrected currently exists in the confusion set of Chinese characters, then the confusion set of Chinese characters includes the words that are related to the confusion to be corrected. The replacement term for the term.
步骤S150、将待纠错语句中的待纠错字词替换为替换字词,得到替换语句;Step S150, replacing the word to be corrected in the sentence to be corrected with a replacement word to obtain a replacement sentence;
具体地,用步骤S140确定的替换字词对待纠错语句中的待纠错字词进行替换,而待纠错语句中的其他部分则不改变,从而得到一个新的替换语句。Specifically, the words to be corrected in the sentence to be corrected are replaced with the replacement words determined in step S140, while other parts of the sentence to be corrected are not changed, so as to obtain a new replacement sentence.
例如,若待纠错语句为:“今天天气晴朗”,而待纠错字词为:“天气”,而“天气”存在于本业务对应的汉字混淆集中,“天气”对应的替换字词为:“田七”,则利用替换字词对待纠错字词进行替换后,生成的替换语句为:“今天田七晴朗”。For example, if the sentence to be corrected is: "The weather is fine today", and the word to be corrected is: "weather", and "weather" exists in the Chinese character confusion set corresponding to this business, the corresponding replacement word for "weather" is : "Tianqi", after replacing the word to be corrected with the replacement word, the generated replacement sentence is: "It's sunny today in Tianqi".
根据本步骤S150,对待检测文本中的所有待纠错语句中的待纠错字词进行替换,确定包含替换字词的若干替换语句。According to this step S150, the words to be corrected in all the sentences to be corrected in the text to be detected are replaced, and several replacement sentences containing the replaced words are determined.
步骤S160、计算待纠错语句的第一逻辑得分和替换语句的第二逻辑得分;Step S160, calculating the first logic score of the sentence to be corrected and the second logic score of the replacement sentence;
具体地,在本步骤S160中需要计算待纠错语句的第一逻辑得分以及对应替换语句的第二逻辑得分。Specifically, in this step S160, the first logic score of the sentence to be corrected and the second logic score of the corresponding replacement sentence need to be calculated.
在一些实施例中,可以用对应业务的语言逻辑模型来计算语句的逻辑得分。例如,利用N-gram模型对语句的逻辑得分进行计算。N-gram是一种基于统计语言模型的算法,gram指字节片段,N指字节的数量,该模型主要是根据前(N-1)个词来推测第N个词出现的概率,如二元的Bi-gram和三元的Tri-gram,对于整句话来说,这句话的概率可以根据句中各个词出现的概率得到,而句中各个词出现的概率,又可以由训练该N-gram模型的训练语料计算得到。本申请实施例不对语言逻辑模型的具体训练过程作具体限制,也不对语言逻辑模型计算逻辑得分的方式作具体限制,本申请实施例想要说明的是,通过对应业务领域内大量的文本资料,可以训练得到能够计算本业务领域内语句的逻辑得分的语言逻辑模型。In some embodiments, the language logic model of the corresponding service can be used to calculate the logic score of the statement. For example, use the N-gram model to calculate the logic score of the sentence. N-gram is an algorithm based on a statistical language model. gram refers to a byte segment, and N refers to the number of bytes. This model mainly estimates the probability of the Nth word appearing based on the previous (N-1) words, such as Binary Bi-gram and ternary Tri-gram, for the whole sentence, the probability of this sentence can be obtained according to the probability of each word in the sentence, and the probability of each word in the sentence can be obtained by training The training corpus of the N-gram model is calculated. The embodiment of this application does not specifically limit the specific training process of the language logic model, nor does it specifically limit the way the language logic model calculates the logic score. What the embodiment of the application wants to illustrate is that by corresponding to a large amount of text data in the business field A language logic model capable of calculating logic scores of sentences in the business domain can be trained.
因此,将待纠错语句输入语言逻辑模型,可以确定第一逻辑得分。例如,待纠错语句为:“我爱读书”,根据业务词典等相关技术对待纠错语句进行必要的分词等处理后,可以确定待纠错语句可以分为以下几个词:“我”、“爱”、“读书”,则在一些实施例中,该待纠错语句的逻辑得分公式可以如下表示:Therefore, the first logic score can be determined by inputting the sentence to be corrected into the language logic model. For example, the sentence to be corrected is: "I love reading", and after performing necessary word segmentation and other processing on the sentence to be corrected according to related technologies such as business dictionaries, it can be determined that the sentence to be corrected can be divided into the following words: "I", "Love", "reading", in some embodiments, the logic score formula of the sentence to be corrected can be expressed as follows:
p(我爱读书)=p(我|)+p(爱|我)+p(读书|爱)+p(|读书)p(I love reading)=p(I|)+p(love|I)+p(reading|love)+p(|reading)
其中,p表示概率,“|”表示集合。Among them, p represents the probability, and "|" represents the set.
而通过预先训练好的语言逻辑模型,可以确定每个部分的出现概率,例如p(我|)=-0.2、p(爱| 我)=-0.8、p(读书|爱)=-0.7、p(|读书)=-0.4,则根据上式计算可得p(我爱读书)=-2.1,也就是第一逻辑得分为-2.1。And through the pre-trained language logic model, the occurrence probability of each part can be determined, for example, p(me|)=-0.2, p(love|me)=-0.8, p(reading|love)=-0.7, p (|Reading)=-0.4, then according to the above calculation, p(I love reading)=-2.1, that is, the first logic score is -2.1.
同理,将替换语句输入同一语言逻辑模型,根据上述步骤计算得到替换语句的第二逻辑得分。Similarly, the replacement sentence is input into the same language logic model, and the second logic score of the replacement sentence is calculated according to the above steps.
步骤S170、当待纠错语句的第一逻辑得分小于替换语句的第二逻辑得分,将待检测文本中的待纠错语句替换为替换语句;Step S170, when the first logic score of the sentence to be corrected is less than the second logic score of the replacement sentence, replace the sentence to be corrected in the text to be detected with the replacement sentence;
具体地,将步骤S160计算得到的第一逻辑得分和第二逻辑得分进行对比,假设第一逻辑得分为-2.1,第二逻辑得分为-2.0,则第一逻辑得分小于第二逻辑得分,也就是说对于本业务领域来说,替换语句的语言逻辑更加通顺,更有可能为正确的语句,因此,则将该待纠错语句替换为对应的替换语句。Specifically, compare the first logical score and the second logical score calculated in step S160, assuming that the first logical score is -2.1 and the second logical score is -2.0, then the first logical score is smaller than the second logical score, and That is to say, for this business field, the language logic of the replacement sentence is more fluent, and it is more likely to be a correct sentence. Therefore, the sentence to be corrected is replaced with the corresponding replacement sentence.
类似地,对待检测文本中的所有待纠错语句和对应的替换语句进行逻辑得分比较,则可以完成对待检测文本的语音识别纠错。Similarly, by comparing the logic scores of all the sentences to be corrected in the text to be detected with the corresponding replacement sentences, the speech recognition error correction of the text to be detected can be completed.
而可以理解的是,根据本步骤S170,是否将待纠错语句替换为替换语句是取决于两种语句的逻辑得分,而在步骤S130中说明,待纠错语句为包含有若干待纠错字词的语句,当待纠错语句中包含有多个待纠错字词,那么可以理解的是,同一待纠错语句可能会生成多种替换语句,而根据不同的替换语句很可能会得到不同的第二逻辑得分。And it can be understood that, according to this step S170, whether the sentence to be corrected is replaced with a replacement sentence depends on the logic scores of the two sentences, and in step S130, it is explained that the sentence to be corrected is a sentence containing several words to be corrected Words, when the sentence to be corrected contains multiple words to be corrected, then it is understandable that the same sentence to be corrected may generate multiple replacement sentences, and different replacement sentences may get different The second logical score of .
例如,待纠错语句为“今天天气晴朗”,并且该待纠错语句中的待纠错字词为“天气”和“晴朗”,“天气”对应的替换字词为“田七”,“晴朗”对应的替换字词为“情郎”,在不同的实施例中,可以生成不同的替换语句。For example, the sentence to be corrected is "the weather is sunny today", and the words to be corrected in the sentence to be corrected are "weather" and "sunny", the corresponding replacement words for "weather" are "Tianqi", " The replacement word corresponding to "sunny" is "love man". In different embodiments, different replacement sentences can be generated.
例如,在一些实施例中,针对该待纠错语句,需要对这一待纠错语句中的需要纠错的字词进行排列组合,得到多种替换语句。以上述待纠错语句“今天天气晴朗”为例:则将替换字词替换待纠错字词语后,可以得到三种替换语句,分别为第一替换语句:“今天田七晴朗”,第二替换语句:“今天田七情郎”,第三替换语句:“今天天气情郎”,在本实施例中,分别计算待纠错语句、第一替换语句、第二替换语句以及第三替换语句对应的逻辑得分,并选择其中逻辑得分最高的一句作为最终的纠错结果。本实施例综合整个待纠错语句中所有的替换可能,并对可能的所有替换语句进行逻辑得分的计算,能够减少因部分待纠错字词替换不准确导致的整句准确度下降。For example, in some embodiments, for the sentence to be corrected, it is necessary to arrange and combine the words to be corrected in the sentence to be corrected to obtain multiple replacement sentences. Take the above sentence to be corrected "the weather is fine today" as an example: after replacing the words to be corrected with the replacement words, three kinds of replacement sentences can be obtained, which are respectively the first replacement sentence: "It is sunny today in Tianqi", the second Replacement sentence: "Today Tian Qiqing Lang", the third replacement sentence: "Today's weather lover", in this embodiment, respectively calculate the sentence corresponding to the error correction sentence, the first replacement sentence, the second replacement sentence and the third replacement sentence logic score, and select the sentence with the highest logic score as the final error correction result. This embodiment integrates all possible replacements in the entire sentence to be corrected, and calculates the logical score of all possible replacement sentences, which can reduce the accuracy of the entire sentence caused by inaccurate replacement of some words to be corrected.
又例如,在另一些实施例中,根据待纠错语句中的前后顺序,对待纠错字词进行逐个替换。以上述待纠错语句“今天天气晴朗”为例,根据前后顺序,先对“天气”一词进行替换,得到替换语句“今天田七晴朗”,计算待纠错语句“今天天气晴朗”对应的第一逻辑得分,计算替换语句“今天田七晴朗”对应的第二逻辑得分,根据逻辑得分的对比结果,确定原来的待纠错语句“今天天气晴朗”更符合语言逻辑,则无需对“天气”一词进行替换。而在“天气”一词的后面,该待纠错语句还有一个待纠错字词为“晴朗”,对应替换字词为“情郎”,则生成下一个替换语句为:“今天天气情郎”,同样可以计算待纠错语句和该替换语句的逻辑得分,最终确定“晴朗”一词也无需替换,从而完成对该待纠错语句的纠错。本实施例根据待纠错语句中待纠错字词的前后顺序,逐个替换逐个对比,逻辑更加简单,而且对于一个待纠错字词对应多个替换字词的情况,可以减少运算次数,提高纠错效率。For another example, in some other embodiments, the words to be corrected are replaced one by one according to the sequence in the sentence to be corrected. Taking the above-mentioned sentence to be corrected "the weather is fine today" as an example, first replace the word "weather" according to the order of front and back, and obtain the replacement sentence "Tianqi is sunny today", and calculate the corresponding The first logical score is to calculate the second logical score corresponding to the replacement sentence "Today Tianqi is sunny", and according to the comparison result of the logical score, it is determined that the original sentence to be corrected "the weather is sunny today" is more in line with language logic, and there is no need to correct the "weather ” to replace it. And behind the word "weather", there is another word to be corrected in the sentence to be corrected, which is "sunny", and the corresponding replacement word is "love man", then the next replacement sentence is generated as: "today's weather is love man" , the logic score of the sentence to be corrected and the replacement sentence can also be calculated, and finally it is determined that the word "sunny" does not need to be replaced, thereby completing the error correction of the sentence to be corrected. According to the sequence of the words to be corrected in the sentence to be corrected, the present embodiment replaces one by one and compares them one by one, the logic is simpler, and for the situation that one word to be corrected corresponds to multiple replacement words, the number of calculations can be reduced and the improvement can be improved. error correction efficiency.
根据以上实施例,本申请不对包含多个待纠错字词的待纠错语句的处理方法进行具体限制,本申请实施例想要说明的是,根据待纠错语句和替换语句的逻辑得分对比,可以提高待纠错字词的替换的准确率,从而降低语音识别的误纠率。According to the above embodiments, this application does not specifically limit the processing method of the sentence to be corrected that contains multiple words to be corrected. , can improve the accuracy rate of the replacement of the words to be corrected, thereby reducing the miscorrection rate of speech recognition.
通过步骤S100-S170,本申请实施例提供了一种语音识别纠错方法,首先对待检测语音进行语音识别,获取到待检测文本和对应的待检测发音序列;根据待检测发音序列构建待检测FST;根据待检测FST和获取到的关键词FST,确定待检测文本中的若干待纠错字词,并确定包含待纠错字词的待纠错语句;若获取到的汉字混淆集中存在待纠错字词,确定每个待纠错字词对应的替换字词,并将待纠错语句中的待纠错字词替换为替换字词,生成替换语句;计算待纠错语句的第一逻辑得分以及替换语句的第二逻辑得分,当第一逻辑得分小于第二逻辑得分,将待检测文本中的待纠错语句替换为替换语句,从而完成语音识别文本的纠错。Through steps S100-S170, the embodiment of the present application provides a speech recognition and error correction method, firstly perform speech recognition on the speech to be detected, obtain the text to be detected and the corresponding pronunciation sequence to be detected; construct the FST to be detected according to the pronunciation sequence to be detected ; According to the FST to be detected and the keyword FST obtained, determine some words to be corrected in the text to be detected, and determine the sentence to be corrected that contains the words to be corrected; Wrong word, determine the replacement word corresponding to each word to be corrected, and replace the word to be corrected in the sentence to be corrected with the replacement word to generate a replacement sentence; calculate the first logic of the sentence to be corrected score and the second logic score of the replacement sentence, when the first logic score is smaller than the second logic score, the sentence to be corrected in the text to be detected is replaced with the replacement sentence, thereby completing the error correction of the speech recognition text.
与相关技术中依赖语法或句法的方案相比,本申请实施例提出的语音识别纠错方法是根据语音识别文本的发音确定可能存在错误的待纠错字词,并根据对应业务中的汉字混淆集,为该待纠错字词提供替换字词,最后根据该待纠错字词替换前后,对应语句的逻辑得分来确定是否需要进行纠错。可见,本申请实施例能够发现指定业务领域中出现的、由于发音错误而导致的识别错误,从而提高待纠错字 词的被发现概率;并利用汉字混淆集和逻辑得分比较,对这些识别错误作有效的纠正,降低语音识别文本的误纠率,从而有效地提高语音识别文本的准确率,令语音识别技术能够在数字医疗、智能家居等领域发挥更大的作用。Compared with the solutions that rely on grammar or syntax in the related art, the speech recognition error correction method proposed in the embodiment of the present application is to determine the words to be corrected that may have errors according to the pronunciation of the speech recognition text, and to confuse them according to the Chinese characters in the corresponding business. Set to provide replacement words for the word to be corrected, and finally determine whether error correction is required according to the logical score of the corresponding sentence before and after the replacement of the word to be corrected. It can be seen that the embodiment of the present application can find the recognition errors caused by mispronunciation in the specified business field, thereby increasing the probability of finding the words to be corrected; Make effective corrections, reduce the miscorrection rate of speech recognition texts, thereby effectively improving the accuracy of speech recognition texts, so that speech recognition technology can play a greater role in digital medical, smart home and other fields.
在一些实施例中,本申请实施例提出的语音识别纠错方法还包括构建关键词FST和构建汉字混淆集的步骤,参照图3,图3为本申请实施例提供的构建关键词FST和构建汉字混淆集的步骤流程图,该方法包括但不限于步骤S300-S380:In some embodiments, the speech recognition error correction method proposed by the embodiment of the present application also includes the steps of constructing a keyword FST and constructing a confusion set of Chinese characters. Referring to FIG. 3, FIG. The flow chart of the steps of the Chinese character confusion set, the method includes but not limited to steps S300-S380:
步骤S300、获取训练语音,训练语音与待检测语音属于同一垂直领域;Step S300, acquiring the training voice, the training voice and the voice to be detected belong to the same vertical field;
具体地,获取多条训练语音,该训练语音与待检测语音属于同一垂直领域,上述内容中已经说明,例如在数字医疗行业,可以将某个地区患者与医生进行线上交流时的所有录音片段作为训练语音。Specifically, multiple training voices are obtained, and the training voice and the voice to be detected belong to the same vertical field. It has been explained above. as a training voice.
步骤S310、对训练语音进行语音识别,得到语音识别文本;Step S310, performing speech recognition on the training speech to obtain speech recognition text;
具体地,利用相关技术中的语音识别技术,对训练语音进行语音识别,并得到语音识别文本,语音识别文本也就是训练语音对应的文字序列。Specifically, the speech recognition technology in the related art is used to perform speech recognition on the training speech to obtain a speech recognition text, which is a text sequence corresponding to the training speech.
步骤S320、根据语音识别文本,确定对应的第一发音序列;Step S320, according to the speech recognition text, determine the corresponding first pronunciation sequence;
具体地,本步骤S320可以参照图1中的步骤S100,即对语音识别文本中的每个字进行查表,确定其对应的唯一拼音,将这些拼音按前后顺序记录为第一发音序列。Specifically, this step S320 can refer to step S100 in FIG. 1 , that is, look up each word in the speech recognition text to determine its corresponding unique pinyin, and record these pinyin in sequence as the first pronunciation sequence.
步骤S330、对训练语音进行人工识别,得到人工识别文本;Step S330, performing manual recognition on the training speech to obtain the manual recognition text;
具体地,对步骤S310中进行过语音识别的训练语音进行人工识别,也就是通过人去听训练语音,并且将听到的结果转化为文字,记录到人工识别文本中。以中国为例,虽然大部分的人都使用中文,会听、说、读、写普通话,但是在许多地方仍保留着地方方言,导致相当一部分人的普通话带有口音,并不标准,导致训练语音中也有许多不标准的语音片段。没有经过学习,语音识别则无法很好地对训练语音中这些口音所对照的字进行很好地分辨,但是人基于社会生活经验以及上下语境,可以比较准确地对训练语音进行识别。因此在本步骤中,需要对训练语音进行人工识别,并生成人工识别文本,人工识别文本也是训练语音所对应的文字序列,与语音识别文本的字数以及词语的分布都基本上是一致的,因此语音识别文本可以与人工识别文本进行比较。Specifically, manually recognize the training speech that has undergone speech recognition in step S310, that is, let people listen to the training speech, and convert the heard result into text, and record it into the manually recognized text. Taking China as an example, although most people use Chinese and can listen, speak, read, and write Mandarin, local dialects are still preserved in many places, resulting in a considerable number of people's Mandarin with an accent, which is not standard, resulting in training There are also many non-standard speech fragments in the speech. Without learning, speech recognition cannot distinguish the words that these accents match in the training speech, but people can recognize the training speech more accurately based on social life experience and context. Therefore, in this step, it is necessary to manually recognize the training speech and generate a manually recognized text. The manually recognized text is also a character sequence corresponding to the training speech, which is basically consistent with the number of words and the distribution of words in the speech recognition text, so Speech-recognized text can be compared to human-recognized text.
步骤S340、根据人工识别文本,确定对应的第二发音序列;Step S340, according to the manually recognized text, determine the corresponding second pronunciation sequence;
具体地,本步骤S340可以参照图1中的步骤S100,即对人工识别文本中的每个字进行查表,确定其对应的唯一拼音,将这些拼音按前后顺序记录为第二发音序列。Specifically, this step S340 can refer to step S100 in FIG. 1 , that is, perform table lookup for each word in the manually recognized text, determine its corresponding unique pinyin, and record these pinyin in sequence as the second pronunciation sequence.
步骤S350、根据语音识别文本和人工识别文本,确定汉字混淆集;Step S350, determine the Chinese character confusion set according to the speech recognition text and the manual recognition text;
具体地,上述内容中说到,语音识别文本的字数及词语的分布与人工识别文本基本一致,因此可以将语音识别文本和人工识别文本进行比较,生成汉字混淆集,生成汉字混淆集的具体过程可以参照图4。Specifically, as mentioned in the above content, the number of words and the distribution of words in the speech recognition text are basically the same as those of the manual recognition text, so the speech recognition text and the manual recognition text can be compared to generate a Chinese character confusion set, and the specific process of generating a Chinese character confusion set You can refer to Figure 4.
参照图4,图4为本申请实施例提供的构建汉字混淆集的步骤流程图,该方法包括但不限于步骤S351-S354:Referring to Fig. 4, Fig. 4 is the flow chart of the steps of constructing the Chinese character confusion set provided by the embodiment of the present application, the method includes but not limited to steps S351-S354:
步骤S351、将语音识别文本中的第一字词与人工识别文本中对应位置的第二字词进行比较;Step S351, comparing the first word in the speech recognition text with the second word in the corresponding position in the manual recognition text;
具体地,上述内容中提到,语音识别文本和人工识别文本的字数及词语的分布基本一致,因此可以将语音识别文本中的字或词作为第一字词,将人工识别文本中的字或词作为第二字词,将对应位置的第一字词和第二字词进行比较。将所有第一字词和所有第二字词进行一一对应,完成语音识别文本和人工识别文本的比较。Specifically, as mentioned above, the number of words and the distribution of words in the speech recognition text and the artificial recognition text are basically the same, so the word or word in the speech recognition text can be used as the first word, and the word or word in the artificial recognition text can be used as the first word. word as the second word, compare the corresponding position of the first word and the second word. One-to-one correspondence is performed between all the first words and all the second words to complete the comparison between the speech recognition text and the manual recognition text.
可以理解的是,第一字词和第二字词应当为位置对应且字数相同,也就是说一个字与另一个字进行比较,一个词与另一个词进行比较,且这两个词的字数相同。It can be understood that the first word and the second word should be corresponding in position and have the same number of words, that is to say, one word is compared with another word, and one word is compared with another word, and the number of words of these two words same.
需要说明的是,在一些实施例中,本步骤S351可以对语音识别文本和人工识别文本进行分词后再比较,例如语音识别文本为“我走路”,人工识别文本为“窝走路”,则按照位置对应,应该是“我”和“窝”进行对比,“走路”和“走路”进行对比。It should be noted that, in some embodiments, this step S351 can perform word segmentation on the speech recognition text and the artificial recognition text and then compare them. The location corresponds, it should be "I" and "nest" for comparison, and "walking" and "walking" for comparison.
而在另一些实施例中,本步骤S351也可以对语音识别文本和人工识别文本进行逐字比较,例如语音识别文本为“我走路”,人工识别文本为“窝走路”,则按照位置对应,应该是“我”和“窝”进行对比,“走”和“走”进行对比,“路”和“路”进行对比,当发现“我”和“窝”这个位置,语音识别文本和人工识别文本存在差异,则在下一个步骤S352中进行处理。In some other embodiments, this step S351 can also compare the speech recognition text and the manual recognition text word by word. For example, the speech recognition text is "I am walking", and the manual recognition text is "Walk Walking". It should be a comparison between "I" and "wo", "walk" and "walk", and "road" and "road". When the location of "I" and "wo" is found, voice recognition text and manual recognition If there is a difference in the text, it will be processed in the next step S352.
步骤S352、若当前第一字词与当前第二字词存在差异,将当前第一字词和当前第二字词作为替 换字词,并将替换字词存入第一候选区;Step S352, if there is a difference between the current first word and the current second word, use the current first word and the current second word as the replacement word, and store the replacement word in the first candidate area;
具体地,当第一字词和第二字词存在差异,则将第一字词和第二字词都作为替换字词,存入一个候选区中。例如语音识别文本为“我走路”,人工识别文本为“窝走路”,则按照位置对应,应该是“我”和“窝”进行对比,发现“我”和“窝”位置对应,字数相同,但存在差异,则将第一字词“我”和第二字词“窝”存入同一个第一候选区中。类似地,对下一个位置的第一字词和第二字词进行比较,若存在差异,则将该第一字词和该第二字词存入另一个第一候选区中。Specifically, when there is a difference between the first word and the second word, both the first word and the second word are used as replacement words and stored in a candidate area. For example, the speech recognition text is "I am walking", and the manual recognition text is "Wo walking", then the corresponding position should be "I" and "Wo" for comparison, and it is found that the positions of "I" and "Wo" are corresponding, and the number of words is the same. But there are differences, then the first word "I" and the second word "wo" are stored in the same first candidate area. Similarly, the first word and the second word at the next position are compared, and if there is a difference, the first word and the second word are stored in another first candidate area.
上述步骤S351中说到,该步骤中可能会将语音识别文本和人工识别文本进行分词后进行比较,则在本步骤S352中,就可以直接确定替换字词是一个字还是一个词。若步骤S351中是对语音识别文本和人工识别文本进行逐字比较,则在本步骤S352中,可以根据相关技术中第一字词或第二字词前后的语言逻辑,确定替换字词具体是一个字还是一个词。本申请实施例不对替换字词的字数作具体限制,本步骤想要说明的是,可以根据语音识别文本和人工识别文本的差异,确定可以进行替换的字或词。As mentioned in the above step S351, in this step, the speech recognition text and the manual recognition text may be segmented and compared, then in this step S352, it can be directly determined whether the replacement word is a word or a phrase. If step S351 is to compare the speech recognition text and the manual recognition text word by word, then in step S352, according to the language logic before and after the first word or the second word in the related art, it can be determined that the replacement word is specifically A word is still a word. The embodiment of the present application does not specifically limit the number of words to be replaced. What this step intends to illustrate is that the words or phrases that can be replaced can be determined according to the difference between the speech recognition text and the manual recognition text.
步骤S353、当第一字词和第二字词比较完毕,将存在相同字词的若干第一候选区合并为同一个第二候选区;Step S353, when the comparison between the first word and the second word is completed, several first candidate areas with the same word are merged into the same second candidate area;
具体地,当语音识别文本中的第一字词和人工识别文本中的第二字词均比较完毕,则有多少处存在差异的地方就会有多少个第一候选区。例如语音识别文本为:“今天田七情郎”,人工识别文本为:“今天天气晴朗”,通过步骤S351-S352,可以确定一个第一候选区中的替换词有“田七”和“天气”,另一个第一候选区中的替换词有“晴朗”和“情郎”。可以理解的是,训练语音片段可能会比较长,同样的字词可能会出现多次,并且语音识别可能会对该同样的字词生成不同的识别结果,例如该语音识别文本的下一句为:“明天天气也清朗”,人工识别文本的下一句为:“明天天气也晴朗”,则对于这两句话来说,可以确定第三个第一候选区中的替换词为“清朗”和“晴朗”。由于在纠错的时候需要以人工识别文本为基础,对尽可能多的误识别可能进行纠正,因此,可以将包含有相同字词的若干第一候选区合并为同一个第二候选区,对于本步骤S353所举的例子来说,语音识别文本为:“今天田七情郎明天天气也清朗”,人工识别文本为“今天天气晴朗明天天气也晴朗”,经过第一候选区的合并,可以确定两个第二候选区,一个第二候选区中的替换词有“田七”和“天气”,另一个第二候选区中的替换词有“晴朗”、“情郎”和“清朗”。也就是说,一个第二候选区中可能存在两个以上的替换词。Specifically, when the first word in the speech recognition text is compared with the second word in the artificial recognition text, there are as many first candidate areas as there are differences. For example voice recognition text is: " Tian Qiqinglang today ", artificial recognition text is: " weather is fine today ", by step S351-S352, can determine that the replacement word in a first candidate area has " Tian Qi " and " weather " , another replacement word in the first candidate area has " sunny " and " lover ". It is understandable that the training speech segment may be relatively long, the same word may appear multiple times, and speech recognition may generate different recognition results for the same word, for example, the next sentence of the speech recognition text is: "The weather will be clear tomorrow", the next sentence of the artificially recognized text is: "The weather will be sunny tomorrow", then for these two sentences, it can be determined that the replacement words in the third first candidate area are "clear" and " sunny". Since it is necessary to correct as many misidentifications as possible on the basis of artificially recognized texts during error correction, several first candidate areas containing the same words can be merged into the same second candidate area, for For the example given in this step S353, the voice recognition text is: "Today Tian Qiqinglang will be sunny tomorrow", and the artificially recognized text is "The weather will be sunny today and the weather will be sunny tomorrow". After the merging of the first candidate area, it can be determined Two second candidate areas, the replacement words in a second candidate area have " field seven " and " weather ", and the replacement words in another second candidate area have " sunny ", " lover " and " clear and bright ". That is to say, there may be more than two replacement words in one second candidate area.
而上述内容中提到,图1中的步骤S140为:若对应业务的汉字混淆集中存在待纠错字词,确定每个待纠错字词对应的替换字词。待纠错字词就是在对应的候选区中确定能够替换的词,根据上述例子,例如待纠错字词为“清朗”,则对应的替换词是“晴朗”或者是“情郎”,将待纠错字词的位置逐个替换上所有的替换词,从而确定更符合语言逻辑的字词作为最终的纠错结果。As mentioned above, step S140 in FIG. 1 is: if there are words to be corrected in the Chinese character confusion set corresponding to the business, determine the replacement word corresponding to each word to be corrected. The word to be corrected is to determine the word that can be replaced in the corresponding candidate area. According to the above example, for example, the word to be corrected is "Qinglang", and the corresponding replacement word is "Qinglang" or "Qinglang". The positions of the error correction words are replaced one by one with all the replacement words, so as to determine the words that are more in line with the language logic as the final error correction result.
步骤S354、确定汉字混淆集,汉字混淆集包括若干第二候选区。Step S354, determining a confusion set of Chinese characters, which includes a plurality of second candidate regions.
具体地,当所有存在相同字词的第一候选区合并完毕,则汉字混淆集构建完成,该汉字混淆集中包含若干个第二候选区,每个第二候选区中包含两个以上的替换词,替换词为第一字词和第二字词。Specifically, when all the first candidate areas with the same word are merged, the construction of the Chinese character confusion set is completed. The Chinese character confusion set contains several second candidate areas, and each second candidate area contains more than two replacement words. , the replacement words are the first word and the second word.
通过步骤S351-S354,本申请实施例提供了一种构建汉字混淆集的方法,通过对比语音识别文本和人工识别文本,根据其差异确定替换词,并且将存在相同替换词的第一候选区进行合并,最大程度上对同一字词的不同误识别结果进行发现。Through steps S351-S354, the embodiment of the present application provides a method for constructing a Chinese character confusion set, by comparing the speech recognition text and the manual recognition text, determining the replacement word according to the difference, and performing the first candidate area with the same replacement word Combined to maximize the discovery of different misrecognition results for the same word.
通过步骤S351-S354,步骤S350已经描述清楚,下面开始阐述步骤S360。Through steps S351-S354, step S350 has been described clearly, and step S360 will be described below.
步骤S360、根据第一发音序列和第二发音序列,确定发音混淆集;Step S360, determine the pronunciation confusion set according to the first pronunciation sequence and the second pronunciation sequence;
具体地,由于语音识别文本和人工识别文本的字数和词语分布基本相同,而一个字可以对应一个拼音单元,因此第一发音序列和第二发音序列也可以进行对比,并生成发音混淆集,生成发音混淆集的具体过程可以参照图5。Specifically, since the speech recognition text and the artificial recognition text have basically the same word count and word distribution, and a word can correspond to a pinyin unit, the first pronunciation sequence and the second pronunciation sequence can also be compared, and a pronunciation confusion set is generated to generate Refer to Figure 5 for the specific process of the pronunciation confusion set.
参照图5,图5为本申请实施例提供的构建发音混淆集的步骤流程图,该方法包括但不限于步骤S361-S364:Referring to FIG. 5, FIG. 5 is a flow chart of steps for constructing a pronunciation confusion set provided by an embodiment of the present application. The method includes but is not limited to steps S361-S364:
步骤S361、将第一发音序列中的第一拼音单元和第二发音序列中对应位置的第二拼音单元进行比较;Step S361, comparing the first pinyin unit in the first pronunciation sequence with the second pinyin unit in the corresponding position in the second pronunciation sequence;
具体地,拼音单元为一个字的拼音,将第一发音序列中的第一拼音单元和第二发音序列中对应位置的第二拼音单元进行比较,实际上是将语音识别文本中每个字的拼音单元与人工识别文本中每个字 的拼音单元进行比较。Specifically, the pinyin unit is the pinyin of a word, and comparing the first pinyin unit in the first pronunciation sequence with the second pinyin unit in the corresponding position in the second pronunciation sequence is actually the speech recognition text of each word The phonetic unit is compared with the human-recognized phonetic unit for each word in the text.
步骤S362、若当前第一拼音单元与当前第二拼音单元存在差异,将当前第一拼音单元和当前第二拼音单元存入第一混淆区;Step S362, if there is a difference between the current first pinyin unit and the current second pinyin unit, storing the current first pinyin unit and the current second pinyin unit into the first confusion area;
具体地,若当前第一拼音单元与当前第二拼音单元存在差异,例如第一发音序列为“fen xi”(对应的文字为“分析”),而第二发音序列为“fen qi”(对应的文字为“分期”),则经过对比,“xi”和“qi”位置对应,但是存在差异,则将“xi”和“qi”存入第一混淆区。Specifically, if there is a difference between the current first pinyin unit and the current second pinyin unit, for example, the first pronunciation sequence is "fen xi" (the corresponding text is "analysis"), and the second pronunciation sequence is "fen qi" (corresponding to The text is "staging"), then after comparison, the positions of "xi" and "qi" correspond, but there is a difference, then "xi" and "qi" are stored in the first confusion area.
步骤S363、当第一拼音单元和第二拼音单元比较完毕,将若干存在相同拼音的第一混淆区合并为同一个第二混淆区;Step S363, when the comparison between the first pinyin unit and the second pinyin unit is completed, a number of first confusion areas with the same pinyin are merged into the same second confusion area;
具体地,本步骤S363可以参照步骤S353,即将若干存在相同拼音的第一混淆区合并为同一个混淆区,例如一个第一混淆区中存在的第一拼音单元为“xi”,第二拼音单元为“qi”,另一个第一混淆区中存在的第一拼音单元为“ji”,第二拼音单元为“qi”,则两个第一混淆区中存在相同的拼音单元“qi”,则将这两个第一混淆区合并为同一个第二混淆区,这个新的第二混淆区中包括三个拼音单元,分别为“xi”、“qi”和“ji”。Specifically, this step S363 can refer to step S353, which is to merge several first confusion areas with the same pinyin into the same confusion area, for example, the first pinyin unit existing in a first confusion area is "xi", and the second pinyin unit is "qi", the first pinyin unit existing in another first confusion zone is "ji", and the second pinyin unit is "qi", then there is the same pinyin unit "qi" in the two first confusion zones, then The two first confusion areas are merged into the same second confusion area, and this new second confusion area includes three pinyin units, namely "xi", "qi" and "ji".
通过混淆区的合并,可以将尽可能多的谐音都收集到同一个第二混淆区中,有助于提高本申请实施例中的语音识别纠错方法对谐音错误的发现率,从而提高纠错的准确度。Through the combination of confusion areas, as many homophonic sounds as possible can be collected in the same second confusion area, which helps to improve the discovery rate of homophonic errors in the speech recognition error correction method in the embodiment of the present application, thereby improving error correction the accuracy.
步骤S364、确定发音混淆集,发音混淆集包括若干第二混淆区;Step S364, determine the pronunciation confusion set, the pronunciation confusion set includes several second confusion areas;
具体地,当所有存在相同拼音单元的混淆区合并完毕,则发音混淆集构建完成,该发音混淆集中包含若干个第二混淆区,每个第二混淆区中包含两个以上的拼音单元。Specifically, when all confusion areas with the same pinyin unit are merged, the pronunciation confusion set is constructed, and the pronunciation confusion set includes several second confusion areas, and each second confusion area contains more than two pinyin units.
通过步骤S361-S364,本申请实施例提供了一种发音混淆集的构建方法,通过第一发音序列和第二发音序列的差异来确定发音混淆集,并通过合并混淆区尽可能地集合谐音发音,有助于降低用户发音不标准对语音识别纠错带来的影响,提高语音识别纠错的正确率。Through steps S361-S364, the embodiment of the present application provides a method for constructing a pronunciation confusion set. The pronunciation confusion set is determined by the difference between the first pronunciation sequence and the second pronunciation sequence, and homophonic pronunciations are assembled as much as possible by merging confusion regions. , help to reduce the impact of the user's non-standard pronunciation on speech recognition error correction, and improve the correct rate of speech recognition error correction.
通过步骤S361-S364,步骤S360已经描述完毕,下面开始阐述步骤S370。Through steps S361-S364, step S360 has been described, and step S370 will be described below.
步骤S370、根据发音混淆集、语音识别文本、人工识别文本、第一发音序列和第二发音序列,确定关键词表;Step S370, determine the keyword list according to the pronunciation confusion set, the speech recognition text, the manual recognition text, the first pronunciation sequence and the second pronunciation sequence;
具体地,本步骤S370需要确定本业务对应的关键词表,关键词表是用于表征本业务的语音识别中容易识别错误的字词。关键词表的构建可以参照图6。Specifically, this step S370 needs to determine the keyword table corresponding to the service, and the keyword table is used to characterize words that are easily recognized incorrectly in speech recognition of the service. Refer to Figure 6 for the construction of the keyword table.
参照图6,图6为本申请实施例提供的构建关键词表的步骤流程图,该方法包括但不限于步骤S371-S372:Referring to FIG. 6, FIG. 6 is a flow chart of steps for constructing a keyword table provided by an embodiment of the present application. The method includes but is not limited to steps S371-S372:
步骤S371、根据发音混淆集,确定关键拼音;Step S371, determine the key pinyin according to the pronunciation confusion set;
具体地,由于发音混淆集是根据第一发音序列和第二发音序列的差异进行构建,因此根据发音混淆集,也可以对应确定第一发音序列和第二发音序列中哪些部分是存在差异的,确定存在差异的拼音单元,根据相关技术确定存在差异的拼音单元是一个单字还是一个词里面的一个字,如果是一个字,则将该拼音单元作为关键拼音,关键拼音包括第一发音序列中的若干第一关键拼音和第二发音序列中的若干第二关键拼音。因此,根据该拼音单元位于第一发音序列或第二发音序列,确定该拼音单元为第一关键拼音或第二关键拼音;如果是一个词里面的一个字,则将包含该拼音单元在内的多个拼音单元作为第一关键拼音或第二关键拼音。Specifically, since the pronunciation confusion set is constructed according to the difference between the first pronunciation sequence and the second pronunciation sequence, it is also possible to correspondingly determine which parts of the first pronunciation sequence and the second pronunciation sequence are different according to the pronunciation confusion set, Determine the pinyin unit with difference, determine whether the pinyin unit with difference is a single character or a word in a word according to related technologies, if it is a word, then use the pinyin unit as key pinyin, key pinyin includes the first pronunciation in the sequence Several first key pinyin and some second key pinyin in the second pronunciation sequence. Therefore, according to this pinyin unit being positioned at the first pronunciation sequence or the second pronunciation sequence, it is determined that this pinyin unit is the first key pinyin or the second key pinyin; if it is a word in a word, then will include this pinyin unit A plurality of pinyin units are used as the first key pinyin or the second key pinyin.
例如,发音混淆集的一个混淆区中包含“xi”和“qi”,对应第一发音序列发现,“xi”实际上为一个词“fen-xi”中的一个字,因此将“fen-xi”作为第一关键拼音,并且对应地将第二发音序列中的“fen-qi”作为第二关键拼音。For example, "xi" and "qi" are contained in a confusion area of the pronunciation confusion set, corresponding to the first pronunciation sequence, "xi" is actually a word in a word "fen-xi", so "fen-xi " as the first key pinyin, and correspondingly take "fen-qi" in the second pronunciation sequence as the second key pinyin.
步骤S372、将语音识别文本中第一关键拼音对应的字词,以及人工识别文本中第二关键拼音对应的字词作为关键字词,并将关键字词存入关键词表。Step S372: Use the words corresponding to the first key pinyin in the speech recognition text and the words corresponding to the second key pinyin in the manual recognition text as keywords, and store the keywords in the keyword table.
具体地,将第一关键拼音与语音识别文本对应起来,将第二关键拼音与人工识别文本对应起来,得到关键字词,如上述例子所示,则第一关键拼音“fen-xi”可以对应语音识别文本中的关键字词“分析”,第二关键拼音“fen-qi”可以对应人工识别文本中的关键字词“分期”,则将“分析”和“分期”均存入关键词表中。Specifically, the first key pinyin is associated with the speech recognition text, and the second key pinyin is associated with the artificially recognized text to obtain key words. As shown in the above example, the first key pinyin "fen-xi" can correspond to The keyword "analysis" in the speech recognition text, the second key pinyin "fen-qi" can correspond to the keyword "stage" in the manual recognition text, then both "analysis" and "stage" are stored in the keyword table middle.
通过步骤S371-S372,本申请实施例提供了一种构建关键词表的方法,主要是将第一发音序列和第二发音序列中存在差异的字词存入关键词表中。Through steps S371-S372, the embodiment of the present application provides a method for constructing a keyword table, which mainly stores words with differences between the first pronunciation sequence and the second pronunciation sequence into the keyword table.
通过步骤S371-S372,步骤S370已经阐述完毕,下面开始阐述步骤S380。Step S370 has been described through steps S371-S372, and step S380 will be described below.
步骤S380、根据关键词表,确定关键词FST;Step S380, determine the keyword FST according to the keyword table;
具体地,通过关键词表中的字词及其对应的拼音,可以构建出关键词FST,关键词FST的构建过程可以参照图7。Specifically, the keyword FST can be constructed through the words in the keyword table and their corresponding pinyin, and the construction process of the keyword FST can refer to FIG. 7 .
参照图7,图7为本申请实施例提供的构建关键词FST的步骤流程图,该方法包括但不限于步骤S381-S385:Referring to FIG. 7, FIG. 7 is a flow chart of steps for constructing a keyword FST provided by an embodiment of the present application. The method includes but is not limited to steps S381-S385:
步骤S381、构建关键词FST中的根节点;Step S381, constructing the root node in the keyword FST;
具体地,参照图8,图8为本申请实施例提供的关键词FST的示意图。在本步骤S381中,构建一个根节点作为关键词FST的起点,如图8所示,根节点用标号800表示。Specifically, refer to FIG. 8 , which is a schematic diagram of a keyword FST provided in an embodiment of the present application. In this step S381, a root node is constructed as the starting point of the keyword FST, as shown in FIG. 8 , the root node is denoted by a label 800 .
步骤S382、在根节点下,根据关键拼音中的第一个拼音单元,构建第一子节点;Step S382, under the root node, construct the first child node according to the first pinyin unit in the key pinyin;
具体地,选取关键词表对应的关键拼音中的任意一个,并确定当前关键拼音中的第一个拼音单元,根据该拼音单元构建出根节点下的第一子节点。也就是说,实际上是根据关节词表中每个关键字词的首字拼音来构建第一子节点。Specifically, any one of the key pinyin corresponding to the keyword list is selected, and the first pinyin unit in the current key pinyin is determined, and the first child node under the root node is constructed according to the pinyin unit. That is to say, in fact, the first child node is constructed according to the pinyin of the first character of each keyword in the joint vocabulary.
可以理解的是,由于是根据关键拼音来进行子节点构建,因此不同的关键字词可能会共用同一个第一子节点,例如“分析”和“分期”,对应关键拼音的第一个拼音单元都为“fen”,因此会这两个关键字词共用一个第一子节点。It is understandable that since the sub-nodes are constructed according to the key pinyin, different keywords may share the same first sub-node, such as "analysis" and "period", which correspond to the first pinyin unit of the key pinyin Both are "fen", so these two keywords share a first child node.
如图8所示,例如关键字词包括“分期”、“分析”、“隔离”、“事务所”,则根据关键拼音中的第一个拼音单元,可以确定3个第一子节点810,分别为“fen”、“ge”和“shi”。As shown in Figure 8, for example keywords include "staging", "analysis", "isolation" and "firm", then according to the first pinyin unit in the key pinyin, three first child nodes 810 can be determined, "fen", "ge" and "shi" respectively.
步骤S383、在第一子节点下,根据关键拼音中的第二拼音单元以及关键拼音中的拼音单元顺序,构建若干第二子节点;Step S383, under the first sub-node, construct several second sub-nodes according to the second pinyin unit in the key pinyin and the order of the pinyin units in the key pinyin;
具体地,确定第一子节点后,根据拼音单元顺序,将该关键拼音中除了第一个拼音单元以外余下的所有拼音单元座位第二拼音单元,并根据拼音单元顺序和第二拼音单元,依次往下构建第二子节点。可以理解的是,余下的拼音单元数量有多个的情况下,对应的第二子节点也有多个。Specifically, after determining the first child node, according to the order of the pinyin units, all the remaining pinyin units in the key pinyin except the first pinyin unit are seated in the second pinyin unit, and according to the order of the pinyin units and the second pinyin unit, sequentially Build the second child node down. It can be understood that when there are multiple remaining pinyin units, there are also multiple corresponding second child nodes.
如图8所示,例如关键字词包括“分期”、“分析”、“隔离”、“事务所”,在3个第一子节点810下,根据每个关键字词余下的第二拼音单元,可以构建出5个第二子节点820,分别为“qi”、“xi”、“li”、“wu”和“suo”,而可以理解的是,由于“事务所”有三个字,对应的关键拼音除去第一个拼音单元后余下两个第二拼音单元,则根据拼音单元顺序,因此往下构建两个第二子节点,也就是在“wu”对应的第二子节点下再构建一个“suo”对应的第二子节点。As shown in Figure 8, for example keywords include "staging", "analysis", "isolation" and "firm", under the 3 first child nodes 810, according to the remaining second phonetic unit of each keyword , five second child nodes 820 can be constructed, namely "qi", "xi", "li", "wu" and "suo", and it is understandable that since "firm" has three characters, the corresponding After removing the first pinyin unit, there are two second pinyin units left. According to the order of the pinyin units, two second sub-nodes are built down, that is, under the second sub-node corresponding to "wu". A "suo" corresponding to the second child node.
通过上述内容阐述说明,第二子节点为关键拼音除了首个拼音单元以外所有拼音单元构建而成,因此对于同一个第一子节点,在同一个发音路径上可能存在多个第二子节点。From the above description, the second sub-node is constructed from all pinyin units except the first pinyin unit of the key pinyin, so for the same first sub-node, there may be multiple second sub-nodes on the same pronunciation path.
可以理解的是,当关键字词为一个单字,则该条发音路径上不存在第二子节点。It can be understood that when the keyword is a single character, there is no second child node on the pronunciation path.
步骤S384、在第二子节点下,根据关键拼音和关键词表,构建若干第三子节点;Step S384, under the second sub-node, construct several third sub-nodes according to the key pinyin and keyword table;
具体地,步骤S383提到,第二子节点已经将关键拼音除去第一个拼音单元后余下所有拼音单元都对应完毕,因此可以在该路径最末端的第二子节点下添加第三子节点,该第三子节点用于表示当前关键拼音对应的关键字词,该关键字词可以在关键词表中找到。Specifically, step S383 mentions that the second child node has already corresponding all remaining pinyin units after the key pinyin is removed from the first pinyin unit, so a third child node can be added under the second child node at the end of the path, The third sub-node is used to represent the keyword corresponding to the current key pinyin, and the keyword can be found in the keyword table.
如图8所示,第三字节点830有四个,分别为“分期”、“分析”、“隔离”和“事务所”。As shown in Fig. 8, there are four nodes 830 in the third column, which are "staging", "analysis", "isolation" and "firm".
步骤S385、对每个第一子节点和第二子节点添加返回根节点的弧;Step S385, adding an arc returning to the root node for each first child node and second child node;
具体地,如图8所示,对每个第一子节点和第二子节点添加返回根节点的弧940,添加弧以后,第一子节点和根节点、第二子节点和根节点之间就形成了搜索闭环。为了图8的清晰,图8中仅画出两条弧,事实上所有第一子节点和所有第二子节点都有返回根节点的弧。在步骤S120中说到,待检测FST需要与关键词FST进行重组,寻找相同的节点,如图8所示,例如待检测FST中存在节点“fen”,而该关键词FST中也存在节点“fen”,则待检测FST和关键词FST均会向下一层的子节点进行下一步对比,若待检测FST中两个连续的节点为“fen-qi”,而关键词FST进入的发音路径为“fen-xi”,则会出现不一致的结果。但如果添加了弧,关键词FST在发现“xi”这个第二子节点发现无法匹配时,会返回到根节点,再次进行比较,则下一个关键FST就会进入发音路径“fen-qi”,从而就可以得到待检测FST和关键词FST中连续两个匹配的节点“fen-qi”,形成搜索闭环能尽可能地保证在步骤S120的重组中,能够在关键词FST中搜索到所有可能与待检测FST相同的节点,从而提高对待纠错字词的发现率。Specifically, as shown in FIG. 8, an arc 940 returning to the root node is added to each first child node and second child node. After adding the arc, the first child node and the root node, the second child node and the root node A closed search loop is formed. For the clarity of Fig. 8, only two arcs are drawn in Fig. 8, in fact, all first child nodes and all second child nodes have arcs returning to the root node. As mentioned in step S120, the FST to be detected needs to be reorganized with the keyword FST to find the same node, as shown in Figure 8, for example, the node "fen" exists in the FST to be detected, and the node "fen" also exists in the keyword FST fen", the FST to be detected and the keyword FST will be compared with the child nodes of the next layer. If the two consecutive nodes in the FST to be detected are "fen-qi", and the pronunciation path entered by the keyword FST If it is "fen-xi", inconsistent results will appear. But if an arc is added, the keyword FST will return to the root node when it finds that the second child node "xi" cannot be matched, and compare again, then the next key FST will enter the pronunciation path "fen-qi", Thus, two consecutive matching nodes "fen-qi" in the FST to be detected and the keyword FST can be obtained, and the formation of a search closed loop can ensure as much as possible that in the reorganization of step S120, all possible matching nodes can be searched in the keyword FST. Nodes with the same FST to be detected, thereby improving the discovery rate of words to be corrected.
通过步骤S381-S385,本申请实施例提供了一种构建关键词FST的方法,通过拼音单元顺序和关 键词表构建多个子节点,并且通过搜索闭环来提高对待纠错字词的发现率。Through steps S381-S385, the embodiment of the present application provides a method for constructing a keyword FST, constructing multiple sub-nodes through the order of pinyin units and the keyword list, and improving the discovery rate of words to be corrected by searching a closed loop.
通过步骤S381-S385,步骤S380已经阐述完毕。Through steps S381-S385, step S380 has been explained.
通过步骤S300-S380,本申请实施例提供了构建关键词FST和汉字混淆集的方法,关键词FST和汉字混淆集可以应用于图1所示的方法步骤中,用于完成本申请实施例所提出的语音识别纠错方法。Through steps S300-S380, the embodiment of the present application provides a method for constructing the keyword FST and the Chinese character confusion set, and the keyword FST and the Chinese character confusion set can be applied to the method steps shown in Figure 1 to complete the method described in the embodiment of the application Proposed Speech Recognition Error Correction Method.
通过一个或多个实施例的组合,本申请实施例提供了一种首先对待检测语音进行语音识别,生成待检测文本以及对应的待检测发音序列;根据待检测打印序列,确定待检测FST;则根据待检测FST和对应业务的关键词FST,可以确定待检测文本中的若干待纠错字词,并确定包含待纠错字词的待纠错语句;若对应业务的汉字混淆集中存在待纠错字词,确定每个待纠错字词对应的替换字词,并将待纠错语句中的待纠错字词替换为替换字词,生成替换语句;当待纠错语句的第一逻辑得分小于替换语句的第二逻辑得分,将待检测文本中的待纠错语句替换为替换语句,从而完成语音识别文本的纠错。另外,本申请实施例提供了关键词FST和汉字混淆集的具体构建方法。Through the combination of one or more embodiments, the embodiment of the present application provides a method of first performing speech recognition on the speech to be detected, generating the text to be detected and the corresponding pronunciation sequence to be detected; according to the print sequence to be detected, determine the FST to be detected; then According to the FST to be detected and the keyword FST of the corresponding business, some words to be corrected in the text to be detected can be determined, and the sentences to be corrected that contain the words to be corrected can be determined; Wrong words, determine the replacement word corresponding to each word to be corrected, and replace the word to be corrected in the sentence to be corrected with the replacement word to generate a replacement sentence; when the first logic of the sentence to be corrected If the score is smaller than the second logic score of the replacement sentence, the sentence to be corrected in the text to be detected is replaced with the replacement sentence, thereby completing the error correction of the speech recognition text. In addition, the embodiment of the present application provides a specific construction method of the keyword FST and the Chinese character confusion set.
与相关技术中依赖语法或句法的方案相比,本申请实施例提出的语音识别纠错方法是根据语音识别文本的发音确定可能存在错误的待纠错字词,并根据对应业务中的汉字混淆集,为该待纠错字词提供替换字词,最后根据该待纠错字词替换前后,对应语句的逻辑得分来确定是否需要进行纠错。可见,本申请实施例能够发现指定业务领域中出现的、由于发音错误而导致的识别错误,从而提高待纠错字词的发现概率;并利用汉字混淆集和逻辑得分比较,对这些识别错误作有效的纠正,降低语音识别文本的误纠率,从而有效地提高语音识别文本的准确率,令语音识别技术能够在数字医疗、智能家居等领域发挥更大的作用。Compared with the solutions that rely on grammar or syntax in the related art, the speech recognition error correction method proposed in the embodiment of the present application is to determine the words to be corrected that may have errors according to the pronunciation of the speech recognition text, and to confuse them according to the Chinese characters in the corresponding business. Set to provide replacement words for the word to be corrected, and finally determine whether error correction is required according to the logical score of the corresponding sentence before and after the replacement of the word to be corrected. It can be seen that the embodiment of the present application can find the recognition errors caused by mispronunciation in the specified business field, thereby improving the discovery probability of the words to be corrected; and using the Chinese character confusion set and logic score comparison, these recognition errors Effective correction can reduce the miscorrection rate of speech recognition text, thereby effectively improving the accuracy of speech recognition text, so that speech recognition technology can play a greater role in digital medical, smart home and other fields.
参照图9,图9为本申请实施例提供的语音识别纠错系统的示意图,该系统900包括但不限于第一模块910、第二模块920、第三模块930、第四模块940、第五模块950、第六模块960、第七模块970和第八模块980。第一模块用于对待检测语音进行语音识别,得到待检测文本和对应的待检测发音序列;第二模块用于根据待检测发音序列,构建待检测FST;第三模块用于获取关键词FST和汉字混淆集;其中,关键词FST、汉字混淆集及待检测FST属于同一垂直领域;第四模块用于根据待检测FST和关键词FST,确定待检测文本中的若干待纠错字词以及若干待纠错语句;其中,待纠错语句包含待纠错字词;第五模块用于若待纠错字词存在于汉字混淆集中,根据汉字混淆集确定每个待纠错字词对应的替换字词;第六模块用于将待纠错语句中的待纠错字词替换为替换字词,得到替换语句;第七模块用于计算待纠错语句的第一逻辑得分和替换语句的第二逻辑得分;第八模块用于当第一逻辑得分小于第二逻辑得分,将待检测文本中的待纠错语句替换为替换语句。Referring to FIG. 9, FIG. 9 is a schematic diagram of a speech recognition error correction system provided by an embodiment of the present application. The system 900 includes but is not limited to a first module 910, a second module 920, a third module 930, a fourth module 940, and a fifth module. Module 950 , sixth module 960 , seventh module 970 and eighth module 980 . The first module is used for speech recognition of the speech to be detected, and obtains the text to be detected and the corresponding pronunciation sequence to be detected; the second module is used to construct the FST to be detected according to the pronunciation sequence to be detected; the third module is used to obtain the keyword FST and Chinese character confusion set; wherein, keyword FST, Chinese character confusion set and FST to be detected belong to the same vertical field; the fourth module is used to determine some words to be corrected in the text to be detected and some Sentence to be corrected; wherein, the sentence to be corrected contains words to be corrected; the fifth module is used to determine the corresponding replacement of each word to be corrected according to the confusion set of Chinese characters if the words to be corrected exist in the confusion set of Chinese characters word; the sixth module is used to replace the word to be corrected in the sentence to be corrected with a replacement word to obtain a replacement sentence; the seventh module is used to calculate the first logic score of the sentence to be corrected and the first logic score of the replacement sentence Two logic scores; the eighth module is used to replace the sentence to be corrected in the text to be detected with a replacement sentence when the first logic score is smaller than the second logic score.
参考图10,图10为本申请实施例提供的装置的示意图,该装置1000包括至少一个处理器1010,还包括至少一个存储器1020,用于存储至少一个程序;图10中以一个处理器及一个存储器为例。Referring to FIG. 10, FIG. 10 is a schematic diagram of a device provided by an embodiment of the present application. The device 1000 includes at least one processor 1010 and at least one memory 1020 for storing at least one program; in FIG. 10, a processor and a memory as an example.
处理器和存储器可以通过总线或者其他方式连接,图10中以通过总线连接为例。The processor and the memory may be connected through a bus or in other ways, and connection through a bus is taken as an example in FIG. 10 .
存储器作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件或其他非暂态固态存储器件。在一些实施方式中,存储器可选包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至该装置。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。As a non-transitory computer-readable storage medium, memory can be used to store non-transitory software programs and non-transitory computer-executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device or other non-transitory solid-state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, which remote memory may be connected to the device via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
本申请的另一个实施例还提供了一种装置,该装置可用于执行如上任意实施例中的控制方法,例如,执行以上描述的图1中的方法步骤。Another embodiment of the present application also provides an apparatus, which can be used to execute the control method in any of the above embodiments, for example, execute the method steps in FIG. 1 described above.
以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
本申请实施例还公开了一种计算机存储介质,其中存储有处理器可执行的程序,其中,处理器可执行的程序在由处理器执行时用于实现本申请提出的语音识别纠错方法,该计算机可读存储介质可以是非易失性,也可以是易失性。The embodiment of the present application also discloses a computer storage medium, which stores a program executable by the processor, wherein the program executable by the processor is used to implement the speech recognition error correction method proposed by the present application when executed by the processor, The computer readable storage medium can be nonvolatile or volatile.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时 性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。Those skilled in the art can understand that all or some of the steps and systems in the methods disclosed above can be implemented as software, firmware, hardware and an appropriate combination thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit . Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As known to those of ordinary skill in the art, the term computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. permanent, removable and non-removable media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer. In addition, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
以上是对本申请的较佳实施进行了具体说明,但本申请并不局限于上述实施方式,熟悉本领域的技术人员在不违背本申请精神的前提下还可作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a specific description of the preferred implementation of the application, but the application is not limited to the above-mentioned implementation, and those skilled in the art can also make various equivalent deformations or replacements without violating the spirit of the application. Equivalent modifications or replacements are all within the scope defined by the claims of the present application.

Claims (20)

  1. 一种语音识别纠错方法,其中,所述方法包括:A speech recognition error correction method, wherein the method includes:
    对待检测语音进行语音识别,得到待检测文本和对应的待检测发音序列;performing speech recognition on the speech to be detected to obtain the text to be detected and the corresponding pronunciation sequence to be detected;
    根据所述待检测发音序列,构建待检测FST;According to the pronunciation sequence to be detected, construct the FST to be detected;
    获取关键词FST和汉字混淆集;其中,所述关键词FST、所述汉字混淆集及所述待检测FST属于同一垂直领域;Obtain keyword FST and Chinese character confusion set; Wherein, described keyword FST, described Chinese character confusion set and described to-be-detected FST belong to the same vertical field;
    根据所述待检测FST和所述关键词FST,确定所述待检测文本中的若干待纠错字词以及若干待纠错语句;其中,所述待纠错语句包含所述待纠错字词;According to the FST to be detected and the keyword FST, determine several words to be corrected and several sentences to be corrected in the text to be detected; wherein, the sentences to be corrected include the words to be corrected ;
    若所述待纠错字词存在于所述汉字混淆集中,根据所述汉字混淆集确定每个所述待纠错字词对应的替换字词;If the word to be corrected exists in the confusion set of Chinese characters, determine the replacement word corresponding to each word to be corrected according to the confusion set of Chinese characters;
    将所述待纠错语句中的所述待纠错字词替换为所述替换字词,得到替换语句;replacing the word to be corrected in the sentence to be corrected with the replacement word to obtain a replacement sentence;
    计算所述待纠错语句的第一逻辑得分和所述替换语句的第二逻辑得分;calculating a first logical score of the sentence to be corrected and a second logical score of the replacement sentence;
    当所述第一逻辑得分小于所述第二逻辑得分,将所述待检测文本中的待纠错语句替换为所述替换语句。When the first logic score is smaller than the second logic score, the sentence to be corrected in the text to be detected is replaced with the replacement sentence.
  2. 根据权利要求1所述的纠错方法,其中,所述根据所述待检测FST和所述关键词FST,确定所述待检测文本中的若干待纠错字词,包括:The error correction method according to claim 1, wherein, according to the FST to be detected and the keyword FST, determining several words to be corrected in the text to be detected includes:
    将所述待检测FST与所述关键词FST进行重组;Recombining the FST to be detected with the keyword FST;
    若所述待检测FST与所述关键词FST中存在若干相同节点,将所述待检测文本中对应所述相同节点的字词确定为待纠错字词;If there are several identical nodes in the FST to be detected and the keyword FST, the word corresponding to the same node in the text to be detected is determined as the word to be corrected;
    其中,所述相同节点的数量与所述待纠错字词的字数相同。Wherein, the number of the same nodes is the same as the word number of the word to be corrected.
  3. 根据权利要求1所述的纠错方法,其中,所述获取关键词FST和汉字混淆集,包括:The error correction method according to claim 1, wherein said acquisition of keyword FST and Chinese character confusion set comprises:
    获取训练语音,所述训练语音与所述待检测语音属于同一垂直领域;Obtain training speech, the training speech and the speech to be detected belong to the same vertical field;
    对所述训练语音进行语音识别,得到语音识别文本;Carry out speech recognition to described training speech, obtain speech recognition text;
    根据所述语音识别文本,确定对应的第一发音序列;Determining a corresponding first pronunciation sequence according to the speech recognition text;
    对所述训练语音进行人工识别,得到人工识别文本;Carry out artificial recognition to described training speech, obtain artificial recognition text;
    根据所述人工识别文本,确定对应的第二发音序列;Determining a corresponding second pronunciation sequence according to the artificially recognized text;
    根据所述语音识别文本和所述人工识别文本,构建所述汉字混淆集;Construct the Chinese character confusion set according to the speech recognition text and the manual recognition text;
    根据所述第一发音序列和所述第二发音序列,构建发音混淆集;Construct a pronunciation confusion set according to the first pronunciation sequence and the second pronunciation sequence;
    根据所述发音混淆集、所述语音识别文本、所述人工识别文本、所述第一发音序列和所述第二发音序列,构建关键词表;Construct a keyword table according to the pronunciation confusion set, the speech recognition text, the artificial recognition text, the first pronunciation sequence and the second pronunciation sequence;
    根据所述关键词表,构建所述关键词FST。According to the keyword table, the keyword FST is constructed.
  4. 根据权利要求3所述的纠错方法,其中,所述根据所述语音识别文本和所述人工识别文本,构建所述汉字混淆集,包括:The error correction method according to claim 3, wherein said constructing said Chinese character confusion set according to said speech recognition text and said manual recognition text comprises:
    将所述语音识别文本中的第一字词与所述人工识别文本中对应位置的第二字词进行比较;comparing a first word in the voice recognition text with a second word in a corresponding position in the manually recognized text;
    若当前所述第一字词与当前所述第二字词存在差异,将当前所述第一字词和当前所述第二字词作为所述替换字词,并将所述替换字词存入第一候选区;If there is a difference between the current first word and the current second word, use the current first word and the current second word as the replacement word, and store the replacement word into the first candidate area;
    当所述第一字词和所述第二字词比较完毕,将存在相同字词的若干所述第一候选区合并为同一个第二候选区;When the comparison between the first word and the second word is completed, a plurality of the first candidate areas with the same word are merged into the same second candidate area;
    构建包括若干所述第二候选区的所述汉字混淆集。Constructing the Chinese character confusion set including several second candidate regions.
  5. 根据权利要求3所述的纠错方法,其中,所述根据所述第一发音序列和所述第二发音序列,构建所述发音混淆集,包括:The error correction method according to claim 3, wherein said constructing said pronunciation confusion set according to said first pronunciation sequence and said second pronunciation sequence comprises:
    将所述第一发音序列中的第一拼音单元和所述第二发音序列中对应位置的第二拼音单元进行比较;comparing the first pinyin unit in the first pronunciation sequence with the second pinyin unit in the corresponding position in the second pronunciation sequence;
    若当前的所述第一拼音单元与当前的所述第二拼音单元存在差异,将当前的所述第一拼音单元和当前的所述第二拼音单元存入第一混淆区;If there is a difference between the current first pinyin unit and the current second pinyin unit, storing the current first pinyin unit and the current second pinyin unit into the first confusion area;
    当所述第一拼音单元和所述第二拼音单元比较完毕,将存在相同拼音的若干所述第一混淆区合并 为同一个第二混淆区;When the comparison between the first phonetic unit and the second phonetic unit is completed, several first confusion areas with the same pinyin are merged into the same second confusion area;
    构建包括若干所述第二混淆区的所述发音混淆集。Constructing the pronunciation confusion set including several second confusion regions.
  6. 根据权利要求3所述的纠错方法,其中,所述根据所述发音混淆集、所述语音识别文本、所述人工识别文本、所述第一发音序列和所述第二发音序列,构建关键词表,包括:The error correction method according to claim 3, wherein, according to the pronunciation confusion set, the speech recognition text, the artificial recognition text, the first pronunciation sequence and the second pronunciation sequence, constructing a key Vocabulary, including:
    根据所述发音混淆集,确定关键拼音;其中,所述关键拼音包括所述第一发音序列中的若干第一关键拼音和所述第二发音序列中的若干第二关键拼音;Determine the key pinyin according to the pronunciation confusion set; wherein, the key pinyin includes several first key pinyins in the first pronunciation sequence and several second key pinyins in the second pronunciation sequence;
    将所述语音识别文本中所述第一关键拼音对应的字词,以及所述人工识别文本中所述第二关键拼音对应的字词作为关键字词,并将所述关键字词存入所述关键词表。The words corresponding to the first key pinyin in the speech recognition text and the words corresponding to the second key pinyin in the artificial recognition text are used as keywords, and the keywords are stored in the List of key words.
  7. 根据权利要求6所述的纠错方法,其中,所述根据所述关键词表,构建所述关键词FST,包括:The error correction method according to claim 6, wherein said constructing said keyword FST according to said keyword table comprises:
    构建所述关键词FST中的根节点;Construct the root node in the keyword FST;
    在所述根节点下,根据所述关键拼音中的第一个拼音单元,构建第一子节点;Under the root node, according to the first pinyin unit in the key pinyin, construct the first child node;
    在所述第一子节点下,根据所述关键拼音中的第二拼音单元以及所述关键拼音中的拼音单元顺序,构建若干第二子节点;其中,所述第二拼音单元为所述关键拼音中除了第一个拼音单元以外的所有拼音单元;Under the first sub-node, according to the second pinyin unit in the key pinyin and the order of the pinyin units in the key pinyin, construct several second sub-nodes; wherein, the second pinyin unit is the key All Pinyin units except the first Pinyin unit in Pinyin;
    在所述第二子节点下,根据所述关键拼音和所述关键词表,构建若干第三子节点;其中,所述第三子节点用于表示所述关键拼音对应的所述关键字词;Under the second sub-node, according to the key pinyin and the keyword table, several third sub-nodes are constructed; wherein, the third sub-node is used to represent the key words corresponding to the key pinyin ;
    对每个所述第一子节点和所述第二子节点添加返回所述根节点的弧,得到所述关键词FST。An arc returning to the root node is added to each of the first child node and the second child node to obtain the keyword FST.
  8. 一种语音识别纠错系统,其中,所述系统包括:A speech recognition error correction system, wherein the system includes:
    第一模块,用于对待检测语音进行语音识别,得到待检测文本和对应的待检测发音序列;The first module is used to perform speech recognition on the speech to be detected to obtain the text to be detected and the corresponding pronunciation sequence to be detected;
    第二模块,用于根据所述待检测发音序列,构建待检测FST;The second module is used to construct the FST to be detected according to the pronunciation sequence to be detected;
    第三模块,用于获取关键词FST和汉字混淆集;其中,所述关键词FST、所述汉字混淆集及所述待检测FST属于同一垂直领域;The third module is used to obtain the keyword FST and the confusion set of Chinese characters; wherein, the keyword FST, the confusion set of Chinese characters and the FST to be detected belong to the same vertical field;
    第四模块,用于根据所述待检测FST和所述关键词FST,确定所述待检测文本中的若干待纠错字词以及若干待纠错语句;其中,所述待纠错语句包含所述待纠错字词;The fourth module is used to determine a number of words to be corrected and a number of sentences to be corrected in the text to be detected according to the FST to be detected and the keyword FST; wherein, the sentences to be corrected include the State the words to be corrected;
    第五模块,用于若所述待纠错字词存在于所述汉字混淆集中,根据所述汉字混淆集确定每个所述待纠错字词对应的替换字词;The fifth module is used to determine the replacement word corresponding to each word to be corrected according to the Chinese character confusion set if the word to be corrected exists in the Chinese character confusion set;
    第六模块,用于将所述待纠错语句中的所述待纠错字词替换为所述替换字词,得到替换语句;The sixth module is used to replace the word to be corrected in the sentence to be corrected with the replacement word to obtain a replacement sentence;
    第七模块,用于计算所述待纠错语句的第一逻辑得分和所述替换语句的第二逻辑得分;The seventh module is used to calculate the first logic score of the sentence to be corrected and the second logic score of the replacement sentence;
    第八模块,用于当所述第一逻辑得分小于所述第二逻辑得分,将所述待检测文本中的待纠错语句替换为所述替换语句。An eighth module, configured to replace the sentence to be corrected in the text to be detected with the replacement sentence when the first logic score is smaller than the second logic score.
  9. 一种装置,其中,包括:A device, comprising:
    至少一个处理器;at least one processor;
    至少一个存储器,用于存储至少一个程序;at least one memory for storing at least one program;
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现一种语音识别纠错方法;When the at least one program is executed by the at least one processor, the at least one processor is made to implement a voice recognition error correction method;
    其中,所述语音识别纠错方法包括:Wherein, the speech recognition error correction method includes:
    对待检测语音进行语音识别,得到待检测文本和对应的待检测发音序列;performing speech recognition on the speech to be detected to obtain the text to be detected and the corresponding pronunciation sequence to be detected;
    根据所述待检测发音序列,构建待检测FST;According to the pronunciation sequence to be detected, construct the FST to be detected;
    获取关键词FST和汉字混淆集;其中,所述关键词FST、所述汉字混淆集及所述待检测FST属于同一垂直领域;Obtain keyword FST and Chinese character confusion set; Wherein, described keyword FST, described Chinese character confusion set and described to-be-detected FST belong to the same vertical field;
    根据所述待检测FST和所述关键词FST,确定所述待检测文本中的若干待纠错字词以及若干待纠错语句;其中,所述待纠错语句包含所述待纠错字词;According to the FST to be detected and the keyword FST, determine several words to be corrected and several sentences to be corrected in the text to be detected; wherein, the sentences to be corrected include the words to be corrected ;
    若所述待纠错字词存在于所述汉字混淆集中,根据所述汉字混淆集确定每个所述待纠错字词对应的替换字词;If the word to be corrected exists in the confusion set of Chinese characters, determine the replacement word corresponding to each word to be corrected according to the confusion set of Chinese characters;
    将所述待纠错语句中的所述待纠错字词替换为所述替换字词,得到替换语句;replacing the word to be corrected in the sentence to be corrected with the replacement word to obtain a replacement sentence;
    计算所述待纠错语句的第一逻辑得分和所述替换语句的第二逻辑得分;calculating a first logical score of the sentence to be corrected and a second logical score of the replacement sentence;
    当所述第一逻辑得分小于所述第二逻辑得分,将所述待检测文本中的待纠错语句替换为所述替换 语句。When the first logic score is less than the second logic score, the sentence to be corrected in the text to be detected is replaced with the replacement sentence.
  10. 根据权利要求9所述的装置,其中,所述根据所述待检测FST和所述关键词FST,确定所述待检测文本中的若干待纠错字词,包括:The device according to claim 9, wherein, according to the FST to be detected and the keyword FST, determining several words to be corrected in the text to be detected includes:
    将所述待检测FST与所述关键词FST进行重组;Recombining the FST to be detected with the keyword FST;
    若所述待检测FST与所述关键词FST中存在若干相同节点,将所述待检测文本中对应所述相同节点的字词确定为待纠错字词;If there are several identical nodes in the FST to be detected and the keyword FST, the word corresponding to the same node in the text to be detected is determined as the word to be corrected;
    其中,所述相同节点的数量与所述待纠错字词的字数相同。Wherein, the number of the same nodes is the same as the word number of the word to be corrected.
  11. 根据权利要求9所述的装置,其中,所述获取关键词FST和汉字混淆集,包括:The device according to claim 9, wherein said obtaining the keyword FST and the confusion set of Chinese characters comprises:
    获取训练语音,所述训练语音与所述待检测语音属于同一垂直领域;Obtain training speech, the training speech and the speech to be detected belong to the same vertical field;
    对所述训练语音进行语音识别,得到语音识别文本;Carry out speech recognition to described training speech, obtain speech recognition text;
    根据所述语音识别文本,确定对应的第一发音序列;Determining a corresponding first pronunciation sequence according to the speech recognition text;
    对所述训练语音进行人工识别,得到人工识别文本;Carry out artificial recognition to described training speech, obtain artificial recognition text;
    根据所述人工识别文本,确定对应的第二发音序列;Determining a corresponding second pronunciation sequence according to the artificially recognized text;
    根据所述语音识别文本和所述人工识别文本,构建所述汉字混淆集;Construct the Chinese character confusion set according to the speech recognition text and the manual recognition text;
    根据所述第一发音序列和所述第二发音序列,构建发音混淆集;Construct a pronunciation confusion set according to the first pronunciation sequence and the second pronunciation sequence;
    根据所述发音混淆集、所述语音识别文本、所述人工识别文本、所述第一发音序列和所述第二发音序列,构建关键词表;Construct a keyword table according to the pronunciation confusion set, the speech recognition text, the artificial recognition text, the first pronunciation sequence and the second pronunciation sequence;
    根据所述关键词表,构建所述关键词FST。According to the keyword table, the keyword FST is constructed.
  12. 根据权利要求11所述的装置,其中,所述根据所述语音识别文本和所述人工识别文本,构建所述汉字混淆集,包括:The device according to claim 11, wherein said constructing said Chinese character confusion set according to said speech recognition text and said artificial recognition text comprises:
    将所述语音识别文本中的第一字词与所述人工识别文本中对应位置的第二字词进行比较;comparing a first word in the voice recognition text with a second word in a corresponding position in the manually recognized text;
    若当前所述第一字词与当前所述第二字词存在差异,将当前所述第一字词和当前所述第二字词作为所述替换字词,并将所述替换字词存入第一候选区;If there is a difference between the current first word and the current second word, use the current first word and the current second word as the replacement word, and store the replacement word into the first candidate area;
    当所述第一字词和所述第二字词比较完毕,将存在相同字词的若干所述第一候选区合并为同一个第二候选区;When the comparison between the first word and the second word is completed, a plurality of the first candidate areas with the same word are merged into the same second candidate area;
    构建包括若干所述第二候选区的所述汉字混淆集。Constructing the Chinese character confusion set including several second candidate regions.
  13. 根据权利要求11所述的装置,其中,所述根据所述第一发音序列和所述第二发音序列,构建所述发音混淆集,包括:The device according to claim 11, wherein said constructing said pronunciation confusion set according to said first pronunciation sequence and said second pronunciation sequence comprises:
    将所述第一发音序列中的第一拼音单元和所述第二发音序列中对应位置的第二拼音单元进行比较;comparing the first pinyin unit in the first pronunciation sequence with the second pinyin unit in the corresponding position in the second pronunciation sequence;
    若当前的所述第一拼音单元与当前的所述第二拼音单元存在差异,将当前的所述第一拼音单元和当前的所述第二拼音单元存入第一混淆区;If there is a difference between the current first pinyin unit and the current second pinyin unit, storing the current first pinyin unit and the current second pinyin unit into the first confusion area;
    当所述第一拼音单元和所述第二拼音单元比较完毕,将存在相同拼音的若干所述第一混淆区合并为同一个第二混淆区;When the comparison between the first pinyin unit and the second pinyin unit is completed, a plurality of the first confusion areas with the same pinyin are merged into the same second confusion area;
    构建包括若干所述第二混淆区的所述发音混淆集。Constructing the pronunciation confusion set including several second confusion regions.
  14. 根据权利要求11所述的装置,其中,所述根据所述发音混淆集、所述语音识别文本、所述人工识别文本、所述第一发音序列和所述第二发音序列,构建关键词表,包括:The device according to claim 11, wherein the keyword table is constructed according to the pronunciation confusion set, the speech recognition text, the artificial recognition text, the first pronunciation sequence and the second pronunciation sequence ,include:
    根据所述发音混淆集,确定关键拼音;其中,所述关键拼音包括所述第一发音序列中的若干第一关键拼音和所述第二发音序列中的若干第二关键拼音;Determine the key pinyin according to the pronunciation confusion set; wherein, the key pinyin includes several first key pinyins in the first pronunciation sequence and several second key pinyins in the second pronunciation sequence;
    将所述语音识别文本中所述第一关键拼音对应的字词,以及所述人工识别文本中所述第二关键拼音对应的字词作为关键字词,并将所述关键字词存入所述关键词表。The words corresponding to the first key pinyin in the speech recognition text and the words corresponding to the second key pinyin in the artificial recognition text are used as keywords, and the keywords are stored in the List of key words.
  15. 根据权利要求14所述的装置,其中,所述根据所述关键词表,构建所述关键词FST,包括:The device according to claim 14, wherein said constructing said keyword FST according to said keyword table comprises:
    构建所述关键词FST中的根节点;Construct the root node in the keyword FST;
    在所述根节点下,根据所述关键拼音中的第一个拼音单元,构建第一子节点;Under the root node, according to the first pinyin unit in the key pinyin, construct the first child node;
    在所述第一子节点下,根据所述关键拼音中的第二拼音单元以及所述关键拼音中的拼音单元顺序,构建若干第二子节点;其中,所述第二拼音单元为所述关键拼音中除了第一个拼音单元以外的所有拼音单元;Under the first sub-node, according to the second pinyin unit in the key pinyin and the order of the pinyin units in the key pinyin, construct several second sub-nodes; wherein, the second pinyin unit is the key All Pinyin units except the first Pinyin unit in Pinyin;
    在所述第二子节点下,根据所述关键拼音和所述关键词表,构建若干第三子节点;其中,所述第 三子节点用于表示所述关键拼音对应的所述关键字词;Under the second sub-node, according to the key pinyin and the keyword table, several third sub-nodes are constructed; wherein, the third sub-node is used to represent the key words corresponding to the key pinyin ;
    对每个所述第一子节点和所述第二子节点添加返回所述根节点的弧,得到所述关键词FST。An arc returning to the root node is added to each of the first child node and the second child node to obtain the keyword FST.
  16. 一种计算机存储介质,其中存储有处理器可执行的程序,其中,所述处理器可执行的程序在由所述处理器执行时实现一种语音识别纠错方法;A computer storage medium, wherein a processor-executable program is stored, wherein the processor-executable program implements a speech recognition error correction method when executed by the processor;
    其中,所述语音识别纠错方法包括:Wherein, the speech recognition error correction method includes:
    对待检测语音进行语音识别,得到待检测文本和对应的待检测发音序列;performing speech recognition on the speech to be detected to obtain the text to be detected and the corresponding pronunciation sequence to be detected;
    根据所述待检测发音序列,构建待检测FST;According to the pronunciation sequence to be detected, construct the FST to be detected;
    获取关键词FST和汉字混淆集;其中,所述关键词FST、所述汉字混淆集及所述待检测FST属于同一垂直领域;Obtain keyword FST and Chinese character confusion set; Wherein, described keyword FST, described Chinese character confusion set and described to-be-detected FST belong to the same vertical field;
    根据所述待检测FST和所述关键词FST,确定所述待检测文本中的若干待纠错字词以及若干待纠错语句;其中,所述待纠错语句包含所述待纠错字词;According to the FST to be detected and the keyword FST, determine several words to be corrected and several sentences to be corrected in the text to be detected; wherein, the sentences to be corrected include the words to be corrected ;
    若所述待纠错字词存在于所述汉字混淆集中,根据所述汉字混淆集确定每个所述待纠错字词对应的替换字词;If the word to be corrected exists in the confusion set of Chinese characters, determine the replacement word corresponding to each word to be corrected according to the confusion set of Chinese characters;
    将所述待纠错语句中的所述待纠错字词替换为所述替换字词,得到替换语句;replacing the word to be corrected in the sentence to be corrected with the replacement word to obtain a replacement sentence;
    计算所述待纠错语句的第一逻辑得分和所述替换语句的第二逻辑得分;calculating a first logical score of the sentence to be corrected and a second logical score of the replacement sentence;
    当所述第一逻辑得分小于所述第二逻辑得分,将所述待检测文本中的待纠错语句替换为所述替换语句。When the first logic score is smaller than the second logic score, the sentence to be corrected in the text to be detected is replaced with the replacement sentence.
  17. 根据权利要求16所述的计算机存储介质,其中,所述根据所述待检测FST和所述关键词FST,确定所述待检测文本中的若干待纠错字词,包括:The computer storage medium according to claim 16, wherein, according to the FST to be detected and the keyword FST, determining several words to be corrected in the text to be detected includes:
    将所述待检测FST与所述关键词FST进行重组;Recombining the FST to be detected with the keyword FST;
    若所述待检测FST与所述关键词FST中存在若干相同节点,将所述待检测文本中对应所述相同节点的字词确定为待纠错字词;If there are several identical nodes in the FST to be detected and the keyword FST, the word corresponding to the same node in the text to be detected is determined as the word to be corrected;
    其中,所述相同节点的数量与所述待纠错字词的字数相同。Wherein, the number of the same nodes is the same as the word number of the word to be corrected.
  18. 根据权利要求16所述的计算机存储介质,其中,所述获取关键词FST和汉字混淆集,包括:The computer storage medium according to claim 16, wherein said obtaining the keyword FST and the confusion set of Chinese characters comprises:
    获取训练语音,所述训练语音与所述待检测语音属于同一垂直领域;Obtain training speech, the training speech and the speech to be detected belong to the same vertical field;
    对所述训练语音进行语音识别,得到语音识别文本;Carry out speech recognition to described training speech, obtain speech recognition text;
    根据所述语音识别文本,确定对应的第一发音序列;Determining a corresponding first pronunciation sequence according to the speech recognition text;
    对所述训练语音进行人工识别,得到人工识别文本;Carry out artificial recognition to described training speech, obtain artificial recognition text;
    根据所述人工识别文本,确定对应的第二发音序列;Determining a corresponding second pronunciation sequence according to the artificially recognized text;
    根据所述语音识别文本和所述人工识别文本,构建所述汉字混淆集;Construct the Chinese character confusion set according to the speech recognition text and the manual recognition text;
    根据所述第一发音序列和所述第二发音序列,构建发音混淆集;Construct a pronunciation confusion set according to the first pronunciation sequence and the second pronunciation sequence;
    根据所述发音混淆集、所述语音识别文本、所述人工识别文本、所述第一发音序列和所述第二发音序列,构建关键词表;Construct a keyword table according to the pronunciation confusion set, the speech recognition text, the artificial recognition text, the first pronunciation sequence and the second pronunciation sequence;
    根据所述关键词表,构建所述关键词FST。According to the keyword table, the keyword FST is constructed.
  19. 根据权利要求18所述的计算机存储介质,其中,所述根据所述语音识别文本和所述人工识别文本,构建所述汉字混淆集,包括:The computer storage medium according to claim 18, wherein said constructing said Chinese character confusion set according to said speech recognition text and said artificial recognition text comprises:
    将所述语音识别文本中的第一字词与所述人工识别文本中对应位置的第二字词进行比较;comparing a first word in the voice recognition text with a second word in a corresponding position in the manually recognized text;
    若当前所述第一字词与当前所述第二字词存在差异,将当前所述第一字词和当前所述第二字词作为所述替换字词,并将所述替换字词存入第一候选区;If there is a difference between the current first word and the current second word, use the current first word and the current second word as the replacement word, and store the replacement word into the first candidate area;
    当所述第一字词和所述第二字词比较完毕,将存在相同字词的若干所述第一候选区合并为同一个第二候选区;When the comparison between the first word and the second word is completed, a plurality of the first candidate areas with the same word are merged into the same second candidate area;
    构建包括若干所述第二候选区的所述汉字混淆集。Constructing the Chinese character confusion set including several second candidate regions.
  20. 根据权利要求18所述的计算机存储介质,其中,所述根据所述第一发音序列和所述第二发音序列,构建所述发音混淆集,包括:The computer storage medium according to claim 18, wherein said constructing said pronunciation confusion set according to said first pronunciation sequence and said second pronunciation sequence comprises:
    将所述第一发音序列中的第一拼音单元和所述第二发音序列中对应位置的第二拼音单元进行比较;comparing the first pinyin unit in the first pronunciation sequence with the second pinyin unit in the corresponding position in the second pronunciation sequence;
    若当前的所述第一拼音单元与当前的所述第二拼音单元存在差异,将当前的所述第一拼音单元和当前的所述第二拼音单元存入第一混淆区;If there is a difference between the current first pinyin unit and the current second pinyin unit, storing the current first pinyin unit and the current second pinyin unit into the first confusion area;
    当所述第一拼音单元和所述第二拼音单元比较完毕,将存在相同拼音的若干所述第一混淆区合并为同一个第二混淆区;When the comparison between the first pinyin unit and the second pinyin unit is completed, a plurality of the first confusion areas with the same pinyin are merged into the same second confusion area;
    构建包括若干所述第二混淆区的所述发音混淆集。Constructing the pronunciation confusion set including several second confusion regions.
PCT/CN2022/071074 2021-09-10 2022-01-10 Speech recognition error correction method and system, and apparatus and storage medium WO2023035525A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111064048.5A CN113779972B (en) 2021-09-10 2021-09-10 Speech recognition error correction method, system, device and storage medium
CN202111064048.5 2021-09-10

Publications (1)

Publication Number Publication Date
WO2023035525A1 true WO2023035525A1 (en) 2023-03-16

Family

ID=78842688

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/071074 WO2023035525A1 (en) 2021-09-10 2022-01-10 Speech recognition error correction method and system, and apparatus and storage medium

Country Status (2)

Country Link
CN (1) CN113779972B (en)
WO (1) WO2023035525A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117831573A (en) * 2024-03-06 2024-04-05 青岛理工大学 Multi-mode-based language barrier crowd speech recording analysis method and system

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779972B (en) * 2021-09-10 2023-09-15 平安科技(深圳)有限公司 Speech recognition error correction method, system, device and storage medium
CN114398876B (en) * 2022-03-24 2022-06-14 北京沃丰时代数据科技有限公司 Text error correction method and device based on finite state converter
CN114896965B (en) * 2022-05-17 2023-09-12 马上消费金融股份有限公司 Text correction model training method and device, text correction method and device
CN116246633B (en) * 2023-05-12 2023-07-21 深圳市宏辉智通科技有限公司 Wireless intelligent Internet of things conference system
CN117453578B (en) * 2023-12-25 2024-04-19 杭州云动智能汽车技术有限公司 NMEA sentence detection method and device, electronic equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016044321A1 (en) * 2014-09-16 2016-03-24 Min Tang Integration of domain information into state transitions of a finite state transducer for natural language processing
CN110210029A (en) * 2019-05-30 2019-09-06 浙江远传信息技术股份有限公司 Speech text error correction method, system, equipment and medium based on vertical field
CN110610700A (en) * 2019-10-16 2019-12-24 科大讯飞股份有限公司 Decoding network construction method, voice recognition method, device, equipment and storage medium
CN111369996A (en) * 2020-02-24 2020-07-03 网经科技(苏州)有限公司 Method for correcting text error in speech recognition in specific field
CN111428494A (en) * 2020-03-11 2020-07-17 中国平安人寿保险股份有限公司 Intelligent error correction method, device and equipment for proper nouns and storage medium
US20200251113A1 (en) * 2010-01-05 2020-08-06 Google Llc Word-level correction of speech input
US20210125600A1 (en) * 2019-04-30 2021-04-29 Boe Technology Group Co., Ltd. Voice question and answer method and device, computer readable storage medium and electronic device
CN112882680A (en) * 2021-01-22 2021-06-01 维沃移动通信有限公司 Voice recognition method and device
CN113361266A (en) * 2021-06-25 2021-09-07 达闼机器人有限公司 Text error correction method, electronic device and storage medium
CN113779972A (en) * 2021-09-10 2021-12-10 平安科技(深圳)有限公司 Speech recognition error correction method, system, device and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451121A (en) * 2017-08-03 2017-12-08 京东方科技集团股份有限公司 A kind of audio recognition method and its device
CN111753529B (en) * 2020-06-03 2021-07-27 杭州云嘉云计算有限公司 Chinese text error correction method based on pinyin identity or similarity
CN112232062A (en) * 2020-12-11 2021-01-15 北京百度网讯科技有限公司 Text error correction method and device, electronic equipment and storage medium
CN112863516A (en) * 2020-12-31 2021-05-28 竹间智能科技(上海)有限公司 Text error correction method and system and electronic equipment
CN113223509B (en) * 2021-04-28 2022-06-10 华南理工大学 Fuzzy statement identification method and system applied to multi-person mixed scene

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200251113A1 (en) * 2010-01-05 2020-08-06 Google Llc Word-level correction of speech input
WO2016044321A1 (en) * 2014-09-16 2016-03-24 Min Tang Integration of domain information into state transitions of a finite state transducer for natural language processing
US20210125600A1 (en) * 2019-04-30 2021-04-29 Boe Technology Group Co., Ltd. Voice question and answer method and device, computer readable storage medium and electronic device
CN110210029A (en) * 2019-05-30 2019-09-06 浙江远传信息技术股份有限公司 Speech text error correction method, system, equipment and medium based on vertical field
CN110610700A (en) * 2019-10-16 2019-12-24 科大讯飞股份有限公司 Decoding network construction method, voice recognition method, device, equipment and storage medium
CN111369996A (en) * 2020-02-24 2020-07-03 网经科技(苏州)有限公司 Method for correcting text error in speech recognition in specific field
CN111428494A (en) * 2020-03-11 2020-07-17 中国平安人寿保险股份有限公司 Intelligent error correction method, device and equipment for proper nouns and storage medium
CN112882680A (en) * 2021-01-22 2021-06-01 维沃移动通信有限公司 Voice recognition method and device
CN113361266A (en) * 2021-06-25 2021-09-07 达闼机器人有限公司 Text error correction method, electronic device and storage medium
CN113779972A (en) * 2021-09-10 2021-12-10 平安科技(深圳)有限公司 Speech recognition error correction method, system, device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117831573A (en) * 2024-03-06 2024-04-05 青岛理工大学 Multi-mode-based language barrier crowd speech recording analysis method and system
CN117831573B (en) * 2024-03-06 2024-05-14 青岛理工大学 Multi-mode-based language barrier crowd speech recording analysis method and system

Also Published As

Publication number Publication date
CN113779972A (en) 2021-12-10
CN113779972B (en) 2023-09-15

Similar Documents

Publication Publication Date Title
WO2023035525A1 (en) Speech recognition error correction method and system, and apparatus and storage medium
US10176804B2 (en) Analyzing textual data
KR100996817B1 (en) Generating large units of graphonemes with mutual information criterion for letter to sound conversion
JP7092953B2 (en) Phoneme-based context analysis for multilingual speech recognition with an end-to-end model
US20090150139A1 (en) Method and apparatus for translating a speech
Pennell et al. Normalization of informal text
CN105404621B (en) A kind of method and system that Chinese character is read for blind person
US11093110B1 (en) Messaging feedback mechanism
Pennell et al. Toward text message normalization: Modeling abbreviation generation
US11257484B2 (en) Data-driven and rule-based speech recognition output enhancement
CN111985234A (en) Voice text error correction method
Sreeram et al. Exploration of end-to-end framework for code-switching speech recognition task: Challenges and enhancements
Duşçu et al. Polarity classification of twitter messages using audio processing
Palmer et al. Robust information extraction from automatically generated speech transcriptions
JP2024512579A (en) Lookup table recurrent language model
Sridhar et al. Enriching machine-mediated speech-to-speech translation using contextual information
Zukerman et al. Improving the understanding of spoken referring expressions through syntactic-semantic and contextual-phonetic error-correction
Dinarelli et al. Concept segmentation and labeling for conversational speech
Petrik et al. Semantic and phonetic automatic reconstruction of medical dictations
US11900072B1 (en) Quick lookup for speech translation
Marin Effective use of cross-domain parsing in automatic speech recognition and error detection
Polyàkova Grapheme-to-phoneme conversion in the era of globalization
Baughman et al. Using language models for improving speech recognition for US Open Tennis Championships
CN114357979A (en) Subtitle making method and device and computer readable storage medium
Lehnen Maximum entropy models for sequences: scaling up from tagging to translation.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22866005

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE