CN107103902B

CN107103902B - Complete speech content recursive recognition method

Info

Publication number: CN107103902B
Application number: CN201710449747.9A
Authority: CN
Inventors: 谢国雄
Original assignee: Shanghai Enjoy Culture Communication Co Ltd
Current assignee: Shanghai Enjoy Culture Communication Co Ltd
Priority date: 2017-06-14
Filing date: 2017-06-14
Publication date: 2020-02-04
Anticipated expiration: 2037-06-14
Also published as: CN107103902A

Abstract

The invention relates to a complete speech content recursive recognition method. Designed to improve the accuracy of speech recognition. The method comprises the following steps: preliminarily recognizing each segment of sub-voice, respectively adopting word segmentation and classification, grammar unit examination and static semantic examination to analyze the semantics of each segment of sub-voice, and calculating the confidence coefficients of the preliminary recognition and the semantic analysis of each segment of sub-voice; and reordering each recognition version in the recognition result according to the confidence degrees to correct the recognition result of the segment, combining a plurality of segments of sub-voices in the initial S2 pairwise to form new combined sub-voices, respectively performing primary voice recognition and semantic analysis, calculating the confidence degrees of the primary recognition and semantic analysis of each segment of combined sub-voices, and repeating the combining step until the combined complete sentences are combined into the initial complete sentences, and finally obtaining the recognition result set of the whole main voice and the corresponding semantic understanding result set through cutting and combining recursions in the 2 directions.

Description

Complete speech content recursive recognition method

Technical Field

The invention relates to a complete speech content recursive recognition method.

Background

In a conventional speech recognition apparatus that performs speech recognition on a client and a server, speech recognition is performed on the client, and when it is determined that the recognition score of the speech recognition result of the client is low and the recognition accuracy is poor, speech recognition is performed on the server and the speech recognition result of the server is used.

The existing speech recognition technology, aiming at the recognition of long-length speech (more than 1 sentence), is also based on the one-by-one recognition of smaller-length unit speech, and cannot utilize the complete information contained in the complete-length speech to further correct and improve the recognition rate.

In view of the above, the present designer is actively making research and innovation to create a complete speech content recursive recognition method, so that the method has industrial application value.

Disclosure of Invention

To solve the above technical problems, an object of the present invention is to provide a complete speech content recursive recognition method for improving the recognition rate of the computer to the speech by using complete text speech content.

The invention discloses a complete speech content recursive recognition method, which comprises the following steps:

s1, acquiring a section of audio as a main voice;

s2 fuzzy cutting the main voice into n segments of sub-voices;

s3 primarily recognizing each sub-voice, analyzing the semanteme of each sub-voice by respectively adopting word segmentation and classification, grammar unit examination and static semantic examination for each sub-voice, and calculating the confidence coefficient of the primary recognition and semantic analysis of each sub-voice;

s4 recalculating the confidence of each element for each segment of sub-speech by comparing the recognition result patterns and semantics of the adjacent sub-speech, and reordering each recognition version in the recognition result according to the confidence to correct the recognition result of the segment, wherein the recognition version is the same segment of speech appearing in different sub-speech and combined sub-speech, there are various recognition result patterns of different versions, and each result pattern is a recognition version; for a total of n pieces of sub-voices, 2 pieces of sub-voices in parentheses are defined as adjacent sub-voices in the following manner [ (1|2), (2,3), …, (n-1, n) ] n > 1;

s5 dividing the sub speech of step S4 as the main speech of S1 into predetermined segments, and repeating the above steps S2 to S5 until the speech becomes a word; the word is a group of words which are composed of one or more words and contain semantics;

s6 is to fuzzy cut the main voice into n sections of sub voice by using the section of audio obtained in S1 as the main voice, combine the sub voice two by two to become new combined sub voice, respectively carry out voice initial recognition and semantic analysis, calculate the confidence of the initial recognition and semantic analysis of each section of combined sub voice, and then repeat the combination step until the combined initial complete sentence is combined into the initial complete sentence, and finally obtain the recognition result set of the whole main voice and the corresponding semantic understanding result set through the recursion in the 2 directions of cutting and combining.

Further, in S2, a natural pause of the speech is recognized according to the pre-trained speech pause model, and the main speech is divided into several sub-speech segments according to the natural pause of the speech.

Further, the predetermined segments in step S5 are 3, 4, 5.

Further, each sub-speech is preliminarily recognized through a phoneme acoustic model comparison method.

The invention discloses a complete speech content recursive recognition system, which comprises:

the voice frequency acquisition unit is used for acquiring a section of voice frequency as main voice and carrying out fuzzy cutting on the main voice into n sections of sub-voice;

the preliminary recognition unit is used for preliminarily recognizing each segment of sub-voice, analyzing the semanteme of each segment of sub-voice by respectively adopting word segmentation and classification, grammar unit examination and static semantic examination for each segment of sub-voice, and calculating the confidence coefficient of the preliminary recognition and semantic analysis of each segment of sub-voice;

the correction unit is used for recalculating the confidence coefficient of each element of each segment of sub-voice through the comparison between the recognition result patterns and the semantics of the adjacent sub-voice, and reordering each recognition version in the recognition result according to the confidence coefficient to correct the recognition result of the segment, wherein the recognition version is the same segment of voice appearing in different sub-voice and combined sub-voice, and has various recognition result patterns with different versions, and each result pattern is a recognition version; for a total of n pieces of sub-voices, 2 pieces of sub-voices in parentheses are defined as adjacent sub-voices in the following manner [ (1|2), (2,3), …, (n-1, n) ] n > 1;

the segmentation unit is used for taking the sub-voice as the main voice of the audio acquisition unit, segmenting the sub-voice into preset segments, and repeatedly operating the primary recognition unit and the correction unit until the voice becomes a word; the word is a group of words which are composed of one or more words and contain semantics;

a merging unit, which is used for fuzzily cutting a section of audio acquired by the audio acquisition unit into n sections of sub-voices, merging every two of the sub-voices into new merged sub-voices, respectively performing primary voice recognition and semantic analysis, calculating the confidence coefficient of the primary recognition and the semantic analysis of each section of merged sub-voices, and then repeating the merging step until the merged sub-voices are merged into an initial complete sentence;

and finally obtaining a recognition result set of the whole main voice and a corresponding semantic understanding result set by cutting and combining recursions in the 2 directions.

Compared with the prior art, the complete speech content recursive recognition method has the following advantages:

compared with the existing small-length unit voice recognition technology, the method can improve recognition accuracy on the basis of complete voice content and the most subdivided vocabularies, and meanwhile, a means for presetting recognition speed and estimating recognition accuracy is created by setting recursion times and sub-voice lengths. The whole process of the invention ensures that the computer can completely recognize and understand the whole sentence and each vocabulary, and obtains the recognition result with the highest confidence coefficient.

The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to implement them in accordance with the contents of the description, the following detailed description is given with reference to the preferred embodiments of the present invention and the accompanying drawings.

Drawings

FIG. 1 is a flowchart of a recursive recognition method for complete speech content according to the present invention.

Detailed Description

The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

Example 1

As shown in FIG. 1, the preferred embodiment of the present invention relates to a method for recursive recognition of complete speech content, which comprises:

s1, acquiring a section of audio as a main voice;

s2 fuzzy cutting the main voice into n segments of sub-voices;

Further, the predetermined segments in step S5 are 3, 4, 5.

In this embodiment, the recognition rate is further corrected and improved by using the complete information contained in the complete space of speech. By setting the recursion times and the sub-voice length, a means of presetting the recognition speed and pre-estimating the recognition accuracy is created.

Example 2

The invention discloses a best embodiment of a complete speech content recursive recognition system, which comprises the following steps:

In the above embodiments, each sub-speech is preliminarily recognized by the phoneme acoustic model comparison method.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, it should be noted that, for those skilled in the art, many modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A method for recursively recognizing complete speech content, comprising:

s1, acquiring a section of audio as a main voice;

s2 fuzzy cutting the main voice into n segments of sub-voices;

s4 recalculating confidence coefficient of preliminary recognition and semantic analysis of each sub-voice by comparing recognition result patterns and semantics of adjacent sub-voices of each segment of sub-voice, and reordering each recognition version in the recognition result according to the confidence coefficient to correct the recognition result of the segment, wherein the recognition version is the same segment of voice appearing in different 'sub-voices' and 'merged sub-voices', there are various versions of recognition result patterns, and each result pattern is a recognition version; for a total of n pieces of sub-voices, 2 pieces of sub-voices in parentheses are defined as adjacent sub-voices in the following manner [ (1|2), (2,3), …, (n-1, n) ] n > 1;

s6, fuzzily cutting the main voice into n sections of sub-voices by taking the section of audio acquired in S1 as the main voice, combining the sub-voices pairwise to form new combined sub-voices, respectively performing primary voice recognition and semantic analysis, calculating confidence coefficients of the primary recognition and the semantic analysis of each section of combined sub-voice, and then repeating the combination step until the combined sub-voices are combined into an initial complete sentence; and finally obtaining a recognition result set of the whole main voice and a corresponding semantic understanding result set by cutting and combining recursions in the 2 directions.

2. The method of claim 1 wherein in step S2, a natural pause of the speech is recognized according to a pre-trained speech pause model, and the main speech is divided into sub-speech segments according to the natural pause of the speech.

3. The method for recursively recognizing complete speech contents according to claim 1, wherein the predetermined segments in step S5 are 3, 4 and 5.

4. The method of claim 1 wherein each sub-speech is initially recognized by a phoneme acoustic model comparison method.

5. A complete speech content recursive recognition system, comprising:

the correction unit is used for recalculating the confidence coefficient of the primary recognition and semantic analysis of each segment of sub-voice by comparing the recognition result patterns and the semantics of the adjacent sub-voices of each segment of sub-voice, and reordering each recognition version in the recognition result according to the confidence coefficient to correct the recognition result of the segment, wherein the recognition version is the same segment of voice appearing in different sub-voices and combined sub-voices, and has recognition result patterns of various versions, and each result pattern is a recognition version; for a total of n pieces of sub-voices, 2 pieces of sub-voices in parentheses are defined as adjacent sub-voices in the following manner [ (1|2), (2,3), …, (n-1, n) ] n > 1;

a merging unit, which is used for fuzzily cutting a section of audio acquired by the audio acquisition unit into n sections of sub-voices, merging every two of the sub-voices into new merged sub-voices, respectively performing primary voice recognition and semantic analysis, calculating the confidence coefficient of the primary recognition and the semantic analysis of each section of merged sub-voice, and then repeating the merging step until the merged sub-voices are merged into an initial complete sentence;