CN107103902B - Complete speech content recursive recognition method - Google Patents

Complete speech content recursive recognition method Download PDF

Info

Publication number
CN107103902B
CN107103902B CN201710449747.9A CN201710449747A CN107103902B CN 107103902 B CN107103902 B CN 107103902B CN 201710449747 A CN201710449747 A CN 201710449747A CN 107103902 B CN107103902 B CN 107103902B
Authority
CN
China
Prior art keywords
sub
voice
recognition
voices
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710449747.9A
Other languages
Chinese (zh)
Other versions
CN107103902A (en
Inventor
谢国雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Enjoy Culture Communication Co Ltd
Original Assignee
Shanghai Enjoy Culture Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Enjoy Culture Communication Co Ltd filed Critical Shanghai Enjoy Culture Communication Co Ltd
Priority to CN201710449747.9A priority Critical patent/CN107103902B/en
Publication of CN107103902A publication Critical patent/CN107103902A/en
Application granted granted Critical
Publication of CN107103902B publication Critical patent/CN107103902B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a complete speech content recursive recognition method. Designed to improve the accuracy of speech recognition. The method comprises the following steps: preliminarily recognizing each segment of sub-voice, respectively adopting word segmentation and classification, grammar unit examination and static semantic examination to analyze the semantics of each segment of sub-voice, and calculating the confidence coefficients of the preliminary recognition and the semantic analysis of each segment of sub-voice; and reordering each recognition version in the recognition result according to the confidence degrees to correct the recognition result of the segment, combining a plurality of segments of sub-voices in the initial S2 pairwise to form new combined sub-voices, respectively performing primary voice recognition and semantic analysis, calculating the confidence degrees of the primary recognition and semantic analysis of each segment of combined sub-voices, and repeating the combining step until the combined complete sentences are combined into the initial complete sentences, and finally obtaining the recognition result set of the whole main voice and the corresponding semantic understanding result set through cutting and combining recursions in the 2 directions.

Description

Complete speech content recursive recognition method
Technical Field
The invention relates to a complete speech content recursive recognition method.
Background
In a conventional speech recognition apparatus that performs speech recognition on a client and a server, speech recognition is performed on the client, and when it is determined that the recognition score of the speech recognition result of the client is low and the recognition accuracy is poor, speech recognition is performed on the server and the speech recognition result of the server is used.
The existing speech recognition technology, aiming at the recognition of long-length speech (more than 1 sentence), is also based on the one-by-one recognition of smaller-length unit speech, and cannot utilize the complete information contained in the complete-length speech to further correct and improve the recognition rate.
In view of the above, the present designer is actively making research and innovation to create a complete speech content recursive recognition method, so that the method has industrial application value.
Disclosure of Invention
To solve the above technical problems, an object of the present invention is to provide a complete speech content recursive recognition method for improving the recognition rate of the computer to the speech by using complete text speech content.
The invention discloses a complete speech content recursive recognition method, which comprises the following steps:
s1, acquiring a section of audio as a main voice;
s2 fuzzy cutting the main voice into n segments of sub-voices;
s3 primarily recognizing each sub-voice, analyzing the semanteme of each sub-voice by respectively adopting word segmentation and classification, grammar unit examination and static semantic examination for each sub-voice, and calculating the confidence coefficient of the primary recognition and semantic analysis of each sub-voice;
s4 recalculating the confidence of each element for each segment of sub-speech by comparing the recognition result patterns and semantics of the adjacent sub-speech, and reordering each recognition version in the recognition result according to the confidence to correct the recognition result of the segment, wherein the recognition version is the same segment of speech appearing in different sub-speech and combined sub-speech, there are various recognition result patterns of different versions, and each result pattern is a recognition version; for a total of n pieces of sub-voices, 2 pieces of sub-voices in parentheses are defined as adjacent sub-voices in the following manner [ (1|2), (2,3), …, (n-1, n) ] n > 1;
s5 dividing the sub speech of step S4 as the main speech of S1 into predetermined segments, and repeating the above steps S2 to S5 until the speech becomes a word; the word is a group of words which are composed of one or more words and contain semantics;
s6 is to fuzzy cut the main voice into n sections of sub voice by using the section of audio obtained in S1 as the main voice, combine the sub voice two by two to become new combined sub voice, respectively carry out voice initial recognition and semantic analysis, calculate the confidence of the initial recognition and semantic analysis of each section of combined sub voice, and then repeat the combination step until the combined initial complete sentence is combined into the initial complete sentence, and finally obtain the recognition result set of the whole main voice and the corresponding semantic understanding result set through the recursion in the 2 directions of cutting and combining.
Further, in S2, a natural pause of the speech is recognized according to the pre-trained speech pause model, and the main speech is divided into several sub-speech segments according to the natural pause of the speech.
Further, the predetermined segments in step S5 are 3, 4, 5.
Further, each sub-speech is preliminarily recognized through a phoneme acoustic model comparison method.
The invention discloses a complete speech content recursive recognition system, which comprises:
the voice frequency acquisition unit is used for acquiring a section of voice frequency as main voice and carrying out fuzzy cutting on the main voice into n sections of sub-voice;
the preliminary recognition unit is used for preliminarily recognizing each segment of sub-voice, analyzing the semanteme of each segment of sub-voice by respectively adopting word segmentation and classification, grammar unit examination and static semantic examination for each segment of sub-voice, and calculating the confidence coefficient of the preliminary recognition and semantic analysis of each segment of sub-voice;
the correction unit is used for recalculating the confidence coefficient of each element of each segment of sub-voice through the comparison between the recognition result patterns and the semantics of the adjacent sub-voice, and reordering each recognition version in the recognition result according to the confidence coefficient to correct the recognition result of the segment, wherein the recognition version is the same segment of voice appearing in different sub-voice and combined sub-voice, and has various recognition result patterns with different versions, and each result pattern is a recognition version; for a total of n pieces of sub-voices, 2 pieces of sub-voices in parentheses are defined as adjacent sub-voices in the following manner [ (1|2), (2,3), …, (n-1, n) ] n > 1;
the segmentation unit is used for taking the sub-voice as the main voice of the audio acquisition unit, segmenting the sub-voice into preset segments, and repeatedly operating the primary recognition unit and the correction unit until the voice becomes a word; the word is a group of words which are composed of one or more words and contain semantics;
a merging unit, which is used for fuzzily cutting a section of audio acquired by the audio acquisition unit into n sections of sub-voices, merging every two of the sub-voices into new merged sub-voices, respectively performing primary voice recognition and semantic analysis, calculating the confidence coefficient of the primary recognition and the semantic analysis of each section of merged sub-voices, and then repeating the merging step until the merged sub-voices are merged into an initial complete sentence;
and finally obtaining a recognition result set of the whole main voice and a corresponding semantic understanding result set by cutting and combining recursions in the 2 directions.
Compared with the prior art, the complete speech content recursive recognition method has the following advantages:
compared with the existing small-length unit voice recognition technology, the method can improve recognition accuracy on the basis of complete voice content and the most subdivided vocabularies, and meanwhile, a means for presetting recognition speed and estimating recognition accuracy is created by setting recursion times and sub-voice lengths. The whole process of the invention ensures that the computer can completely recognize and understand the whole sentence and each vocabulary, and obtains the recognition result with the highest confidence coefficient.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to implement them in accordance with the contents of the description, the following detailed description is given with reference to the preferred embodiments of the present invention and the accompanying drawings.
Drawings
FIG. 1 is a flowchart of a recursive recognition method for complete speech content according to the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
Example 1
As shown in FIG. 1, the preferred embodiment of the present invention relates to a method for recursive recognition of complete speech content, which comprises:
s1, acquiring a section of audio as a main voice;
s2 fuzzy cutting the main voice into n segments of sub-voices;
s3 primarily recognizing each sub-voice, analyzing the semanteme of each sub-voice by respectively adopting word segmentation and classification, grammar unit examination and static semantic examination for each sub-voice, and calculating the confidence coefficient of the primary recognition and semantic analysis of each sub-voice;
s4 recalculating the confidence of each element for each segment of sub-speech by comparing the recognition result patterns and semantics of the adjacent sub-speech, and reordering each recognition version in the recognition result according to the confidence to correct the recognition result of the segment, wherein the recognition version is the same segment of speech appearing in different sub-speech and combined sub-speech, there are various recognition result patterns of different versions, and each result pattern is a recognition version; for a total of n pieces of sub-voices, 2 pieces of sub-voices in parentheses are defined as adjacent sub-voices in the following manner [ (1|2), (2,3), …, (n-1, n) ] n > 1;
s5 dividing the sub speech of step S4 as the main speech of S1 into predetermined segments, and repeating the above steps S2 to S5 until the speech becomes a word; the word is a group of words which are composed of one or more words and contain semantics;
s6 is to fuzzy cut the main voice into n sections of sub voice by using the section of audio obtained in S1 as the main voice, combine the sub voice two by two to become new combined sub voice, respectively carry out voice initial recognition and semantic analysis, calculate the confidence of the initial recognition and semantic analysis of each section of combined sub voice, and then repeat the combination step until the combined initial complete sentence is combined into the initial complete sentence, and finally obtain the recognition result set of the whole main voice and the corresponding semantic understanding result set through the recursion in the 2 directions of cutting and combining.
Further, in S2, a natural pause of the speech is recognized according to the pre-trained speech pause model, and the main speech is divided into several sub-speech segments according to the natural pause of the speech.
Further, the predetermined segments in step S5 are 3, 4, 5.
In this embodiment, the recognition rate is further corrected and improved by using the complete information contained in the complete space of speech. By setting the recursion times and the sub-voice length, a means of presetting the recognition speed and pre-estimating the recognition accuracy is created.
Example 2
The invention discloses a best embodiment of a complete speech content recursive recognition system, which comprises the following steps:
the voice frequency acquisition unit is used for acquiring a section of voice frequency as main voice and carrying out fuzzy cutting on the main voice into n sections of sub-voice;
the preliminary recognition unit is used for preliminarily recognizing each segment of sub-voice, analyzing the semanteme of each segment of sub-voice by respectively adopting word segmentation and classification, grammar unit examination and static semantic examination for each segment of sub-voice, and calculating the confidence coefficient of the preliminary recognition and semantic analysis of each segment of sub-voice;
the correction unit is used for recalculating the confidence coefficient of each element of each segment of sub-voice through the comparison between the recognition result patterns and the semantics of the adjacent sub-voice, and reordering each recognition version in the recognition result according to the confidence coefficient to correct the recognition result of the segment, wherein the recognition version is the same segment of voice appearing in different sub-voice and combined sub-voice, and has various recognition result patterns with different versions, and each result pattern is a recognition version; for a total of n pieces of sub-voices, 2 pieces of sub-voices in parentheses are defined as adjacent sub-voices in the following manner [ (1|2), (2,3), …, (n-1, n) ] n > 1;
the segmentation unit is used for taking the sub-voice as the main voice of the audio acquisition unit, segmenting the sub-voice into preset segments, and repeatedly operating the primary recognition unit and the correction unit until the voice becomes a word; the word is a group of words which are composed of one or more words and contain semantics;
a merging unit, which is used for fuzzily cutting a section of audio acquired by the audio acquisition unit into n sections of sub-voices, merging every two of the sub-voices into new merged sub-voices, respectively performing primary voice recognition and semantic analysis, calculating the confidence coefficient of the primary recognition and the semantic analysis of each section of merged sub-voices, and then repeating the merging step until the merged sub-voices are merged into an initial complete sentence;
and finally obtaining a recognition result set of the whole main voice and a corresponding semantic understanding result set by cutting and combining recursions in the 2 directions.
In the above embodiments, each sub-speech is preliminarily recognized by the phoneme acoustic model comparison method.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, it should be noted that, for those skilled in the art, many modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (5)

1. A method for recursively recognizing complete speech content, comprising:
s1, acquiring a section of audio as a main voice;
s2 fuzzy cutting the main voice into n segments of sub-voices;
s3 primarily recognizing each sub-voice, analyzing the semanteme of each sub-voice by respectively adopting word segmentation and classification, grammar unit examination and static semantic examination for each sub-voice, and calculating the confidence coefficient of the primary recognition and semantic analysis of each sub-voice;
s4 recalculating confidence coefficient of preliminary recognition and semantic analysis of each sub-voice by comparing recognition result patterns and semantics of adjacent sub-voices of each segment of sub-voice, and reordering each recognition version in the recognition result according to the confidence coefficient to correct the recognition result of the segment, wherein the recognition version is the same segment of voice appearing in different 'sub-voices' and 'merged sub-voices', there are various versions of recognition result patterns, and each result pattern is a recognition version; for a total of n pieces of sub-voices, 2 pieces of sub-voices in parentheses are defined as adjacent sub-voices in the following manner [ (1|2), (2,3), …, (n-1, n) ] n > 1;
s5 dividing the sub speech of step S4 as the main speech of S1 into predetermined segments, and repeating the above steps S2 to S5 until the speech becomes a word; the word is a group of words which are composed of one or more words and contain semantics;
s6, fuzzily cutting the main voice into n sections of sub-voices by taking the section of audio acquired in S1 as the main voice, combining the sub-voices pairwise to form new combined sub-voices, respectively performing primary voice recognition and semantic analysis, calculating confidence coefficients of the primary recognition and the semantic analysis of each section of combined sub-voice, and then repeating the combination step until the combined sub-voices are combined into an initial complete sentence; and finally obtaining a recognition result set of the whole main voice and a corresponding semantic understanding result set by cutting and combining recursions in the 2 directions.
2. The method of claim 1 wherein in step S2, a natural pause of the speech is recognized according to a pre-trained speech pause model, and the main speech is divided into sub-speech segments according to the natural pause of the speech.
3. The method for recursively recognizing complete speech contents according to claim 1, wherein the predetermined segments in step S5 are 3, 4 and 5.
4. The method of claim 1 wherein each sub-speech is initially recognized by a phoneme acoustic model comparison method.
5. A complete speech content recursive recognition system, comprising:
the voice frequency acquisition unit is used for acquiring a section of voice frequency as main voice and carrying out fuzzy cutting on the main voice into n sections of sub-voice;
the preliminary recognition unit is used for preliminarily recognizing each segment of sub-voice, analyzing the semanteme of each segment of sub-voice by respectively adopting word segmentation and classification, grammar unit examination and static semantic examination for each segment of sub-voice, and calculating the confidence coefficient of the preliminary recognition and semantic analysis of each segment of sub-voice;
the correction unit is used for recalculating the confidence coefficient of the primary recognition and semantic analysis of each segment of sub-voice by comparing the recognition result patterns and the semantics of the adjacent sub-voices of each segment of sub-voice, and reordering each recognition version in the recognition result according to the confidence coefficient to correct the recognition result of the segment, wherein the recognition version is the same segment of voice appearing in different sub-voices and combined sub-voices, and has recognition result patterns of various versions, and each result pattern is a recognition version; for a total of n pieces of sub-voices, 2 pieces of sub-voices in parentheses are defined as adjacent sub-voices in the following manner [ (1|2), (2,3), …, (n-1, n) ] n > 1;
the segmentation unit is used for taking the sub-voice as the main voice of the audio acquisition unit, segmenting the sub-voice into preset segments, and repeatedly operating the primary recognition unit and the correction unit until the voice becomes a word; the word is a group of words which are composed of one or more words and contain semantics;
a merging unit, which is used for fuzzily cutting a section of audio acquired by the audio acquisition unit into n sections of sub-voices, merging every two of the sub-voices into new merged sub-voices, respectively performing primary voice recognition and semantic analysis, calculating the confidence coefficient of the primary recognition and the semantic analysis of each section of merged sub-voice, and then repeating the merging step until the merged sub-voices are merged into an initial complete sentence;
and finally obtaining a recognition result set of the whole main voice and a corresponding semantic understanding result set by cutting and combining recursions in the 2 directions.
CN201710449747.9A 2017-06-14 2017-06-14 Complete speech content recursive recognition method Active CN107103902B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710449747.9A CN107103902B (en) 2017-06-14 2017-06-14 Complete speech content recursive recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710449747.9A CN107103902B (en) 2017-06-14 2017-06-14 Complete speech content recursive recognition method

Publications (2)

Publication Number Publication Date
CN107103902A CN107103902A (en) 2017-08-29
CN107103902B true CN107103902B (en) 2020-02-04

Family

ID=59660290

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710449747.9A Active CN107103902B (en) 2017-06-14 2017-06-14 Complete speech content recursive recognition method

Country Status (1)

Country Link
CN (1) CN107103902B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108573707B (en) * 2017-12-27 2020-11-03 北京金山云网络技术有限公司 Method, device, equipment and medium for processing voice recognition result
CN109257547B (en) * 2018-09-21 2021-04-06 南京邮电大学 Chinese online audio/video subtitle generating method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1455357A (en) * 2003-05-23 2003-11-12 郑方 Method for realizing multi-path dialogue for man-machine Chinese colloguial conversational system
CN1831937A (en) * 2005-03-08 2006-09-13 台达电子工业股份有限公司 Method and device for voice identification and language comprehension analysing
CN101201818A (en) * 2006-12-13 2008-06-18 李萍 Method for calculating language structure, executing participle, machine translation and speech recognition using HMM
CN104485106A (en) * 2014-12-08 2015-04-01 畅捷通信息技术股份有限公司 Voice recognition method, voice recognition system and voice recognition equipment
CN106649666A (en) * 2016-11-30 2017-05-10 浪潮电子信息产业股份有限公司 Left-right recursion-based new word discovery method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020152071A1 (en) * 2001-04-12 2002-10-17 David Chaiken Human-augmented, automatic speech recognition engine

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1455357A (en) * 2003-05-23 2003-11-12 郑方 Method for realizing multi-path dialogue for man-machine Chinese colloguial conversational system
CN1831937A (en) * 2005-03-08 2006-09-13 台达电子工业股份有限公司 Method and device for voice identification and language comprehension analysing
CN101201818A (en) * 2006-12-13 2008-06-18 李萍 Method for calculating language structure, executing participle, machine translation and speech recognition using HMM
CN104485106A (en) * 2014-12-08 2015-04-01 畅捷通信息技术股份有限公司 Voice recognition method, voice recognition system and voice recognition equipment
CN106649666A (en) * 2016-11-30 2017-05-10 浪潮电子信息产业股份有限公司 Left-right recursion-based new word discovery method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
简述ZX-2029型电话机的原理与制作;朱璇等;《电脑知识与技术》;20130531;第3431-3435 *

Also Published As

Publication number Publication date
CN107103902A (en) 2017-08-29

Similar Documents

Publication Publication Date Title
CN110263322B (en) Audio corpus screening method and device for speech recognition and computer equipment
CN110364171B (en) Voice recognition method, voice recognition system and storage medium
KR102413692B1 (en) Apparatus and method for caculating acoustic score for speech recognition, speech recognition apparatus and method, and electronic device
CN109410914B (en) Method for identifying Jiangxi dialect speech and dialect point
Schuster et al. Japanese and korean voice search
US7813929B2 (en) Automatic editing using probabilistic word substitution models
WO2007097176A1 (en) Speech recognition dictionary making supporting system, speech recognition dictionary making supporting method, and speech recognition dictionary making supporting program
CN107291684B (en) Word segmentation method and system for language text
WO2014187096A1 (en) Method and system for adding punctuation to voice files
US20090265166A1 (en) Boundary estimation apparatus and method
CN110019741B (en) Question-answering system answer matching method, device, equipment and readable storage medium
WO2019100458A1 (en) Method and device for segmenting thai syllables
CN104679735A (en) Pragmatic machine translation method
JP6875819B2 (en) Acoustic model input data normalization device and method, and voice recognition device
CN107103902B (en) Complete speech content recursive recognition method
CN112818680A (en) Corpus processing method and device, electronic equipment and computer-readable storage medium
JP6242963B2 (en) Language model improvement apparatus and method, speech recognition apparatus and method
CN111933113B (en) Voice recognition method, device, equipment and medium
CN111222331B (en) Auxiliary decoding method and device, electronic equipment and readable storage medium
Granell et al. Combining handwriting and speech recognition for transcribing historical handwritten documents
Neubig et al. Improved statistical models for SMT-based speaking style transformation
Kuo et al. Morphological and syntactic features for Arabic speech recognition
CN114254628A (en) Method and device for quickly extracting hot words by combining user text in voice transcription, electronic equipment and storage medium
Milne Improving the accuracy of forced alignment through model selection and dictionary restriction
JP5344396B2 (en) Language learning device, language learning program, and language learning method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant