CN107103902A - Complete speech content recurrence recognition methods - Google Patents

Complete speech content recurrence recognition methods Download PDF

Info

Publication number
CN107103902A
CN107103902A CN201710449747.9A CN201710449747A CN107103902A CN 107103902 A CN107103902 A CN 107103902A CN 201710449747 A CN201710449747 A CN 201710449747A CN 107103902 A CN107103902 A CN 107103902A
Authority
CN
China
Prior art keywords
voice
cross
talk
sub
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710449747.9A
Other languages
Chinese (zh)
Other versions
CN107103902B (en
Inventor
谢国雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Enjoy Culture Communication Co Ltd
Original Assignee
Shanghai Enjoy Culture Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Enjoy Culture Communication Co Ltd filed Critical Shanghai Enjoy Culture Communication Co Ltd
Priority to CN201710449747.9A priority Critical patent/CN107103902B/en
Publication of CN107103902A publication Critical patent/CN107103902A/en
Application granted granted Critical
Publication of CN107103902B publication Critical patent/CN107103902B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

The present invention relates to a kind of complete speech content recurrence recognition methods.Designed to improve speech recognition accuracy.Method of the present invention includes:Tentatively recognize per cross-talk voice, and every cross-talk voice is respectively adopted word and cut the semanteme that classification, syntactical unit examination and static semantic examination analyze every cross-talk voice, calculate preliminary identification, the confidence level of semantic analysis of every cross-talk voice;The recognition result of this section is corrected according to confidence level rearrangement to each identification version in recognition result, merge " the merging sub- voice " for becoming new two-by-two to some cross-talk voices in initial S2, respectively carry out voice tentatively recognize, semantic analysis, preliminary identification, the confidence level of semantic analysis of every section " merging sub- voice " are calculated, then repeating this combining step, initially complete sentence finally gives the recognition result set of whole main speech by cutting with the recurrence on this 2 directions of merging and the corresponding meaning of one's words understands results set until being merged into.

Description

Complete speech content recurrence recognition methods
Technical field
The present invention relates to a kind of complete speech content recurrence recognition methods.
Background technology
The existing speech recognition equipment for carrying out speech recognition in client and server carries out voice in client first Identification, it is relatively low in the identification fraction for being determined as the voice identification result of client, in the case that accuracy of identification is poor, in server Carry out speech recognition and using the voice identification result of server.
Existing voice identification technology, for the identification of the voice (more than 1 word) of long length, is also based on less length Knowledge is further corrected and lifted to the identification one by one of unit voice, the complete information that could not be included using the voice of complete length Not other rate.
In view of above-mentioned, the design people is actively subject to research and innovation, to found a kind of complete speech content recurrence identification side Method, makes it with more the value in industry.
The content of the invention
In order to solve the above technical problems, the purpose of the present invention is carried to provide a kind of voice content using complete length Complete speech content recurrence recognition methods of the high computer to the discrimination of voice.
Complete speech content recurrence recognition methods of the present invention, including:
S1 obtains a section audio and is used as main speech;
The fuzzy main speech that cuts of S2 is into n cross-talk voices;
The preliminary identifications of S3 every cross-talk voice are respectively adopted word cutting classification, syntactical unit examination per cross-talk voice The semanteme for analyzing every cross-talk voice is examined with static semantic, preliminary identification, the confidence of semantic analysis of every cross-talk voice is calculated Degree;
Recognition result official documents and correspondence and semantic contrast of each cross-talk voices of S4 by adjacent sub- voice, to recalculate each The confidence level of element, the recognition result of this section is corrected to each identification version in recognition result according to confidence level rearrangement, Wherein, the identification version is to appear in same section of voice in different " sub- voices " and " the sub- voice of merging ", have it is various not With the recognition result official documents and correspondence of version, each result official documents and correspondence is an identification version;For n cross-talks voice altogether, by such as lower section Formula [(1 | 2), (2,3) ..., (n-1, n)] 2 cross-talk voices in bracket are defined as adjacent sub- voice by n > 1;
Sub- voice in step S4 as the main speech in S1, is divided into predetermined section by S5, the S2 that repeats the above steps to S5, Until voice becomes a word;Institute's predicate refers to, is made up of one or more words, includes one group of semantic word;
S6, as main speech, obscures cutting main speech into n cross-talk voices, by institute to the section audio of acquisition one that will be obtained in S1 State sub- voice and merge two-by-two and become new " merging sub- voice ", carry out respectively voice tentatively recognize, semantic analysis, calculate every section The preliminary identification of " merging sub- voice ", the confidence level of semantic analysis, then repeat this combining step initially complete until being merged into Sentence finally give the recognition result set of whole main speech and corresponding with the recurrence on this 2 directions of merging by cutting The meaning of one's words understand results set.
Further, the natural pause of voice is identified, according to voice according to the speech pause model of training in advance in S2 It is natural pause main speech is divided into some cross-talk voices.
Further, the predetermined section in step S5 is 3,4,5.
Further, by phoneme acoustic model control methods tentatively identification per cross-talk voice.
Complete speech content recurrence identifying system of the present invention, including:
Audio acquiring unit, for obtaining a section audio as main speech, using the fuzzy main speech that cuts into n cross-talk languages Sound;
Preliminary recognition unit, returns for tentatively recognizing every cross-talk voice, and every cross-talk voice being respectively adopted word cutting Class, syntactical unit are examined and static semantic examines and analyzes the semanteme of every cross-talk voice, calculate preliminary identification per cross-talk voice, The confidence level of semantic analysis;
Unit is corrected, for each cross-talk voice by the recognition result official documents and correspondence of adjacent sub- voice and semantic contrast, is come The confidence level of each element is recalculated, this section is corrected according to confidence level rearrangement to each identification version in recognition result Recognition result, wherein, the identification version is to appear in same section of voice in different " sub- voices " and " the sub- voice of merging ", The recognition result official documents and correspondence of various different editions is had, each result official documents and correspondence is an identification version;For n cross-talks language altogether Sound, as follows [(1 | 2), (2,3) ..., (n-1, n)] 2 cross-talk voices in bracket are defined as adjacent sub- voice by n > 1;
Cutting unit, for as the main speech of audio acquiring unit, sub- voice to be divided into predetermined section, reruns just Recognition unit, correction unit are walked, until voice becomes a word;Institute's predicate refers to, is made up of one or more words, includes language One group of word of justice;
Combining unit, the section audio of acquisition one to audio acquiring unit is obtained obscures cutting main speech as main speech Into n cross-talk voices, the sub- voice is merged to " the merging sub- voice " for becoming new two-by-two, carry out respectively voice tentatively recognize, language Justice analysis, calculates preliminary identification, the confidence level of semantic analysis of every section " merge sub- voice ", then repeat this combining step until It is merged into initially complete sentence;
By cutting the recognition result set of whole main speech is finally given with merging the recurrence on this 2 directions and right The meaning of one's words answered understands results set.
Compared with prior art, complete speech content recurrence recognition methods of the present invention has advantages below:
, can be based on complete voice content and the word most segmented compared with existing small length cell speech recognition technology On the basis of remittance, identification accuracy is improved, meanwhile, by setting recurrence number of times, sub- voice length, create and pre-set identification speed Spend and estimate the means of identification accuracy.Whole flow process of the present invention, which allows, the complete identification of computer and understanding of whole sentence sentence and every Individual vocabulary, draws confidence level highest recognition result.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, And can be practiced according to the content of specification, below with presently preferred embodiments of the present invention and coordinate accompanying drawing describe in detail as after.
Brief description of the drawings
Fig. 1 is a kind of complete speech content recurrence recognition methods flow chart of the invention.
Embodiment
With reference to the accompanying drawings and examples, the embodiment to the present invention is described in further detail.Implement below Example is used to illustrate the present invention, but is not limited to the scope of the present invention.
Embodiment 1
As shown in figure 1, a kind of most preferred embodiment of complete speech content recurrence recognition methods of the invention, including:
S1 obtains a section audio and is used as main speech;
The fuzzy main speech that cuts of S2 is into n cross-talk voices;
The preliminary identifications of S3 every cross-talk voice are respectively adopted word cutting classification, syntactical unit examination per cross-talk voice The semanteme for analyzing every cross-talk voice is examined with static semantic, preliminary identification, the confidence of semantic analysis of every cross-talk voice is calculated Degree;
Recognition result official documents and correspondence and semantic contrast of each cross-talk voices of S4 by adjacent sub- voice, to recalculate each The confidence level of element, the recognition result of this section is corrected to each identification version in recognition result according to confidence level rearrangement, Wherein, the identification version is to appear in same section of voice in different " sub- voices " and " the sub- voice of merging ", have it is various not With the recognition result official documents and correspondence of version, each result official documents and correspondence is an identification version;For n cross-talks voice altogether, by such as lower section Formula [(1 | 2), (2,3) ..., (n 1, n)] 2 cross-talk voices in bracket are defined as adjacent sub- voice by n > 1;
Sub- voice in step S4 as the main speech in S1, is divided into predetermined section by S5, the S2 that repeats the above steps to S5, Until voice becomes a word;Institute's predicate refers to, is made up of one or more words, includes one group of semantic word;
S6, as main speech, obscures cutting main speech into n cross-talk voices, by institute to the section audio of acquisition one that will be obtained in S1 State sub- voice and merge two-by-two and become new " merging sub- voice ", carry out respectively voice tentatively recognize, semantic analysis, calculate every section The preliminary identification of " merging sub- voice ", the confidence level of semantic analysis, then repeat this combining step initially complete until being merged into Sentence finally give the recognition result set of whole main speech and corresponding with the recurrence on this 2 directions of merging by cutting The meaning of one's words understand results set.
Further, the natural pause of voice is identified, according to voice according to the speech pause model of training in advance in S2 It is natural pause main speech is divided into some cross-talk voices.
Further, the predetermined section in step S5 is 3,4,5.
In the present embodiment, identification is further corrected and lifted to the complete information included using the voice of complete length Rate.By setting recurrence number of times, sub- voice length, the means for pre-setting recognition speed and estimating identification accuracy are created.
Embodiment 2
A kind of most preferred embodiment of complete speech content recurrence identifying system of the present invention, including:
Audio acquiring unit, for obtaining a section audio as main speech, using the fuzzy main speech that cuts into n cross-talk languages Sound;
Preliminary recognition unit, returns for tentatively recognizing every cross-talk voice, and every cross-talk voice being respectively adopted word cutting Class, syntactical unit are examined and static semantic examines and analyzes the semanteme of every cross-talk voice, calculate preliminary identification per cross-talk voice, The confidence level of semantic analysis;
Unit is corrected, for each cross-talk voice by the recognition result official documents and correspondence of adjacent sub- voice and semantic contrast, is come The confidence level of each element is recalculated, this section is corrected according to confidence level rearrangement to each identification version in recognition result Recognition result, wherein, the identification version is to appear in same section of voice in different " sub- voices " and " the sub- voice of merging ", The recognition result official documents and correspondence of various different editions is had, each result official documents and correspondence is an identification version;For n cross-talks language altogether Sound, as follows [(1 | 2), (2,3) ..., (n-1, n)] 2 cross-talk voices in bracket are defined as adjacent sub- voice by n > 1;
Cutting unit, for as the main speech of audio acquiring unit, sub- voice to be divided into predetermined section, reruns just Recognition unit, correction unit are walked, until voice becomes a word;Institute's predicate refers to, is made up of one or more words, includes language One group of word of justice;
Combining unit, the section audio of acquisition one to audio acquiring unit is obtained obscures cutting main speech as main speech Into n cross-talk voices, the sub- voice is merged to " the merging sub- voice " for becoming new two-by-two, carry out respectively voice tentatively recognize, language Justice analysis, calculates preliminary identification, the confidence level of semantic analysis of every section " merge sub- voice ", then repeat this combining step until It is merged into initially complete sentence;
By cutting the recognition result set of whole main speech is finally given with merging the recurrence on this 2 directions and right The meaning of one's words answered understands results set.
In the various embodiments described above, by phoneme acoustic model control methods tentatively identification per cross-talk voice.
Described above is only the preferred embodiment of the present invention, is not intended to limit the invention, it is noted that for this skill For the those of ordinary skill in art field, without departing from the technical principles of the invention, can also make it is some improvement and Modification, these improvement and modification also should be regarded as protection scope of the present invention.

Claims (5)

1. a kind of complete speech content recurrence recognition methods, it is characterised in that including:
S1 obtains a section audio and is used as main speech;
The fuzzy main speech that cuts of S2 is into n cross-talk voices;
S3 tentatively identifications every cross-talk voice are respectively adopted word cutting is sorted out, syntactical unit is examined and quiet per cross-talk voice State semanteme examines the semanteme for analyzing every cross-talk voice, calculates preliminary identification, the confidence level of semantic analysis of every cross-talk voice;
Recognition result official documents and correspondence and semantic contrast of each cross-talk voices of S4 by adjacent sub- voice, to recalculate each element Confidence level, the recognition result of this section is corrected according to confidence level rearrangement to each identification version in recognition result, wherein, The identification version is to appear in same section of voice in different " sub- voices " and " merging sub- voice ", has various different editions Recognition result official documents and correspondence, each result official documents and correspondence be one identification version;For n cross-talks voice altogether, as follows [(1 | 2), (2,3) ..., (n-1, n)] 2 cross-talk voices in bracket are defined as adjacent sub- voice by n > 1;
Sub- voice in step S4 as the main speech in S1, is divided into predetermined section by S5, the S2 that repeats the above steps to S5, until Voice becomes a word;Institute's predicate refers to, is made up of one or more words, includes one group of semantic word;
S6, as main speech, obscures cutting main speech into n cross-talk voices, by the son to the section audio of acquisition one that will be obtained in S1 Voice merges two-by-two becomes new " merging sub- voice ", carry out respectively voice tentatively recognize, semantic analysis, calculate every section and " merge The preliminary identification of sub- voice ", the confidence level of semantic analysis, then repeat this combining step until being merged into initially complete sentence Son;By cutting the recognition result set of whole main speech is finally given with the recurrence on this 2 directions of merging and corresponding The meaning of one's words understands results set.
2. complete speech content recurrence recognition methods according to claim 1, it is characterised in that according to training in advance in S2 Speech pause model, identify the natural pause of voice, main speech be divided into by some cross-talks according to natural pause of voice Voice.
3. the complete speech content recurrence recognition methods according to power requires 1, it is characterised in that the predetermined section in step S5 is 3、4、5。
4. complete speech content recurrence recognition methods according to claim 1, it is characterised in that pass through phoneme acoustic model Control methods tentatively identification is per cross-talk voice.
5. a kind of complete speech content recurrence identifying system, it is characterised in that including:
Audio acquiring unit, for obtaining a section audio as main speech, using the fuzzy main speech that cuts into n cross-talk voices;
Tentatively recognition unit, for tentatively recognizing per cross-talk voice, and every cross-talk voice is respectively adopted word cutting classification, language Method unit is examined and static semantic examines the semanteme for analyzing every cross-talk voice, calculates preliminary identification, the semanteme of every cross-talk voice The confidence level of analysis;
Unit is corrected, for each cross-talk voice by the recognition result official documents and correspondence of adjacent sub- voice and semantic contrast, is come again The confidence level of each element is calculated, the knowledge of this section is corrected according to confidence level rearrangement to each identification version in recognition result Other result, wherein, the identification version is to appear in same section of voice in different " sub- voices " and " merging sub- voice ", is had The recognition result official documents and correspondence of various different editions, each result official documents and correspondence is an identification version;For n cross-talks voice altogether, press Following manner [(1 | 2), (2,3) ..., (n-1, n)] 2 cross-talk voices in bracket are defined as adjacent sub- voice by n > 1;
Cutting unit, for as the main speech of audio acquiring unit, sub- voice to be divided into predetermined section, preliminary knowledge of reruning Other unit, correction unit, until voice becomes a word;Institute's predicate refers to, is made up of one or more words, includes semanteme One group of word;
Combining unit, the section audio of acquisition one to audio acquiring unit is obtained obscures cutting main speech into n sections as main speech Sub- voice, the sub- voice is merged two-by-two " the merging sub- voice " for becoming new, progress voice is tentatively recognized respectively, semanteme divides Analysis, calculates preliminary identification, the confidence level of semantic analysis of every section " merging sub- voice ", then repeats this combining step until merging Into initially complete sentence;
By cutting the recognition result set of whole main speech is finally given with the recurrence on this 2 directions of merging and corresponding The meaning of one's words understands results set.
CN201710449747.9A 2017-06-14 2017-06-14 Complete speech content recursive recognition method Active CN107103902B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710449747.9A CN107103902B (en) 2017-06-14 2017-06-14 Complete speech content recursive recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710449747.9A CN107103902B (en) 2017-06-14 2017-06-14 Complete speech content recursive recognition method

Publications (2)

Publication Number Publication Date
CN107103902A true CN107103902A (en) 2017-08-29
CN107103902B CN107103902B (en) 2020-02-04

Family

ID=59660290

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710449747.9A Active CN107103902B (en) 2017-06-14 2017-06-14 Complete speech content recursive recognition method

Country Status (1)

Country Link
CN (1) CN107103902B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108573707A (en) * 2017-12-27 2018-09-25 北京金山云网络技术有限公司 A kind of processing method of voice recognition result, device, equipment and medium
CN109257547A (en) * 2018-09-21 2019-01-22 南京邮电大学 The method for generating captions of Chinese online audio-video

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020152071A1 (en) * 2001-04-12 2002-10-17 David Chaiken Human-augmented, automatic speech recognition engine
CN1455357A (en) * 2003-05-23 2003-11-12 郑方 Method for realizing multi-path dialogue for man-machine Chinese colloguial conversational system
CN1831937A (en) * 2005-03-08 2006-09-13 台达电子工业股份有限公司 Method and device for voice identification and language comprehension analysing
CN101201818A (en) * 2006-12-13 2008-06-18 李萍 Method for calculating language structure, executing participle, machine translation and speech recognition using HMM
CN104485106A (en) * 2014-12-08 2015-04-01 畅捷通信息技术股份有限公司 Voice recognition method, voice recognition system and voice recognition equipment
CN106649666A (en) * 2016-11-30 2017-05-10 浪潮电子信息产业股份有限公司 Left-right recursion-based new word discovery method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020152071A1 (en) * 2001-04-12 2002-10-17 David Chaiken Human-augmented, automatic speech recognition engine
CN1455357A (en) * 2003-05-23 2003-11-12 郑方 Method for realizing multi-path dialogue for man-machine Chinese colloguial conversational system
CN1831937A (en) * 2005-03-08 2006-09-13 台达电子工业股份有限公司 Method and device for voice identification and language comprehension analysing
CN101201818A (en) * 2006-12-13 2008-06-18 李萍 Method for calculating language structure, executing participle, machine translation and speech recognition using HMM
CN104485106A (en) * 2014-12-08 2015-04-01 畅捷通信息技术股份有限公司 Voice recognition method, voice recognition system and voice recognition equipment
CN106649666A (en) * 2016-11-30 2017-05-10 浪潮电子信息产业股份有限公司 Left-right recursion-based new word discovery method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
朱璇等: "简述ZX-2029型电话机的原理与制作", 《电脑知识与技术》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108573707A (en) * 2017-12-27 2018-09-25 北京金山云网络技术有限公司 A kind of processing method of voice recognition result, device, equipment and medium
CN108573707B (en) * 2017-12-27 2020-11-03 北京金山云网络技术有限公司 Method, device, equipment and medium for processing voice recognition result
CN109257547A (en) * 2018-09-21 2019-01-22 南京邮电大学 The method for generating captions of Chinese online audio-video

Also Published As

Publication number Publication date
CN107103902B (en) 2020-02-04

Similar Documents

Publication Publication Date Title
CN104166462B (en) The input method and system of a kind of word
US20180039859A1 (en) Joint acoustic and visual processing
EP2506252A3 (en) Topic specific models for text formatting and speech recognition
CN110263322A (en) Audio for speech recognition corpus screening technique, device and computer equipment
US7792671B2 (en) Augmentation and calibration of output from non-deterministic text generators by modeling its characteristics in specific environments
CN107492382A (en) Voiceprint extracting method and device based on neutral net
CN105427858A (en) Method and system for achieving automatic voice classification
CN105632501A (en) Deep-learning-technology-based automatic accent classification method and apparatus
CN107274889A (en) A kind of method and device according to speech production business paper
CN108446278B (en) A kind of semantic understanding system and method based on natural language
CN106875943A (en) A kind of speech recognition system for big data analysis
CN103164403A (en) Generation method of video indexing data and system
CN112818680B (en) Corpus processing method and device, electronic equipment and computer readable storage medium
CN106782508A (en) The cutting method of speech audio and the cutting device of speech audio
JP6875819B2 (en) Acoustic model input data normalization device and method, and voice recognition device
CN109192225A (en) The method and device of speech emotion recognition and mark
CN109977398A (en) A kind of speech recognition text error correction method of specific area
CN111933113B (en) Voice recognition method, device, equipment and medium
CN106782517A (en) A kind of speech audio keyword filter method and device
Alghifari et al. On the use of voice activity detection in speech emotion recognition
CN113129927A (en) Voice emotion recognition method, device, equipment and storage medium
CN108831450A (en) A kind of virtual robot man-machine interaction method based on user emotion identification
CN109074809B (en) Information processing apparatus, information processing method, and computer-readable storage medium
CN107103902A (en) Complete speech content recurrence recognition methods
CN113392781A (en) Video emotion semantic analysis method based on graph neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant