CN105913841A - Voice recognition method, voice recognition device and terminal - Google Patents
Voice recognition method, voice recognition device and terminal Download PDFInfo
- Publication number
- CN105913841A CN105913841A CN201610509372.6A CN201610509372A CN105913841A CN 105913841 A CN105913841 A CN 105913841A CN 201610509372 A CN201610509372 A CN 201610509372A CN 105913841 A CN105913841 A CN 105913841A
- Authority
- CN
- China
- Prior art keywords
- voice
- calibration
- letter
- identified
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 241001093575 Alma Species 0.000 claims description 2
- 230000013011 mating Effects 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 10
- 238000004891 communication Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 5
- 230000005236 sound signal Effects 0.000 description 4
- 230000000712 assembly Effects 0.000 description 3
- 238000000429 assembly Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- KLDZYURQCUYZBL-UHFFFAOYSA-N 2-[3-[(2-hydroxyphenyl)methylideneamino]propyliminomethyl]phenol Chemical compound OC1=CC=CC=C1C=NCCCN=CC1=CC=CC=C1O KLDZYURQCUYZBL-UHFFFAOYSA-N 0.000 description 1
- 241001672694 Citrus reticulata Species 0.000 description 1
- 244000070406 Malus silvestris Species 0.000 description 1
- 244000131316 Panax pseudoginseng Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000010420 art technique Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 201000001098 delayed sleep phase syndrome Diseases 0.000 description 1
- 208000033921 delayed sleep phase type circadian rhythm sleep disease Diseases 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000012905 input function Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Machine Translation (AREA)
- Telephone Function (AREA)
Abstract
The invention relates to a voice recognition method, a voice recognition device and a terminal. The method comprises the following steps of: acquiring the input to-be-recognized voice; identifying the to-be-recognized voice according to letter calibration voice or text calibration voice, wherein the letter calibration voice replaces system default standard letter voice. According to the scheme in the embodiments of the invention, voice of users can be accurately recognized.
Description
Technical field
It relates to mobile communication technology field, particularly relate to a kind of audio recognition method, device and terminal.
Background technology
At present, speech recognition technology was widely applied in present stage, and its target is by the voice of the mankind
Vocabulary Content Transformation be computer-readable input, such as button, binary coding or character string.
The application of speech recognition technology include phonetic dialing, Voice Navigation, indoor equipment control, voice document searching,
Simple dictation data inputting etc..
In order to adapt to the different demands of user, speech recognition technology starts increase dialect adaptive, such as: Guangdong
Language, Sichuan words etc..But, for having the languages of RP letter configuration, such as mandarin and English,
Speech recognition system is all be provided with acquiescence letter received pronunciation, if the voice that user sends is band ground
The spelled speech mode of square opening sound, and accent differs greatly, and may result in phonetic recognization rate extremely low, speech recognition merit
Can almost lose efficacy.
Summary of the invention
Present disclose provides a kind of audio recognition method, device and terminal, can more accurately identify the voice of user.
First aspect according to disclosure embodiment, it is provided that a kind of audio recognition method, including:
Obtain the voice to be identified of input;
According to voice to be identified described in letter calibration voice or word calibration speech recognition, wherein said letter
Calibration voice replacement system acquiescence letter received pronunciation.
Optionally, described according to word calibration speech recognition described in voice to be identified, including:
Use the word calibration voice that described letter calibration voice composition is new;
Voice to be identified according to input described in the calibration speech recognition of described word.
Optionally, described according to word calibration speech recognition described in voice to be identified, including:
Obtaining the word calibration voice of storage, the word calibration voice of wherein said storage is according to described word
After the quasi-speech recognition of Alma Mater goes out history voice to be identified, the new word calibration being made up of the voice identified
Voice;
The voice to be identified of input described in word calibration speech recognition according to described acquisition.
Optionally, described letter calibration voice replacement system acquiescence letter received pronunciation includes:
Letter calibration voice is gathered by recording the pronunciation of all letters of alphabet;
The alphabetical received pronunciation replacement system of described collection is given tacit consent to letter received pronunciation.
Optionally, the described voice to be identified according to the calibration speech recognition input of described word, including:
Obtain described word calibration voice and the voice characteristics information of described voice to be identified;
According to mating between described word calibration voice and the voice characteristics information of described voice to be identified
Relation, identifies the voice to be identified of input.
Optionally, described voice characteristics information can include following one or more: the tone color of voice, pitch,
The duration of a sound and loudness of a sound.
Optionally, the word calibration voice that described use described letter calibration voice composition is new includes:
Combined into syllables by single letter calibration sound and obtain new word calibration voice;Or,
New word calibration language is obtained by combination multiple letter calibration voice combining into syllables according to liaison rule
Sound.
Optionally, between the setting letter in described letter calibration voice, fuzzy approximation relation is set.
Second aspect according to disclosure embodiment, it is provided that a kind of speech recognition equipment, including:
Acquisition module, for obtaining the voice to be identified of input;
Sound identification module, for obtaining mould according to letter calibration voice or word calibration speech recognition
The voice to be identified of block, wherein said letter calibration voice replacement system acquiescence letter received pronunciation.
Optionally, described sound identification module includes:
First identifies submodule, for using the word calibration voice that described letter calibration voice composition is new,
Voice to be identified according to input described in the calibration speech recognition of described word;Or,
Second identifies submodule, for obtaining the word calibration voice of storage, the word of wherein said storage
Calibration voice is according to described letter calibration after speech recognition goes out history voice to be identified, by the language identified
The new word calibration voice of sound composition, calibrates input described in speech recognition according to the word of described acquisition
Voice to be identified.
Optionally, described device also includes:
Letter voice replacement module, for the pronunciation collection letter calibration by recording all letters of alphabet
Voice, gives tacit consent to letter received pronunciation by the alphabetical received pronunciation replacement system of described collection.
Optionally, described sound identification module obtains described word calibration voice and described voice to be identified
Voice characteristics information, according to the voice characteristics information of described word calibration voice and described voice to be identified it
Between matching relationship, identify input voice to be identified.
Optionally, described first identification submodule is combined into syllables by single letter calibration sound and obtains new word school
Quasi-voice or obtain new word by combination multiple letter calibration voice combining into syllables according to liaison rule and calibrate
Voice.
Optionally, described device also includes:
Obscure and module is set, arrange between the setting letter in described letter calibration voice and obscure closely
Like relation.
The third aspect according to disclosure embodiment, it is provided that a kind of mobile terminal, including:
Processor and for storing the memorizer of processor executable;
Wherein, described processor is configured to:
Obtain the voice to be identified of input;
According to voice to be identified described in letter calibration voice or word calibration speech recognition, wherein said letter
Calibration voice replacement system acquiescence letter received pronunciation.
Embodiment of the disclosure that the technical scheme of offer can include following beneficial effect:
The disclosure, can be according to letter calibration voice or word calibration after the voice to be identified obtaining input
Voice identifies that described voice to be identified, wherein said letter calibration voice substituted for system default letter mark
Quasi-voice, even if thus making user's voice can also be accurately identified with accent, improves language
Sound identification ability.
Further, the disclosure can also have two kinds of processing modes, can be to use described letter calibration language
The word calibration voice that sound composition is new, according to the language to be identified of input described in the calibration speech recognition of described word
Sound;Can also is that the word calibration voice obtaining storage, the word calibration voice of wherein said storage is root
According to described letter calibration after speech recognition goes out history voice to be identified, by the voice identified form new
Word calibration voice;The voice to be identified of input described in word calibration speech recognition according to described acquisition,
The most just so can also improve voice according to the voice to be identified of word calibration speech recognition input
Identification ability and recognition efficiency.
Further, the disclosure can calibrate voice by recording the pronunciation of all letters of alphabet as letter.
Further, the disclosure can be special with the voice of described voice to be identified according to described word calibration voice
Matching relationship between reference breath, identifies the voice to be identified of input.
Further, the disclosure can be combined into syllables by single letter calibration sound and obtain new word calibration voice,
Or obtain new word calibration voice by combination multiple letter calibration voice combining into syllables according to liaison rule.
Further, letter can be calibrated and arrange fuzzy approximation pass between the setting letter in voice by the disclosure
System, can solve the problem that the individual letters pronunciation of some accents is similar.
It should be appreciated that it is only exemplary and explanatory that above general description and details hereinafter describe,
The disclosure can not be limited.
Accompanying drawing explanation
Accompanying drawing herein is merged in description and constitutes the part of this specification, it is shown that meet the disclosure
Embodiment, and for explaining the principle of the disclosure together with description.
Fig. 1 is the disclosure flow chart according to a kind of audio recognition method shown in an exemplary embodiment.
Fig. 2 is the disclosure flow chart according to the another kind of audio recognition method shown in an exemplary embodiment.
Fig. 3 is the disclosure block diagram according to a kind of speech recognition equipment shown in an exemplary embodiment.
Fig. 4 is the disclosure block diagram according to the another kind of speech recognition equipment shown in an exemplary embodiment.
Fig. 5 is a disclosure structured flowchart according to a kind of mobile terminal shown in an exemplary embodiment.
Fig. 6 is a disclosure structured flowchart according to a kind of equipment shown in an exemplary embodiment.
Detailed description of the invention
Here will illustrate exemplary embodiment in detail, its example represents in the accompanying drawings.Following retouches
Stating when relating to accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represents same or analogous key element.
Embodiment described in following exemplary embodiment does not represent all embodiment party consistent with the disclosure
Formula.On the contrary, they only with describe in detail in appended claims, the disclosure some in terms of mutually one
The example of the apparatus and method caused.
The term used in the disclosure is only merely for describing the purpose of specific embodiment, and is not intended to be limiting this
Open." a kind of ", " described " and " being somebody's turn to do " of singulative used in disclosure and the accompanying claims book
It is also intended to include most form, unless context clearly shows that other implications.It is also understood that herein
Any or all possibility that the term "and/or" used refers to and comprises one or more project of listing being associated
Combination.
Although should be appreciated that may use term first, second, third, etc. to describe various letter in the disclosure
Breath, but these information should not necessarily be limited by these terms.These terms are only used for same type of information district each other
Separately.Such as, without departing from the scope of this disclosure, the first information can also be referred to as the second information,
Similarly, the second information can also be referred to as the first information.Depend on linguistic context, word as used in this " as
Really " can be construed to " ... time " or " when ... " or " in response to determining ".
Present disclose provides a kind of audio recognition method, device and terminal, can more accurately identify the voice of user.
Fig. 1 is the disclosure flow chart according to a kind of audio recognition method shown in an exemplary embodiment.
The method can apply in terminal, as it is shown in figure 1, the method may comprise steps of:
In a step 101, the voice to be identified of input is obtained.
In a step 102, according to voice to be identified described in letter calibration voice or word calibration speech recognition,
Wherein said letter calibration voice replacement system acquiescence letter received pronunciation.
This step can use the word calibration voice that described letter calibration voice composition is new, according to described literary composition
The voice to be identified of input described in word calibration speech recognition;Or, obtain the word calibration voice of storage,
The word calibration voice of wherein said storage is to go out history language to be identified according to described letter calibration speech recognition
After sound, the new word calibration voice being made up of the voice identified, calibrate according to the word of described acquisition
The voice to be identified of input described in speech recognition.
Wherein, this step can gather letter calibration voice by recording the pronunciation of all letters of alphabet;
The alphabetical received pronunciation replacement system acquiescence letter received pronunciation that will gather.
This step can obtain described word calibration voice and the voice characteristics information of described voice to be identified,
According to the matching relationship between described word calibration voice and the voice characteristics information of described voice to be identified,
Identify the voice to be identified of input.
This step uses the word calibration voice that described letter calibration voice composition is new to may include that by list
Individual letter calibration sound combines into syllables and obtains new word calibration voice;Or, by combination multiple letter calibration voice
And combine into syllables according to liaison rule and to obtain new word calibration voice.
This step can obtain word calibration voice and the voice characteristics information of voice to be identified;According to word
Matching relationship between calibration voice and the voice characteristics information of voice to be identified, identifies the to be identified of input
Voice.Wherein, voice characteristics information can include following one or more: the tone color of voice, pitch, sound
Length and loudness of a sound.
From this embodiment, embodiment of the disclosure that the technical scheme of offer can include following beneficial effect:
The disclosure, can be according to letter calibration voice or word calibration voice after the voice to be identified obtaining input
Identify that described voice to be identified, wherein said letter calibration voice substituted for system default letter standard speech
Sound, even if thus making user's voice can also be accurately identified with accent, improving voice and knowing
Other ability.
Fig. 2 is that the disclosure is according to the another kind of audio recognition method flow chart shown in an exemplary embodiment.
The method can apply in terminal, and this embodiment describe in more detail the skill of the disclosure relative to Fig. 1
Art scheme.
Technical scheme is described in detail below in conjunction with Fig. 2.As in figure 2 it is shown, the method may comprise steps of:
In step 201, obtained by all of for alphabet letter-sound is recorded by collection user the most voluntarily
Letter calibration voice.
All letter-sounds that user is recorded by the disclosure are referred to as letter calibration voice.The disclosure provides for mark
The input function voluntarily of quasi-pronunciation letter, all pronunciation letters are recorded one time by user voluntarily, obtain letter school
Quasi-voice, follow-up using letter calibration voice as standard, so can solve RP letter configuration
The accent problem of pronunciation.
Wherein, above-mentioned letter can be such as the letter etc. of English alphabet, Chinese character or other language.
In step 202., the acquiescence word that voice replacement system is original is calibrated in the letter that the user obtained records
Female received pronunciation.
Because the acquiescence letter received pronunciation of system is to be difficult to identify the letter-sound with accent, therefore
The letter that the user obtained records is calibrated the acquiescence letter received pronunciation that voice replacement system is original by the disclosure,
So system arrange letter-sound standard by using gather letter calibration voice as criterion of identification, with ground
The letter-sound of square opening sound is just easily identified.
In step 203, the word calibration voice that letter calibration voice composition is new is used.
In view of people when reading aloud the pronunciation of any one word or word, it is all to be sent by single letter
Sound or combine multiple single-letter and send pronunciation according to corresponding liaison rule, speech recognition system can also be learned
Practise this liaison behavior of people.Therefore this step of the disclosure can be combined into syllables by single letter calibration sound and obtain
New word calibration voice;Or, by combination multiple letter calibration voice and combine into syllables according to liaison rule
To new word calibration voice.
Such as: the phonetic of Fructus Mali pumilae is pingguo, can combine single letter or monogram p, ing, g,
U, o also combine into syllables according to liaison mode p-ing-g-u-o, thus available new Teletext Standard voice.Also
That is, the letter calibration voice that speech recognition system uses user to record voluntarily replaces default configuration
After letter received pronunciation, then reconfigure multiple letter calibration voice by identical liaison rule or directly make
Calibrate sound (such as some is that single letter forms a word) with single letter and obtain new word calibration language
Sound, it is possible to the literary composition obtained according to letter received pronunciation that new word calibration voice replacement system is carried
Word voice.
Wherein, described liaison, refer to that, such as in the same sense-group of English, previous word is with consonant phoneme
Ending, a rear word starts with vowel phoneme, when speaking or read aloud sentence, the most naturally enough by this
Two phonemes are merged and read out together, and this speech phenomenon is liaison.The syllable that liaison is constituted is general
Do not read again, only need to naturally glance off, the heavyliest unreadable.Liaison rule, refers to liaison
Custom, in the case of such as " consonant+vowel " type liaison, liaison rule is if adjacent two words
In previous word be with consonant end up, later word is with vowel, and this will be by consonant and vowel
Spell liaison.
It should be noted that, system typically can carry text-to-speech storehouse, presses silent by some common word or vocabulary
Female received pronunciation of reading combines into syllables and stores.All text-to-speech letter school that system can be carried by the disclosure
Quasi-voice combines into syllables after obtaining new Teletext Standard voice again, replaces original text-to-speech.
In step 204, according to the voice to be identified described in word calibration speech recognition user.
This step, obtains word calibration voice and the voice characteristics information of voice to be identified;According to word school
Matching relationship between the voice characteristics information of quasi-voice and voice to be identified, identifies the language to be identified of input
Sound.Wherein, voice characteristics information can include following one or more: the tone color of voice, pitch, the duration of a sound
And loudness of a sound.
It should be noted that, according to the voice to be identified of voice characteristics information identification input, can use existing
Recognizer be identified, the disclosure is not limited.
It should be noted that be, it is contemplated that can there are some pronunciations of obscuring of pronouncing in accent, the disclosure is permissible
Between setting letter in letter calibration voice, fuzzy approximation relation is set, it would be possible to pronunciation can be there is and obscure
Pronunciation associate, such as: letter-sound s=sh, c=ch etc. are set.
The scheme of the disclosure, user arranges letter calibration voice according to oneself accent situation, by all pronunciation words
Mother record voluntarily one time as letter calibration voice, the alphabetical received pronunciation of replacement system, more alphabetical with using
The word calibration voice that calibration voice composition is new identifies the voice to be identified of input, so can solve mark
The accent problem of the pronunciation of quasi-pronunciation letter configuration, can promote the resolution of phonetic entry.
It should be noted that, the disclosure can also go out history voice to be identified according to letter calibration speech recognition
After, the new word calibration voice being made up of the voice identified, the most just can directly obtain storage
Word calibration voice, calibrates the voice to be identified of input described in speech recognition according to the word of described acquisition.
Corresponding with aforementioned applications function realizing method embodiment, the disclosure additionally provides a kind of speech recognition dress
Put, terminal and corresponding embodiment.
Fig. 3 is the disclosure block diagram according to a kind of speech recognition equipment shown in an exemplary embodiment.
This device can be provided in terminal.As it is shown on figure 3, it is permissible in a kind of speech recognition equipment
Including: acquisition module 31, sound identification module 32.
Acquisition module 31, for obtaining the voice to be identified of input.
Sound identification module 32, for obtaining according to letter calibration voice or word calibration speech recognition
The voice to be identified of module, wherein said letter calibration voice replacement system acquiescence letter received pronunciation.
Wherein, sound identification module 32 can use the word calibration that described letter calibration voice composition is new
Voice, according to the voice to be identified of input described in the calibration speech recognition of described word;Or, obtain storage
Word calibration voice, wherein said storage word calibration voice be according to described letter calibration voice know
After not going out history voice to be identified, the new word calibration voice being made up of the voice identified, according to institute
State the voice to be identified of input described in the word calibration speech recognition of acquisition.
From this embodiment, the disclosure, can be according to letter school after the voice to be identified obtaining input
Quasi-voice or word calibration voice identify that described voice to be identified, wherein said letter calibration voice are replaced
System default letter received pronunciation, even if thus making user's voice can also be by with accent
Accurately identify, improve speech recognition capabilities.
Fig. 4 is the disclosure another block diagram according to the device of the MPTY shown in an exemplary embodiment.
This device can be provided in terminal.As shown in Figure 4, permissible in a kind of speech recognition equipment
Including: acquisition module 31, sound identification module 32, letter voice replacement module 33, fuzzy mould is set
Block 34.
Acquisition module 31, the function of sound identification module 32 can be found in the description in Fig. 3.
Wherein, sound identification module 32 may include that the first identification submodule 321 or second identifies son
Module 322.
First identifies submodule 321, for using the word calibration language that described letter calibration voice composition is new
Sound, according to the voice to be identified of input described in the calibration speech recognition of described word.
Second identifies submodule 322, for obtaining the word calibration voice of storage, wherein said storage
Word calibration voice is according to described letter calibration after speech recognition goes out history voice to be identified, by identifying
The new word calibration voice of voice composition, calibrate described in speech recognition defeated according to the word of described acquisition
The voice to be identified entered.
Wherein, may include that acquisition word school according to the voice to be identified of word calibration speech recognition input
Quasi-voice and the voice characteristics information of voice to be identified;Language according to word calibration voice with voice to be identified
Matching relationship between sound characteristic information, identifies the voice to be identified of input.Wherein, voice characteristics information
Can include following one or more: the tone color of voice, pitch, the duration of a sound and loudness of a sound.
Wherein, described device can also include: letter voice replacement module 33.
Letter voice replacement module 33, for the pronunciation collection letter school by recording all letters of alphabet
Quasi-voice, gives tacit consent to letter received pronunciation by the alphabetical received pronunciation replacement system of described collection.Because system
Acquiescence letter received pronunciation be to be difficult to identify that therefore the disclosure will obtain with the letter-sound of accent
The acquiescence letter received pronunciation that voice replacement system is original, such system calibrated in the letter that the user taken records
Arrange letter-sound standard by using gather letter calibration voice as criterion of identification, with accent
Letter-sound be just easily identified.
Wherein, described sound identification module 32 obtains described word calibration voice and described voice to be identified
Voice characteristics information, according to the voice characteristics information of described word calibration voice and described voice to be identified
Between matching relationship, identify input voice to be identified.
Wherein, described first identification submodule 321 is combined into syllables by single letter calibration sound and obtains new word
Calibration voice or by combination multiple letter calibration voice and combine into syllables according to liaison rule and obtain new word school
Quasi-voice.
Wherein, described device can also include: obscures and arranges module 34.
Obscure and module 34 is set, fuzzy approximation is set between the setting letter in letter calibration voice
Relation.Can there are some pronunciations obscured of pronouncing in view of accent, the disclosure can be calibrated at letter
Between setting letter in voice, fuzzy approximation relation is set, it would be possible to the pronunciation pass that pronunciation is obscured can be there is
Connection gets up, such as: arrange letter-sound s=sh, c=ch etc..
Therefore, the scheme of the disclosure, user arranges letter calibration voice according to oneself accent situation, by institute
Have pronunciation letter record voluntarily one time as letter calibration voice, the alphabetical received pronunciation of replacement system, then
The voice to be identified of input is identified, so with using the word that letter calibration voice composition is new to calibrate voice
The accent problem of the pronunciation of RP letter configuration can be solved, the identification of phonetic entry can be promoted
Degree.
In said apparatus, the function of unit and the process that realizes of effect specifically refer in said method corresponding
Step realize process, do not repeat them here.
For device embodiment, owing to it corresponds essentially to embodiment of the method, so relevant part ginseng
See that the part of embodiment of the method illustrates.Device embodiment described above is only schematically,
The unit wherein illustrated as separating component can be or may not be physically separate, as list
The parts of unit's display can be or may not be physical location, i.e. may be located at a place, or
Can also be distributed on multiple NE.Part therein or complete can be selected according to the actual needs
Portion's module realizes the purpose of disclosure scheme.Those of ordinary skill in the art are not paying creative work
In the case of, i.e. it is appreciated that and implements.
Fig. 5 is that the disclosure is according to a kind of block diagram shown in an exemplary embodiment.
As it is shown in figure 5, include: processor 501 and for storing the memorizer 502 of processor executable;
Wherein, processor 501 is configured to:
Obtain the voice to be identified of input;
According to voice to be identified described in letter calibration voice or word calibration speech recognition, wherein said letter
Calibration voice replacement system acquiescence letter received pronunciation.
It should be noted that, other programs of memorizer 502 storage, referring specifically in previous methods flow process
Description, here is omitted, processor 501 be additionally operable to perform memorizer 502 storage other programs.
Fig. 6 is a disclosure structured flowchart according to a kind of equipment shown in an exemplary embodiment.
Such as, equipment 600 can be mobile phone, computer, digital broadcast terminal, and information receiving and transmitting sets
Standby, game console, tablet device, armarium, body-building equipment, personal digital assistant etc..
With reference to Fig. 6, equipment 600 can include following one or more assembly: processes assembly 602, storage
Device 604, power supply module 606, multimedia groupware 608, audio-frequency assembly 610, input/output (I/O)
Interface 612, sensor cluster 614, and communications component 616.
Process assembly 602 and generally control the integrated operation of equipment 600, such as with display, call,
The operation that data communication, camera operation and record operation are associated.Process assembly 602 and can include one
Or multiple processor 620 performs instruction, to complete all or part of step of above-mentioned method.Additionally,
Process assembly 602 and can include one or more module, it is simple to process between assembly 602 and other assemblies
Mutual.Such as, process assembly 602 and can include multi-media module, to facilitate multimedia groupware 608
And process between assembly 602 mutual.
Memorizer 604 is configured to store various types of data to support the operation at equipment 600.This
The example of a little data includes any application program for operation on equipment 600 or the instruction of method, connection
It is personal data, telephone book data, message, picture, video etc..Memorizer 704 can be by any type
Volatibility or non-volatile memory device or combinations thereof realize, such as static RAM
(SRAM), Electrically Erasable Read Only Memory (EEPROM), erasable programmable is read-only
Memorizer (EPROM), programmable read only memory (PROM), read only memory (ROM),
Magnetic memory, flash memory, disk or CD.
The various assemblies that power supply module 606 is equipment 600 provide electric power.Power supply module 606 can include
Power-supply management system, one or more power supplys, and other with generate for equipment 600, manage and distribute electricity
The assembly that power is associated.
The screen of one output interface of offer that multimedia groupware 608 is included between equipment 600 and user.
In certain embodiments, screen can include liquid crystal display (LCD) and touch panel (TP).As
Really screen includes that touch panel, screen may be implemented as touch screen, to receive the input letter from user
Number.Touch panel includes that one or more touch sensor touches with sensing, slides and on touch panel
Gesture.Touch sensor can not only sense touch or the border of sliding action, but also detect and touch
Or slide relevant persistent period and pressure.In certain embodiments, multimedia groupware 608 includes
One front-facing camera and/or post-positioned pick-up head.When equipment 600 is in operator scheme, such as screening-mode or
During video mode, front-facing camera and/or post-positioned pick-up head can receive the multi-medium data of outside.Each
Front-facing camera and post-positioned pick-up head can be a fixing optical lens system or have focal length and optics
Zoom capabilities.
Audio-frequency assembly 610 is configured to output and/or input audio signal.Such as, audio-frequency assembly 610 wraps
Include a mike (MIC), when equipment 600 is in operator scheme, such as call model, logging mode
During with speech recognition mode, mike is configured to receive external audio signal.The audio signal received
Can be further stored at memorizer 604 or send via communications component 616.In certain embodiments,
Audio-frequency assembly 610 also includes a speaker, is used for exporting audio signal.
I/O interface 612 provides interface, above-mentioned periphery for processing between assembly 602 and peripheral interface module
Interface module can be keyboard, puts striking wheel, button etc..These buttons may include but be not limited to: homepage is pressed
Button, volume button, start button and locking press button.
Sensor cluster 614 includes one or more sensor, for providing various aspects for equipment 600
State estimation.Such as, what sensor cluster 614 can detect equipment 600 opens/closed mode,
The relative localization of assembly, such as assembly are display and the keypad of equipment 600, sensor cluster 614
Can also detect equipment 600 or the position change of 600 1 assemblies of equipment, user contacts with equipment 600
Presence or absence, equipment 600 orientation or acceleration/deceleration and the variations in temperature of equipment 600.Sensor
Assembly 614 can include proximity transducer, is configured to detect when not having any physical contact attached
The existence of nearly object.Sensor cluster 614 can also include optical sensor, as CMOS or CCD schemes
As sensor, for using in imaging applications.In certain embodiments, this sensor cluster 614 is also
Acceleration transducer, gyro sensor, Magnetic Sensor, pressure transducer or temperature sensing can be included
Device.
Communications component 616 is configured to facilitate wired or wireless mode between equipment 600 and other equipment
Communication.Equipment 600 can access wireless network based on communication standard, such as WiFi, 2G or 3G, or
Combinations thereof.In one exemplary embodiment, communications component 616 via broadcast channel receive from
The broadcast singal of external broadcasting management system or broadcast related information.In one exemplary embodiment, logical
Letter assembly 616 also includes near-field communication (NFC) module, to promote junction service.Such as, at NFC
Module can be based on RF identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra broadband (UWB)
Technology, bluetooth (BT) technology and other technologies realize.
In the exemplary embodiment, equipment 600 can be by one or more application specific integrated circuits
(ASIC), digital signal processor (DSP), digital signal processing appts (DSPD), can compile
Journey logical device (PLD), field programmable gate array (FPGA), controller, microcontroller, micro-
Processor or other electronic components realize, and are used for performing said method.
In the exemplary embodiment, a kind of non-transitory computer-readable storage including instruction is additionally provided
Medium, such as, include the memorizer 604 of instruction, and above-mentioned instruction can be held by the processor 620 of equipment 600
Row is to complete said method.Such as, non-transitory computer-readable recording medium can be ROM, random
Access memorizer (RAM), CD-ROM, tape, floppy disk and optical data storage devices etc..
A kind of non-transitory computer-readable recording medium, when the instruction in storage medium is by terminal unit
When processor performs so that terminal is able to carry out audio recognition method, and method includes:
Obtain the voice to be identified of input;
According to voice to be identified described in letter calibration voice or word calibration speech recognition, wherein said letter
Calibration voice replacement system acquiescence letter received pronunciation.
Those skilled in the art, after considering description and putting into practice invention disclosed herein, will readily occur to these public affairs
Other embodiment opened.The disclosure is intended to any modification, purposes or the adaptations of the disclosure,
These modification, purposes or adaptations are followed the general principle of the disclosure and include that the disclosure is not disclosed
Common knowledge in the art or conventional techniques means.Description and embodiments is considered only as exemplary
, the true scope of the disclosure and spirit are pointed out by claim below.
It should be appreciated that the disclosure is not limited to accurate knot described above and illustrated in the accompanying drawings
Structure, and various modifications and changes can carried out without departing from the scope.The scope of the present disclosure is only by appended
Claim limits.
Claims (15)
1. an audio recognition method, it is characterised in that including:
Obtain the voice to be identified of input;
According to voice to be identified described in letter calibration voice or word calibration speech recognition, wherein said letter
Calibration voice replacement system acquiescence letter received pronunciation.
Method the most according to claim 1, it is characterised in that described according to word calibration voice
Identify described voice to be identified, including:
Use the word calibration voice that described letter calibration voice composition is new;
Voice to be identified according to input described in the calibration speech recognition of described word.
Method the most according to claim 1, it is characterised in that described according to word calibration voice
Identify described voice to be identified, including:
Obtaining the word calibration voice of storage, the word calibration voice of wherein said storage is according to described word
After the quasi-speech recognition of Alma Mater goes out history voice to be identified, the new word calibration being made up of the voice identified
Voice;
The voice to be identified of input described in word calibration speech recognition according to described acquisition.
Method the most according to claim 1, it is characterised in that described letter calibration voice is replaced
System default letter received pronunciation includes:
Letter calibration voice is gathered by recording the pronunciation of all letters of alphabet;
The alphabetical received pronunciation replacement system of described collection is given tacit consent to letter received pronunciation.
Method the most according to claim 1, it is characterised in that described according to described word calibration language
The voice to be identified of sound identification input, including:
Obtain described word calibration voice and the voice characteristics information of described voice to be identified;
According to mating between described word calibration voice and the voice characteristics information of described voice to be identified
Relation, identifies the voice to be identified of input.
Method the most according to claim 5, it is characterised in that described voice characteristics information can wrap
Include following one or more: the tone color of voice, pitch, the duration of a sound and loudness of a sound.
Method the most according to claim 2, it is characterised in that described use described letter calibration
The word calibration voice that voice composition is new includes:
Combined into syllables by single letter calibration sound and obtain new word calibration voice;Or,
New word calibration language is obtained by combination multiple letter calibration voice combining into syllables according to liaison rule
Sound.
8. according to the method described in any one of claim 1 to 7, it is characterised in that:
Between setting letter in described letter calibration voice, fuzzy approximation relation is set.
9. a speech recognition equipment, it is characterised in that including:
Acquisition module, for obtaining the voice to be identified of input;
Sound identification module, for obtaining mould according to letter calibration voice or word calibration speech recognition
The voice to be identified of block, wherein said letter calibration voice replacement system acquiescence letter received pronunciation.
Speech recognition equipment the most according to claim 9, it is characterised in that described speech recognition
Module includes:
First identifies submodule, for using the word calibration voice that described letter calibration voice composition is new,
Voice to be identified according to input described in the calibration speech recognition of described word;Or,
Second identifies submodule, for obtaining the word calibration voice of storage, the word of wherein said storage
Calibration voice is according to described letter calibration after speech recognition goes out history voice to be identified, by the language identified
The new word calibration voice of sound composition, calibrates input described in speech recognition according to the word of described acquisition
Voice to be identified.
11. devices according to claim 9, it is characterised in that also include:
Letter voice replacement module, for the pronunciation collection letter calibration by recording all letters of alphabet
Voice, gives tacit consent to letter received pronunciation by the alphabetical received pronunciation replacement system of described collection.
12. devices according to claim 9, it is characterised in that:
The voice that described sound identification module obtains described word calibration voice and described voice to be identified is special
Reference cease, according to described word calibration voice and the voice characteristics information of described voice to be identified between
Join relation, identify the voice to be identified of input.
13. devices according to claim 10, it is characterised in that:
Described first identifies that submodule is combined into syllables by single letter calibration sound obtains new word calibration voice
Or obtain new word calibration voice by combination multiple letter calibration voice combining into syllables according to liaison rule.
14. according to the device described in any one of claim 9 to 13, it is characterised in that described device
Also include:
Obscure and module is set, arrange between the setting letter in described letter calibration voice and obscure closely
Like relation.
15. 1 kinds of mobile terminals, it is characterised in that including:
Processor and for storing the memorizer of processor executable;
Wherein, described processor is configured to:
Obtain the voice to be identified of input;
According to voice to be identified described in letter calibration voice or word calibration speech recognition, wherein said letter
Calibration voice replacement system acquiescence letter received pronunciation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610509372.6A CN105913841B (en) | 2016-06-30 | 2016-06-30 | Voice recognition method, device and terminal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610509372.6A CN105913841B (en) | 2016-06-30 | 2016-06-30 | Voice recognition method, device and terminal |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105913841A true CN105913841A (en) | 2016-08-31 |
CN105913841B CN105913841B (en) | 2020-04-03 |
Family
ID=56753927
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610509372.6A Active CN105913841B (en) | 2016-06-30 | 2016-06-30 | Voice recognition method, device and terminal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105913841B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108710484A (en) * | 2018-03-12 | 2018-10-26 | 西安艾润物联网技术服务有限责任公司 | It is a kind of to pass through the method for speech modification license plate number, storage medium and device |
CN111540353A (en) * | 2020-04-16 | 2020-08-14 | 重庆农村商业银行股份有限公司 | Semantic understanding method, device, equipment and storage medium |
CN111971744A (en) * | 2018-03-23 | 2020-11-20 | 清晰Xyz有限公司 | Handling speech to text conversion |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1465042A (en) * | 2001-05-02 | 2003-12-31 | 索尼公司 | Obot device, character recognizing apparatus and character reading method, and control program and recording medium |
CN101141508A (en) * | 2006-09-05 | 2008-03-12 | 美商富迪科技股份有限公司 | Communication system and voice recognition method |
CN101958118A (en) * | 2003-03-31 | 2011-01-26 | 索尼电子有限公司 | Implement the system and method for speech recognition dictionary effectively |
CN103594085A (en) * | 2012-08-16 | 2014-02-19 | 百度在线网络技术(北京)有限公司 | Method and system providing speech recognition result |
CN104282302A (en) * | 2013-07-04 | 2015-01-14 | 三星电子株式会社 | Apparatus and method for recognizing voice and text |
CN105096945A (en) * | 2015-08-31 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | Voice recognition method and voice recognition device for terminal |
CN105355195A (en) * | 2015-09-25 | 2016-02-24 | 小米科技有限责任公司 | Audio frequency recognition method and audio frequency recognition device |
CN105513594A (en) * | 2015-11-26 | 2016-04-20 | 许传平 | Voice control system |
-
2016
- 2016-06-30 CN CN201610509372.6A patent/CN105913841B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1465042A (en) * | 2001-05-02 | 2003-12-31 | 索尼公司 | Obot device, character recognizing apparatus and character reading method, and control program and recording medium |
CN101958118A (en) * | 2003-03-31 | 2011-01-26 | 索尼电子有限公司 | Implement the system and method for speech recognition dictionary effectively |
CN101141508A (en) * | 2006-09-05 | 2008-03-12 | 美商富迪科技股份有限公司 | Communication system and voice recognition method |
CN103594085A (en) * | 2012-08-16 | 2014-02-19 | 百度在线网络技术(北京)有限公司 | Method and system providing speech recognition result |
CN104282302A (en) * | 2013-07-04 | 2015-01-14 | 三星电子株式会社 | Apparatus and method for recognizing voice and text |
CN105096945A (en) * | 2015-08-31 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | Voice recognition method and voice recognition device for terminal |
CN105355195A (en) * | 2015-09-25 | 2016-02-24 | 小米科技有限责任公司 | Audio frequency recognition method and audio frequency recognition device |
CN105513594A (en) * | 2015-11-26 | 2016-04-20 | 许传平 | Voice control system |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108710484A (en) * | 2018-03-12 | 2018-10-26 | 西安艾润物联网技术服务有限责任公司 | It is a kind of to pass through the method for speech modification license plate number, storage medium and device |
CN108710484B (en) * | 2018-03-12 | 2021-09-21 | 西安艾润物联网技术服务有限责任公司 | Method, storage medium and device for modifying license plate number through voice |
CN111971744A (en) * | 2018-03-23 | 2020-11-20 | 清晰Xyz有限公司 | Handling speech to text conversion |
CN111540353A (en) * | 2020-04-16 | 2020-08-14 | 重庆农村商业银行股份有限公司 | Semantic understanding method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN105913841B (en) | 2020-04-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106024014B (en) | A kind of phonetics transfer method, device and mobile terminal | |
CN106024009B (en) | Audio processing method and device | |
CN110634483B (en) | Man-machine interaction method and device, electronic equipment and storage medium | |
US8655659B2 (en) | Personalized text-to-speech synthesis and personalized speech feature extraction | |
CN110210310B (en) | Video processing method and device for video processing | |
KR101819458B1 (en) | Voice recognition apparatus and system | |
CN104468959A (en) | Method, device and mobile terminal displaying image in communication process of mobile terminal | |
CN107945806B (en) | User identification method and device based on sound characteristics | |
CN105139848B (en) | Data transfer device and device | |
JP7116088B2 (en) | Speech information processing method, device, program and recording medium | |
CN108073572A (en) | Information processing method and its device, simultaneous interpretation system | |
CN109977426A (en) | A kind of training method of translation model, device and machine readable media | |
CN112037756A (en) | Voice processing method, apparatus and medium | |
CN108650543A (en) | The caption editing method and device of video | |
CN105913841A (en) | Voice recognition method, voice recognition device and terminal | |
CN110930977B (en) | Data processing method and device and electronic equipment | |
KR102279505B1 (en) | Voice diary device | |
KR20200056754A (en) | Apparatus and method for generating personalization lip reading model | |
CN113409765B (en) | Speech synthesis method and device for speech synthesis | |
CN104660819B (en) | Mobile device and the method for accessing file in mobile device | |
CN112837668A (en) | Voice processing method and device for processing voice | |
CN109992121A (en) | A kind of input method, device and the device for input | |
JP6509308B1 (en) | Speech recognition device and system | |
CN117409783A (en) | Training method and device for voice feature recognition model | |
CN113990289A (en) | Data processing method and device and readable medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |