CN105913841A - Voice recognition method, voice recognition device and terminal - Google Patents

Voice recognition method, voice recognition device and terminal Download PDF

Info

Publication number
CN105913841A
CN105913841A CN201610509372.6A CN201610509372A CN105913841A CN 105913841 A CN105913841 A CN 105913841A CN 201610509372 A CN201610509372 A CN 201610509372A CN 105913841 A CN105913841 A CN 105913841A
Authority
CN
China
Prior art keywords
voice
calibration
letter
identified
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610509372.6A
Other languages
Chinese (zh)
Other versions
CN105913841B (en
Inventor
伍亮雄
刘鸣
王乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN201610509372.6A priority Critical patent/CN105913841B/en
Publication of CN105913841A publication Critical patent/CN105913841A/en
Application granted granted Critical
Publication of CN105913841B publication Critical patent/CN105913841B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)
  • Telephone Function (AREA)

Abstract

The invention relates to a voice recognition method, a voice recognition device and a terminal. The method comprises the following steps of: acquiring the input to-be-recognized voice; identifying the to-be-recognized voice according to letter calibration voice or text calibration voice, wherein the letter calibration voice replaces system default standard letter voice. According to the scheme in the embodiments of the invention, voice of users can be accurately recognized.

Description

Audio recognition method, device and terminal
Technical field
It relates to mobile communication technology field, particularly relate to a kind of audio recognition method, device and terminal.
Background technology
At present, speech recognition technology was widely applied in present stage, and its target is by the voice of the mankind Vocabulary Content Transformation be computer-readable input, such as button, binary coding or character string. The application of speech recognition technology include phonetic dialing, Voice Navigation, indoor equipment control, voice document searching, Simple dictation data inputting etc..
In order to adapt to the different demands of user, speech recognition technology starts increase dialect adaptive, such as: Guangdong Language, Sichuan words etc..But, for having the languages of RP letter configuration, such as mandarin and English, Speech recognition system is all be provided with acquiescence letter received pronunciation, if the voice that user sends is band ground The spelled speech mode of square opening sound, and accent differs greatly, and may result in phonetic recognization rate extremely low, speech recognition merit Can almost lose efficacy.
Summary of the invention
Present disclose provides a kind of audio recognition method, device and terminal, can more accurately identify the voice of user.
First aspect according to disclosure embodiment, it is provided that a kind of audio recognition method, including:
Obtain the voice to be identified of input;
According to voice to be identified described in letter calibration voice or word calibration speech recognition, wherein said letter Calibration voice replacement system acquiescence letter received pronunciation.
Optionally, described according to word calibration speech recognition described in voice to be identified, including:
Use the word calibration voice that described letter calibration voice composition is new;
Voice to be identified according to input described in the calibration speech recognition of described word.
Optionally, described according to word calibration speech recognition described in voice to be identified, including:
Obtaining the word calibration voice of storage, the word calibration voice of wherein said storage is according to described word After the quasi-speech recognition of Alma Mater goes out history voice to be identified, the new word calibration being made up of the voice identified Voice;
The voice to be identified of input described in word calibration speech recognition according to described acquisition.
Optionally, described letter calibration voice replacement system acquiescence letter received pronunciation includes:
Letter calibration voice is gathered by recording the pronunciation of all letters of alphabet;
The alphabetical received pronunciation replacement system of described collection is given tacit consent to letter received pronunciation.
Optionally, the described voice to be identified according to the calibration speech recognition input of described word, including:
Obtain described word calibration voice and the voice characteristics information of described voice to be identified;
According to mating between described word calibration voice and the voice characteristics information of described voice to be identified Relation, identifies the voice to be identified of input.
Optionally, described voice characteristics information can include following one or more: the tone color of voice, pitch, The duration of a sound and loudness of a sound.
Optionally, the word calibration voice that described use described letter calibration voice composition is new includes:
Combined into syllables by single letter calibration sound and obtain new word calibration voice;Or,
New word calibration language is obtained by combination multiple letter calibration voice combining into syllables according to liaison rule Sound.
Optionally, between the setting letter in described letter calibration voice, fuzzy approximation relation is set.
Second aspect according to disclosure embodiment, it is provided that a kind of speech recognition equipment, including:
Acquisition module, for obtaining the voice to be identified of input;
Sound identification module, for obtaining mould according to letter calibration voice or word calibration speech recognition The voice to be identified of block, wherein said letter calibration voice replacement system acquiescence letter received pronunciation.
Optionally, described sound identification module includes:
First identifies submodule, for using the word calibration voice that described letter calibration voice composition is new, Voice to be identified according to input described in the calibration speech recognition of described word;Or,
Second identifies submodule, for obtaining the word calibration voice of storage, the word of wherein said storage Calibration voice is according to described letter calibration after speech recognition goes out history voice to be identified, by the language identified The new word calibration voice of sound composition, calibrates input described in speech recognition according to the word of described acquisition Voice to be identified.
Optionally, described device also includes:
Letter voice replacement module, for the pronunciation collection letter calibration by recording all letters of alphabet Voice, gives tacit consent to letter received pronunciation by the alphabetical received pronunciation replacement system of described collection.
Optionally, described sound identification module obtains described word calibration voice and described voice to be identified Voice characteristics information, according to the voice characteristics information of described word calibration voice and described voice to be identified it Between matching relationship, identify input voice to be identified.
Optionally, described first identification submodule is combined into syllables by single letter calibration sound and obtains new word school Quasi-voice or obtain new word by combination multiple letter calibration voice combining into syllables according to liaison rule and calibrate Voice.
Optionally, described device also includes:
Obscure and module is set, arrange between the setting letter in described letter calibration voice and obscure closely Like relation.
The third aspect according to disclosure embodiment, it is provided that a kind of mobile terminal, including:
Processor and for storing the memorizer of processor executable;
Wherein, described processor is configured to:
Obtain the voice to be identified of input;
According to voice to be identified described in letter calibration voice or word calibration speech recognition, wherein said letter Calibration voice replacement system acquiescence letter received pronunciation.
Embodiment of the disclosure that the technical scheme of offer can include following beneficial effect:
The disclosure, can be according to letter calibration voice or word calibration after the voice to be identified obtaining input Voice identifies that described voice to be identified, wherein said letter calibration voice substituted for system default letter mark Quasi-voice, even if thus making user's voice can also be accurately identified with accent, improves language Sound identification ability.
Further, the disclosure can also have two kinds of processing modes, can be to use described letter calibration language The word calibration voice that sound composition is new, according to the language to be identified of input described in the calibration speech recognition of described word Sound;Can also is that the word calibration voice obtaining storage, the word calibration voice of wherein said storage is root According to described letter calibration after speech recognition goes out history voice to be identified, by the voice identified form new Word calibration voice;The voice to be identified of input described in word calibration speech recognition according to described acquisition, The most just so can also improve voice according to the voice to be identified of word calibration speech recognition input Identification ability and recognition efficiency.
Further, the disclosure can calibrate voice by recording the pronunciation of all letters of alphabet as letter.
Further, the disclosure can be special with the voice of described voice to be identified according to described word calibration voice Matching relationship between reference breath, identifies the voice to be identified of input.
Further, the disclosure can be combined into syllables by single letter calibration sound and obtain new word calibration voice, Or obtain new word calibration voice by combination multiple letter calibration voice combining into syllables according to liaison rule.
Further, letter can be calibrated and arrange fuzzy approximation pass between the setting letter in voice by the disclosure System, can solve the problem that the individual letters pronunciation of some accents is similar.
It should be appreciated that it is only exemplary and explanatory that above general description and details hereinafter describe, The disclosure can not be limited.
Accompanying drawing explanation
Accompanying drawing herein is merged in description and constitutes the part of this specification, it is shown that meet the disclosure Embodiment, and for explaining the principle of the disclosure together with description.
Fig. 1 is the disclosure flow chart according to a kind of audio recognition method shown in an exemplary embodiment.
Fig. 2 is the disclosure flow chart according to the another kind of audio recognition method shown in an exemplary embodiment.
Fig. 3 is the disclosure block diagram according to a kind of speech recognition equipment shown in an exemplary embodiment.
Fig. 4 is the disclosure block diagram according to the another kind of speech recognition equipment shown in an exemplary embodiment.
Fig. 5 is a disclosure structured flowchart according to a kind of mobile terminal shown in an exemplary embodiment.
Fig. 6 is a disclosure structured flowchart according to a kind of equipment shown in an exemplary embodiment.
Detailed description of the invention
Here will illustrate exemplary embodiment in detail, its example represents in the accompanying drawings.Following retouches Stating when relating to accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represents same or analogous key element. Embodiment described in following exemplary embodiment does not represent all embodiment party consistent with the disclosure Formula.On the contrary, they only with describe in detail in appended claims, the disclosure some in terms of mutually one The example of the apparatus and method caused.
The term used in the disclosure is only merely for describing the purpose of specific embodiment, and is not intended to be limiting this Open." a kind of ", " described " and " being somebody's turn to do " of singulative used in disclosure and the accompanying claims book It is also intended to include most form, unless context clearly shows that other implications.It is also understood that herein Any or all possibility that the term "and/or" used refers to and comprises one or more project of listing being associated Combination.
Although should be appreciated that may use term first, second, third, etc. to describe various letter in the disclosure Breath, but these information should not necessarily be limited by these terms.These terms are only used for same type of information district each other Separately.Such as, without departing from the scope of this disclosure, the first information can also be referred to as the second information, Similarly, the second information can also be referred to as the first information.Depend on linguistic context, word as used in this " as Really " can be construed to " ... time " or " when ... " or " in response to determining ".
Present disclose provides a kind of audio recognition method, device and terminal, can more accurately identify the voice of user.
Fig. 1 is the disclosure flow chart according to a kind of audio recognition method shown in an exemplary embodiment.
The method can apply in terminal, as it is shown in figure 1, the method may comprise steps of:
In a step 101, the voice to be identified of input is obtained.
In a step 102, according to voice to be identified described in letter calibration voice or word calibration speech recognition, Wherein said letter calibration voice replacement system acquiescence letter received pronunciation.
This step can use the word calibration voice that described letter calibration voice composition is new, according to described literary composition The voice to be identified of input described in word calibration speech recognition;Or, obtain the word calibration voice of storage, The word calibration voice of wherein said storage is to go out history language to be identified according to described letter calibration speech recognition After sound, the new word calibration voice being made up of the voice identified, calibrate according to the word of described acquisition The voice to be identified of input described in speech recognition.
Wherein, this step can gather letter calibration voice by recording the pronunciation of all letters of alphabet; The alphabetical received pronunciation replacement system acquiescence letter received pronunciation that will gather.
This step can obtain described word calibration voice and the voice characteristics information of described voice to be identified, According to the matching relationship between described word calibration voice and the voice characteristics information of described voice to be identified, Identify the voice to be identified of input.
This step uses the word calibration voice that described letter calibration voice composition is new to may include that by list Individual letter calibration sound combines into syllables and obtains new word calibration voice;Or, by combination multiple letter calibration voice And combine into syllables according to liaison rule and to obtain new word calibration voice.
This step can obtain word calibration voice and the voice characteristics information of voice to be identified;According to word Matching relationship between calibration voice and the voice characteristics information of voice to be identified, identifies the to be identified of input Voice.Wherein, voice characteristics information can include following one or more: the tone color of voice, pitch, sound Length and loudness of a sound.
From this embodiment, embodiment of the disclosure that the technical scheme of offer can include following beneficial effect: The disclosure, can be according to letter calibration voice or word calibration voice after the voice to be identified obtaining input Identify that described voice to be identified, wherein said letter calibration voice substituted for system default letter standard speech Sound, even if thus making user's voice can also be accurately identified with accent, improving voice and knowing Other ability.
Fig. 2 is that the disclosure is according to the another kind of audio recognition method flow chart shown in an exemplary embodiment.
The method can apply in terminal, and this embodiment describe in more detail the skill of the disclosure relative to Fig. 1 Art scheme.
Technical scheme is described in detail below in conjunction with Fig. 2.As in figure 2 it is shown, the method may comprise steps of:
In step 201, obtained by all of for alphabet letter-sound is recorded by collection user the most voluntarily Letter calibration voice.
All letter-sounds that user is recorded by the disclosure are referred to as letter calibration voice.The disclosure provides for mark The input function voluntarily of quasi-pronunciation letter, all pronunciation letters are recorded one time by user voluntarily, obtain letter school Quasi-voice, follow-up using letter calibration voice as standard, so can solve RP letter configuration The accent problem of pronunciation.
Wherein, above-mentioned letter can be such as the letter etc. of English alphabet, Chinese character or other language.
In step 202., the acquiescence word that voice replacement system is original is calibrated in the letter that the user obtained records Female received pronunciation.
Because the acquiescence letter received pronunciation of system is to be difficult to identify the letter-sound with accent, therefore The letter that the user obtained records is calibrated the acquiescence letter received pronunciation that voice replacement system is original by the disclosure, So system arrange letter-sound standard by using gather letter calibration voice as criterion of identification, with ground The letter-sound of square opening sound is just easily identified.
In step 203, the word calibration voice that letter calibration voice composition is new is used.
In view of people when reading aloud the pronunciation of any one word or word, it is all to be sent by single letter Sound or combine multiple single-letter and send pronunciation according to corresponding liaison rule, speech recognition system can also be learned Practise this liaison behavior of people.Therefore this step of the disclosure can be combined into syllables by single letter calibration sound and obtain New word calibration voice;Or, by combination multiple letter calibration voice and combine into syllables according to liaison rule To new word calibration voice.
Such as: the phonetic of Fructus Mali pumilae is pingguo, can combine single letter or monogram p, ing, g, U, o also combine into syllables according to liaison mode p-ing-g-u-o, thus available new Teletext Standard voice.Also That is, the letter calibration voice that speech recognition system uses user to record voluntarily replaces default configuration After letter received pronunciation, then reconfigure multiple letter calibration voice by identical liaison rule or directly make Calibrate sound (such as some is that single letter forms a word) with single letter and obtain new word calibration language Sound, it is possible to the literary composition obtained according to letter received pronunciation that new word calibration voice replacement system is carried Word voice.
Wherein, described liaison, refer to that, such as in the same sense-group of English, previous word is with consonant phoneme Ending, a rear word starts with vowel phoneme, when speaking or read aloud sentence, the most naturally enough by this Two phonemes are merged and read out together, and this speech phenomenon is liaison.The syllable that liaison is constituted is general Do not read again, only need to naturally glance off, the heavyliest unreadable.Liaison rule, refers to liaison Custom, in the case of such as " consonant+vowel " type liaison, liaison rule is if adjacent two words In previous word be with consonant end up, later word is with vowel, and this will be by consonant and vowel Spell liaison.
It should be noted that, system typically can carry text-to-speech storehouse, presses silent by some common word or vocabulary Female received pronunciation of reading combines into syllables and stores.All text-to-speech letter school that system can be carried by the disclosure Quasi-voice combines into syllables after obtaining new Teletext Standard voice again, replaces original text-to-speech.
In step 204, according to the voice to be identified described in word calibration speech recognition user.
This step, obtains word calibration voice and the voice characteristics information of voice to be identified;According to word school Matching relationship between the voice characteristics information of quasi-voice and voice to be identified, identifies the language to be identified of input Sound.Wherein, voice characteristics information can include following one or more: the tone color of voice, pitch, the duration of a sound And loudness of a sound.
It should be noted that, according to the voice to be identified of voice characteristics information identification input, can use existing Recognizer be identified, the disclosure is not limited.
It should be noted that be, it is contemplated that can there are some pronunciations of obscuring of pronouncing in accent, the disclosure is permissible Between setting letter in letter calibration voice, fuzzy approximation relation is set, it would be possible to pronunciation can be there is and obscure Pronunciation associate, such as: letter-sound s=sh, c=ch etc. are set.
The scheme of the disclosure, user arranges letter calibration voice according to oneself accent situation, by all pronunciation words Mother record voluntarily one time as letter calibration voice, the alphabetical received pronunciation of replacement system, more alphabetical with using The word calibration voice that calibration voice composition is new identifies the voice to be identified of input, so can solve mark The accent problem of the pronunciation of quasi-pronunciation letter configuration, can promote the resolution of phonetic entry.
It should be noted that, the disclosure can also go out history voice to be identified according to letter calibration speech recognition After, the new word calibration voice being made up of the voice identified, the most just can directly obtain storage Word calibration voice, calibrates the voice to be identified of input described in speech recognition according to the word of described acquisition.
Corresponding with aforementioned applications function realizing method embodiment, the disclosure additionally provides a kind of speech recognition dress Put, terminal and corresponding embodiment.
Fig. 3 is the disclosure block diagram according to a kind of speech recognition equipment shown in an exemplary embodiment.
This device can be provided in terminal.As it is shown on figure 3, it is permissible in a kind of speech recognition equipment Including: acquisition module 31, sound identification module 32.
Acquisition module 31, for obtaining the voice to be identified of input.
Sound identification module 32, for obtaining according to letter calibration voice or word calibration speech recognition The voice to be identified of module, wherein said letter calibration voice replacement system acquiescence letter received pronunciation.
Wherein, sound identification module 32 can use the word calibration that described letter calibration voice composition is new Voice, according to the voice to be identified of input described in the calibration speech recognition of described word;Or, obtain storage Word calibration voice, wherein said storage word calibration voice be according to described letter calibration voice know After not going out history voice to be identified, the new word calibration voice being made up of the voice identified, according to institute State the voice to be identified of input described in the word calibration speech recognition of acquisition.
From this embodiment, the disclosure, can be according to letter school after the voice to be identified obtaining input Quasi-voice or word calibration voice identify that described voice to be identified, wherein said letter calibration voice are replaced System default letter received pronunciation, even if thus making user's voice can also be by with accent Accurately identify, improve speech recognition capabilities.
Fig. 4 is the disclosure another block diagram according to the device of the MPTY shown in an exemplary embodiment.
This device can be provided in terminal.As shown in Figure 4, permissible in a kind of speech recognition equipment Including: acquisition module 31, sound identification module 32, letter voice replacement module 33, fuzzy mould is set Block 34.
Acquisition module 31, the function of sound identification module 32 can be found in the description in Fig. 3.
Wherein, sound identification module 32 may include that the first identification submodule 321 or second identifies son Module 322.
First identifies submodule 321, for using the word calibration language that described letter calibration voice composition is new Sound, according to the voice to be identified of input described in the calibration speech recognition of described word.
Second identifies submodule 322, for obtaining the word calibration voice of storage, wherein said storage Word calibration voice is according to described letter calibration after speech recognition goes out history voice to be identified, by identifying The new word calibration voice of voice composition, calibrate described in speech recognition defeated according to the word of described acquisition The voice to be identified entered.
Wherein, may include that acquisition word school according to the voice to be identified of word calibration speech recognition input Quasi-voice and the voice characteristics information of voice to be identified;Language according to word calibration voice with voice to be identified Matching relationship between sound characteristic information, identifies the voice to be identified of input.Wherein, voice characteristics information Can include following one or more: the tone color of voice, pitch, the duration of a sound and loudness of a sound.
Wherein, described device can also include: letter voice replacement module 33.
Letter voice replacement module 33, for the pronunciation collection letter school by recording all letters of alphabet Quasi-voice, gives tacit consent to letter received pronunciation by the alphabetical received pronunciation replacement system of described collection.Because system Acquiescence letter received pronunciation be to be difficult to identify that therefore the disclosure will obtain with the letter-sound of accent The acquiescence letter received pronunciation that voice replacement system is original, such system calibrated in the letter that the user taken records Arrange letter-sound standard by using gather letter calibration voice as criterion of identification, with accent Letter-sound be just easily identified.
Wherein, described sound identification module 32 obtains described word calibration voice and described voice to be identified Voice characteristics information, according to the voice characteristics information of described word calibration voice and described voice to be identified Between matching relationship, identify input voice to be identified.
Wherein, described first identification submodule 321 is combined into syllables by single letter calibration sound and obtains new word Calibration voice or by combination multiple letter calibration voice and combine into syllables according to liaison rule and obtain new word school Quasi-voice.
Wherein, described device can also include: obscures and arranges module 34.
Obscure and module 34 is set, fuzzy approximation is set between the setting letter in letter calibration voice Relation.Can there are some pronunciations obscured of pronouncing in view of accent, the disclosure can be calibrated at letter Between setting letter in voice, fuzzy approximation relation is set, it would be possible to the pronunciation pass that pronunciation is obscured can be there is Connection gets up, such as: arrange letter-sound s=sh, c=ch etc..
Therefore, the scheme of the disclosure, user arranges letter calibration voice according to oneself accent situation, by institute Have pronunciation letter record voluntarily one time as letter calibration voice, the alphabetical received pronunciation of replacement system, then The voice to be identified of input is identified, so with using the word that letter calibration voice composition is new to calibrate voice The accent problem of the pronunciation of RP letter configuration can be solved, the identification of phonetic entry can be promoted Degree.
In said apparatus, the function of unit and the process that realizes of effect specifically refer in said method corresponding Step realize process, do not repeat them here.
For device embodiment, owing to it corresponds essentially to embodiment of the method, so relevant part ginseng See that the part of embodiment of the method illustrates.Device embodiment described above is only schematically, The unit wherein illustrated as separating component can be or may not be physically separate, as list The parts of unit's display can be or may not be physical location, i.e. may be located at a place, or Can also be distributed on multiple NE.Part therein or complete can be selected according to the actual needs Portion's module realizes the purpose of disclosure scheme.Those of ordinary skill in the art are not paying creative work In the case of, i.e. it is appreciated that and implements.
Fig. 5 is that the disclosure is according to a kind of block diagram shown in an exemplary embodiment.
As it is shown in figure 5, include: processor 501 and for storing the memorizer 502 of processor executable;
Wherein, processor 501 is configured to:
Obtain the voice to be identified of input;
According to voice to be identified described in letter calibration voice or word calibration speech recognition, wherein said letter Calibration voice replacement system acquiescence letter received pronunciation.
It should be noted that, other programs of memorizer 502 storage, referring specifically in previous methods flow process Description, here is omitted, processor 501 be additionally operable to perform memorizer 502 storage other programs.
Fig. 6 is a disclosure structured flowchart according to a kind of equipment shown in an exemplary embodiment.
Such as, equipment 600 can be mobile phone, computer, digital broadcast terminal, and information receiving and transmitting sets Standby, game console, tablet device, armarium, body-building equipment, personal digital assistant etc..
With reference to Fig. 6, equipment 600 can include following one or more assembly: processes assembly 602, storage Device 604, power supply module 606, multimedia groupware 608, audio-frequency assembly 610, input/output (I/O) Interface 612, sensor cluster 614, and communications component 616.
Process assembly 602 and generally control the integrated operation of equipment 600, such as with display, call, The operation that data communication, camera operation and record operation are associated.Process assembly 602 and can include one Or multiple processor 620 performs instruction, to complete all or part of step of above-mentioned method.Additionally, Process assembly 602 and can include one or more module, it is simple to process between assembly 602 and other assemblies Mutual.Such as, process assembly 602 and can include multi-media module, to facilitate multimedia groupware 608 And process between assembly 602 mutual.
Memorizer 604 is configured to store various types of data to support the operation at equipment 600.This The example of a little data includes any application program for operation on equipment 600 or the instruction of method, connection It is personal data, telephone book data, message, picture, video etc..Memorizer 704 can be by any type Volatibility or non-volatile memory device or combinations thereof realize, such as static RAM (SRAM), Electrically Erasable Read Only Memory (EEPROM), erasable programmable is read-only Memorizer (EPROM), programmable read only memory (PROM), read only memory (ROM), Magnetic memory, flash memory, disk or CD.
The various assemblies that power supply module 606 is equipment 600 provide electric power.Power supply module 606 can include Power-supply management system, one or more power supplys, and other with generate for equipment 600, manage and distribute electricity The assembly that power is associated.
The screen of one output interface of offer that multimedia groupware 608 is included between equipment 600 and user. In certain embodiments, screen can include liquid crystal display (LCD) and touch panel (TP).As Really screen includes that touch panel, screen may be implemented as touch screen, to receive the input letter from user Number.Touch panel includes that one or more touch sensor touches with sensing, slides and on touch panel Gesture.Touch sensor can not only sense touch or the border of sliding action, but also detect and touch Or slide relevant persistent period and pressure.In certain embodiments, multimedia groupware 608 includes One front-facing camera and/or post-positioned pick-up head.When equipment 600 is in operator scheme, such as screening-mode or During video mode, front-facing camera and/or post-positioned pick-up head can receive the multi-medium data of outside.Each Front-facing camera and post-positioned pick-up head can be a fixing optical lens system or have focal length and optics Zoom capabilities.
Audio-frequency assembly 610 is configured to output and/or input audio signal.Such as, audio-frequency assembly 610 wraps Include a mike (MIC), when equipment 600 is in operator scheme, such as call model, logging mode During with speech recognition mode, mike is configured to receive external audio signal.The audio signal received Can be further stored at memorizer 604 or send via communications component 616.In certain embodiments, Audio-frequency assembly 610 also includes a speaker, is used for exporting audio signal.
I/O interface 612 provides interface, above-mentioned periphery for processing between assembly 602 and peripheral interface module Interface module can be keyboard, puts striking wheel, button etc..These buttons may include but be not limited to: homepage is pressed Button, volume button, start button and locking press button.
Sensor cluster 614 includes one or more sensor, for providing various aspects for equipment 600 State estimation.Such as, what sensor cluster 614 can detect equipment 600 opens/closed mode, The relative localization of assembly, such as assembly are display and the keypad of equipment 600, sensor cluster 614 Can also detect equipment 600 or the position change of 600 1 assemblies of equipment, user contacts with equipment 600 Presence or absence, equipment 600 orientation or acceleration/deceleration and the variations in temperature of equipment 600.Sensor Assembly 614 can include proximity transducer, is configured to detect when not having any physical contact attached The existence of nearly object.Sensor cluster 614 can also include optical sensor, as CMOS or CCD schemes As sensor, for using in imaging applications.In certain embodiments, this sensor cluster 614 is also Acceleration transducer, gyro sensor, Magnetic Sensor, pressure transducer or temperature sensing can be included Device.
Communications component 616 is configured to facilitate wired or wireless mode between equipment 600 and other equipment Communication.Equipment 600 can access wireless network based on communication standard, such as WiFi, 2G or 3G, or Combinations thereof.In one exemplary embodiment, communications component 616 via broadcast channel receive from The broadcast singal of external broadcasting management system or broadcast related information.In one exemplary embodiment, logical Letter assembly 616 also includes near-field communication (NFC) module, to promote junction service.Such as, at NFC Module can be based on RF identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra broadband (UWB) Technology, bluetooth (BT) technology and other technologies realize.
In the exemplary embodiment, equipment 600 can be by one or more application specific integrated circuits (ASIC), digital signal processor (DSP), digital signal processing appts (DSPD), can compile Journey logical device (PLD), field programmable gate array (FPGA), controller, microcontroller, micro- Processor or other electronic components realize, and are used for performing said method.
In the exemplary embodiment, a kind of non-transitory computer-readable storage including instruction is additionally provided Medium, such as, include the memorizer 604 of instruction, and above-mentioned instruction can be held by the processor 620 of equipment 600 Row is to complete said method.Such as, non-transitory computer-readable recording medium can be ROM, random Access memorizer (RAM), CD-ROM, tape, floppy disk and optical data storage devices etc..
A kind of non-transitory computer-readable recording medium, when the instruction in storage medium is by terminal unit When processor performs so that terminal is able to carry out audio recognition method, and method includes:
Obtain the voice to be identified of input;
According to voice to be identified described in letter calibration voice or word calibration speech recognition, wherein said letter Calibration voice replacement system acquiescence letter received pronunciation.
Those skilled in the art, after considering description and putting into practice invention disclosed herein, will readily occur to these public affairs Other embodiment opened.The disclosure is intended to any modification, purposes or the adaptations of the disclosure, These modification, purposes or adaptations are followed the general principle of the disclosure and include that the disclosure is not disclosed Common knowledge in the art or conventional techniques means.Description and embodiments is considered only as exemplary , the true scope of the disclosure and spirit are pointed out by claim below.
It should be appreciated that the disclosure is not limited to accurate knot described above and illustrated in the accompanying drawings Structure, and various modifications and changes can carried out without departing from the scope.The scope of the present disclosure is only by appended Claim limits.

Claims (15)

1. an audio recognition method, it is characterised in that including:
Obtain the voice to be identified of input;
According to voice to be identified described in letter calibration voice or word calibration speech recognition, wherein said letter Calibration voice replacement system acquiescence letter received pronunciation.
Method the most according to claim 1, it is characterised in that described according to word calibration voice Identify described voice to be identified, including:
Use the word calibration voice that described letter calibration voice composition is new;
Voice to be identified according to input described in the calibration speech recognition of described word.
Method the most according to claim 1, it is characterised in that described according to word calibration voice Identify described voice to be identified, including:
Obtaining the word calibration voice of storage, the word calibration voice of wherein said storage is according to described word After the quasi-speech recognition of Alma Mater goes out history voice to be identified, the new word calibration being made up of the voice identified Voice;
The voice to be identified of input described in word calibration speech recognition according to described acquisition.
Method the most according to claim 1, it is characterised in that described letter calibration voice is replaced System default letter received pronunciation includes:
Letter calibration voice is gathered by recording the pronunciation of all letters of alphabet;
The alphabetical received pronunciation replacement system of described collection is given tacit consent to letter received pronunciation.
Method the most according to claim 1, it is characterised in that described according to described word calibration language The voice to be identified of sound identification input, including:
Obtain described word calibration voice and the voice characteristics information of described voice to be identified;
According to mating between described word calibration voice and the voice characteristics information of described voice to be identified Relation, identifies the voice to be identified of input.
Method the most according to claim 5, it is characterised in that described voice characteristics information can wrap Include following one or more: the tone color of voice, pitch, the duration of a sound and loudness of a sound.
Method the most according to claim 2, it is characterised in that described use described letter calibration The word calibration voice that voice composition is new includes:
Combined into syllables by single letter calibration sound and obtain new word calibration voice;Or,
New word calibration language is obtained by combination multiple letter calibration voice combining into syllables according to liaison rule Sound.
8. according to the method described in any one of claim 1 to 7, it is characterised in that:
Between setting letter in described letter calibration voice, fuzzy approximation relation is set.
9. a speech recognition equipment, it is characterised in that including:
Acquisition module, for obtaining the voice to be identified of input;
Sound identification module, for obtaining mould according to letter calibration voice or word calibration speech recognition The voice to be identified of block, wherein said letter calibration voice replacement system acquiescence letter received pronunciation.
Speech recognition equipment the most according to claim 9, it is characterised in that described speech recognition Module includes:
First identifies submodule, for using the word calibration voice that described letter calibration voice composition is new, Voice to be identified according to input described in the calibration speech recognition of described word;Or,
Second identifies submodule, for obtaining the word calibration voice of storage, the word of wherein said storage Calibration voice is according to described letter calibration after speech recognition goes out history voice to be identified, by the language identified The new word calibration voice of sound composition, calibrates input described in speech recognition according to the word of described acquisition Voice to be identified.
11. devices according to claim 9, it is characterised in that also include:
Letter voice replacement module, for the pronunciation collection letter calibration by recording all letters of alphabet Voice, gives tacit consent to letter received pronunciation by the alphabetical received pronunciation replacement system of described collection.
12. devices according to claim 9, it is characterised in that:
The voice that described sound identification module obtains described word calibration voice and described voice to be identified is special Reference cease, according to described word calibration voice and the voice characteristics information of described voice to be identified between Join relation, identify the voice to be identified of input.
13. devices according to claim 10, it is characterised in that:
Described first identifies that submodule is combined into syllables by single letter calibration sound obtains new word calibration voice Or obtain new word calibration voice by combination multiple letter calibration voice combining into syllables according to liaison rule.
14. according to the device described in any one of claim 9 to 13, it is characterised in that described device Also include:
Obscure and module is set, arrange between the setting letter in described letter calibration voice and obscure closely Like relation.
15. 1 kinds of mobile terminals, it is characterised in that including:
Processor and for storing the memorizer of processor executable;
Wherein, described processor is configured to:
Obtain the voice to be identified of input;
According to voice to be identified described in letter calibration voice or word calibration speech recognition, wherein said letter Calibration voice replacement system acquiescence letter received pronunciation.
CN201610509372.6A 2016-06-30 2016-06-30 Voice recognition method, device and terminal Active CN105913841B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610509372.6A CN105913841B (en) 2016-06-30 2016-06-30 Voice recognition method, device and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610509372.6A CN105913841B (en) 2016-06-30 2016-06-30 Voice recognition method, device and terminal

Publications (2)

Publication Number Publication Date
CN105913841A true CN105913841A (en) 2016-08-31
CN105913841B CN105913841B (en) 2020-04-03

Family

ID=56753927

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610509372.6A Active CN105913841B (en) 2016-06-30 2016-06-30 Voice recognition method, device and terminal

Country Status (1)

Country Link
CN (1) CN105913841B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710484A (en) * 2018-03-12 2018-10-26 西安艾润物联网技术服务有限责任公司 It is a kind of to pass through the method for speech modification license plate number, storage medium and device
CN111540353A (en) * 2020-04-16 2020-08-14 重庆农村商业银行股份有限公司 Semantic understanding method, device, equipment and storage medium
CN111971744A (en) * 2018-03-23 2020-11-20 清晰Xyz有限公司 Handling speech to text conversion

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1465042A (en) * 2001-05-02 2003-12-31 索尼公司 Obot device, character recognizing apparatus and character reading method, and control program and recording medium
CN101141508A (en) * 2006-09-05 2008-03-12 美商富迪科技股份有限公司 Communication system and voice recognition method
CN101958118A (en) * 2003-03-31 2011-01-26 索尼电子有限公司 Implement the system and method for speech recognition dictionary effectively
CN103594085A (en) * 2012-08-16 2014-02-19 百度在线网络技术(北京)有限公司 Method and system providing speech recognition result
CN104282302A (en) * 2013-07-04 2015-01-14 三星电子株式会社 Apparatus and method for recognizing voice and text
CN105096945A (en) * 2015-08-31 2015-11-25 百度在线网络技术(北京)有限公司 Voice recognition method and voice recognition device for terminal
CN105355195A (en) * 2015-09-25 2016-02-24 小米科技有限责任公司 Audio frequency recognition method and audio frequency recognition device
CN105513594A (en) * 2015-11-26 2016-04-20 许传平 Voice control system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1465042A (en) * 2001-05-02 2003-12-31 索尼公司 Obot device, character recognizing apparatus and character reading method, and control program and recording medium
CN101958118A (en) * 2003-03-31 2011-01-26 索尼电子有限公司 Implement the system and method for speech recognition dictionary effectively
CN101141508A (en) * 2006-09-05 2008-03-12 美商富迪科技股份有限公司 Communication system and voice recognition method
CN103594085A (en) * 2012-08-16 2014-02-19 百度在线网络技术(北京)有限公司 Method and system providing speech recognition result
CN104282302A (en) * 2013-07-04 2015-01-14 三星电子株式会社 Apparatus and method for recognizing voice and text
CN105096945A (en) * 2015-08-31 2015-11-25 百度在线网络技术(北京)有限公司 Voice recognition method and voice recognition device for terminal
CN105355195A (en) * 2015-09-25 2016-02-24 小米科技有限责任公司 Audio frequency recognition method and audio frequency recognition device
CN105513594A (en) * 2015-11-26 2016-04-20 许传平 Voice control system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710484A (en) * 2018-03-12 2018-10-26 西安艾润物联网技术服务有限责任公司 It is a kind of to pass through the method for speech modification license plate number, storage medium and device
CN108710484B (en) * 2018-03-12 2021-09-21 西安艾润物联网技术服务有限责任公司 Method, storage medium and device for modifying license plate number through voice
CN111971744A (en) * 2018-03-23 2020-11-20 清晰Xyz有限公司 Handling speech to text conversion
CN111540353A (en) * 2020-04-16 2020-08-14 重庆农村商业银行股份有限公司 Semantic understanding method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN105913841B (en) 2020-04-03

Similar Documents

Publication Publication Date Title
CN106024014B (en) A kind of phonetics transfer method, device and mobile terminal
CN106024009B (en) Audio processing method and device
CN110634483B (en) Man-machine interaction method and device, electronic equipment and storage medium
US8655659B2 (en) Personalized text-to-speech synthesis and personalized speech feature extraction
CN110210310B (en) Video processing method and device for video processing
KR101819458B1 (en) Voice recognition apparatus and system
CN104468959A (en) Method, device and mobile terminal displaying image in communication process of mobile terminal
CN107945806B (en) User identification method and device based on sound characteristics
CN105139848B (en) Data transfer device and device
JP7116088B2 (en) Speech information processing method, device, program and recording medium
CN108073572A (en) Information processing method and its device, simultaneous interpretation system
CN109977426A (en) A kind of training method of translation model, device and machine readable media
CN112037756A (en) Voice processing method, apparatus and medium
CN108650543A (en) The caption editing method and device of video
CN105913841A (en) Voice recognition method, voice recognition device and terminal
CN110930977B (en) Data processing method and device and electronic equipment
KR102279505B1 (en) Voice diary device
KR20200056754A (en) Apparatus and method for generating personalization lip reading model
CN113409765B (en) Speech synthesis method and device for speech synthesis
CN104660819B (en) Mobile device and the method for accessing file in mobile device
CN112837668A (en) Voice processing method and device for processing voice
CN109992121A (en) A kind of input method, device and the device for input
JP6509308B1 (en) Speech recognition device and system
CN117409783A (en) Training method and device for voice feature recognition model
CN113990289A (en) Data processing method and device and readable medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant