CN101320561A - Method and module for improving individual speech recognition rate - Google Patents

Method and module for improving individual speech recognition rate Download PDF

Info

Publication number
CN101320561A
CN101320561A CNA2007101098914A CN200710109891A CN101320561A CN 101320561 A CN101320561 A CN 101320561A CN A2007101098914 A CNA2007101098914 A CN A2007101098914A CN 200710109891 A CN200710109891 A CN 200710109891A CN 101320561 A CN101320561 A CN 101320561A
Authority
CN
China
Prior art keywords
model
cognition
user
phoneme
construction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2007101098914A
Other languages
Chinese (zh)
Inventor
徐志文
高鸿宗
刘进荣
何泰轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cyberon Corp
Original Assignee
Cyberon Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cyberon Corp filed Critical Cyberon Corp
Priority to CNA2007101098914A priority Critical patent/CN101320561A/en
Publication of CN101320561A publication Critical patent/CN101320561A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Telephone Function (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The present invention relates to a method and a module which are used for improving the personal speech recognition rate. The module can be used in a portable electronic device. The portable device is provided with a preset recognition model; the recognition model consists of a phoneme model, and is used for transmitting at least an instruction speech to a user for recognition. The method comprises the steps of building a special text database which is relevant to the characters corresponding to the instruction speech, acquiring a plurality of speech data by the user according to the text database to build a adjusting parameter, and integrating the phoneme model and the adjusting parameter to regulate the recognition model. According to the steps, the user can effectively adjust the recognition model, so as to improve the personal speech recognition rate.

Description

Promote the method and the module of individual speech recognition rate
Technical field
Relevant a kind of method and the module that promotes individual speech recognition rate of the present invention; In more detail, be module and the method thereof that is used for the lifting individual speech recognition rate of a portable electronic devices about a kind of.
Background technology
Along with the arriving of digital times, the interaction between the human and portable electronic product is also more and more frequent, but the operation and control interface of portable electronic product can't satisfy user's demand gradually now.The most natural communication way is exactly a language in human daily life, therefore if being given an order, human directly use language gives the portable electronic product, will make that the operation and control interface of portable electronic product is easier to be accepted by the user, make on the portable electronic product operation more conveniently, and significantly increase the surcharge of portable electronic product.
For example, have the mobile phone of speech identifying function, have a default model of cognition, this model of cognition is according to the construction of phoneme model institute.Then according to this model of cognition, mobile phone can be discerned in order at least one instruction voice that a user is sent.Model of cognition that this is default and user are irrelevant, and meaning is that the user need not the voice of pre-recording, and can enjoy the facility of speech recognition.Yet this kind model of cognition can't be taken specific user's voice difference into account, and when user's voice and the model of cognition of presetting differed big, discrimination promptly can reduce.
Concealed Markov model (Hidden Markov Model is hereinafter to be referred as HMM) is the normal speech model that uses in the field of speech recognition, in order to constitute a phoneme model.It is a probability type generation model that a HMM speech model is looked each input data (for example voice).The HMM speech model all has a probability distribution, desire to inquire about a certain voice for when for each index (for example word or speech), then is to decide in the possibility that these voice take place by inquiring about all index.For the effect that makes speech recognition more accurate.Then need to use speech data to adjust the HMM speech model, make it can pass through the speech sound signal of this accommodation function with the identification different users.
On the other hand, human each voice that is sent all are made up of different phonemes, are example with Chinese, and the pronunciation of each word all can be made up of different initial consonants or simple or compound vowel of a Chinese syllable, so each different initial consonant or simple or compound vowel of a Chinese syllable just can be considered different phonemes.Phoneme model is based on the HMM speech model, the model of setting up at each different phoneme.
In order to reach the above-mentioned purpose that gives an order with language, existing instruction audio recognition method is a model of cognition of forming each instruction with phoneme model.For example " phone Wang Xiaoming ", wherein " phone " and just can be considered an instruction, but the tone difference that everyone speaks so need the user at different instructions, is imported speech data corresponding with it to adjust its instruction model of cognition.But this adjustment is gradual, so the user just need repeat to provide the speech data of " phoning ", up to corresponding instruction model of cognition can discern the user " phone " this instruction till.
The method of above-mentioned lifting individual speech recognition rate all needs to ask the user to adjust one by one at the different instruction model of cognition, also may must repeat to import many speech datas to same instruction model of cognition, and this extremely is inconvenient to reach concerning the user also lack efficient.
In sum, how to promote the efficient of adjusting the instruction model of cognition, the user need not adjusted one by one at the different instruction model of cognition, to save time and to promote individual speech recognition rate, this is the target that just effort is carved by speech recognition manufacturer.
Summary of the invention
A purpose of the present invention is to provide a kind of method that promotes individual speech recognition rate, this method is used for a portable electronic devices, the method can according to one in advance the rule phoneme model relevant with speech data hived off, afterwards whenever the user provides speech data, just can adjust phoneme model, the so also related instruction model of cognition of forming by phoneme model of having adjusted.Therefore the present invention can improve existing instruction audio recognition method needs the user at different instruction model of cognition, the shortcoming of input speech data corresponding with it.For reaching above-mentioned purpose, disclosed method, by the acquisition speech data that the user provided, construction goes out to adjust parameter; Then integrate phoneme model and adjust parameter, to adjust this model of cognition.By above-mentioned steps, just can adjust the model of cognition in the portable electronic devices.
Another purpose of the present invention is to provide a kind of module that promotes individual speech recognition rate, this module can be used for a portable electronic devices, and carry out aforesaid method, need the user at different instruction model of cognition to improve existing instruction speech recognition, the shortcoming of input speech data corresponding with it.For reaching above-mentioned purpose, disclosed module comprises a model of cognition, and adjusts a parameter model and an integrate module, and wherein model of cognition is made up of phoneme model, and adjusting parameter model is the speech data institute construction that provides according to the user.And integrate module is in order to integrate phoneme model and to adjust parameter, to adjust model of cognition.Whereby, the present invention can adjust technology by the user, improves in the portable electronic devices, and model of cognition is for specific user's discrimination.
Behind the embodiment of consulting accompanying drawing and describing subsequently, affiliated technical field has knows that usually the knowledgeable just can understand other purposes of the present invention, and technological means of the present invention and enforcement aspect.
Description of drawings
Fig. 1 is the process flow diagram of method embodiment of the present invention;
Fig. 2 is the further process flow diagram of method embodiment of the present invention;
Fig. 3 is the synoptic diagram of phoneme model of the present invention group framework; And
Fig. 4 is the synoptic diagram of module embodiment of the present invention.
Embodiment
Preferred embodiment of the present invention is a kind of method that promotes individual speech recognition rate, is applied to a portable electronic devices with speech identifying function, is a mobile phone in the present embodiment.Have recognition system in the mobile phone, comprise a default model of cognition, this model of cognition is that this method is adjusted parameter by integrating this phoneme model and, to adjust this model of cognition according to the construction of at least one phoneme model institute.Model of cognition after then adjusting according to this, mobile phone can promote the discrimination of at least one instruction voice that a user is sent.Specifically, the default model of cognition of adjusting as yet all carries out speech recognition with identical model of cognition for the different users, can be considered the construction by a unspecific phoneme model institute.
See also Fig. 1, at first, execution in step 100, the lteral data storehouse that construction one is specific, in the middle of this preferred embodiment, specific lteral data storehouse is relevant with the pairing literal of the spendable instruction voice of user, and does not need and instruction identical.For example, default instruction voice in order to operating handset are instructions such as " phoning ", " shutdown " in the mobile phone, and specific lteral data storehouse promptly is according to these instruction features of voice and construction, will be in order to improve the phonetic recognization rate of mobile phone to specific user.Therefore, this specific lteral data storehouse can be made of above-mentioned instruction, also can be made of other literal relevant with the phonetic feature of above-mentioned instruction.About phonetic feature, be discussed in hereinafter.
Next, execution in step 101, when the user sent voice according to above-mentioned specific character database, the feature in acquisition a plurality of speech datas that the user sent went out one with construction and adjusts parameter.At last, execution in step 102 integration adjustment parameters and phoneme model are to adjust model of cognition.
See also Fig. 2, specifically, step 101 comprises the following step: execution in step 200 is by capturing proper vector in a plurality of speech datas, and wherein proper vector can be Mel cepstral coefficients (Mel-scale FrequencyCepstral Coefficients), linear predictor cepstral coefficients (Linear Predictive CepstralCoefficient) and cepstrum (Cepstral) one of them or its combination.Next execution in step 201, utilize the proper vector that is captured, and are aided with group's framework of a phoneme model, go out one with construction and adjust parameter.This group's framework is to set up according to default phoneme model, and is irrelevant with user's language tendency.About further specifying of group's framework please refer to Fig. 3 with hereinafter.
Specifically, in step 201, behind the recognition system acquisition speech data, proper vector in the acquisition speech data, these proper vectors are promptly relevant with user individual pronunciation custom, recognition system is utilized this proper vector afterwards, is aided with group's framework of a phoneme model, goes out one with construction and adjusts parameter.For example, can adopt maximum back probability estimation method (Maximum a posteriori estimation, MAP), maximum similarity linear regression method (MaximumLikelihood Linear Regression, MLLR) and vector field smoothing (Vector-Field Smoothing, VFS) comprehensive method, the best that reaches under the various training voice datas is adjusted effect.Wherein MLLR and VFS algorithm, the method that employing is hived off overcomes the problem of adjusting data deficiencies or shortage of probability distribution model, when a certain probability distribution model data is not enough, just can have the probability distribution model of particular association with reference to other of the same group of this probability distribution model (for example HMM speech model), adjust this probability distribution model, and the particular association of each probability distribution model is represented to set up group's framework just.Still have the phenomenon of data deficiencies or shortage in the group of hiving off, the group of hiving off will be set up as tree structure, when if a certain group data is not enough, can up review, merge with another group, if when data are still not enough, then up review again, in order to a group that adjusts model of cognition, enough data are arranged till.
Please refer to Fig. 3, Fig. 3 is the synoptic diagram of group's framework 3, and the method for hiving off is to use existing k-means algorithm, and the phoneme model of speech data is divided into 5 subgroup 300,301,302,303 and 304, is not described in detail in this.Adopt (bottom-up) mode from bottom to top to strengthen relation between each subgroup then, making has enough data to adjust model of cognition in the group.Utilize similarity between these subgroup (i.e. distance or maximum similarity), be combined into father group 305,306,307 and 308, and then tree structure of construction up, this group's framework finished.Visual actual conditions of above-mentioned method and adjusting not are in order to limit the scope of the invention.
In more detail, suppose because the relation of user's accent (being the language tendency), the pronunciation that user " ㄉ " reaches " ㄍ " is very close, so in this group's framework, the model that just " ㄉ " can be reached " ㄍ " is considered as two phoneme models in same subgroup 300, and phoneme model " ㄉ " reaches the voice that " ㄍ " just can be considered particular association, reached " ㄍ " relevant proper vector as long as comprise with " ㄉ " in the proper vector that captures, the proper vector that these relevant " ㄉ " reach " ㄍ " also can be used to adjust the phoneme model in the same group.
Therefore present embodiment can according to as above-mentioned group's framework, integrate and adjust parameter and phoneme model, to adjust default model of cognition, therefore adjusting parameter is hived off according to user's accent in fact, so in this preferred embodiment, as long as have " shutdown " to reach the instruction model of cognition of " making a phone call " in the default model of cognition, and have in the voice that the user sends and comprise " ㄉ " or " ㄍ ", just can adjust phoneme model " ㄉ " and reach " ㄍ ", so also related the adjustment comprises " shutdown " that phoneme model " ㄉ " reaches " ㄍ " and reaches " making a phone call " instruction model of cognition.In other words, all comprise the model of cognition of identical phoneme model, can related in the lump adjustment, and the model of cognition after adjusting just can be considered by specific phoneme model institute construction.
As shown in the above description, the present invention can adjust model of cognition by less speech data, utilize group's framework of phoneme model, when the user when reading out a certain voice, related adjustment is the relevant phoneme model of voice therewith, and then adjust the model of cognition that instructs, make the user import less speech data and just can adjust all model of cognition.
Another preferred embodiment of the present invention is the module 4 of a lifting individual speech recognition rate, be used for a portable electronic devices (as mobile phone), module 4 comprises a model of cognition 400, and adjusts a parameter model 401 and an integrate module 402, can utilize the method for preferred embodiment as described above, improve phonetic recognization rate.
Model of cognition 400 is by the construction of phoneme model institute, discerns in order to the instruction voice that a user is sent, and this phoneme model is identical with the described phoneme model of aforementioned preferred embodiment, does not add at this and gives unnecessary details.And adjustment parameter model 401 is the speech data institute construction according to the user, this adjusts parameter model 401 and comprises the described group of a preferred embodiment framework as described above, this group's framework is to form according to the particular association between phoneme model, this group's framework is the described group of preferred embodiment framework as described above, in this superfluous words no longer.The proper vector of a plurality of speech datas that the construction that this adjusts parameter model 401 is the acquisition user is sent according to a specific lteral data storehouse is aided with group's framework and gets.The purpose of design of specific character database, be that the user is sent and the relevant voice of phoneme model that constitute the instruction voice, for example, specific literal can be an instruction, as " making a phone call ", " shutdown " etc., also can be one section specific character, as " phone is arranged in the room ", " weather is very good " etc.At same text, different users's pronunciation is also different.Integrate module 402 is in order to integrate phoneme model and to adjust parameter model, and to adjust model of cognition, preferred embodiment is described as described above for its adjustment mode, does not add at this and gives unnecessary details.
Except operation and function that Fig. 4 described, module 4 also can carry out preceding method embodiment the institute in steps.Under technical field have know usually the knowledgeable can be directly acquainted with module 4 how based on preceding method embodiment to carry out these steps, do not add at this and give unnecessary details.
From the above, the present invention can do classification with phoneme model, producing group's framework, and according to this group's framework, utilizes the adjustment parameter relevant with the user to adjust phoneme model, the also related whereby model of cognition of having adjusted.Therefore the present invention can overcome the shortcoming of existing instruction audio recognition method, by importing less voice, can adjust model of cognition, to promote individual speech recognition rate.
The above embodiments only are used for exemplifying enforcement aspect of the present invention, and explain technical characterictic of the present invention, are not to be used for limiting category of the present invention.Any be familiar with this operator can unlabored change or the arrangement of the isotropism scope that all belongs to the present invention and advocated, interest field of the present invention should be as the criterion with the application's claim scope.

Claims (8)

1. a method that promotes individual speech recognition rate is used for a portable electronic devices, this portable apparatus, has a default model of cognition, this model of cognition is according to the construction of at least one phoneme model institute, with at least one instruction voice that a user is sent, discerns; This method comprises the following step:
The lteral data storehouse that construction one is specific is relevant with the pairing literal of these instruction voice;
Capture a plurality of speech datas that this user is sent according to this article numerical data base, go out one with construction and adjust parameter; And
Integrate this at least one phoneme model and this adjustment parameter, to adjust this model of cognition.
2. method according to claim 1 is characterized in that this construction one adjusts the step of parameter, is the proper vector of these a plurality of speech datas of acquisition, and at this at least one phoneme model, sets up group's framework.
3. method according to claim 2 is characterized in that this construction one adjusts the step of parameter, is the voice according to particular association, sets up this group's framework.
4. method according to claim 2 is characterized in that this adjusts the step of model of cognition, is according to this group's framework, so that this at least one phoneme model and this are adjusted parameter, integrates.
5. method according to claim 1 is characterized in that this model of cognition is by the construction of at least one unspecific phoneme model institute.
6. a module that promotes individual speech recognition rate is used for a portable electronic devices, comprises:
One model of cognition defaults in this portable electronic devices, and this model of cognition is by the construction of at least one phoneme model institute, is at least one instruction voice that sent in order to a user, discerns;
One adjusts parameter model, comprises group's framework, and this a group's framework and a user's language tendency is irrelevant; And
One integrate module is integrated this at least one phoneme model and this adjusts parameter model, to adjust this model of cognition.
7. module according to claim 6 is characterized in that this group's framework, is that the particular association of this at least one phoneme model of basis forms.
8. module according to claim 6 is characterized in that this model of cognition is by the construction of at least one unspecific phoneme model institute.
CNA2007101098914A 2007-06-05 2007-06-05 Method and module for improving individual speech recognition rate Pending CN101320561A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2007101098914A CN101320561A (en) 2007-06-05 2007-06-05 Method and module for improving individual speech recognition rate

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2007101098914A CN101320561A (en) 2007-06-05 2007-06-05 Method and module for improving individual speech recognition rate

Publications (1)

Publication Number Publication Date
CN101320561A true CN101320561A (en) 2008-12-10

Family

ID=40180575

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2007101098914A Pending CN101320561A (en) 2007-06-05 2007-06-05 Method and module for improving individual speech recognition rate

Country Status (1)

Country Link
CN (1) CN101320561A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103414830A (en) * 2013-08-28 2013-11-27 上海斐讯数据通信技术有限公司 Quick power-off method and system on the basis of voice
CN105609101A (en) * 2014-11-14 2016-05-25 现代自动车株式会社 Speech recognition system and speech recognition method
CN106233374A (en) * 2014-04-17 2016-12-14 高通股份有限公司 Generate for detecting the keyword model of user-defined keyword
WO2017076222A1 (en) * 2015-11-06 2017-05-11 阿里巴巴集团控股有限公司 Speech recognition method and apparatus
CN107016996A (en) * 2017-06-06 2017-08-04 广东小天才科技有限公司 A kind of processing method and processing device of voice data
CN108806691A (en) * 2017-05-04 2018-11-13 有爱科技(深圳)有限公司 Audio recognition method and system
CN114071539A (en) * 2020-08-10 2022-02-18 中国电信股份有限公司 Voice quality evaluation method and device

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103414830A (en) * 2013-08-28 2013-11-27 上海斐讯数据通信技术有限公司 Quick power-off method and system on the basis of voice
CN106233374A (en) * 2014-04-17 2016-12-14 高通股份有限公司 Generate for detecting the keyword model of user-defined keyword
CN106233374B (en) * 2014-04-17 2020-01-10 高通股份有限公司 Keyword model generation for detecting user-defined keywords
CN105609101A (en) * 2014-11-14 2016-05-25 现代自动车株式会社 Speech recognition system and speech recognition method
WO2017076222A1 (en) * 2015-11-06 2017-05-11 阿里巴巴集团控股有限公司 Speech recognition method and apparatus
US10741170B2 (en) 2015-11-06 2020-08-11 Alibaba Group Holding Limited Speech recognition method and apparatus
US11664020B2 (en) 2015-11-06 2023-05-30 Alibaba Group Holding Limited Speech recognition method and apparatus
CN108806691A (en) * 2017-05-04 2018-11-13 有爱科技(深圳)有限公司 Audio recognition method and system
CN108806691B (en) * 2017-05-04 2020-10-16 有爱科技(深圳)有限公司 Voice recognition method and system
CN107016996A (en) * 2017-06-06 2017-08-04 广东小天才科技有限公司 A kind of processing method and processing device of voice data
CN114071539A (en) * 2020-08-10 2022-02-18 中国电信股份有限公司 Voice quality evaluation method and device

Similar Documents

Publication Publication Date Title
CN103095911B (en) Method and system for finding mobile phone through voice awakening
US9570066B2 (en) Sender-responsive text-to-speech processing
US9553979B2 (en) Bluetooth headset and voice interaction control thereof
CN101141508B (en) communication system and voice recognition method
CN101320561A (en) Method and module for improving individual speech recognition rate
CN103426428B (en) Audio recognition method and system
US20080300870A1 (en) Method and Module for Improving Personal Speech Recognition Capability
US7689417B2 (en) Method, system and apparatus for improved voice recognition
US9202465B2 (en) Speech recognition dependent on text message content
TW557443B (en) Method and apparatus for voice recognition
CN101345819B (en) Speech control system used for set-top box
CN109074806A (en) Distributed audio output is controlled to realize voice output
US9997155B2 (en) Adapting a speech system to user pronunciation
KR20050098839A (en) Intermediary for speech processing in network environments
US20120150541A1 (en) Male acoustic model adaptation based on language-independent female speech data
KR102056330B1 (en) Apparatus for interpreting and method thereof
CN101082836A (en) Chinese characters input system integrating voice input and hand-written input function
JP2009104156A (en) Telephone communication terminal
CN103106061A (en) Voice input method and device
US10143027B1 (en) Device selection for routing of communications
CN115148185A (en) Speech synthesis method and device, electronic device and storage medium
CN113223542B (en) Audio conversion method and device, storage medium and electronic equipment
CN104427125A (en) Method and mobile terminal for answering call
KR100554442B1 (en) Mobile Communication Terminal with Voice Recognition function, Phoneme Modeling Method and Voice Recognition Method for the same
Furui Robust methods in automatic speech recognition and understanding.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20081210