CN1735924A

CN1735924A - Standard model creating device and standard model creating method

Info

Publication number: CN1735924A
Application number: CNA200380103867XA
Authority: CN
Inventors: 芳泽伸一
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2002-11-21
Filing date: 2003-11-18
Publication date: 2006-02-15

Abstract

The standard model creating apparatus which provides a high-precision standard model used for: pattern recognition such as speech recognition, character recognition, or image recognition using a probability model based on a hidden Markov model, Bayesian theory, or linear discrimination analysis; intention interpretation using a probability model such as a Bayesian net; data-mining performed using a probability model; and so forth, the apparatus comprising: a reference model preparing unit (102) operable to prepare at least one reference model; a reference model storing unit (103) operable to store the reference model (121) prepared by the reference model preparing unit (102); and a standard model creating unit (104) operable to create a standard model (122) by calculating statistics of the standard model so as to maximize or locally maximize the probability or likelihood with respect to the at least one reference model stored in the reference model storing unit (103).

Description

Master pattern producing device and master pattern method for making

Technical field

The present invention relates to a kind of producing device and method thereof of master pattern, be used for based on Hidden Markov Model (HMM), Bayesian logic, the linear voice recognition of judging the equiprobability model, character recognition, the identification of image recognition isotype, intention based on Bayesian network equiprobability model is understood (identification of intention), data acquisition (identification of data characteristic) based on probability model, person detecting based on probability model, finger print identifying, face authenticating, iris authentication (identifying object judges whether it is special object), the stock prediction, predictions such as weather forecasting (judging after the identification situation), synthesizing of a plurality of speaker's voice, synthetic (people discerns the model after synthesizing enjoyably) of a plurality of face-images etc. etc.

Background technology

In recent years, along with popularizing of the Internet etc., the high capacity of network, the low price of communications cost have been advanced.Therefore, by utilizing network, can collect a large amount of identification model (reference model).For example, with regard to speech recognition, can utilize the Internet download by numerous speech recognition of various research institutions issue with model (children with model, adult with model, the elderly with in model, the automobile with model, pocket telephone with model etc.).In addition, utilize the contact between the based on network equipment, the speech recognition by utilizations such as auto-navigation systems can be downloaded in televisor or the computer etc. with model.In addition, with regard to intention is understood, can collect the identification model of the various people's who has learnt various places experience by network.

In addition, along with the development of recognition technology, identification can be used to model in the large number quipments that the specifications such as cpu power, amount of memory such as remote controller, pocket telephone, auto-navigation system of computer, televisor have nothing in common with each other.In addition, can be used for security system etc. requires the application program of accuracy of identification or requires in numerous application programs that specification has nothing in common with each other as short application program of the time before requirements such as the remote controller operation output recognition result of televisor etc.

In addition, can under the different a lot of environment of identifying object, utilize recognition technology.For example, with regard to speech recognition, be used to discern under the multiple environment such as sound of children's sound, adult's sound, the elderly's sound or the sound in the identification automobile, pocket telephone.

Variation in view of these social environments, think by effectively applying flexibly a large amount of identification model (reference model), can make the specification that is suitable for equipment or application program, the high precision identification model (master pattern) that utilizes environment at short notice, offer the user.

In speech recognition isotype identification field, the method that the probability of use model is used as discerning the master pattern of usefulness receives much concern in recent years, especially widespread use Hidden Markov Model (HMM) (below be called HMM) or mixture gaussian modelling (below be called GMM).In addition, with regard to intention was understood, the method that the probability of use model is used as the master pattern of hoist pennants, knowledge, hobby etc. received much concern in recent years, and especially Bayesian network etc. in widespread use.In addition, in data collecting field, the method for representative model that the probability of use model is used as each classification for grouped data receives much concern in recent years, and GMM etc. in widespread use.In addition, in field of authentication such as speech recognition, finger print identifying, face authenticating, rainbow authentications, the method that the probability of use model is used as authenticating the master pattern of usefulness receives much concern, and is using GMM etc.As the learning algorithm of the master pattern that shows by HMM, be extensive use of the presuming method again of Bao Mu-Wei Erqi (Baum-Welch), (for example with reference to the holy work of modern well, " speech recognition ", the upright altogether Co., Ltd. that publishes of pp.150-152, Japan, distribution on November 25 nineteen ninety-five).In addition, as the learning algorithm of the master pattern that shows by GMM, be extensive use of EM (Expectation-Maximization) algorithm (for example with reference to works such as Gu Jingzhen ▲ ひろ , " voice messaging processing ", pp.100-104, Co., Ltd. is published in gloomy north, distribution on June 30th, 1998).In the EM algorithm, will

(formula 1)

Σ_{m = 1}^{M_{f}} ω_{f (m)} f (x; μ_{f (m)}, σ_{f (m)}^{2})

Wherein, (formula 2)

f (x; μ_{f (m)}, σ_{f (m)}^{2}) - - (m = 1,2, . . ., M_{f})

The expression Gaussian distribution,

(formula 3)

x＝(x ₍₁₎，x ₍₂₎，...，x _(J))∈R ^J

The input data of expression J (〉=1) dimension) in as the mixed weighting coefficient (formula 4) of statistic

ω _f(m) (m＝1，2，...，M _f)

, the mean value of J (〉=1) dimension

(formula 5)

μ _f(m)＝(μ _f(m，1)，μ _f(m，2)，...，μ _f(m，J))∈R ^J

(m＝1，2，...，M _f，j＝1，2，...，J)

And the variance yields of J (〉=1) dimension (being divided into J diagonal components of the matrix that looses)

(formula 6)

σ_{f (m)}^{2} = (σ_{f (m, 1)}^{2}, σ_{f (m, 2)}^{2}, . . ., σ_{f (m, J)}^{2}) &Element; R^{J}

(m＝1，2，...，M _f，j＝1，2，...，J)

, with (formula 7)

x[i]＝(x ₍₁₎[i]，x ₍₂₎[i]，...，x _(J)[i])∈R ^J (i＝1，2，...，N)

Make the likelihood corresponding with learning data

(formula 8)

\log P = Σ_{i = 1}^{N} \log [Σ_{m = 1}^{M_{f}} ω_{f (m)} f (x [i]; μ_{f (m)}, σ_{f (m)}^{2})]

Maximization or maximization,

Utilize (formula 9)

ω_{f (m)} = \frac{Σ_{i = 1}^{N} γ (x [i], m)}{Σ_{k = 1}^{M_{f}} Σ_{i = 1}^{N} γ (x [i], k)}

(m＝1，2，...，M _f)

(formula 10)

μ_{f (m, j)} = \frac{Σ_{i = 1}^{N} γ (x [i], m) x_{(j)}}{Σ_{i = 1}^{N} γ (x [i], m)}

(m＝1，2，...，M _f，j＝1，2，...，J)

(formula 11)

σ_{f (m, j)}^{2} = \frac{Σ_{i = 1}^{N} γ (x [i], m) {(x_{(j)} - μ_{f (m, j)})}^{2}}{Σ_{i = 1}^{N} γ (x [i], m)}

(m＝1，2，...，M _f，j＝1，2，...，J)

(wherein, (formula 12) is

γ (x [i], m) = \frac{ω_{f (m)} f (x [i]; μ_{f (m)}, σ_{f (m)}^{2})}{Σ_{k = 1}^{M_{f}} ω_{f (k)} f (x [i]; μ_{f (k)}, σ_{f (k)}^{2})} - - (m = 1,2, . . ., M_{f})

) more than double counting 1 time goes up, learn.

In addition, also proposed Bayes and inferred method methods such as (for example calculating man's work, " Bayesian statistics is crossed the threshold ", pp.42-53, Tokyo University publishing house, distribution on April 30th, 1985) with reference to numerous instructor.The presuming method again of Bao Mu-Wei Erqi, EM algorithm, Bayes infer any learning algorithm such as method all the parameter of basis of calculation model (statistic) come the production standard model so that the probability (likelihood) of maximization or maximization learning data.In these learning methods, realized making the mathematical optimization of probability (likelihood) maximization or maximization.

Under the situation of the making of the master pattern that above-mentioned learning method is used for speech recognition, for the change of the sound characteristic amount of various speaker of correspondence or noise etc., expectation is learnt master pattern with a plurality of voice datas.In addition, under the situation that is used to be intended to understand, for the change of multiple speaker of correspondence or situation etc., expectation is learnt master pattern with a plurality of data.In addition, under the situation that is used for the rainbow authentication, for the change of corresponding sunshine, camera position, rotation etc., expectation is learnt master pattern with a plurality of rainbow view data.But, under the situation of handling this mass data, because learn the expensive time, so can not provide master pattern to the user at short notice.In addition, the cost that is used to store mass data becomes very big.In addition, utilizing network to collect under the data conditions, it is very big that communications cost becomes.

On the other hand, the method for coming the production standard model by synthetic a plurality of models (below, will for the production standard model as being called ' reference model ' with reference to the model that be used for preparing) has been proposed.Reference model is the probability Distribution Model that the population parameter (average, dispersion etc.) with probability distribution shows a plurality of learning datas, the feature of coming intensive a plurality of learning datas with the parameter (population parameter) of minority.In the prior art shown below, model shows with Gaussian distribution.

In the 1st existing method, reference model is showed by GMM, synthesizes the GMM of a plurality of reference models, production standard model (for example the spy opens disclosed technology in the flat 4-125599 communique) by weighting.

In addition, in the 2nd existing method, be on the 1st existing mode basis, also maximize or maximize by probability (likelihood) to learning data, study is come production standard model (for example the spy opens disclosed technology in the flat 10-268893 communique) thus by the mixed weighting of linear combination.

In addition, in the 3rd existing method, by the mean value with the incompatible performance standard model of mean value linear junction of reference model, maximization or maximization are learnt linear attachment coefficient, thus the production standard model to the probability (likelihood) of input data.Here, the speech data that uses specific speaker is as learning data, use speaker's adaptive model that master pattern uses as speech recognition (M.J.F.Gales for example; " ClusterAdaptive Training For Speech Recognition "; 1998; ICSLP98 gives the original text collection, pp.1783-1786).

In addition, in the 4th existing method, show reference model with single Gaussian distribution, in the Gaussian distribution of synthetic a plurality of reference models, unify to belong to of a sort Gaussian distribution by troop (clustering), production standard model (for example the spy opens disclosed technology in the flat 9-81178 communique).

In addition, in the 5th existing method, a plurality of reference models are distributed by the mixed Gaussian with the mixed number of quantity and show, and give corresponding one to one continuous sequence number to each Gaussian distribution.By synthetic Gaussian distribution, production standard model with same continuous sequence number.Synthetic a plurality of reference models are models of making near user's speaker by on the sound equipment, the master pattern of making be speaker's adaptive model (for example, 6 people such as scented hair oil used by women in former times, " use teacherless learning's method of the harmonious sounds model of sufficient statistic and speaker's distance ", on March 1st, 2002, electronic information communication association, Vol.J85-D-II, No.3, pp.382-389).

But in the 1st existing method, when synthetic reference model quantity increased, the mixed number of master pattern also increased, and the memory capacity that master pattern is used, that quantitative change is handled in identification is huge, impracticable.In addition, can not corresponding specification come control criterion model mix number.This problem is considered to follow the increase of synthetic reference model quantity and becomes remarkable.

In the 2nd existing method, when synthetic reference model quantity increased, the mixed number of master pattern also increased, and the memory capacity that master pattern is used, that quantitative change is handled in identification is huge, impracticable.In addition, can not corresponding specification come control criterion model mix number.In addition because master pattern be reference model simple mixing and, the parameter of study is defined to mixed weighting, so can not make the high precision standard model.In addition, with regard to the making of master pattern,, spend learning time owing to use a plurality of learning datas to learn.These problems are considered to follow the increase of synthetic reference model quantity and become remarkable.

In the 3rd existing method, because the parameter of study is defined to the linear attachment coefficient of the mean value of reference model, so can not make the high precision standard model.In addition, with regard to the making of master pattern,, spend learning time owing to use a plurality of learning datas to learn.

In the 4th existing method, troop because exploringly carry out, so be difficult to make the high precision standard model.In addition, because reference model is single Gaussian distribution, so precision is low, the precision of their master pattern of unification is low.The problem that relates to accuracy of identification is considered to follow the increase of synthetic reference model quantity and becomes remarkable.

In the 5th existing method, though come the production standard model by synthetic Gaussian distribution with same continuous sequence number, in order to make best master pattern, general synthetic Gaussian distribution is not limited to corresponding one to one, so accuracy of identification is low.In addition, have at a plurality of reference models under the situation of different mixed number, can not the production standard model.In addition, the general not Gaussian distribution in reference model is given continuous sequence number, and this moment can not the production standard model.In addition, can not corresponding specification come control criterion model mix number.

Summary of the invention

Therefore, the present invention makes in view of such problem, its purpose is to provide a kind of master pattern producing device of making the high precision master pattern etc., this master pattern is used for based on the voice recognition of Hidden Markov Model (HMM), Bayesian logic, linear discriminatory analysis equiprobability model, character recognition, the identification of image recognition isotype, intention based on Bayesian network equiprobability model is understood (identification of intention), based on the data acquisition (identification of data characteristic) of probability model, and prediction (judging after the identification situation) such as stock prediction, weather forecasting etc.

In addition, the present invention also aims to provide a kind of master pattern producing device etc., do not need to learn data or teacher's data of usefulness, simply the production standard model.

And, the present invention also aims to provide the master pattern producing device of a kind of versatility and good springiness, the master pattern that making adapts to the identifying object that utilizes master pattern, or make using master pattern to carry out the specification of the device of discerning processing or the master pattern of environmental adaptation.

The what is called of using among the present invention ' identification ' is not only the identification under the narrow sense implication such as speech recognition, and refer to also that parameter matching, identification, authentication, Bayes infer or prediction etc. utilized the whole identifications by the master pattern of probability performance.

To achieve these goals, master pattern producing device of the present invention is the device that is produced as follows master pattern, this master pattern is by the defined identification model of the output probability of the transformation between the set of incident and incident or incident, it is characterized in that: possess: the reference model storage unit, the storage conduct is used to discern special object and the more than one reference model of prior pattern of making; With the master pattern production unit, by the statistic of basis of calculation model, the production standard model is stored in the probability or the likelihood of the more than one reference model in the described reference model storage unit relatively with maximization or maximization master pattern.

For example, as speech recognition master pattern producing device, it is a kind of master pattern producing device, use is showed the probability model of the frequency reference of expression phonetic feature by output probability, make the speech recognition master pattern that expression has the phonetic feature of particular community, it is characterized in that: possess the reference model storage unit, storage has the above reference model of probability model of the phonetic feature of certain attribute as expression; With the master pattern production unit, be stored in the statistic of the more than one reference model in the described reference model storage unit by use, calculate the statistic of described master pattern, the production standard model, described master pattern production unit has master pattern structure determination portion, determines the structure of the master pattern of making; Primary standard modelling portion determines the specific statistic initial value of having determined the master pattern of structure; Infer portion with statistic, infer the statistic of calculating described master pattern, determined of probability or the likelihood of the master pattern of initial value described reference model with maximization or maximization.

Thus, because the statistic of basis of calculation model, the production standard model, probability or likelihood with maximization or the more than one relatively reference model of maximization master pattern, so needn't speech data etc. learning data or teacher's data, the production standard model simultaneously, can be made the unified high precision master pattern of reconnoitring a plurality of reference models of having made simply.

Here, best described master pattern production unit also possesses the reference model preparatory unit, carry out after reference model is obtained in the outside, be stored in the described reference model storage unit and make reference model after one of be stored in the described reference model storage unit at least.For example, a kind of master pattern producing device, as long as be applicable under the situation that speech recognition is used, then use the probability model that shows the frequency reference of expression phonetic feature by output probability, make the speech recognition master pattern that expression has the phonetic feature of particular community, it is characterized in that: possess the reference model storage unit, storage has the above reference model of probability model of the phonetic feature of certain attribute as expression; The reference model preparatory unit, carry out after reference model is obtained in the outside, be stored in the described reference model storage unit and make new reference model after one of be stored in the described reference model storage unit at least; With the master pattern production unit, the statistic initial value that has the corresponding master pattern of regulation structure by preparation, and use the statistic of described reference model, calculate the statistic of this master pattern, come the production standard model thus, be stored in the probability or the likelihood of the more than one reference model in the described reference model storage unit with maximization or maximization master pattern relatively.

Thus,, come the production standard model, so can realize corresponding to the high master pattern producing device of the versatility of various identifying objects according to the reference model of obtaining because can obtain new reference model from the outside of master pattern producing device.

In addition, described master pattern producing device also possess make as the information that relates to identifying object utilize information utilize the information issuing unit; With the reference model selected cell, according to the described information of utilizing of making, in the reference model from be stored in described reference model storage unit, select more than one reference model, described master pattern production unit is calculated the statistic of described master pattern, with the maximization or probability or the likelihood of described master pattern with respect to the reference model of described reference model selected cell selection that maximize.

Thus, according to user's feature, user's age, sex, utilize environment etc. to utilize information, from a plurality of reference models of preparing, only select to be suitable for the reference model of identifying object, make the master pattern of unified these reference models, so can make by the high master pattern of the precision of identifying object specialization.

Here, described master pattern producing device also possesses the similar degree judging unit, calculates the described similar degree of information that utilizes information and the reference model that relates to selection, judges whether described similar degree is more than the defined threshold, makes and judges signal.

Thus, under the reference model that utilizes information near (close) is not present in situation in the reference model storage unit, can carry out the preparation request of reference model.

In addition, on described master pattern producing device, through communication path connecting terminal device, described master pattern producing device also possesses from described end device reception as the information receiving unit that utilizes that utilizes information that relates to the information of identifying object; With the reference model selected cell, according to the described information of utilizing that receives, in the reference model from be stored in described reference model storage unit, select more than one reference model, described master pattern production unit is calculated the statistic of described master pattern, with the maximization or probability or the likelihood of described master pattern with respect to the reference model of described reference model selected cell selection that maximize.

Thus, because according to can coming the production standard model,, simultaneously, can realize constructing based on the recognition system of communication system so can generate master pattern by far distance controlled through the information of utilizing that communication path sends.

In addition, described master pattern producing device also possesses the specification information production unit, making is as the specification information of the information of the specification of the master pattern that relates to making, the specification information that described master pattern production unit is made according to described specification information production unit, calculate the statistic of described master pattern, with the maximization or probability or the likelihood of described master pattern that maximize with respect to described reference model.

Thus, because come the production standard model according to the cpu power of the device that uses master pattern, memory capacity, the accuracy of identification of requirement, the specification information such as identification processing time of requirement, so can generate the master pattern that satisfies the specific standard condition, realize being suitable for the generation that the master pattern of required resource environment is handled in identification such as computing engines.

Here, described specification information for example can be the information of expression corresponding to the specification of the application program kind of using master pattern.In addition, described master pattern producing device also possesses the specification information holding unit, the corresponding application program specification correspondence database of the application program of maintenance expression use master pattern and the specification of master pattern, as described specification information, in the application program specification correspondence database of described master pattern structure determination portion from remain on described specification information holding unit, read specification corresponding to the application program of starting, and according to the specification of reading, calculate the statistic of described master pattern, with the maximization or probability or the likelihood of described master pattern that maximize with respect to described reference model.

Thus, because according to coming the production standard model, so can make the master pattern that is suitable for each application program most, the accuracy of identification of the recognition system of raising use master pattern etc. corresponding to the specification of each application program.

In addition, on described master pattern producing device, through communication path connecting terminal device, described master pattern producing device also possesses from the specification information receiving element of described end device reception as the specification information of the information of the master pattern specification that relates to making, the specification information that described master pattern production unit receives according to described specification information receiving element, calculate the statistic of described master pattern, with the maximization or probability or the likelihood of described master pattern that maximize with respect to described reference model.

Thus, because according to can coming the production standard model,, simultaneously, can realize constructing based on the recognition system of communication system so can generate master pattern by far distance controlled through the specification information that communication path sends.

For example, also available 1 more than one Gaussian distribution shows described reference model and described master pattern, and described master pattern production unit is determined the mixed distribution number (quantity of Gaussian distribution) of described master pattern according to described specification information.

Thus, dynamically determine to be contained in the mixed distribution number of the Gaussian distribution in the master pattern of making, can be corresponding to carrying out the environment that identification handles or requiring specification to wait the structure of control criterion model.As an example, under the little situation of the cpu power of the recognition device that uses master pattern, under the little situation of memory capacity, inferior in the situation that the identification processing time that requires is short, can set the mixed distribution number of master pattern few, with up to specification, on the other hand, in the high situation of the accuracy of identification that requires etc. down, can set the mixed distribution number many, improve accuracy of identification.

In addition, state in the use under the situation of utilizing information or specification information to come the production standard model, may not need the reference model preparatory unit.This is because for example, according to user's request, or irrelevant with user's request, under the state that is stored in reference to model in advance in the master pattern producing device, the master pattern producing device is dispatched from the factory, use and utilize information or specification information to come the production standard model.

In addition, described reference model and described master pattern use more than one Gaussian distribution to show, the different reference model of the mixed distribution number of described at least one pair of reference model of reference model cell stores (Gaussian distribution number), described master pattern production unit is calculated the statistic of described master pattern, with the probability or the likelihood of maximization or the described master pattern that the maximizes reference model different with respect to the mixed distribution number (Gaussian distribution number) of at least one pair of reference model.

Thus, because come the production standard model,, realize being more suitable in the making of the high precision master pattern of identifying object so can come the production standard model according to the reference model of cut-and-dried various structure according to the different reference model of mixed distribution number.

In addition, described master pattern producing device also possesses the master pattern storage unit, stores the master pattern that described master pattern production unit is made.

Thus, the master pattern that temporary transient buffering is made can be realized the effect as the data server that provides to other device at sending request output at once.

In addition, on described master pattern producing device, through communication path connecting terminal device, described master pattern producing device also possesses the master pattern transmitting element that sends the master pattern of described master pattern production unit making to described end device.

Thus, because the master pattern of making is sent to the external device (ED) that is arranged at separated part on the space,, make engine, or make the master pattern producing device as the server in the communication system as master pattern so can make this master pattern producing device independent.

In addition, on described master pattern producing device, through communication path connecting terminal device, described master pattern producing device also possesses the reference model receiving element of reception from the reference model of described end device transmission, described master pattern production unit is calculated the statistic of described master pattern, the probability or the likelihood of the reference model that receives with respect to described at least reference model receiving element with the maximization or the described master pattern that maximizes.

Thus,, use the reference model that sends to come the production standard model, so can realize further being suitable for the making of the high precision master pattern of identifying object because can send the reference model that utilizes environment that keeps near end device through communication path.As an example, remain in the end device at the reference model A that user A is utilized under environment A, user A wants under situation about utilizing under the environment B, by utilizing reference model A to come the production standard model, can make the high precision master pattern of reflection user A feature.

In addition, described reference model preparatory unit also carry out described reference model cell stores reference model renewal and one of append at least.For example, on described master pattern producing device, through communication path connecting terminal device, described master pattern producing device also possesses the reference model receiving element of reception from the reference model of described end device transmission, the reference model that described reference model preparatory unit uses described reference model receiving element to receive, carry out described reference model cell stores reference model renewal and one of append at least.

Thus, because the appending of the reference model of carry out preparing, renewal etc., be reference model so the model that various identifying objects are used can be appended, or be replaced as more high-precision reference model, can carry out according to the reference model after upgrading and come the regeneration master pattern, maybe with the master pattern that generates as reference model, production standard model etc. is based on the study of feedback etc. again.

In addition, described master pattern production unit also can constitute the master pattern structure determination portion with master pattern structure of determining making; Primary standard modelling portion determines the specific statistic initial value of having determined the described master pattern of structure; Infer portion with statistic, infer and calculate the statistic of described master pattern, with the maximization or probability or the likelihood of described master pattern that maximize with respect to described reference model.At this moment, described primary standard modelling portion also can use the described statistic portion of inferring to be used for described reference model basis of calculation modeling statistics amount, more than one, determines the statistic initial value of specific described master pattern.For example, described primary standard modelling portion also can determine described initial value according to the classification ID of criterion of identification version.Particularly, described primary standard modelling portion also can keep representing described classification ID and the corresponding corresponding tables of described initial value with described reference model, determines described initial value according to described corresponding tables.

Thus, provide classification ID, can use final essential master pattern and have the primary standard model of common property, so can make the high precision standard model by each kind to the identifying object that uses master pattern.

As mentioned above, by the present invention, a kind of high precision standard model can be provided, be used for based on Hidden Markov Model (HMM), Bayesian logic, the linear voice recognition of judging the equiprobability model, character recognition, the identification of image recognition isotype, intention based on Bayesian network equiprobability model is understood (identification of intention), data acquisition (identification of data characteristic) based on probability model, person detecting based on probability model, finger print identifying, face authenticating, rainbow authentication (identifying object, judge whether it is special object), the stock prediction, predictions such as weather forecasting (judging after the identification situation) etc., its practical value is high.

In addition, the present invention not only can be used as this master pattern producing device and realizes, also can be used as the master pattern method for making that the feature inscape that the master pattern producing device is possessed is made as step and realize, can be used as the program that allows computing machine carry out these steps and realize.In addition, much less, also can be through transmission mediums such as recording mediums such as CD-ROM or the Internet this program of providing and delivering.

Description of drawings

Fig. 1 is the whole block diagram that constitutes of server of the master pattern producing device of expression the present invention the 1st embodiment.

Fig. 2 is the process flow diagram of the action step of this server of expression.

Fig. 3 is the figure that expression is stored in the reference model example in the reference model storage part of Fig. 1.

Fig. 4 is the process flow diagram of the detailed step of the step S101 (making of master pattern) in the presentation graphs 2.

Fig. 5 is the figure that the 1st of key diagram 1 is similar to the approximate treatment of the 104e of portion execution.

Picture when Fig. 6 is expression selection reference model shows the figure of example.

Fig. 7 (a) be expression specify to make master pattern structure (mixed distribution number) time picture show that the figure of example, Fig. 7 (b) they are the figure that the picture of expression when selecting specification information shows example.

Fig. 8 is the figure that the picture of the progress situation when the production standard model is shown shows example.

Fig. 9 is the block diagram that the integral body of STB of the master pattern producing device of expression the present invention the 2nd embodiment constitutes.

Figure 10 is the process flow diagram of the action step of this STB of expression.

Figure 11 is the figure of the reference model example stored in the reference model storage part of expression Figure 10.

Figure 12 is the figure that the 2nd of explanation Figure 10 is similar to the approximate treatment of portion's execution.

Figure 13 is the whole block diagram that constitutes of PDA that the master pattern producing device of expression the present invention the 3rd embodiment relates to.

Figure 14 is the process flow diagram of the action step of this PDA of expression.

Figure 15 is the figure that expression is stored in the reference model example in the reference model storage part of Figure 13.

Figure 16 represents selection picture one example of this PDA.

Figure 17 is the schematic diagram that statistic that the statistic of expression among Figure 13 inferred portion is inferred step.

Figure 18 is the figure of the approximate treatment of the 3rd approximate portion execution among explanation Figure 13.

Figure 19 is the whole block diagram that constitutes of server of the master pattern producing device of expression the present invention the 4th embodiment.

Figure 20 is the process flow diagram of the action step of this server of expression.

Figure 21 is the reference model used of the action step of this server of expression explanation and the figure of master pattern one example.

Picture when Figure 22 is expression input as the personal information of utilizing information shows the figure of example.

Figure 23 is the whole block diagram that constitutes of server of the master pattern producing device of expression the present invention the 5th embodiment.

Figure 24 is the process flow diagram of the action step of this server of expression.

Figure 25 is the reference model used of the action step of this server of expression explanation and the figure of master pattern one example.

The whole block diagram that constitutes of the server of the master pattern producing device of Figure 26 the present invention the 6th embodiment.

Figure 27 is the process flow diagram of the action step of this server of expression.

Figure 28 is the reference model used of the action step of this server of expression explanation and the figure of master pattern one example.

Figure 29 is the whole block diagram that constitutes of server of the master pattern producing device of expression the present invention the 7th embodiment.

Figure 30 is the process flow diagram of the action step of this server of expression.

Figure 31 is the reference model used of the action step of this server of expression explanation and the figure of master pattern one example.

Figure 32 is the whole block diagram that constitutes of master pattern producing device of expression the present invention the 8th embodiment.

Figure 33 is the process flow diagram of the action step of expression pocket telephone 901.

Figure 34 is the figure that expression is stored in reference model one example in the reference model storage part.

Figure 35 is the figure that represents to be stored in again reference model one example in the reference model storage part.

Figure 36 is that the picture that expression is made when utilizing information shows routine figure.

Picture when Figure 37 is expression preparation reference model shows the figure of example.

Figure 38 is the curve that the identification experimental result of the master pattern that uses the making of the 3rd approximate portion has been adopted in expression.

Figure 39 is the curve of expression by the identification experimental result of the master pattern of the 2nd approximate portion making of the 3rd embodiment.

Figure 40 is the whole block diagram that constitutes of master pattern producing device of expression the present invention the 9th embodiment.

Figure 41 is the figure of the data instance of expression application program, specification information correspondence database.

Figure 42 is the process flow diagram of the action step of expression PDA1001.

Figure 43 is the figure that expression is stored in reference model one example in the reference model storage part.

Figure 44 is method is determined in expression based on the initial value of trooping of primary standard modelling portion a process flow diagram.

Figure 45 is the figure of concrete example of the step S1004 of expression Figure 44.

Figure 46 is the figure of concrete example of the step S1005 of expression Figure 44.

Figure 47 is the figure of concrete example of the step S1006 of expression Figure 44.

Figure 48 is the figure of concrete example of the step S1008 of expression Figure 44.

Figure 49 is the whole block diagram that constitutes of server of the master pattern producing device of expression the present invention the 10th embodiment.

Figure 50 is the process flow diagram of the action step of this server of expression.

Figure 51 is the concrete figure that is suitable for the system example of master pattern producing device of the present invention of expression.

Figure 52 is the figure of the example of presentation class ID, primary standard model, reference model corresponding tables.

Figure 53 is the figure of reference model 8AA-AZ example of classification ID, primary standard model, the reference model corresponding tables of expression Figure 52.

Figure 54 is the figure of reference model 64ZA-ZZ example of classification ID, primary standard model, the reference model corresponding tables of expression Figure 52.

Figure 55 is the figure of primary standard model 8A-64Z example of classification ID, primary standard model, the reference model corresponding tables of expression Figure 52.

Figure 56 is the process flow diagram of the method for making of presentation class ID, primary standard model, reference model corresponding tables.

Figure 57 is the figure of instantiation of the step S1100 of expression Figure 56.

Figure 58 is the figure of instantiation of the step S1102 of expression Figure 56.

Figure 59 is the figure of instantiation of the step S1103 of expression Figure 56.

Figure 60 is the figure of instantiation of the step S1104 of expression Figure 56.

Figure 61 is the step of classification ID, primary standard model, reference model corresponding tables is finished in expression by communicating by letter between terminal and the server figure.

Figure 62 is that expression uses the primary standard model of classification ID, primary standard model, reference model corresponding tables to determine the process flow diagram of method.

Figure 63 is the figure of the concrete example of the step S1105 among expression Figure 62.

Figure 64 is the curve that the identification experimental result of the master pattern that uses the making of the 3rd approximate portion has been adopted in expression.

Figure 65 (a)-(j) is the attribute of expression speech recognition object and the figure that concerns example of master pattern structure (mixed number of Gaussian distribution).

Embodiment

Below, describe embodiments of the present invention in detail with reference to accompanying drawing.In addition, the identical or additional same-sign of considerable part in figure, not repeat specification.

(the 1st embodiment)

Fig. 1 is the block diagram that the integral body of the master pattern producing device of expression the present invention the 1st embodiment constitutes.Here, example in the server 101 that the master pattern producing device that will the present invention relates to is assembled in computer system is shown.In the present embodiment, be that example describes to make the speech recognition that expression has the phonetic feature of particular community with the situation of master pattern.

Server 101 is computer installations in the communication system etc., as making, possess the portion of reading in 111, reference model preparation portion 102, reference model storage part 103, master pattern preparing department 104 and write section 112 by of the set of incident and the master pattern producing device of the defined speech recognition of of the Hidden Markov Model (HMM) master pattern of the output probability performance of the transfer between incident or incident.

Read in portion 111 read in the memory devices such as being written to CD-ROM children with reference model, adult use reference model with reference model, the elderly.Reference model preparation portion 102 sends to reference model storage part 103 with the reference model 121 that reads in.3 reference models 121 of reference model storage part 103 storages.Here, the model of the prior making that so-called reference model is referenced when being the production standard model (be the speech recognition model here, i.e. expression has the probability model of the phonetic feature of regulation attribute).

Master pattern preparing department 104 is handling parts of production standard model 122, make probability or likelihood maximization or maximization for 3 (Ng=3) reference models 121 of reference model storage part 103 storages, this master pattern preparing department 104 comprises: the master pattern structure determination portion 104a of the structure of the model that settles the standard (mixed number of Gaussian distribution etc.); The primary standard modelling 104b of portion by being identified for the statistic initial value of basis of calculation model, makes the primary standard model; The statistic storage part 104c of the primary standard model that storage is determined; Infer the 104d of portion with statistic, by calculating statistic (generating final master pattern) to being stored in the primary standard model among the statistic storage part 104c, the approximate treatment of use the 1st approximate 104e of portion etc., this statistic makes for the probability that is stored in 3 (Ng=3) reference models 121 in the reference model storage part 103 or likelihood maximization or maximization.In addition, so-called statistic is meant the parameter of specific criteria model, is mixed weighting coefficient, mean value, variance yields here.

The master pattern 122 that write section 112 is made master pattern preparing department 104 writes in the memory device such as CD-ROM.

Below, the action of the server 101 of above-mentioned formation is described.

Fig. 2 is the process flow diagram of the action step of expression server 101.

At first, before the production standard model, prepare to become the reference model (step S100) of its benchmark.Promptly, read in the children that portion 111 reads in the memory devices such as being written to CD-ROM and use reference model with reference model, the elderly with reference model, adult, reference model preparation portion 102 sends to reference model storage part 103 with the reference model 121 that reads in, 3 reference models 121 of reference model storage part 103 storages.

Reference model 121 is made of the HMM of each phoneme.One example of reference model shown in Fig. 3 121.Here, children use reference model with reference model, the elderly with reference model, adult synoptic diagram (in addition, among this figure, having omitted the synoptic diagram of the elderly with reference model) is shown.The whole status number of these 3 reference models is 3, under each state, is that the output that constitutes HMM that distributes of 3 mixed Gaussian distributes by the mixed distribution number.As characteristic quantity, use cepstrum (cepstrum) coefficient of 12 dimensions (J=12).

Then, master pattern preparing department 104 makes and makes for the probability of 3 reference models 121 of storage in the reference model storage part 103 or the master pattern 122 (step S101) of likelihood maximization or maximization.

At last, write section 112 master pattern 122 that master pattern preparing department 104 is made writes in the memory device such as CD-ROM (step S102).Write master pattern in the memory devices such as CD-ROM and be used as the speech recognition master pattern of having considered children, adult, the elderly.

At first, the master pattern structure determination portion 104a structure (step S102a) of model that settles the standard.Here, as the structure of master pattern, being made of the HMM of each phoneme, is 3 states, and the mixed number that the output of each state is distributed is defined as 3 (Mf=3).

Then, the 104b of primary standard modelling portion is identified for the statistic initial value (step S102b) of basis of calculation model.Here, the model in 3 reference model integration to Gaussian distribution that are stored in the reference model storage part 103 is made as the statistic initial value, this initial value is stored among the statistic storage part 104c as the primary standard model using statistical treatment to calculate.

Particularly, the 104b of primary standard modelling portion generates the output distribution shown in the following formula 13 respectively to above-mentioned 3 state I (I=1,2,3).In addition, the Mf in the formula (mixed number of Gauss analysis) here is 3.

(formula 13)

Σ_{m = 1}^{M_{f}} ω_{f (m)} f (x; μ_{f (m)}, σ_{f (m)}^{2})

Wherein, (formula 14)

f (x; μ_{f (m)}, σ_{f (m)}^{2}) - - (m = 1,2, . . ., M_{f})

The expression Gaussian distribution,

(formula 15)

x＝(x ₍₁₎，x ₍₂₎，...，x _(J))∈R ^J

The LPC cepstral coefficients of expression 12 dimensions (J=12),

(formula 16)

ω _f(m)(m＝1，2，...，M _f)

The mixed weighting coefficient of representing each Gaussian distribution,

(formula 17)

μ _f(m)＝(μ _f(m，1)，μ _f(m，2)，...，μ _f(m，J))∈R ^J (m＝1，2，...，M _f)

The mean value of representing each Gaussian distribution,

(formula 18)

σ_{f (m)}^{2} = (σ_{f (m, 1)}^{2}, σ_{f (m, 2)}^{2}, . . ., σ_{f (m, J)}^{2}) &Element; R^{J} - - (m = 1,2, . . ., M_{f})

The variance yields of representing each Gaussian distribution.

In addition, statistic is inferred the 104d of portion and is used 3 reference models 121 that are stored in the reference model storage part 103, infers the statistic (step S102c) that is stored in the master pattern among the statistic storage part 104c.

Particularly, inferring master pattern distributes, is the probability that distributes of the output shown in the following formula 19 or the statistic (variance yields shown in mean value shown in the mixed weighting coefficient shown in the following formula 16, the following formula 17 and the following formula 18) of likelihood (the likelihood logP shown in the following formula 25) maximization or maximized master pattern for the output of 3 (Ng=3) reference model 121 each states (I=1,2,3).

Formula (19)

Σ_{l = 1}^{L_{g (i)}} {&upsi;}_{g (i, l)} g (x; μ_{g (i, l)}, σ_{g (i, l)}^{2}) - - (i = 1,2, . . ., N_{g})

Wherein, (formula 20)

g (x; μ_{g (i, l)}, σ_{g (i, l)}^{2}) - - (i = 1,2, . . ., N_{g}, l = 1,2, . . ., L_{(i)})

The expression Gaussian distribution,

(formula 21)

L _g(i)(i＝1，2，...，N _g)

The mixed distribution number (being 3 here) of representing each reference model,

(formula 22)

υ _g(i，l) (l＝1，2，...，L _g(i))

The mixed weighting coefficient of representing each Gaussian distribution,

(formula 23)

μ _g(i，l) (l＝1，2，...，L _g(i))

The mean value of representing each Gaussian distribution,

(formula 24)

σ_{g (i, l)}^{2} - - (l = 1,2, . . ., L_{g (i)})

The variance yields of representing each Gaussian distribution.

(formula 25)

\log P = Σ_{i = 1}^{N_{g}} {&Integral;}_{- \infty}^{\infty} \log [Σ_{m = 1}^{M_{f}} ω_{f (m)} f (x; μ_{f (m)}, σ_{f (m)}^{2})] {Σ_{l = 1}^{L_{g (i)}} {&upsi;}_{g (i, l)} g (x; μ_{g (i, l)}, σ_{g (i, l)}^{2})} dx

In addition, according to following formula 26, formula 27 and formula 28, calculate mixed weighting coefficient, mean value and the variance yields of master pattern respectively.

(formula 26)

ω_{f (m)} = \frac{Σ_{i = 1}^{N_{g}} {&Integral;}_{- \infty}^{\infty} γ (x, m) {Σ_{l = 1}^{L_{g (i)}} {&upsi;}_{g (i, l)} g (x; μ_{g (i, l)}, σ_{g (i, l)}^{2})} dx}{Σ_{k = 1}^{M_{f}} Σ_{i = 1}^{N_{g}} {&Integral;}_{- \infty}^{\infty} γ (x, k) {Σ_{l = 1}^{L_{g (i)}} {&upsi;}_{g (i, l)} g (x; μ_{g (i, l)}, σ_{g (i, l)}^{2})} dx}

(m＝1，2，...，M _f)

(formula 27)

μ_{f (m, j)} = \frac{Σ_{i = 1}^{N_{g}} {&Integral;}_{- \infty}^{\infty} γ (x, m) x_{(j)} {Σ_{l = 1}^{L_{g (i)}} {&upsi;}_{g (i, l)} g (x; μ_{g (i, l)}, σ_{g (i, l)}^{2})} dx}{Σ_{i = 1}^{N_{g}} {&Integral;}_{- \infty}^{\infty} γ (x, m) {Σ_{l = 1}^{L_{g (i)}} {&upsi;}_{g (i, l)} g (x; μ_{g (i, l)}, σ_{g (i, l)}^{2})} dx}

(m＝1，2，...，M _f，j＝1，2，...，J)

(formula 28)

σ_{f (m, j)}^{2} = \frac{Σ_{i = 1}^{N_{g}} {&Integral;}_{- \infty}^{\infty} γ (x, m) {(x_{(j)} - μ_{f (m, j)})}^{2} {Σ_{l = 1}^{L_{g (i)}} {&upsi;}_{g (i, l)} g (x; μ_{g (i, l)}, σ_{g (i, l)}^{2})} dx}{Σ_{i = 1}^{N_{g}} {&Integral;}_{- \infty}^{\infty} γ (x, m) {Σ_{l = 1}^{L_{g (i)}} {&upsi;}_{g (i, l)} g (x; μ_{g (i, l)}, σ_{g (i, l)}^{2})} dx}

(m＝1，2，...，M _f，j＝1，2，...，J)

At this moment, infer the 1st of the 104d of portion by statistic and be similar to the 104e of portion, use the approximate expression shown in the following formula 29.

(formula 29)

γ (x, m) = \frac{ω_{f (m)} f (x; μ_{f (m)}, σ_{f (m)}^{2})}{Σ_{k = 1}^{M_{f}} ω_{f (k)} f (x; μ_{f (k)}, σ_{f (k)}^{2})} \approx \frac{ω_{f (m)} f (x; μ_{f (m)}, σ_{f (m)}^{2})}{u_{h (m)} h (x; μ_{h (m)}, σ_{h (m)}^{2})}

(m＝1，2，...，M _f)

Wherein,

(formula 30)

u_{h (m)} h (x; μ_{h (m)}, σ_{h (m) 2}) - - (m = 1,2, . . ., M_{f})

Expression will

(formula 31)

u _h(m)(m＝1，2，...，M _f)

(formula 32)

μ _{H (m)}=(μ _{H (m, 1)}, μ _{H (m, 2)}..., μ _{H (m, J)}) ∈ R ^JAs mean value,

With (formula 33)

σ_{h (m)}^{2} = (σ_{h (m, 1)}^{2}, σ_{h (m, 2)}^{2}, . . ., σ_{h (m, J)}^{2}) &Element; R^{J}

Single Gaussian distribution as variance yields.

In addition, the 1st approximate 104e of portion calculates weighting (formula 31), mean value (formula 32) and the variance yields (formula 33) of the single Gaussian distribution shown in the following formula 30 respectively according to the formula shown in following formula 34, formula 35 and the formula 36.

(formula 34)

u_{h (m)} = Σ_{p = 1}^{M_{f}} ω_{f (m, p)} = Σ_{p = 1}^{M_{f}} ω_{f (p)} = 1.0 - - (m = 1,2, . . ., M_{f})

(formula 35)

μ_{h (m, j)} = \frac{Σ_{p = 1}^{M_{f}} ω_{f (m, p)} μ_{f (m, p, j)}}{Σ_{p = 1}^{M_{f}} ω_{f (m, p)}} = \frac{Σ_{p = 1}^{M_{f}} ω_{f (p)} μ_{f (p, j)}}{Σ_{p = 1}^{M_{f}} ω_{f (p)}}

(m＝1，2，...，M _f，j＝1，2，...，J)

(formula 36)

σ_{h (m, j)}^{2} = \frac{Σ_{p = 1}^{M_{f}} ω_{f (m, p)} (σ_{f (m, p, j)}^{2} + μ_{f (m, p, j)}^{2})}{Σ_{p = 1}^{M_{f}} ω_{f (m, p)}} - μ_{h (m, j)}^{2}

= \frac{Σ_{p = 1}^{M_{f}} ω_{f (p)} (σ_{f (p, j)}^{2} + μ_{f (p, j)}^{2})}{Σ_{p = 1}^{M_{f}} ω_{f (p)}} - μ_{h (m, j)}^{2}

(m＝1，2，...，M _f，j＝1，2，...，J)

Fig. 5 is the figure of the approximate treatment of explanation the 1st approximate 104e of portion.The 1st approximate 104e of portion uses the whole mixed Gaussians that constitute master pattern to distribute as shown in the figure, determines the single Gaussian distribution (formula 30) in the approximate expression shown in the following formula 29.

If take all factors into consideration the approximate expression of the above the 1st approximate 104e of portion, it is as follows that then statistic is inferred the calculating formula of the 104d of portion.That is, statistic is inferred the 104d of portion according to following formula 37, formula 38 and formula 39, calculates mixed weighting coefficient, mean value and variance yields respectively, is stored among the statistic storage part 104c.Afterwards, this statistic infers with to repeat R (〉=1) to the storage of statistic storage part 104c inferior.Its result exports the statistic that obtains as the statistic of the final master pattern 122 that generates.

(formula 37)

ω_{f (m)} = \frac{Σ_{i = 1}^{N_{g}} Π_{j = 1}^{J} Σ_{l = 1}^{L_{g (i)}} A_{(m, l, i, j)}}{Σ_{i = 1}^{N_{g}} Σ_{k = 1}^{M_{f}} Π_{j = 1}^{J} Σ_{l = 1}^{L_{g (i)}} A_{(k, l, i, j)}} - - (m = 1,2, . . ., M_{f})

A_{(m, l, i, j)} = \frac{ω_{f (m)} {&upsi;}_{g (i, l)} σ_{h (m, j)}^{2}}{\sqrt{σ_{f (m, j)}^{2} σ_{h (m, j)}^{2} + σ_{g (i, l, j)}^{2} σ_{h (m, j)}^{2} - σ_{f (m, j)}^{2} σ_{g (i, l, j)}^{2}}}

\times \exp {\frac{1}{2} (\frac{{(\frac{σ_{f (m, j)} σ_{g (i, l, j)}}{σ_{h (m, j)}} μ_{h (m, j)} - \frac{σ_{f (m, j)} σ_{h (m, j)}}{σ_{g (i, l, j)}} μ_{g (i,,, j)} - \frac{σ_{g (i, l, j)} σ_{h (m, j)}}{σ_{f (m, j)}} μ_{f (m, j)})}^{2}}{σ_{f (m, j)}^{2} σ_{h (m, j)}^{2} + σ_{g (i, l, j)}^{2} σ_{h (m, j)}^{2} - σ_{f (m, j)}^{2} σ_{g (i, l, j)}^{2}} + \frac{μ_{h (m, j)}}{σ_{h (m, j)}^{2}} - \frac{μ_{g (i,,, j)}}{σ_{g (i, l, j)}^{2}} - \frac{μ_{f (m, j)}}{σ_{f (m, j)}^{2}})}

(formula 38)

μ_{f (m, j)} = \frac{Σ_{i = 1}^{N_{g}} Σ_{l = 1}^{L_{g (i)}} B_{(m, l, i, j)}}{Σ_{i = 1}^{N_{g}} Π_{j = 1}^{J} Σ_{l = 1}^{L_{g (i)}} A_{(m, l, i, j)}} - - (m = 1,2, . . ., M_{f}, j = 1,2, . . ., J)

B_{(m, l, i, l)} = \frac{σ_{f (m, j)}^{2} σ_{h (m, j)}^{2} μ_{g (i,,, j)} + σ_{g (i, l, j)}^{2} σ_{h (m, j)}^{2} μ_{f (m, j)} - σ_{f (m, j)}^{2} σ_{g (i, l, j)}^{2} μ_{h (m, j)}}{σ_{f (m, j)}^{2} σ_{h (m, j)}^{2} + σ_{g (i, l, j)}^{2} σ_{h (m, j)}^{2} - σ_{f (m, j)}^{2} σ_{g (i, l, j)}^{2}} \times A_{(m, l, i, j)}

(formula 39)

σ_{f (m, j)}^{2} = \frac{Σ_{i = 1}^{N_{g}} Σ_{l = 1}^{L_{g (i)}} C_{(m, l, i, j)}}{Σ_{i = 1}^{N_{g}} Π_{j = 1}^{J} Σ_{l = 1}^{L_{g (i)}} A_{(m, l, i, j)}} - - (m = 1,2, . . ., M_{f})

C_{(m, l, i, l)} = {\frac{σ_{f (m, j)}^{2} σ_{g (i, l, j)}^{2} σ_{h (m, j)}^{2}}{σ_{f (m, i)}^{2} σ_{h (m, i)}^{2} + σ_{g (i, l, j)}^{2} σ_{h (m, j)}^{2} - σ_{f (m, j)}^{2} σ_{g (i, l, j)}^{2}}

+ {(μ_{f (m, j)} - \frac{σ_{f (m, j)}^{2} σ_{h (m, j)}^{2} μ_{g (i,,, j)} + σ_{g (i, l, j)}^{2} σ_{h (m, j)}^{2} μ_{f (m, j)} - σ_{f (m, j)}^{2} σ_{g (i, l, j)}^{2} μ_{h (m, j)}}{σ_{f (m, j)}^{2} σ_{h (m, j)}^{2} + σ_{g (i, l, j)}^{2} σ_{h (i, l, j)}^{2} - σ_{f (m, j)}^{2} σ_{g (i, l, j)}^{2}})}^{2}} \times A_{(m, l, i, j)}

In addition, use normalized state transition probability, make with the pairing state-transition probability of HMM all be added on the reference model 121 integral body become 1.

Below, the concrete example that present embodiment is applicable to the speech recognition of computer is described.Here, establishing computer (PC) as server 101, as reading in portion 111, is that the center illustrate with the concrete using method of master pattern with the CD-ROM drive unit.

At first, the user CD-ROM that will store as a plurality of sound models of reference model is installed in the CD-ROM drive unit (reading in portion 111) of PC (server 101).In this CD-ROM, storage is each sound model of ' baby ', ' children: man ', ' children: woman ', ' adult: man ', ' adult: woman ', ' the elderly: man ', ' the elderly: woman ' for example.

Then, as Fig. 6 (a) with shown in the demonstration of the picture (b) example, the user uses the display that is connected on the PC (server 101), selects to be arranged in the sound model of family composition (utilizing the people of speech recognition).Showing that in the frame of ' CD-ROM ' sound model that is stored among the CD-ROM, the sound model that will select copy to the interior appearance of the frame of ' user ' shown in Fig. 6 from these sound models.Here, the family composition of establishing the user is 10 years old son, 50 years old father, 3 people of mother of 40 years old, user (father) with ' children: man ', ' adult: man ', ' the model holder of adult: woman ' 3 is dragged and is moved in the frame of ' user '.By this operation, reference model preparation portion 102 carries out the preparation of reference model.That is, 3 reference models are read by the portion of reading in 111, through reference model preparation portion 102, are stored in the reference model storage part 103.

Then, the picture shown in Fig. 7 (a) shows that the user specifies the structure (mixed distribution number) of the master pattern of making shown in the example.In Fig. 7 (a), show ' 3 as ' mixed distribution number ' ', ' 10 ', ' 20 ', the user selects the number expected from these numbers.By this operation, 104a determines the structure of the master pattern of making thus by master pattern structure determination portion.

In addition, definite mode of mixed distribution number is not limited to this direct appointment, and for example, the picture shown in Fig. 7 (b) shows shown in the example, also can determine the mixed distribution number according to the specification information that the user selects.Among Fig. 7 (b), illustrate as using master pattern to carry out the object-based device of speech recognition, from 3 kinds ' utilizing equipment ', promptly select to utilize the state of equipment ' used as television ', ' automobile navigation instrument with ', ' pocket telephone with '.At this moment, corresponding tables according to prior storage, for example under the situation of selecting ' used as television ', the mixed distribution number is defined as 3, under the situation of selecting ' automobile navigation instrument is used ', the mixed distribution number is defined as 20, under the situation of selecting ' pocket telephone is used ', the mixed distribution number is defined as 10.

In addition, definite mode of mixed distribution number, also can be by selection from recognition speed or precision, i.e. ' identification as early as possible ', ' usually ', ' the high precision identification ', will be defined as the mixed distribution number corresponding to the value of each option (' discerning as early as possible '=3, ' usually '=10, ' high precision identification '=20).

If this input operation finishes, then after making the primary standard model by the primary standard modelling 104b of portion, statistic is inferred the 104d of portion and is carried out repeated calculation (study), production standard model.At this moment, the picture as Fig. 8 shows shown in the example that master pattern structure determination portion 104a shows the progress situation of study.The user can understand progress situation, study concluding time etc. of study, can feel at ease to wait for, till finishing master pattern.In addition, as the demonstration of progress situation, bar demonstration, the study number of times demonstration shown in Fig. 8 (b) and the demonstration of other likelihood benchmark etc. of the level of learning shown in Fig. 8 (a) are for example arranged.In addition, also can be, when not learning, show general face image,, change to user's progresses such as face image demonstration along with finishing near study.Equally, but show child's face when also not learning, along with finishing, show that celestial being's etc. progress shows near study.

Finish the making of master pattern if so, then the master pattern of Zhi Zuoing is recorded in the storage card (write section 112) by master pattern preparing department 104.The user is inserted in the memory card insert slot that utilizes equipment, for example televisor extract this storage card from PC (write section 112 of server 101) after.Thus, the master pattern of making is moved to the equipment (televisor) that utilizes from PC (server 101).The televisor utilization is recorded in the master pattern in the storage card that is mounted, and carries out the speech recognition as object with user's (here for utilizing the household of televisor).For example, be input to the voice that are attached to the microphone on the televisor, judge the instruction of televisor operation usefulness, and carry out this instruction (for example program search of the switching of channel, EPG etc.) by identification.Thus, realize using master pattern, the voice-based televisor operation of making by the master pattern producing device of present embodiment.

As mentioned above, according to the 1st embodiment of the present invention, because the statistic of basis of calculation model is come the production standard model, make probability or likelihood maximization or maximization to cut-and-dried reference model, so do not need to learn data or teacher's data of usefulness, production standard model simply, and, the high precision master pattern behind a plurality of reference models that integrated survey made can be made.

In addition, master pattern 122 is not limited to each phoneme is constituted HMM, also can be made of the HMM of context-sensitive.

In addition, master pattern preparing department 104 also can make the incident output probability execution model part phoneme, under the partial status.

In addition, the HMM that constitutes master pattern 122 also can be by constituting the different status number of each phoneme, and also can be distributed by the mixed Gaussian to the different distribution number of each state constitutes.

In addition, reference model 121, can be made of different status numbers with regard to reference model with reference model, the elderly with reference model, adult with regard to children, and also can be distributed by the mixed Gaussian of different mixed number constitutes.

In addition, also can use master pattern 122 in server 101, to carry out speech recognition.

In addition, also can replace from memory devices such as CD-ROM, DVD-RAM, reading in reference model 121, and make reference model 121 according to speech data by server 101.

In addition, reference model preparation portion 102 also can append the new reference model that reads in, be updated in the reference model storage part 103 where necessary from memory devices such as CD-ROM, DVD-RAM.Promptly, reference model preparation portion 102 not only is stored in new reference model in the reference model storage part 103, under the reference model to the same identification object is stored in situation in the reference model storage part 103, can upgrade reference model by replacing, perhaps also can delete the useless reference model that is stored in the reference model storage part 103 with this reference model.

In addition, where necessary, reference model preparation portion 102 also can append new reference model, be updated in the reference model storage part 103 through communicating route.

In addition, behind the production standard model, also can utilize speech data to learn.

In addition, master pattern structure determination portion also can be determined HMM structures such as monophone, tripartite phone (triphone), state be shared or status number etc.

(the 2nd embodiment)

Fig. 9 is the whole block diagram that constitutes of master pattern producing device of expression the present invention the 2nd embodiment.Here, illustrate master pattern producing device of the present invention is assembled in example in the set-top box 201 (below be called STB).In the present embodiment, be that example describes to make speech recognition with the situation of master pattern (speaker's adaptive model).Particularly, be that example describes to carry out the EPG retrieval of televisor or the situation of program switching, recording schedule etc. by the speech identifying function of STB.

STB201 is the digital broadcasting receiver of identification user's the giving orders or instructions back automatic switchover of carrying out the TV program etc., as making, possess microphone 211, speech data storage part 212, reference model preparation portion 202, reference model storage part 203, utilize information issuing portion 204, reference model selection portion 205, master pattern preparing department 206 and speech recognition portion 213 by the speech recognition of the output probability definition of the set of incident and the transfer between incident or incident master pattern producing device of master pattern.

The speech data that microphone 211 is collected is stored in the speech data storage part 212.Reference model preparation portion 202 uses the speech data of speech data storage part 212 storages, each speaker is made reference model 221, and be stored in the reference model storage part 203.

Utilize information issuing portion 204 to utilize microphone 211 to collect as the voice that utilize the user of information 224.Here, what is called utilizes information to be and identification (sense stricto identification, distinguish, authentication etc.) the relevant information of object (people, thing), here, is the voice that constitute the user of speech recognition object.Reference model selection portion 205 is utilized information 224 according to what utilize that information issuing portion 204 makes, from the reference model 221 of reference model storage part 203 storages, selects on the audio frequency near the reference model 223 that utilizes the user's voice shown in the information 224.

Master pattern preparing department 206 is handling parts of production standard model 222, make probability or likelihood maximization or maximization to reference model selection portion 205 selected speakers' reference model 223, this master pattern preparing department 206 comprises: the master pattern structure determination portion 206a of the structure of the model that settles the standard (the mixed distribution number of Gaussian distribution etc.); The primary standard modelling 206b of portion by being identified for the statistic initial value of basis of calculation model, makes the primary standard model; The statistic storage part 206c of the primary standard model that storage is determined; Infer the 206d of portion with statistic, approximate treatment by using the general approximate 206e of portion etc., the primary standard Model Calculation that is stored among the statistic storage part 206c is gone out statistic (generating final master pattern), and this statistic makes for the probability of reference model selection portion 205 selected reference models 223 or likelihood maximization or maximization.

Speech recognition portion 213 uses the master pattern of being made by master pattern preparing department 206 222 to discern user's voice.

Below, the action of the STB201 of above-mentioned formation is described.

Figure 10 is the process flow diagram of the action step of expression STB201.

At first, before the production standard model, prepare to become the reference model (step S200) of its benchmark.That is, the speech data by microphone 211 is collected from little A to little Z is stored in the speech data storage part 212.For example, be arranged within doors a plurality of microphones, be built in microphone in the televisor remote controller, telephone set etc. is connected with the speech data storage part 212 of STB201, will be stored in the speech data storage part 212 from the speech data of microphone or telephone set input.For example, storage elder brother, younger sister, father, mother, grandfather, neighbours, friend's voice.

Reference model preparation portion 202 uses the speech data of speech data storage part 2 12 storages, by the presuming method again of Bao Mu-Wei Erqi, each speaker is made reference model 221.This processing was carried out before request production standard model.

The reference model 221 that reference model storage part 203 storage reference model preparation portions 202 make.Reference model 221 is made of the HMM of each phoneme.Figure 11 illustrates an example of reference model 221.Here, the status number from little A to whole reference models of little Z is 3, under each state, is that the output that constitutes HMM that distributes of 5 mixed Gaussian distributes by the mixed distribution number.As characteristic quantity, use the cepstral coefficients of 25 dimensions (J=25).

Here, the making of request master pattern.For example, the user is by pressing ' user's affirmation ' button, the making of request master pattern.About ' user's affirmation ' key, can consider to be presented on the TV set image and the method for selecting, or the method that ' user's affirmation ' switch is selected is set on the remote controller of televisor.As the timing that presses the button, can consider to start the timing of televisor, when using the speech recognition executing instruction operations, think the timing of the master pattern that need be adapted to the user etc.

Then, information issuing portion 204 collects as the voice (step S201) that utilize the user of information 224 by microphone 211.For example, if request production standard model then is shown as ' please import name ' on picture.The user is by being built in the microphone in the televisor remote controller, input name (user's voice).This user's voice are to utilize information.In addition, the voice of input are not limited to name.For example, also can show ' sound that please send ' adaptation ' ', user's sounding is ' adaptation '.

Reference model selection portion 205 is from the reference model 22 1 of reference model storage part 203 storages, and last reference model 223 (step S202) near these user's voice selects a sound.Particularly, user's phonetic entry to from little A to the reference model of little Z, is selected the reference model to big 10 people (Ng=10) first speaker of the likelihood of sounding word.

In addition, master pattern preparing department 206 production standard models 222 make probability or the likelihood maximization or the maximization (step S203) of 10 reference models 223 that reference model selection portion 205 is selected.At this moment, shown in the 1st embodiment, also can show the progress situation of study.Thus, the user can judge progress situation, the study tail end etc. of study, the production standard model of can feeling at ease.In addition, the non-display part of progress situation that the progress situation that makes study becomes non-demonstration also can be set.By this function, can effectively use picture.In addition, carry out non-demonstration, can avoid feeling trouble by people to custom.

At last, the user's that speech recognition portion 213 will send through microphone 211 voice use the master pattern of being made by master pattern preparing department 206 222 to carry out speech recognition (S204) as input.For example, carry out audio analysis etc., calculate 25 dimension cepstral coefficients, and be input to the master pattern 222 of each phoneme, specific thus phoneme contact with high likelihood by the voice that the user is sent.Afterwards, the program names in the contact of this phoneme and the electronic programming data that receive in advance relatively under the situation that detects the likelihood more than the setting, is carried out the control that the automatic program that switches to this program switches.

Below, the detailed step of the step S203 (making of master pattern) among Figure 10 is described.Steps flow chart is the same with process flow diagram shown in Figure 4.But, the structure of accepted standard model or concrete differences such as approximate treatment.

At first, the master pattern structure determination portion 206a structure (the step S102a of Fig. 4) of model that settles the standard.Here, as the structure of master pattern, being made of the HMM of each phoneme, is 3 states, and the mixed distribution number that the output of each state is distributed is defined as 16 (Mf=16).

Then, the 206b of primary standard modelling portion determines that meter is used for the statistic initial value (the step S102b of Fig. 4) of basis of calculation model.Here, the model that uses statistical treatment to calculate in 10 reference model 223 integrations to Gaussian distribution selecting with reference to Model Selection portion 205 is made as the statistic initial value, this initial value is stored among the statistic storage part 206c as the primary standard model.Here, the mixed distribution number that use is learnt each speaker is 5 a reference model, and making high-precision mixed distribution number is the master pattern (speaker's adaptive model) of 16 (16 mix).

Particularly, the 206b of primary standard modelling portion generates the output distribution shown in the following formula 13 respectively to above-mentioned 3 state I (I=1,2,3).

But, in the present embodiment, (formula 40) during the output shown in the following formula 13 distributes

x＝(x ₍₁₎，x ₍₂₎，...，x _(J))∈R ^J

The cepstral coefficients of expression 25 dimensions (J=25).

Then, statistic is inferred 10 reference models 223 that the 206d of portion uses reference model selection portion 205 to select, and infers the statistic (the step S102c of Fig. 4) that is stored in the master pattern among the statistic storage part 206c.

That is, inferring master pattern distributes, is probability (being the likelihood logP shown in the following formula 25 here) maximization of the output distribution shown in the following formula 19 or the statistic (variance yields shown in mean value shown in the mixed weighting coefficient shown in the following formula 16, the following formula 17 and the following formula 18) of maximized master pattern for the output under 10 (Ng=10) reference model 223 each states (I=1,2,3).

But, in the present embodiment, (formula 41) during the output shown in the following formula 19 distributes

L _g(i)(i＝1，2，...，N _g)

Be 5 (the mixed distribution numbers of each reference model).

Particularly, according to following formula 26, formula 27 and formula 28, calculate mixed weighting coefficient, mean value and the variance yields of master pattern respectively.

At this moment, infer the general approximate 206e of portion of the 206d of portion, use the approximate expression shown in the following formula 29 by statistic.

Here, the generally approximate 206e of portion is different with the 1st embodiment, from the output distribution shown in the denominator of the approximate expression of following formula 29

(formula 42)

ω_{f (k)} f (x; μ_{f (k)}, σ_{f (k)}^{2}) - - (k = 1,2, . . ., M_{f})

In, distribute near the output shown in the molecule of the approximate expression of following formula 29 on the chosen distance

(formula 43)

ω_{f (m)} f (x; μ_{f (m)}, σ_{f (m)}^{2})

Near 3 (Ph (m)=3) output distributes,

(formula 44)

ω_{f (m, p)} f (x; μ_{f (m, p)}, σ_{f (m, p)}^{2}) - - (m = 1,2, . . ., M_{f}, p = 1,2, . . ., P_{h (m)})

And use 3 outputs selecting to distribute, respectively according to the formula shown in following formula 45, formula 46 and the formula 47, calculate weighting (formula 31), mean value (formula 32) and the variance yields (formula 33) of the single Gaussian distribution shown in the above-mentioned formula 30.

Formula (45)

u_{h (m)} = Σ_{p = 1}^{P_{h (m)}} ω_{f (m, p)} - - (m = 1,2, . . ., M_{f})

(formula 46)

μ_{h (m, j)} = \frac{Σ_{p = 1}^{P_{h (m)}} ω_{f (m, p)} μ_{f (m, p, j)}}{Σ_{p = 1}^{P_{h (m)}} ω_{f (m, p)}} - - (m = 1,2, . . ., M_{f}, j = 1,2, . . ., J)

(formula 47)

σ_{h (m, j)}^{2} = \frac{Σ_{p = 1}^{P_{h (m)}} ω_{f (m, p)} (σ_{f (m, p, j)}^{2} + μ_{f (m, p, j)}^{2})}{Σ_{p = 1}^{P_{h (m)}} ω_{f (m, p)}} - μ_{h (m, j)}^{2}

(m＝1，2，...，M _f，j＝1，2，...，J)

Figure 12 is the figure of the approximate treatment of the general approximate 206e of portion of explanation.The general approximate 206e of portion as shown in the figure, from Mf the mixed Gaussian that constitutes master pattern distributes, only use a part (Ph (m) is individual) the mixed Gaussian distribution that distributes approaching with the mixed Gaussian that becomes calculating object, determine the single Gaussian distribution (formula 30) in the approximate expression shown in the following formula 29.Therefore, compare, cut down the calculated amount in the approximate treatment with the 1st embodiment that uses all (Mf) mixed Gaussian to distribute.

If take all factors into consideration the approximate expression of the above general approximate 206e of portion, it is as follows that then statistic is inferred the calculating formula of the 206d of portion.That is, statistic is inferred the 206d of portion according to following formula 48, formula 49 and formula 50, calculates mixed weighting coefficient, mean value and variance yields respectively, and is stored among the statistic storage part 206c.Afterwards, this statistic infers with to repeat R (〉=1) to the storage of statistic storage part 206c inferior.The statistic that obtains the result is exported as the statistic of the final master pattern 222 that generates.In addition, with regard to double counting, corresponding to this number of times, reduce the selection number Ph (m) that the output in the above-mentioned approximate treatment distributes, the calculating of Ph (m)=1 is satisfied in final execution.

(formula 48)

ω_{f (m)} = \frac{Σ_{i = 1}^{N_{g}} Σ_{l = 1}^{L_{g (i)}} α_{(m, l, i)}}{Σ_{k = 1}^{M_{f}} ω_{f (k)} (Σ_{i = 1}^{N_{g}} Σ_{l = 1}^{L_{g (i)}} α_{(k, l, i)})} - - (m = 1,2, . . ., M_{f})

α_{(m, l, i)} = {&upsi;}_{g (i, l)} Π_{j = 1}^{J} D_{(m, l, i, j)}

D_{(m, l, i, j)} = \frac{σ_{h (m, j)}^{2}}{\sqrt{σ_{f (m, j)}^{2} σ_{h (m, j)}^{2} + σ_{g (i, l, j)}^{2} σ_{h (m, j)}^{2} - σ_{f (m, j)}^{2} σ_{g (i, l, j)}^{2}}}

\times \exp {\frac{1}{2} (\frac{{(\frac{σ_{f (m, j)} σ_{g (i, l, j)}}{σ_{h (m, j)}} μ_{h (m, j)} - \frac{σ_{f (m, j)} σ_{h (m, j)}}{σ_{g (i, l, j)}} μ_{g (i,,, j)} - \frac{σ_{g (i, l, j)} σ_{h (m, j)}}{σ_{f (m, j)}} μ_{f (m, j)})}^{2}}{σ_{f (m, j)}^{2} σ_{h (m, j)}^{2} + σ_{g (i, l, j)}^{2} σ_{h (m, j)}^{2} - σ_{f (m, j)}^{2} σ_{g (i, l, j)}^{2}} + \frac{μ_{h (m, j)}}{σ_{h (m, j)}^{2}} - \frac{μ_{g (i,,, j)}}{σ_{g (i, l, j)}^{2}} - \frac{μ_{f (m, j)}}{σ_{f (m, j)}^{2}})}

(formula 49)

μ_{f (m, j)} = \frac{Σ_{i = 1}^{N_{g}} Σ_{l = 1}^{L_{g (i)}} β_{(m, l, i, j)} α_{(m, l, i)}}{Σ_{i = 1}^{N_{g}} Σ_{l = 1}^{L_{g (i)}} α_{(m, l, i)}} - - (m = 1,2, . . ., M_{f}, j = 1,2, . . ., J)

β_{(m, l, i, j)} = \frac{σ_{f (m, j)}^{2} σ_{h (m, j)}^{2} μ_{g (i,,, j)} + σ_{g (i, l, j)}^{2} σ_{h (m, j)}^{2} μ_{f (m, j)} - σ_{f (m, j)}^{2} σ_{g (i, l, j)}^{2} μ_{h (m, j)}}{σ_{f (m, j)}^{2} σ_{h (m, j)}^{2} + σ_{g (i, l, j)}^{2} σ_{h (m, j)}^{2} - σ_{f (m, j)}^{2} σ_{g (i, l, j)}^{2}}

(formula 50)

σ_{f (m, j)}^{2} = \frac{Σ_{i = 1}^{N_{g}} Σ_{l = 1}^{L_{g (i)}} γ_{(m, l, i, j)} α_{(m, l, i)}}{Σ_{i = 1}^{N_{g}} Σ_{l = 1}^{L_{g (i)}} α_{(m, l, i)}} - - (m = 1,2, . . ., M_{f}, j = 1,2, . . ., J)

γ_{(m, l, i, l)} = {\frac{σ_{f (m, j)}^{2} σ_{g (i, l, j)}^{2} σ_{h (m, j)}^{2}}{σ_{f (m, j)}^{2} σ_{h (m, j)}^{2} + σ_{g (i, l, j)}^{2} σ_{h (m, j)}^{2} - σ_{f (m, j)}^{2} σ_{g (i, l, j)}^{2}}

+ {(μ_{f (m, j)} \frac{σ_{f (m, j)}^{2} σ_{h (m, j)}^{2} μ_{g (i,,, j)} + σ_{g (i, l, j)}^{2} σ_{h (m, j)}^{2} μ_{f (m, j)} - σ_{f (m, j)}^{2} σ_{g (i, l, j)}^{2} μ_{h (m, j)}}{σ_{f (m, j)}^{2} σ_{h (m, j)}^{2} + σ_{g (i, l, j)}^{2} σ_{h (m, j)}^{2} - σ_{f (m, j)}^{2} σ_{g (i, l, j)}^{2}})}^{2}}

In addition, use normalized state transition probability, make the integral body that the pairing state-transition probability of HMM all is added on the reference model 223 become 1.

As mentioned above, according to the 2nd embodiment of the present invention, because the statistic of basis of calculation model is come the production standard model, and make to probability or likelihood maximization or maximization, so can provide suitable high precision master pattern by utilizing situation according to a plurality of reference models that utilize Information Selection.

In addition, the timing of production standard model, be not limited only to present embodiment the user express indication, also can be at other production standard model regularly.For example, the user whether automatic user of judgement change also is set and changes judging part in STB201.This user changes judging part and uses the identification voice that are input in the televisor remote controller, judges whether the user changes, is whether present user is same people with the user who discerns before this.Be judged as under the situation of user change, with these voice as utilizing information, the production standard model.Thus, the user can unconsciously carry out the speech recognition of having used the master pattern that is suitable for the user.

In addition, master pattern 222 is not limited to each phoneme is constituted HMM, also can be made of the HMM of context-sensitive.

In addition, master pattern preparing department 206 also can make the output probability execution model of the incident part phoneme, under the partial status.

In addition, the HMM that constitutes master pattern 222 also can be by constituting the different status number of each phoneme, and also can be distributed by the mixed Gaussian to the different distribution number of each state constitutes.

In addition, reference model 221 can be made of different status numbers with regard to each speaker's HMM, and also can be distributed by the mixed Gaussian of different mixed number constitutes.

In addition, reference model 221 is not limited to each speaker's HMM, also can make each speaker, noise, tone.

In addition, also master pattern 222 can be recorded in the memory devices such as CD-ROM, hard disk, DVD-RAM.

In addition, also can replace the making of reference model 221, and from memory devices such as CD-ROM, DVD-RAM, read in.

In addition, reference model selection portion 205 also can change the quantity of the reference model that each user is selected according to utilizing information 224.

In addition, reference model preparation portion 202 appends, is updated in the reference model storage part 203 after also can making new reference model where necessary, and deletion is stored in the useless reference model in the reference model storage part 203.

In addition, reference model preparation portion 202 also can append new reference model, be updated in the reference model storage part 203 where necessary through communication path.

In addition, the number Ph (m) that the output selected in the above-mentioned approximate treatment distributes also can distribute different because of the output of the incident that becomes object or master pattern, also can be according to distribute spacing from determining.

In addition, behind the production standard model, can also utilize speech data to learn.

In addition, with regard to the mixed distribution number, can when dispatching from the factory, set the STB in the present embodiment for setting, perhaps, can wait to determine the mixed distribution number according to the specification of the application program of the specifications such as cpu power of the equipment of having considered to use network, starting.

(the 3rd embodiment)

Figure 13 is that the integral body of the master pattern producing device of expression the present invention the 3rd embodiment constitutes block diagram.Here, illustrate master pattern producing device of the present invention is assembled in PDA (personal digital assistant: the example in 301 PersonalDigital Assistant).In the present embodiment, be that example describes to make noise identification with the situation of master pattern (noise model).

PDA301 is a portable information terminal, utilize noise identification that the output probability of incident defines master pattern producing device as making, possess the portion of reading in 311, reference model preparation portion 302, reference model storage part 303, utilize information issuing portion 304, reference model selection portion 305, master pattern preparing department 306, specification information preparing department 307, microphone 312 and noise identification portion 313 with master pattern.

Read in portion 311 and read in the reference model of the noises such as reference model of the reference model of the reference model of the reference model of the reference model of the car A in the memory devices such as being written in CD-ROM, car B, bus A, light rain, heavy rain.Reference model preparation portion 302 sends to reference model storage part 303 with the reference model 321 that reads in.Reference model storage part 303 storage reference models 321.

Utilize information issuing portion 304 to utilize the picture of PDA301 and key to make as the noise type that utilizes information.Reference model selection portion 305 selects to utilize with conduct on the audio frequency the approaching reference model of noise type of information 324 from the reference model 321 that reference model storage part 303 is stored.Specification information preparing department 307 comes manufacturing specification information 325 according to the specification of PDA301.Here, the information of the specification of the master pattern that so-called specification information relates to make is the relevant information of processing power of the CPU that possesses with PDA301 here.

Master pattern preparing department 306 is following handling parts, according to the specification information of making by specification information preparing department 307 325, production standard model 322, the probability or the likelihood of the reference model of the noise of selecting at reference model selection portion 305 with maximization or maximization comprise: the master pattern structure determination portion 306a of the model construction that settles the standard (the mixed distribution number of Gaussian distribution etc.); The primary standard modelling 306b of portion by determining initial value at the beginning of the statistic that basis of calculation model uses, makes the primary standard model; The statistic storage part 306c of the primary standard model that storage is determined; Infer the 306d of portion with statistic, by using the approximate treatment etc. of the 2nd approximate 306e of portion, calculate the probability of the reference model 323 that maximization or maximization select reference model selection portion 305 or the statistic (generating final master pattern) of likelihood to being stored in primary standard model among the statistic storage part 306c.

Noise identification portion 313 uses the master pattern of being made by master pattern preparing department 306 322, and identification is from the noise type of microphone 312 inputs.

Below, the action of the PDA301 of above-mentioned formation is described.

Figure 14 is the process flow diagram of the action step of expression PDA301.

At first, before the production standard model, prepare to become the reference model (step S300) of its benchmark.That is, read in portion 311 and read in the reference model that is written in the noise in the memory device, reference model preparation portion 302 sends to reference model storage part 303 with the reference model 321 that reads in, reference model storage part 303 storage reference models 321.

Reference model 321 is made of GMM.One example of reference model shown in Figure 15 321.Here, each noise model is that 3 GMM constitutes by the mixed distribution number.As characteristic quantity, use the LPC cepstral coefficients of 5 dimensions (J=5).

Then, utilize information issuing portion 304 to make and utilize information 324 (step S301) as the noise type that will discern.Selection picture one example of PDA301 shown in Figure 16.Here, selected the noise of car.Reference model selection portion 305 from the reference model 321 of reference model storage part 303 storage, select on the audio frequency with the selecteed information 324 of utilizing be that the approaching reference model of the noise of car is the reference model of car A and the reference model (step S302) of car B.

Afterwards, specification information preparing department 307 is according to the specification of PDA301, manufacturing specification information 325 (step S303).Here, according to the CPU specification of PDA301, make the little such specification information 325 of cpu power.Master pattern preparing department 306 is according to the specification information of making 325, and production standard model 322 makes probability or likelihood maximization or maximization (step S304) to reference model selection portion 305 selected reference models 323.

At last, noise identification portion 313 uses master pattern 322, and the user is carried out noise identification (step S305) from the noise of microphone 312 inputs.

Below, the detailed step of the step S304 (making of master pattern) among Figure 14 is described.Steps flow chart is the same with process flow diagram shown in Figure 4.But the structure of accepted standard model is different with concrete approximate treatment etc.

At first, the master pattern structure determination portion 306a structure (the step S102a of Fig. 4) of model that settles the standard.Here, as the structure of master pattern, be defined as constituting master pattern 322 by a GMM who mixes (Mf=1) according to as the little information of the cpu power of specification information 325.

Then, the 306b of primary standard modelling portion is identified for the statistic initial value (the step S102b of Fig. 4) of basis of calculation model.Here, with use statistical treatment calculate, be the reference model of selecting 323 model in reference model integration to a Gaussian distribution of three mixing of car A as the statistic initial value, be stored among the statistic storage part 306c.

Particularly, the 306b of primary standard modelling portion generates the output distribution shown in the above-mentioned formula 13.

Wherein, in the present embodiment, during the output shown in the above-mentioned formula 13 distributes

(formula 51)

x＝(x ₍₁₎，x ₍₂₎，...，x _(J))∈R ^J

The LPC cepstral coefficients of expression 5 dimensions (J=5).

Then, statistic is inferred 2 reference models 323 that the 306d of portion uses reference model selection portion 305 to select, and infers the statistic (the step S102c of Fig. 4) that is stored in the master pattern among the statistic storage part 306c.

Promptly, infer the statistic (variance yields shown in mean value shown in the mixed weighting coefficient shown in the following formula 16, the following formula 17 and the following formula 1 8) of master pattern, this master pattern makes master pattern distribute, be probability (being the likelihood logP shown in the following formula 25) maximization or the maximization that the output shown in the following formula 19 distributes for the output in 2 (Ng=2) reference models 322 here.

Wherein, in the present embodiment, during the output shown in the above-mentioned formula 19 distributes

(formula 52)

L _g(i)(i＝1，2，...，N _g)

Be 3 (the mixed distribution numbers of each reference model).

At this moment, each Gaussian distribution that the 2nd approximate 306e of portion that statistic is inferred the 306d of portion is assumed to be master pattern does not influence each other, uses following approximate expression.

(formula 53)

γ (x, m) \approx \frac{ω_{f (m)} f (x; μ_{f (m)}, σ_{f (m)}^{2})}{u_{h (m)} h (x; μ_{h (m)}, σ_{h (m)}^{2})} \approx 1.0

(m＝1，2，...，M _f)

In addition, the Gaussian distribution of master pattern

Formula (54)

ω_{f (m, p)} f (x; μ_{f (m, p)}, σ_{f (m, p)}^{2}) - - (m = 1,2, . . ., M_{f}, p = 1,2, . . ., P_{h (m)})

Near

(formula 55)

X

Be that (Kullback-Leibler: distribution spacings such as the distance of Kullback-Lai Bule) are near Qg (m, I) Gaussian distribution of individual reference model 323 for the Euclidean distance, Mahalanobis generalised distance, KL of the mean value that distributes with output shown in the following formula 54

(formula 56)

g (x; μ_{g (i, l)}, σ_{g (i, l)}^{2}) - - (i = 1,2, . . ., N_{g}, l = 1,2, . . ., L_{(i)})

The space that exists,

With formula (57)

ω_{f (m, p)} f (x; μ_{f (m, p)}, σ_{f (m, p)}^{2}) - - (m = 1,2, . . ., M_{f}, p = 1,2, . . ., P_{h (m)})

Distribute spacing (m, I) individual (the described output with reference to vector of 1≤Qg (m, I)≤Lg (I)) distributes and is approximately the Gaussian distribution of described reference model near Qg

(58)

{&upsi;}_{g (i, l)} g (x; μ_{g (l)}, σ_{g (l)}^{2}) - - (i = 1,2, . . ., N_{g}, l = 1,2, . . ., L_{g (i)})

Middle distribute spacing is distributed as the described output distribution with reference to vector of described formula 57 from the output of the described master pattern of nearest (near indication parameter G=1).

Figure 17 is the schematic diagram that statistic that this statistic of expression is inferred the 306d of portion is inferred step.Illustrate and use following Gaussian distribution that each Gaussian distribution of each reference model is carried out inferring of statistic, in this Gaussian distribution, distribution spacings such as the Euclidean distance of mean value, Mahalanobis generalised distance from nearest be the Gaussian distribution m of master pattern.

Figure 18 is the figure of the approximate treatment of explanation the 2nd approximate 306e of portion.As shown in the figure, the 2nd approximate 306e of portion determines the Gaussian distribution m of nearest master pattern by each Gaussian distribution to each reference model, uses the approximate expression shown in the following formula 53 thus.

If take all factors into consideration the approximate expression of the above the 2nd approximate 306e of portion, it is as follows that then statistic is inferred the calculating formula of the 306d of portion.That is, statistic is inferred the 306d of portion according to following formula 59, formula 60 and formula 61, calculates mixed weighting coefficient, mean value and variance yields respectively, and generates by these parameter certain criteria models, as final master pattern 322.

(formula 59)

ω_{f (m)} = \frac{Σ_{i = 1}^{N_{g}} Σ_{l = 1}^{Q_{g (m, i)}} {&upsi;}_{g (i, l)}}{Σ_{k = 1}^{M_{f}} Σ_{i = 1}^{N_{g}} Σ_{l = 1}^{Q_{g (m, i)}} {&upsi;}_{g (i, l)}}

(m＝1，2，...，M _f)

(wherein, denominator, molecule sum mean distribution spacings such as Euclidean distance in each Gaussian distribution with each reference model, mean value, Mahalanobis generalised distance from the Gaussian distribution of the nearest Gaussian distribution m that becomes master pattern relevant and.)

(formula 60)

μ_{f (m, j)} = \frac{Σ_{i = 1}^{N_{g}} Σ_{l = 1}^{Q_{g (m, i)}} {&upsi;}_{g (i, l)} μ_{g (i, l, j)}}{Σ_{i = 1}^{N_{g}} Σ_{l = 1}^{Q_{g (m, i)}} {&upsi;}_{g (i, l)}}

(m＝1，2，...，M _f，j＝1，2，...，J)

(formula 61)

σ_{f (m, j)}^{2} = \frac{Σ_{i = 1}^{N_{g}} Σ_{l = 1}^{Q_{g (m, i)}} {&upsi;}_{g (i, l)} (σ_{g (i, l)}^{2} + μ_{g (i, l, j)}^{2})}{Σ_{i = 1}^{N_{g}} Σ_{l = 1}^{Q_{g (m, i)}} {&upsi;}_{g (i, l)}} - μ_{f (m, j)}^{2}

(m＝1，2，...，M _f，j＝1，2，...，J)

But,

(formula 62)

Σ_{i = 1}^{N_{g}} Q_{g (m, i)} = 0 - - (m = 1,2, . . ., M_{f})

Situation under,

(the 1st method) do not upgrade mixed weighting coefficient, mean value, variance yields.

(the 2nd method) is made as zero with the value of mixed weighting coefficient, and mean value, variance yields are made as setting.

(the 3rd method) is made as setting with the value of mixed weighting coefficient, and mean value, variance yields are made as ready-made mean value, the variance yields when distributing of output distribution table with master pattern.

Utilize arbitrary method to determine the value of statistic.In addition, the method for utilization also can be to the state of each multiplicity R, HMM, HMM and difference.Here, use the 1st method.

The statistic that statistic is inferred the master pattern that the 306d of portion will so infer is stored among the statistic storage part 306c.Afterwards, this statistic infers with to repeat R (〉=1) to the storage of statistic storage part 306c inferior.Its result exports the statistic that obtains as the statistic of the final master pattern 322 that generates.

Below, illustrate present embodiment is applicable to concrete example in the environment sound identification of PDA.

At first, the required reference model of environment sound identification is read by reference model preparation portion 302 from CD-ROM.The user considers to carry out the environment (utilizing information) of identification, wants the environment sound of discerning from the picture selection.For example, select by ' car ', and then select ' alarm song ', ' baby sound ', ' sound of electric car ' etc.Select according to this, in the reference model of reference model selection portion 305 from be stored in reference model storage part 303, select corresponding reference model.Afterwards, use the reference model of choosing 323 one by one, master pattern preparing department 306 is respectively to its production standard model.

Then, the user is to PDA301 starting ' information is provided ' application programs such as (by judging the information that provides based on the situation of environment sound).This application program is to come practice condition to judge according to the environment sound, and the program of adequate information is provided to the user.In case demonstrations such as ' accurately judging ', ' judging fast ' are then carried out in starting in the display frame of PDA301.Relative with it, the user selects one of them.

Afterwards, specification information preparing department 307 is according to this selection result, manufacturing specification information.For example, under the situation of selecting ' accurately judging ', in order to improve precision, making is made as the mixed distribution number 10 specification information.On the other hand, under the situation of selecting ' judging fast ', for high speed processing, making is made as the mixed distribution number 1 specification information.In addition, under the situation that a plurality of PDA linkages are handled, also can judge current available cpu power, and utilize this cpu power to come manufacturing specification information.

According to this specification information, make single master pattern that mixes such as ' car ', ' alarm tone ', ' baby's sound ', ' sound of electric car '.Afterwards, PDA301 utilizes the master pattern of making to carry out environment identification, according to this recognition result, in the various information of PDA screen displayed.For example, be identified as nearby under the situation that ' car ' arranged, showing road-map, or under the situation that identifies ' baby's sound ', showing the advertisement of toyshop.Like this, realize having used information master pattern, that discern based on the environment sound of making to provide by the master pattern producing device of present embodiment.In addition, can come the complexity of adjustment criteria model corresponding to the specification of application program.

As mentioned above, according to the 3rd embodiment of the present invention, because the statistic of basis of calculation model is come the production standard model, make for probability or likelihood maximization or maximization, so can provide suitable high precision master pattern by utilizing situation according to a plurality of reference models that utilize Information Selection.

In addition, because come the production standard model, so can prepare to be suitable for to utilize the master pattern of the equipment of master pattern according to specification information.

In addition, the processing multiplicity that statistic is inferred the 306d of portion also can be to make the likelihood size shown in the above-mentioned formula 25 up to becoming certain number of times more than the defined threshold.

In addition, the GMM that constitutes master pattern 322 also can be distributed by the mixed Gaussian to the diverse mixed distribution number of each noise and constitute.

In addition, model of cognition is not limited to noise model, also can discern the speaker, or the identification age etc.

In addition, also master pattern 322 can be recorded in the memory devices such as CD-ROM, DVD-RAM, hard disk.

In addition, also can make reference model 321 according to noise data, replace from memory devices such as CD-ROM, reading in reference model 321 with this by PDA301.

In addition, reference model preparation portion 302 also can append the new reference model that reads in, be updated in the reference model storage part 303 where necessary from memory devices such as CD-ROM, and deletion is stored in the useless reference model in the reference model storage part 303.

In addition, reference model preparation portion 302 also can append new reference model, be updated in the reference model storage part 303 where necessary through communication path.

In addition, behind the production standard model, also can further utilize data to learn.

In addition, also can the settle the standard structure of model or status number etc. of master pattern structure determination portion.

In addition, near indication also can distribute different because of the output of the incident that becomes object or master pattern with reference to G, or is made it to change by multiplicity R.

(the 4th embodiment)

Figure 19 is the block diagram that the integral body of the master pattern producing device of expression the present invention the 4th embodiment constitutes.Here, illustrate master pattern producing device of the present invention is assembled in example in the server 401 of computer system.In the present embodiment, the situation with the master pattern of making face identification usefulness is that example describes.

Server 401 is computer installations in the communication system etc., the face identification that defines as the output probability of making by incident is equipped with video camera 411, image data storage portion 412, reference model preparation portion 402, reference model storage part 412, utilizes message pick-up portion 404, reference model selection portion 405, master pattern preparing department 406 and write section 413 with the master pattern producing device of master pattern.

The view data of collecting face by video camera 411, with the face image data storage in image data storage portion 412.Reference model preparation portion 402 uses the face view data of image data storage portion 412 storages, and each speaker is made reference model 421, is stored in the reference model storage part 403.

Utilize message pick-up portion 404 to utilize phone 414 to receive to become people's the age at age of face identifying object of user's expectation and the information of sex, as utilizing information 424.Reference model selection portion 405 is utilized information 424 according to what utilize that message pick-up portion 404 receives, from the reference model 421 of reference model storage part 403 storages, selects and utilize the age reference model 423 corresponding with the speaker of sex of information 424 expressions.

Master pattern preparing department 406 is production standard models 422, make for the probability of the reference model 423 of reference model selection portion 405 selected speaker's face images or the handling part of likelihood maximization or maximization, have master pattern preparing department 206 identical functions with the 2nd embodiment, and, have the function of the 2nd approximate 306e of portion in approximate 104e of portion of the 1st in the 1st embodiment and the 3rd embodiment.That is, carry out the calculating of having made up 3 kinds of approximate treatment shown in 1-the 3rd embodiment.

The master pattern 422 that write section 413 is made master pattern preparing department 406 writes in the memory device such as CD-ROM.

Below, the action of the server 401 of above-mentioned formation is described.

Figure 20 is the process flow diagram of the action step of expression server 401.Figure 21 is the figure of an example of the reference model used of the action step of expression explanation server 401 and master pattern.

At first, before the production standard model, prepare to become the reference model (the step S400 of Figure 20) of its benchmark.That is, utilize video camera 411 to collect, be stored in the image data storage portion 412 from little A to the face view data of little Z.Reference model preparation portion 402 uses the face view data of image data storage portion 412 storages, utilizes the EM algorithm to make each speaker's reference model 421.Here, reference model 421 is made of GMM.

The reference model 421 that reference model storage part 403 storage reference model preparation portions 402 make.Here, shown in the reference model 421 of Figure 21, be that 5 G deletes and constitutes by the mixed distribution number from little A to whole reference models of little Z.As characteristic quantity, use the concentration value of the pixel of 100 dimensions (J=100).

Then, utilize message pick-up portion 404 to utilize phone 414 to receive as utilizing the age of information 424 and the information of sex (the step S401 of Figure 20).Here, as utilizing information 424, be from 11 years old to 15 years old the male sex and women from 22 years old to 26 years old.Reference model selection portion 405 is utilized information 424 according to this, from the reference model 421 of reference model storage part 403 storages, selects corresponding to the reference model 423 that utilizes information 424 (the step S402 of Figure 20).Particularly, ' shown in the reference model 423 ' of selection, here, selecting as Figure 21 from 11 years old to 15 years old the male sex and reference model from 22 years old to 26 years old women.

Then, master pattern preparing department 406 production standard models 422 make probability or likelihood maximization or maximization (the step S403 of Figure 20) for reference model selection portion 405 selected speakers' reference model 421.Here, shown in the master pattern 422 of Figure 21, be that 3 GMM constitutes each of two master patterns 422 by the mixed distribution number.

The method for making of master pattern 422 is the same with the 2nd embodiment basically.But particularly, the approximate treatment during the statistic of master pattern 422 is inferred is following to be carried out.Promptly, master pattern preparing department 406 is by through built-in storage part etc., model that will be by the approximate treatment made the same with the approximate treatment carried out of the 1st approximate 104e of portion in the 1st embodiment is as initial value, execution is based on the calculating of the approximate treatment the same with the approximate treatment that generally is similar to the 206e of portion execution in the 2nd embodiment, with its result as initial value, carry out with the 3rd embodiment in the same approximate treatment of approximate treatment of the 2nd approximate 306e of portion execution.

Two master patterns 422 that write section 413 is made master pattern preparing department 406 write in the memory device such as CD-ROM (the step S404 of Figure 20).

The user has write from 11 years old to 15 years old the male sex's master pattern and memory device from 22 years old to 26 years old women's master pattern by posting to receive.

Below, illustrate present embodiment is applicable to according to action prediction introduces concrete example in the information providing system in shop etc.This information providing system is made of the automobile navigation apparatus and the information-providing server device that utilize communication network to connect.Automobile navigation apparatus possesses following function: by being used as the action forecast model by the master pattern that the master pattern producing device 401 of present embodiment is made in advance, prediction people's the action destination of car (be etc.) provides near the information related with this action (be positioned at the destination store information such as dining room etc.).

At first, the user uses automobile navigation apparatus, entrusts the server 401 that is connected by telephone line 414 to make the action forecast model of own usefulness.

Particularly, the user selects to press the button of ' recommendation function ' in the picture in the automobile navigation apparatus items displayed.At this moment, become the picture in input user's residence (utilizing the place), age, sex, interest etc.

Here, the user is made as father and mother.At first, with the picture of automobile navigation apparatus talk with on one side, father's personal information imported on one side.With regard to the residence, by the conversion automatically of input telephone number.Perhaps, press the button of ' utilizing the place ' when in automobile navigation apparatus, showing current location, with current location as utilizing the place to import.Here, the information with the residence is made as residence A.With regard to age and sex, select and input ' over fifty years old ', ' man '.With regard to interest, because the selectable items of prior demonstration is arranged, so the user selects this project.Here, the interest information with father is made as interest information A.

Then, import mother's personal information equally.The personal information that making is made of residence B, more than 40 year old, woman, interest information B.This input results shows shown in the example as the picture of Figure 22.

At last, the personal information that automobile navigation apparatus will so be made is used subsidiary telephone line 414 as utilizing information, is transferred to the server 401 as the information-providing server device.

Then, the personal information (utilizing information) that server 401 comes according to transmission is made two action forecast models of father and mother.Here, the action forecast model is showed by probability model, its input be week, constantly, current ground etc., output be the probability of prompting shop A information, the probability of prompting shop B information, the probability of prompting shop C information, the probability of prompting parking lot information etc.

The a plurality of reference models that are stored in the reference model storage part 403 of server 401 are action forecast models of being made by age, sex, representative residence and interest tendency.The load button of server 401 prior use automobile navigation apparatus waits and replaces video camera 411, import various personal information (information of above-mentioned input and output), thereby in the various personal information of image data storage portion 412 storages, reference model preparation portion 402 is according to the personal information that is stored in the image data storage portion 412 thus, make each reference model 421 of multiple typical user, be stored in the reference model storage part 403.

Reference model selection portion 405 is used personal information (utilizing information), selects to be suitable for the reference model of personal information.For example, select the most probably above consistent reference model of option same street, age, interest identical with sex.The master pattern preparing department 406 of server 401 makes the master pattern that combines selected reference model.Write section 413 is stored in the master pattern that is made in the storage card.Here, storage father and two people's of mother master pattern.By mailing storage card is delivered to the user.

The user inserts automobile navigation apparatus by the storage card that will receive, and selects to be shown in ' father ' and ' mother ' in the picture, sets the user.Thus, automobile navigation apparatus by the master pattern in the storage card that will be stored in installation as the action forecast model, according to current week, constantly, place etc., in the timing prompting store information of necessity etc.Like this, realize by the master pattern that will make by the master pattern producing device of present embodiment, predict people's action (being the destination of car) and the information providing system of the information related with this action is provided as the action forecast model.

As mentioned above, according to the 4th embodiment of the present invention, because production standard model after the statistic of basis of calculation model, make for probability or likelihood maximization or maximization, so can be according to utilizing situation that suitable high precision master pattern is provided according to a plurality of reference models that utilize Information Selection to arrive.

In addition, the GMM of formation master pattern 422 also can be distributed by the mixed Gaussian that each speaker is had different distribution numbers and constitute.

In addition, reference model preparation portion 402 appends, is updated in the reference model storage part 403 after also can making new reference model where necessary, and deletion is stored in the useless reference model in the reference model storage part 403.

(the 5th embodiment)

Figure 23 is the block diagram that the integral body of the master pattern producing device of expression the present invention the 5th embodiment constitutes.Here, illustrate master pattern producing device of the present invention is assembled in example in the server 501 in the computer system.In the present embodiment, be that example describes with the situation of making the master pattern (adaptive model) that speech recognition uses.

Server 501 is computer installations in the communication system etc., as making the speech recognition that defined by the output probability that moves between the set of incident and incident or the incident master pattern producing device with master pattern, outfit is read in portion 511, speech data storage part 512, reference model preparation portion 502, reference model storage part 503, is utilized message pick-up portion 504, reference model selection portion 505, master pattern preparing department 506, specification information acceptance division 507 and write section 513.

Read in portion 511 and read in the children that write in the memory devices such as CD-ROM, adult, the elderly's speech data, be stored in the speech data storage part 512.Reference model preparation portion 502 uses the speech data of speech data storage part 512 storages, and each speaker is made reference model 521.The reference model 521 that reference model storage part 503 storage reference model preparation portions 502 make.

Specification information acceptance division 507 receives specification information 525.Utilize message pick-up portion 504 to receive as the voice that utilize the user of information 524.Reference model selection portion 505 is from the reference model 521 of reference model storage part 503 storage, and selecting on the audio frequency with utilizing information 524 is the approaching speaker's of user's voice reference model.

Master pattern preparing department 506 is according to specification information 525 production standard models 522, makes for the probability of reference model selection portion 505 selected speaker's reference models 523 or the handling part of likelihood maximization or maximization, has master pattern preparing department 104 identical functions with the 1st embodiment.The master pattern 522 that write section 513 is made master pattern preparing department 506 writes in the memory device such as CD-ROM.

Below, the action of the server 501 of above-mentioned formation is described.

Figure 24 is the process flow diagram of the action step of expression server 501.Figure 25 is the reference model used of the action step of expression explanation server 501 and the figure of master pattern one example.

At first, before the production standard model, prepare to become the reference model (the step S500 of Figure 24) of its benchmark.That is, read in portion 511 and read in the speech data that writes in the memory devices such as CD-ROM, be stored in the speech data storage part 512.Reference model preparation portion 502 uses the speech data of speech data storage part 512 storages, utilizes the presuming method again of Bao nurse-Wei Erqi that each speaker is made reference model 521.The reference model 521 that reference model storage part 503 storage reference model preparation portions 502 make.

Reference model 521 is made of the HMM of each phoneme.Here, shown in the reference model 521 of Figure 25, each speaker's of children reference model, utilizing status number is that the mixed distribution number is that the output that constitutes HMM that distributes of 3 mixed Gaussian distributes under 3, each state; Adult's each speaker's reference model, utilizing status number is that the mixed distribution number is that the output that constitutes HMM that distributes of 64 mixed Gaussian distributes under 3, each state; Each speaker's of the elderly reference model, utilizing status number is that the mixed distribution number is that the output that constitutes HMM that distributes of 16 mixed Gaussian distributes under 3, each state.This is that adult's speech data is many because children's speech data is less.As characteristic quantity, use Mel cepstrum (Mel-frequencyceptral coefficient) coefficient of 25 dimensions (J=25).

Then, utilize the voice of message pick-up portion 504, as utilizing information 524 (the step S501 of Figure 24) from end device 514 reception users.Reference model selection portion 505 selects to utilize with conduct on the audio frequency the approaching reference model 523 (the step S502 of Figure 24) of user's voice of information 524 from the reference model 521 of reference model storage part 503 storages.Particularly, ' shown in the reference model 523 ' of selection,, select approaching speaker 10 people's (Ng=10) reference model here as Figure 25.

Afterwards, specification information acceptance division 507 receives specification information 525 (the step S503 of Figure 24) according to user's request from end device 514.Here, receive the specification information 525 that is called the quick identification processing.The specification information 525 that master pattern preparing department 506 receives according to specification information acceptance division 507, production standard model 522 makes probability or the likelihood maximization or the maximization (the step S504 of Figure 24) of the speaker's who selects for reference model selection portion 505 reference model 523.Particularly, master pattern 522 is shown in the master pattern 522 of Figure 25, and the information according to what is called is handled as the quick identification of specification information 525 is made of 2 HMM that mix (Mf=2), 3 states.HMM constitutes by each phoneme.

The method for making of master pattern 522 is the same with the 1st embodiment carries out.

The master pattern 522 that write section 513 is made master pattern preparing department 506 writes in the memory device such as CD-ROM (the step S505 of Figure 24).

Below, illustrate present embodiment is applicable to based on the concrete example in the recreation of the speech recognition of using communication network.Here, establish the speech recognition portion that master pattern that server 501 possesses use carries out speech recognition.

In addition, with PDA as end device 514.They are connected by communication network.

Server 501 in the timing that obtains speech data by CD or DVD etc., is prepared reference model by the portion of reading in 511, speech data storage part 512 and reference model preparation portion 502 successively.

The user starts the games that utilized speech recognition in PDA (end device 514), be here ' action game '.At this moment, show ' please with sound send ' action ", so user's sounding is ' action '.These voice are as utilizing information, send to server 501 from PDA (end device 514), by server 501 utilize message pick-up portion 504 and reference model selection portion 505, in a plurality of reference models from be stored in reference model storage part 503, select the reference model consistent with the user.

In addition, because the user wishes rapid reaction, so in the setting picture of PDA (end device 514), set ' identification at a high speed '.This setting content as specification information, is sent to server 501 from PDA (end device 514), and server 501 is made 2 master patterns that mix according to this specification information and selected reference model by master pattern preparing department 506.

The user sends instructions such as ' moving right ', ' being moved to the left ' to the microphone of PDA with sound in action game.The voice of input are sent to server, carry out the speech recognition that has utilized the master pattern of having made.This recognition result is sent to PDA (end device 514) from server 501, among the PDA (end device 514), according to the recognition result that sends, the role action in the action game.Like this, by being used for speech recognition, realize voice-based action game by the master pattern that the master pattern producing device of present embodiment is made.

In addition, same, also present embodiment can be applicable to other application program, for example use the translation system of communication network.For example, the user starts the application program that is called ' voiced translation ' in PDA (end device 514).At this moment, show ' please with sound send ' translation ".The user sends the sound of ' translation '.These voice as utilizing information, are sent to server 501 from PDA (end device 514).In addition, the user is owing to wishing accurately identification, so the content of indication in this application program ' hope is identification accurately '.Should indicate as specification information, send to server 501 from PDA (end device 514).Server 501 utilizes information and specification information according to what send, makes for example 100 master patterns that mix.

The user sends the sound of ' good morning ' to the microphone of PDA (end device 514).The voice of input are sent to server 501 from PDA (end device 514), be identified as ' good morning ' afterwards, this recognition result is returned to PDA (end device 514) at server 501.PDA (end device 514) will translate into English from the recognition result that server 501 receives, and ' GOOD MORNING ' is shown in the picture with its result.Like this, by being used for speech recognition, can realize voice-based translating equipment by the master pattern that the master pattern producing device of present embodiment is made.

As mentioned above, according to the 5th embodiment of the present invention, the statistic of basis of calculation model is come the production standard model, makes for probability or likelihood maximization or maximization according to a plurality of reference models that utilize Information Selection to arrive, so can be according to utilizing situation that suitable high precision master pattern is provided.

In addition, because come the production standard model, so prepare to be applicable to the master pattern of the equipment that utilizes master pattern according to specification information.

In addition, reference model preparation portion 502 can be adapted to the high precision reference model of the mixed distribution number of data number to each reference model preparation, can use the high precision reference model to come the production standard model.Therefore, can utilize the high precision master pattern.

In addition, master pattern 522 is not limited to each phoneme is constituted HMM, also can constitute by depending on contextual HMM.

In addition, the HMM of formation master pattern 522 also can be distributed by the mixed Gaussian that each state is had different distribution numbers and constitute.

In addition, server 501 also can use master pattern 522 to carry out speech recognition.

In addition, reference model preparation portion 502 appends, is updated in the reference model storage part 503 after also can making new reference model where necessary, and deletion is stored in the useless reference model in the reference model storage part 503.

(the 6th embodiment)

Figure 26 is the block diagram that the integral body of the master pattern producing device of expression the present invention the 6th embodiment constitutes.Here, illustrate master pattern producing device of the present invention is assembled in example in the server 601 in the computer system.In the present embodiment, to understand the situation of the master pattern (hobby model) of usefulness be that example describes to make intention.

Server 601 is computer installations in the communication system etc., the intention identification that defines as the output probability of making by incident possesses the portion of reading in 611, reference model preparation portion 602, reference model storage part 603, utilizes message pick-up portion 604, reference model selection portion 605, master pattern preparing department 606 and specification information preparing department 607 with the master pattern producing device of master pattern.

Read in portion 611 and read in the hobby model of the little A of speaker of all ages and classes that writes in the memory devices such as CD-ROM to the little Z of speaker, reference model preparation portion 602 sends to reference model storage part 603 with the reference model 621 that reads in, reference model storage part 603 storage reference models 621.

The cpu power of the computing machine that specification information preparing department 607 correspondences are being popularized, manufacturing specification information 625.Utilize message pick-up portion 604 to receive the information 624 of utilizing from end device 614.Reference model selection portion 605 is utilized information 624 according to what utilize that message pick-up portion 604 receives, from the reference model 621 of reference model storage part 603 storages, selects corresponding to the reference model 623 that utilizes information 624.

Master pattern preparing department 606 is specification informations 625 of making according to specification information preparing department 607, production standard model 622, make the probability of the reference model of selecting for reference model selection portion 605 623 or the handling part of likelihood maximization or maximization, have with the 2nd embodiment in master pattern preparing department 206 identical functions, simultaneously, the function that also has the 2nd approximate 306e of portion of the 3rd embodiment.That is, make up two kinds of approximate treatment shown in the 2nd and the 3rd embodiment, calculate.

Below, the action of the server 601 of above-mentioned formation is described.

Figure 27 is the process flow diagram of the action step of expression server 601.Figure 28 is the reference model used of the action step of expression explanation server 601 and the figure of master pattern one example.

At first, before the production standard model, prepare to become the reference model (the step S600 of Figure 27) of its benchmark.Promptly, read in portion 611 and read in the hobby model from the little A of speaker to the little Z of speaker of all ages and classes that writes in the memory devices such as CD-ROM, reference model preparation portion 602 sends to reference model storage part 603 with the reference model 621 that reads in, reference model storage part 603 storage reference models 621.

Reference model 621 is made of GMM.Here, shown in the reference model 621 of Figure 28, be that 3 GMM constitutes by the mixed distribution number.As learning data, use the characteristic quantity of 5 dimensions (J=5) that interest, personality etc. is quantized.Before request production standard model, carry out the preparation of reference model.

That then, utilizes message pick-up portion 604 to receive to want the age level of making the hobby model utilizes information 624 (the step S601 of Figure 27).Here, be utilize all ages and classes bands such as more than 20 year old, one's late 30s, more than 40 year old the hobby model utilize information 624.Reference model selection portion 605 is as ' shown in the reference model 623 ' of selection of Figure 28, from the reference model 621 of reference model storage part 603 storages, the 624 represented ages of selecting to utilize message pick-up portion 604 to receive of information of utilizing are with speaker's hobby model (the step S602 of Figure 27).

Afterwards, the cpu power of the computing machine that specification information preparing department 607 bases are being popularized, memory capacity etc., manufacturing specification information 625 (the step S603 of Figure 27).Here, make the specification information 625 that the identification of so-called speed is usually handled.

The specification information 625 that master pattern preparing department 606 makes according to specification information preparing department 607, production standard model 622 makes probability or the likelihood maximization or the maximization (the step S604 of Figure 27) of the speaker's who selects for reference model selection portion 605 reference model 623.Here, master pattern 622 is shown in the master pattern 622 of Figure 28, and the information such as common speed identification processing according to as specification information 625 are made of 3 GMM that mix (Mf=3).

The method for making of master pattern 622 is the same with the 2nd embodiment basically carries out.But, with regard to the approximate treatment during the statistic of master pattern 622 is inferred, execution specific as follows.Promptly, master pattern preparing department 606 passes through built-in storage part etc., utilize the calculating of the approximate treatment the same with the approximate treatment that generally is similar to the 206e of portion execution in the 2nd embodiment, with its result as initial value, carry out with the 3rd embodiment in the same approximate treatment of approximate treatment of the 2nd approximate 306e of portion.

Below, illustrate present embodiment is applicable to concrete example in the information indexing device.Here, the input search key of reference model, output utilizes the probability of searching route A, searching route B etc.If use different searching routes, then the result for retrieval difference of Xian Shiing.In addition, establishing the reference model of preparing in the reference model storage part 603 of server 601, is the speaker's of representative feature model.

At first, the user uses the remote controller (end device 614) that attaches in server 601, and input utilizes information.Utilizing information is age, personality, sex, interest etc.In addition, also can be the information of identification ' children ', ' performer ', ' college student ' isotactic grouping body.

Then, the user is by selecting picture, and one of selection utilizes equipment from ' automobile navigation apparatus with ', ' pocket telephone with ', ' computer with ', ' used as television ' etc.The specification information preparing department 607 of server 601 comes manufacturing specification information according to the cpu power that utilizes equipment, memory capacity.Here, establish and selected ' used as television ', making content is the little specification information 625 of cpu power and memory capacity, according to this specification information 625, even if master pattern preparing department 606 makes the 3 hybrid standard models that also move under the little cpu power.The master pattern of making is stored in the storage card, and the user inserts this storage card in the televisor.

User's utilization is shown in the EPG in the televisor etc., and is programs recommended in order to retrieve, the input search key.At this moment, the master pattern of televisor service recorder in storage card determines to meet the searching route of search key, retrieves program along this searching route, shows as the program that meets user's hobby.Like this, realize having used the indexing unit easily of the master pattern of making by the master pattern producing device of present embodiment.

As mentioned above, according to the 6th embodiment of the present invention, because production standard model after the statistic of basis of calculation model, make for probability or likelihood maximization or maximization, so can be according to utilizing situation that suitable high precision master pattern is provided according to a plurality of reference models that utilize Information Selection to arrive.

In addition, because come the production standard model, so prepare to be suitable for to utilize the master pattern of the equipment of master pattern according to specification information.

In addition, the GMM of formation master pattern 622 also can be distributed by the mixed Gaussian that each speaker is had different distribution numbers and constitute.

In addition, reference model preparation portion 602 also can append the new reference model that reads in, be updated in the reference model storage part 603 where necessary from memory devices such as CD-ROM, and deletion is stored in the useless reference model in the reference model storage part 603.

In addition, the GMM of reference model and master pattern also can show the part of calling network (Bayesian network).

In addition, master pattern structure determination portion also can be determined HMM structures such as monophone, tripartite phone, state be shared or status number etc.

(the 7th embodiment)

Figure 29 is the block diagram that the integral body of the master pattern producing device of expression the present invention the 7th embodiment constitutes.Here, illustrate master pattern producing device of the present invention is assembled in example in the server 701 in the computer system.In the present embodiment, be that example describes with the situation of making the master pattern (adaptive model) that speech recognition uses.

Server 701 is computer installations in the communication system etc., as making the speech recognition that defines by the output probability that changes between the set of incident and incident or incident master pattern producing device, be equipped with and read in portion 711, reference model preparation portion 702, reference model storage part 703, utilize message pick-up portion 704, reference model selection portion 705, master pattern preparing department 706, specification information acceptance division 707, master pattern storage part 708 and master pattern sending part 709 with master pattern.

Reference model preparation portion 702 will be 711 that read in by the portion of reading in, be written in speaker in the memory device such as CD-ROM, the speech recognition of noise, tone classification sends to reference model storage part 703 with reference model, the reference model 721 that 703 storages of reference model storage part are sent.

Specification information acceptance division 707 receives specification information 725 from end device 712.Utilize message pick-up portion 704 to be received in the voice of the user of sounding under certain noise from end device 712.Reference model selection portion 705 is selected on the audio frequency and reference model 723 as the approaching speaker of the user's voice that utilize information 724, noise, tone from the reference model 721 of reference model storage part 703 storage.

Master pattern preparing department 706 is the handling parts that maximize or maximize according to specification information 725 production standard models 722, the probability that makes the reference model of selecting for reference model selection portion 705 723 or likelihood that specification information acceptance division 707 receives, has master pattern preparing department 206 identical functions with the 2nd embodiment.708 storages of master pattern storage part are based on one or more master patterns of specification information 725.When master pattern sending part 709 when user's end device 712 receives the request signal of specification informations and master pattern, be suitable for the master pattern of this specification information to end device 712 transmissions.

Below, the action of the server 701 of above-mentioned formation is described.

Figure 30 is the process flow diagram of the action step of expression server 701.Figure 31 is the reference model used of the action step of expression explanation server 701 and the figure of master pattern one example.

At first, before the production standard model, prepare to become the reference model (the step S700 of Figure 30) of its benchmark.Promptly, reference model preparation portion 702 will be 711 that read in by the portion of reading in, be written in speaker in the memory device such as CD-ROM, the speech recognition of noise, tone classification sends to reference model storage part 703 with reference model, the reference model 721 that 703 storages of reference model storage part are sent.Here, to each speaker, noise, tone, reference model 721 is made of the HMM of each phoneme.In addition, each reference model is shown in the reference model 721 of Figure 31, and utilizing status number is that the mixed distribution number is that the output that constitutes HMM that distributes of 128 mixed Gaussian distributes under 3, each state.As characteristic quantity, use the cepstrum spectral coefficient of 25 dimensions (J=25).

Then, utilize message pick-up portion 704 to receive the voice of user A under noise, as utilizing information 724 (the step S701 of Figure 30) from end device 712.Reference model selection portion 705 selects to utilize with conduct on the audio frequency the approaching reference model 723 (the step S702 of Figure 30) of voice of the user A of information 724 from the reference model 721 of reference model storage part 703 storages.Particularly, ' shown in the reference model 723 ' of selection,, select approaching speaker 100 people's (Ng=100) reference model here as Figure 31.

Afterwards, specification information acceptance division 707 receives specification information 725 (the step S703 of Figure 30) according to the request of user A from end device 712.Here, receive the specification information 725 that is called high accuracy of identification.Master pattern preparing department 706 makes probability or the likelihood maximization or the maximization (the step S704 of Figure 30) of the reference model of selecting for reference model selection portion 705 723 according to specification information 725 production standard models 722.Particularly, master pattern 722 according to the information as the so-called high accuracy of identification of specification information 725, is made of 64 HMM that mix (Mf=64), 3 states shown in the master pattern 722 of Figure 31.HMM constitutes by each phoneme.

The method for making of master pattern 722 is the same with the 2nd embodiment.

708 storages of master pattern storage part are based on one or more master patterns 722 of specification information 725.Here, the master pattern of having made before having stored is 16 HMM that mix of user B, 64 HMM that mix of new memory by using person A.

User A sends request signal (the step S706 of Figure 30) as user A, noise type and the master pattern of specification information from end device 712 to the master pattern sending part 709 of server 701.Master pattern sending part 709 sends the master pattern (the step S707 of Figure 30) that is suitable for this specification to end device 712 when the request signal of specification information that receives user A transmission and master pattern.Here, the master pattern 722 with the user A that before made sends to end device 712.

User A uses the master pattern 722 that receives in end device 712 to carry out speech recognition (the step S708 of Figure 30).

Below, illustrate present embodiment is applicable to by automobile navigation apparatus (end device 712) that connects by communication network and server unit (server 701; The master pattern producing device) concrete example in the speech recognition system of Gou Chenging.

At first, the user selects the button of representative ' obtaining the speech model of oneself ' in the picture of automobile navigation apparatus (end device 712).At this moment, be shown as ' please import name ', so import the name of oneself by push-botton operation.Then, be shown as ' please with sound send ' voice ", so the user sends the sound of ' voice ' to the subsidiary microphone of automobile navigation apparatus.These information (user's name, the voice under the noise) as utilizing information, are sent to server 701 from automobile navigation apparatus (end device 712).

Equally, the user selects the button of ' high-precision speech recognition ' at the picture of automobile navigation apparatus (end device 712).At this moment, should selection information as specification information, send to server 701 from automobile navigation apparatus (end device 712).

Server 701 utilizes information and specification information according to these, makes the master pattern that the speech recognition be suitable for the user is used, and with the master pattern made with after user's name is corresponding, be stored in the master pattern storage part 708.

When start-uping a car guider (end device 712) next time, because show ' please import name ', so the user imports name.At this moment, this name is sent to server 701, the master pattern that will be stored in the correspondence in the master pattern 722 by master pattern sending part 709 sends to end device 712 from server 701.From the end device 712 that server 701 is downloaded corresponding to the master pattern of name (user), use the speech recognition of this master pattern execution to the user, carry out voice-based destination setting etc.Like this, by being used for speech recognition, can operate automobile navigation apparatus by voice by the master pattern that the master pattern producing device of present embodiment is made.

As mentioned above, according to the 7th embodiment of the present invention, because production standard model after the statistic of basis of calculation model, make for probability or likelihood maximization or maximization, so can be according to utilizing situation that suitable high precision master pattern is provided according to a plurality of reference models that utilize Information Selection to arrive.

In addition, because master pattern storage part 708 can be stored a plurality of master patterns, so can provide master pattern in case of necessity at once.

In addition because utilize master pattern sending part 709 to send master patterns to end device 712, so when end device 712 and server setting spatially away from the situation in place under, the master pattern that end device 712 can utilize server 701 to make easily.

In addition, master pattern 722 is not limited to each phoneme is constituted HMM, also can constitute by relying on contextual HMM.

In addition, the HMM of formation master pattern 722 also can be distributed by the mixed Gaussian that each state is had different mixed number and constitute.

In addition, also can use master pattern 722 in server 701, to carry out speech recognition, recognition result is sent to end device 712.

In addition, reference model preparation portion 702 appends, is updated in the reference model storage part 703 after also can making new reference model where necessary, and deletion is stored in the useless reference model in the reference model storage part 703.

In addition, reference model preparation portion 702 also can append new reference model, be updated in the reference model storage part 703 through communication path where necessary.

(the 8th embodiment)

Figure 32 is the whole block diagram that constitutes of master pattern producing device of expression the present invention the 8th embodiment.Here, illustrate master pattern producing device of the present invention is assembled in example in the pocket telephone 901.In the present embodiment, be that example describes to make speech recognition with the situation of master pattern.

Pocket telephone 901 is portable information terminals, master pattern producing device as making by the defined speech recognition of of the Hidden Markov Model (HMM) master pattern that shows with the output probability that changes between the set of incident and incident or incident possesses reference model acceptance division 909, reference model preparation portion 902, reference model storage part 903, master pattern preparing department 904, reference model selection portion 905, similar degree information issuing portion 908, master pattern preparing department 906, specification information preparing department 907, microphone 912 and speech recognition portion 913.

Utilize information issuing portion 904 to utilize the picture and the key of pocket telephone 901, make the information 924 of utilizing.

Specification information preparing department 907 is according to the specification of pocket telephone 901, manufacturing specification information 925.Here, so-called specification information is the relevant information of specification with the master pattern of making, and is the relevant information of CPU processing power that is equipped with pocket telephone 90 1 here.

Similar degree information issuing portion 908 makes similar degree information 926 according to the reference model 921 that utilizes information 924, specification information 925 and 903 storages of reference model storage part, sends to reference model preparation portion.

Reference model preparation portion 902 determines whether to prepare reference model according to similar degree information 926.Reference model preparation portion 902 prepares under the situation of reference model being defined as, and will utilize information 924 and specification information 925 to send to reference model acceptance division 909.

Reference model acceptance division 909 receives the reference model corresponding with utilizing information 924 and specification information 925 from server unit 910, sends to reference model preparation portion 902.

Reference model preparation portion 902 is stored in the reference model storage part 903 with reference to the reference model that model acceptance division 909 sends.

Reference model selection portion 905 is selected corresponding to the reference model 923 that utilizes information 924 from the reference model 92 1 of reference model storage part 903 storages.

Master pattern preparing department 906 is according to the specification information 925 production standard models 922 of specification information preparing department 907 making, makes for the probability of reference model selection portion 905 selected reference models 923 or the handling part of likelihood maximization or maximization, comprising: the master pattern structure determination portion 906a of the structure of the model that settles the standard (the mixed distribution number of Gaussian distribution etc.); The primary standard modelling 906b of portion by the statistic initial value of determining that basis of calculation model is used, makes the primary standard model; The statistic storage part 906c of the primary standard model that storage is determined; Infer the 906d of portion with statistic, by using the approximate treatment etc. of the 3rd approximate 906e of portion to being stored in primary standard model among the statistic storage part 906c, calculate the probability that makes the reference model of selecting for reference model selection portion 905 923 or the statistic (generating final master pattern) of likelihood maximization or maximization.

Speech recognition portion 913 uses the master pattern of being made by master pattern preparing department 906 922, and identification is from the user's of microphone 912 inputs voice.

Below, the action of the pocket telephone 901 of above-mentioned formation is described.

Now, be located in the reference model storage part 903 in advance that the storage children use model, as reference model 921.This reference model 921 is made of the HMM of each phoneme.Figure 34 illustrates an example of reference model 921.Here, the mode chart of children with reference model is shown.These reference models are that distribution number is that the output that constitutes HMM that distributes of 16 mixed Gaussian distributes under 3, each state by status number.As characteristic quantity, use the Mel cepstral coefficients of 12 dimensions, the δ Mel cepstral coefficients of 12 dimensions, 25 dimension (J=25) characteristic quantities altogether of δ power.

At first, utilize information issuing portion 904 to make and utilize information 924 (step S900) as what classify under the user.Figure 36 is the figure that expression utilizes the making example of information 924.An example of the selection picture of pocket telephone 901 shown in Figure 36 (a).Here, by pressing ' 4: adult ' button, select this pocket telephone 901 be used to be grown up the women with adult the male sex.Another example shown in Figure 36 (b).Here, ' menu ' button limit input voice are pressed on the limit.Become characteristic quantity by phonetic modification, make as ' user's the speech data ' that utilize information 924 with this user.

On the other hand, specification information preparing department 907 is according to the specification of pocket telephone 901, manufacturing specification information 925 (step S901).Here, according to the size of the memory span of pocket telephone 901, make ' the specification information 925 that mixed distribution several 16 ' is such.

Then, similar degree information issuing portion 908 makes similar degree information 926 (step S902) according to the reference model 921 that utilizes information 924, specification information 925 and 903 storages of reference model storage part, and similar degree information 926 is sent to reference model preparation portion 902.Here, the reference model 921 that is present in the reference model storage part 903 only is that the mixed distribution number is children's usefulness model (with reference to 34) of 3, owing in reference model storage part 903, do not exist and utilize ' adult ' (corresponding Figure 36 (a)) of information 924 with conduct and as ' the reference model that mixed distribution several 16 ' is corresponding of specification information 925, so make ' similarly reference model does not exist ' such similar degree information 926, similar degree information 926 sent to reference model preparation portion 902.In another example, utilizing information 924 is " user's speech data " (corresponding to Figure 36 (b)), and the children that user's speech data are input to 903 storages of reference model storage part make similar degree information 926 with in the model.Here, because to children with the likelihood of model below defined threshold, so ' not having similar reference model ' such similar degree information 926 of making sends to reference model preparation portion 902.

Then, reference model preparation portion 902 determines whether to prepare reference model (step S903) according to similar degree information 926.Here, because ' not having similar reference model ',, supervise the user to prepare reference model so show shown in the example as the picture of the pocket telephone 901 of Figure 37 (a).Here, press ' p.m.entry ' button the user and prepare under the situation of reference model with request, reference model preparation portion 902 is defined as preparing reference model, will utilize information 924 and specification information 925 to send to reference model acceptance division 909.In another example, because ' not having similar reference model ',, will utilize information 924 and specification information 925 to send to reference model acceptance division 909 so reference model preparation portion 902 is defined as preparing automatically reference model.Figure 37 (b) illustrates picture one example of the pocket telephone 901 of this moment.

Relative with it, reference model acceptance division 909 sends to reference model preparation portion 902 after receiving the reference model corresponding with utilizing information 924 and specification information 925 from eye affair apparatus 910.Here, reference model acceptance division 909 from server unit 910 receive as with ' adult ' that utilize information 924 (corresponding Figure 36 (a)) and specification information 925 ' reference model that mixed distribution several 16 ' is corresponding, " the adult women of mixed distribution several 16 uses model " and two reference models such as " the adult male sex of mixed distribution several 16 use model ".

Afterwards, reference model preparation portion 902 is stored in the reference model storage part 903 by the reference model that sends with reference to model acceptance division 909, prepares reference model (step S904).One example of this reference model shown in Figure 35.Here, the image figure that the adult male sex uses, the adult women uses, children use reference model is shown.

Then, reference model selection portion 905 from the reference model 921 of reference model storage part 903 storage, " the adult women of mixed distribution several 16 uses model " and two reference models (step S905) such as " the adult male sex of mixed distribution several 16 use model " of the same category of selecting to belong to corresponding with ' adult ' that utilize information 924.In another example, reference model selection portion 905 selects to utilize with conduct on the audio frequency " " the adult women of mixed distribution several 16 uses model " that user's speech data (likelihood is big) is approaching and two reference models such as " the adult male sex of mixed distribution several 16 use model " of information 924 from the reference model 921 that reference model storage part 903 is being stored.

Then, master pattern preparing department 906 is according to the specification information of making 925, and production standard model 922 makes probability or the likelihood maximization or the maximization (step S906) of the reference model of selecting for reference model selection portion 905 923.

At last, speech recognition portion 913 is according to the master pattern of being made by master pattern preparing department 906 922, and identification is from the user's of microphone 912 inputs voice (step S907).

Below, the detailed step of the step S906 (making of master pattern) among Figure 33 is described.Steps flow chart is the same with process flow diagram shown in Figure 4.But, the structure of accepted standard model or concrete differences such as approximate treatment.

At first, the master pattern structure determination portion 906a structure (the step S102a of Fig. 4) of model that settles the standard.Here, as the structure of master pattern, according to as specification information 925 ' mixed distribution is several 16 ', is made of the HMM of each phoneme, and status number is made as 3, and the mixed distribution number that the output of each state is distributed is defined as 16 (Mf=16).

Then, the 906b of primary standard modelling portion determines the statistic initial value (the step S102b of Fig. 4) that basis of calculation model is used.Here, with selecteed reference model 923 promptly " the adult women of mixed distribution several 16 uses model " be stored among the statistic storage part 906c as the initial value of statistic.In another example, with selecteed reference model 923 promptly " the adult male sex of mixed distribution several 16 uses model " be stored among the statistic storage part 906c as the initial value of statistic.Particularly, the 906b of primary standard modelling portion generates the output distribution shown in the above-mentioned formula 13.

Afterwards, statistic is inferred 2 reference models 923 that the 906d of portion uses reference model selection portion 905 to select, and infers the statistic (the step S102c of Fig. 4) that is stored in the master pattern among the statistic storage part 906c.That is, infer and make master pattern distribute, be probability (being the likelihood logP shown in the following formula 25 here) maximization that distributes of the output shown in the following formula 19 or the statistic (variance yields shown in mean value shown in the mixed weighting coefficient shown in the following formula 16, the following formula 17 and the following formula 18) of maximized master pattern for the output of 2 (Ng=2) reference models 923.But in the present embodiment, the formula 21 during the output shown in the following formula 19 distributes is 16 (mixed distribution numbers of each reference model).

At this moment, statistic each Gaussian distribution of inferring the 3rd approximate 906e of the portion tentative standard model of the 906d of portion does not exert an influence the approximate expression of use formula 53 each other.In addition, be under the 1st time the situation at multiplicity R, near the Gaussian distribution of the master pattern shown in the so-called formula 54 formula 55 is approximated to and the space of distribution spacings such as the Mahalanobis generalised distance of output distributions shown in the formula 54, KL (KL) distance from the Gaussian distribution existence of the reference model 923 shown in the formula 56 of near and the 2nd near two (near indication parameter G=2).On the other hand, under multiplicity R is situation more than the 2nd time, near the Gaussian distribution of the master pattern shown in the so-called formula 54 formula 55 is approximated to the space of distribution spacings such as the Mahalanobis generalised distance that distributes with the output shown in the formula 54, KL (KL) distance from the Gaussian distribution existence of the reference model 923 shown in the formula 56 of near (near indication parameter G=1).

If take all factors into consideration the approximate expression of the above the 3rd approximate 906e of portion, it is as follows that then statistic is inferred the calculating formula of the 906d of portion.That is, statistic is inferred the 906d of portion according to formula 59, formula 60 and formula 61, calculates mixed weighting coefficient, mean value and variance yields respectively, and generates by these parameter certain criteria models, as final master pattern 922.Wherein, use as the 2nd method in the 3rd embodiment, the value of mixed weighting coefficient is made as zero, mean value is made as zero, variance yields is made as 1 method.In addition, corresponding to multiplicity, the value difference of indication parameter G near.In addition, also can be dependent on the value of indication parameter G near, said method is defined as one of the 1st to the 3rd method in the 3rd embodiment.

The statistic that statistic is inferred the master pattern that the 906d of portion will so infer is stored among the statistic storage part 906c.Afterwards, repeat inferring and storage of the inferior this statistic of R (〉=1) to statistic storage part 906c.Its result exports the statistic that obtains as the statistic of the final master pattern 922 that generates.

Figure 38 illustrates the identification result of experiment of having utilized the master pattern 922 that uses the 3rd approximate 906e of portion making.The longitudinal axis illustrates adult's (male sex and women) discrimination (%), and transverse axis illustrates multiplicity R.So-called multiplicity R=0 is meant, the result that the initial model of being made by the primary standard modelling 906b of portion before learning is discerned.In addition, when multiplicity R=1, near indication parameter G=2 being made as, when multiplicity R=2～5, near indication parameter G=1 being made as.

Curve ' data ', the result when representing to utilize during several days speech data study, the result when curve ' women ', curve ' male sex ' represent that respectively initial model is made as adult women, the adult male sex.Learning time of the present invention based on reference model is tens of seconds levels.From experimental result as can be known, can make the high precision standard model at short notice.

Here, for reference, shown in Figure 39 based on the discrimination of the master pattern of making by the approximate 306E of portion of the 2nd in the 3rd embodiment.Be with the difference of the approximate 906e of portion of the 3rd in the present embodiment, regardless of multiplicity R, near indication parameter G=1 being.From experimental result as can be known, if select the adult women, can obtain good result as can be known as initial model.In addition, if select the adult male sex as initial model, precision worsens slightly as can be known.If in conjunction with the result of Figure 38, then irrelevant based on master pattern and the initial model of the 3rd approximate 906e of portion as can be known, can make the high precision standard model.

As mentioned above, according to the 8th embodiment of the present invention, because prepare reference model, so can prepare to be suitable for utilizing the reference model of information and specification information in the timing of necessity according to similar degree information.In addition, by near indication parameter G changing by multiplicity R, can irrespectively provide the high precision standard model with initial model.

In addition, to infer the multiplicity that the 906e of portion handles also can be that the size of the likelihood shown in the following formula 25 becomes the above number of times before of certain defined threshold to statistic.

In addition, master pattern 922 is not limited to each phoneme is constituted HMM, also can constitute by relying on contextual HMM.

In addition, master pattern preparing department 906 also can come execution model to make to the incident output probability part phoneme, under the partial status.

In addition, the HMM that constitutes master pattern 922 also can be made of different status numbers each phoneme, and also can be distributed by the mixed Gaussian that each state is had the different distributions number constitutes.

(the 9th embodiment)

Figure 40 is the whole block diagram that constitutes of master pattern producing device of expression the present invention the 9th embodiment.Here, illustrate master pattern producing device of the present invention is assembled in PDA (Personal DigitalAssistant: the personal digital assistant) example in 1001.Below, in the present embodiment, be that example describes with the situation of master pattern to make speech recognition.

PDA1001 is a portable data assistance, master pattern producing device as making the defined speech recognition usefulness of the Hidden Markov Model (HMM) master pattern that shows with the output probability that changes between the set of incident and incident or incident possesses reference model storage part 1003, master pattern preparing department 1006, application program and specification information correspondence database 1014, microphone 1012 and speech recognition portion 1013.The 1006 outfit standard model construction determination portion 1006a of master pattern preparing department, the primary standard modelling 1006b of portion, statistic storage part 306c and statistic are inferred the 306d of portion.

Master pattern preparing department 1006 uses application program and specification information correspondence database 1014 according to the application program actuate message 1027 that sends (here for the application program of starting ID number), obtains specification information 1025.Figure 41 illustrates the data example of specification information correspondence database 1014.Login is corresponding to the specification information (being the mixed distribution number here) of application program (ID number and name) in specification information correspondence database 1014.

Master pattern preparing department 1006 is according to the handling part of the specification information 1025 production standard models of obtaining 1022, the probability that makes a reference model 1021 of storing for reference model storage part 1003 or likelihood maximization or maximization, has the function of the 2nd approximate 306e of portion of the 3rd embodiment.

Speech recognition portion 1013 uses the master pattern of being made by master pattern preparing department 1,006 1022, and identification is from the user's of microphone 1012 inputs voice.

Below, the action of the PDA1001 of above-mentioned formation is described.

Figure 42 is the process flow diagram of the action step of expression PDA1001.

Here, be located in the reference model storage part 1003 in advance that user with a lot of mixed distribution numbers of storage uses model, as reference model 1021.Reference model 1021 is made of the HMM of each phoneme.Figure 43 illustrates an example of reference model 1021.This reference model is that distribution number is that the output that constitutes HMM that distributes of 300 mixed Gaussian distributes under 3, each state by status number.As characteristic quantity, use the Mel cepstral coefficients of 12 dimensions, the δ Mel cepstral coefficients of 12 dimensions, the characteristic quantity that amounts to 25 dimensions (J=25) of δ power.

At first, the user starts for example application program (step S1000) of so-called ' stock exchange '.

Relative with it, master pattern preparing department 1006 receives the ID ' 3 ' of the application program of starting, as application program actuate message (step S1001).Afterwards, use application program and specification information correspondence database 1014, according to as corresponding to the specification information 1025 of ID ' 3 ' ' mixed distribution is several 126 ', production standard model 1022 (step S1002).Particularly, as master pattern 1022, constitute by the HMM of the context dependent form of several 126 (Mf=126) of mixed distribution, 3 states.

Then, master pattern preparing department 1006 receives specification information 1025 (step S1001), comes production standard model (step S1002) according to specification information 1025.

At last, speech recognition portion 1013 is according to the master pattern of being made by master pattern preparing department 1,006 1022, and identification is from the user's of microphone 1012 inputs voice (step S1003).

Below, the detailed step of the step S1002 (making of master pattern) among Figure 42 is described.Steps flow chart is the same with process flow diagram shown in Figure 4.But, the structure of accepted standard model or concrete differences such as approximate treatment.

At first, master pattern structure determination portion 1006a receive application program ID ' 3 ' as application program actuate message 1027 after, by using application program and specification information correspondence database 1014 to contrast specification information 1025 corresponding to ID ' 3 ' (' mixed distribution several 126 '), the structure of master pattern is defined as the context dependent form HMM (the step S102a of Fig. 4) of several 126 (Mf=126) of mixed distribution, 3 states.

Then, the primary standard modelling 1006b of portion determines the statistic initial value (the step S102b of Fig. 4) that basis of calculation model is used according to the structure of the definite master pattern of master pattern structure determination portion 1006a.Here, will carry out the value of described later trooping (clustering),, be stored among the statistic storage part 306c as the initial value of statistic by k-means method and the method for using Mahalanobis generalised distance.

Afterwards, statistic is inferred the 306d of portion and is used the reference model 1021 that is stored in the reference model storage part 1003, infers the statistic (the step S102c of Fig. 4) that is stored in the master pattern among the statistic storage part 306c.In addition, this statistic is inferred inferring of the 306d of portion and is handled the same with the 3rd embodiment.

Below, illustrate based on the initial value of the primary standard modelling 1006b of portion and determine method, promptly utilize k-means method and the trooping of method of having used Mahalanobis generalised distance.Figure 44 illustrates the process flow diagram of trooping.The mode chart of trooping shown in Figure 45-Figure 48 in addition.

At first, in the step S1004 of Figure 44, prepare 126 representative points (Figure 45) as master pattern mixed distribution number.Here, from 300 of reference model outputs distribute, selects 126 output distributions, the mean value of the distribution chosen is made as representative point.

Afterwards, in the step S1005 of Figure 44, each representative point is determined the output vector (Figure 46) of the reference model that Mahalanobis generalised distance is near.Afterwards, in the step S1006 of Figure 44, show the near distribution of determining by step S1005, mean value is made as new representative point (Figure 47) with a Gaussian distribution.

Afterwards, in the step S1007 of Figure 44, determine whether to stop to troop operation.Here, each representative point and with reference to the Mahalanobis generalised distance rate of change of the distribution of vector (with and the distance of preceding 1 representative point between poor) stop under the situation below the threshold value.Under the situation that does not satisfy stop condition, return the step S1005 of Figure 44, determine near distribution, repeat same operation.

On the other hand, satisfying under the situation of stop condition, advancing to the step S1008 of Figure 44, determine the initial value of statistic after, be stored among the statistic storage part 306c.Like this, execution is determined based on the initial value of trooping.

As mentioned above, according to the 9th embodiment of the present invention, can obtain to be suitable for the master pattern of specification information automatically with the application program interlock.

In addition, master pattern 1022 also can constitute HMM by each phoneme.

In addition, master pattern preparing department 1006 also can carry out modelling to the incident output probability part phoneme, under the partial status.

In addition, the HMM that constitutes master pattern 1022 also can be by constituting the different status number of each phoneme, and also can be distributed by the mixed Gaussian that each state is had the different distributions number constitutes.

(the 10th embodiment)

Figure 49 is the whole block diagram that constitutes of master pattern producing device of expression the present invention the 10th embodiment.Here, illustrate master pattern producing device of the present invention is assembled in example in the server 801 in the computer system.In the present embodiment, be that example describes with the situation of making the master pattern (adaptive model) that speech recognition uses.

Server 801 is computer installations in the communication system etc., master pattern producing device as making the speech recognition usefulness master pattern that is defined by the output probability that changes between the set of incident and incident or incident has the portion of reading in 711, reference model preparation portion 702, reference model storage part 703, utilizes message pick-up portion 704, reference model selection portion 705, master pattern preparing department 706, specification information acceptance division 707, master pattern storage part 708, master pattern sending part 709 and reference model acceptance division 810.

Reference model preparation portion 702 will send to reference model storage part 703 with reference model by the speech recognition that speaker, noise, tone that the portion of reading in 711 reads in and writes are classified in memory devices such as CD-ROM.The reference model 721 that 703 storages of reference model storage part are sent.In addition, reference model preparation portion 702 sends the speech recognition reference model that reference model acceptance division 810 receives at the transmission from end device 712 to reference model storage part 703.The reference model 721 that 703 storages of reference model storage part send.

Specification information acceptance division 707 receives specification information 725 from end device 712.Utilize message pick-up portion 704 to be received in as the voice that utilize the user of sounding under the noise of information 724 from end device 712.Reference model selection portion 705 is selected on the audio frequency and reference model 723 as the approaching speaker of the user's voice that utilize information 724 that utilize message pick-up portion 704 to receive, noise, tone from the reference model 721 of reference model storage part 703 storage.

Master pattern preparing department 706 is according to the handling part of specification information 725 production standard models 722, the probability that makes the reference model of selecting for reference model selection portion 705 723 or likelihood maximization or maximization, has master pattern preparing department 206 identical functions with the 2nd embodiment.708 storages of master pattern storage part are based on one or more master patterns of specification information 725.When master pattern sending part 709 receives the request signal of specification information 725 and master pattern when the end device 712 from the user, send the master pattern that is suitable for specification to end device 712.

Below, the action of the server 801 of above-mentioned formation is described.

Figure 50 is the process flow diagram of the action step of expression server 801.In addition, illustrate that the reference model that the action step of this server 801 uses is the same with the Figure 31 in the 7th embodiment with master pattern one example.

At first, before the production standard model, prepare to become the reference model (the step S800 of Figure 50, S801) of its benchmark.Promptly, the speech recognition that speaker in the memory device such as CD-ROM, noise, tone classification will be read in and be written in reference model preparation portion 702 by the portion of reading in 711 sends to reference model storage part 703 with reference model, the reference model 721 (the step S800 of Figure 50) that 703 storages of reference model storage part are sent.Here, to each speaker, noise, tone, reference model 721 is made of the HMM of each phoneme.In addition, reference model preparation portion 702 sends the back with end device 712 and sends to reference model storage part 703 by 810 speech recognitions that receive, that be suitable for user and end device 712 of reference model acceptance division with reference model, the reference model 721 (the step S801 of Figure 50) that 703 storages of reference model storage part are sent.Here, each reference model is shown in the reference model 721 of Figure 31, and utilizing status number is that the mixed distribution number is that the output that constitutes HMM that distributes of 128 mixed Gaussian distributes under 3, each state.As characteristic quantity, use the Mel cepstral coefficients of 25 dimensions (J=25).

Below, the making of master pattern 722 of having used these reference models 721 is with the same with step (the step S701-S708 of Figure 30) in the 7th embodiment to the transmission of end device 712 (step S802-S809).

Like this, because can upload to server and constitute the material of production standard model with model with being stored in own in the end device 712, so for example reference model that can comprehensively be uploaded of server 801 and other reference model that has kept, make the more high precision master pattern of mixed number, and download to end device 712 back utilizations.Therefore, can upload the model of adaptation simply, make more high precision standard model to end device 712 subsidiary easy adaptive functions.

Figure 51 is the figure that the system example of the master pattern producing device that specifically is applicable to present embodiment is shown.Here, illustrate through the Internet or radio communication waits server in communication 701 and end device 712 (pocket telephone 712a, automobile navigation apparatus 712b).

For example, pocket telephone 712a is made as the information of utilizing with user's voice, be made as specification information with being illustrated in the meaning (processing power of CPU is low) that machine utilizes in the portable phone, the sampling model of storage in advance is made as reference model, and utilize information, specification information and reference model to send to server 701 these, thereby request production standard model.If server 701 is at this request production standard model, then pocket telephone 712a downloads this master pattern, uses this master pattern to discern user's voice.For example, under the consistent situation of the name of user's voice and the inner address book that keeps, call out telephone number automatically corresponding to this name.

In addition, automobile navigation apparatus 712b is made as the information of utilizing with user's voice, the meaning that to utilize in automobile navigation apparatus (processing power of CPU is general) is made as specification information, the sampling model of storage in advance is made as reference model, and utilize information, specification information and reference model to send to server 701 these, thereby request production standard model.If server 701 is at this request production standard model, then automobile navigation apparatus 712b downloads this master pattern, uses this master pattern to discern user's voice.For example, under the consistent situation of user's voice and the inner place name that keeps, in picture, show expression automatically, begin from current place to this place name as the map that is the road the destination of impact point.

Like this, pocket telephone 712a and automobile navigation apparatus 712b make the master pattern that is suitable for this device by entrusting server 701, required circuit of production standard model or handling procedure needn't be installed in this device, simultaneously, can obtain the master pattern of various identifying objects in the timing of necessity.

As mentioned above, according to the 10th embodiment of the present invention, because the reference model that utilizes reference model acceptance division 810 to receive comes the production standard model, so the high precision standard model can be provided.That is, append reference model by uploading of origin self terminal device 712, the change of the reference model that keeps in server 801 sides increases, and when other people utilize, can provide more high precision standard model.

In addition, reference model acceptance division 810 also can receive reference model from other end device outside the end device 712.

In addition, the application examples shown in Figure 51 is not limited to present embodiment, also applicable to other embodiment.That is, by providing and delivering by the master pattern of the 1st to the 9th embodiment making to various electronic equipments through various recording mediums or communication, these electronic equipments can be carried out the high speech recognition of precision, image recognition, intention understanding etc.And,, also can realize being equipped with the independently electronic equipment of identifications such as speech recognition, image recognition, intention understanding, authentication function by the master pattern producing device in the above-mentioned embodiment is built in the various electronic equipments.

According to embodiment, understand master pattern producing device of the present invention above, but the invention is not restricted to these embodiments.

For example, with regard to the statistic approximate treatment of the master pattern in 1-the 10th embodiment, be not limited only to the approximate treatment in each embodiment, also can use in 1-the 4th embodiment amount in 4 kinds of approximate treatment one of at least.That is, can be any of 4 kinds of approximate treatment, also can be the combination of two or more approximate treatment.

In addition, in the 2nd embodiment, statistic is inferred the general near of the 206d of portion and is calculated mixed weighting coefficient, mean value and the variance yields of master pattern respectively according to the approximate expression shown in formula 45, formula 46 and the formula 47 with the 206e of portion, but also can use the approximate expression shown in following formula 63, formula 64 and the formula 65 to replace these approximate expressions to calculate.

(formula 63)

ω_{f (m)} \approx \frac{Σ_{i = 1}^{N_{g}} {&Integral;}_{- \infty}^{\infty} {Σ_{l = 1}^{L_{g (i)}} γ (μ_{g (i, l)}, m) {&upsi;}_{g (i, l)} g (x; μ_{g (i, l)}, σ_{g (i, l)}^{2})} dx}{Σ_{k = 1}^{M_{f}} Σ_{i = 1}^{N_{g}} {&Integral;}_{- \infty}^{\infty} {Σ_{l = 1}^{L_{g (i)}} γ (μ_{g (i, l)}, k) {&upsi;}_{g (i, l)} g (x; μ_{g (i, l)}, σ_{g (i, l)}^{2})} dx}

(m＝1，2，...，M _f)

(formula 64)

μ_{f (m, j)} \approx \frac{Σ_{i = 1}^{N_{g}} {&Integral;}_{- \infty}^{\infty} x_{(j)} {Σ_{l = 1}^{L_{g (i)}} γ (μ_{g (i, l)}, m) {&upsi;}_{g (i, l)} g (x; μ_{g (i, l)}, σ_{g (i, l)}^{2})} dx}{Σ_{i = 1}^{N_{g}} {&Integral;}_{- \infty}^{\infty} {Σ_{l = 1}^{L_{g (i)}} γ (μ_{g (i, l)}, m) {&upsi;}_{g (i, l)} g (x; μ_{g (i, l)}, σ_{g (i, l)}^{2})} dx}

(m＝1，2，...，M _f，j＝1，2，...，J)

(formula 65)

σ_{f (m, j)}^{2} \approx \frac{Σ_{i = 1}^{N_{g}} {&Integral;}_{- \infty}^{\infty} {(x_{(j)} - μ_{f (m, j)})}^{2} {Σ_{l = 1}^{L_{g (i)}} γ (μ_{g (i, l)}, m) {&upsi;}_{g (i, l)} g (x; μ_{g (i, l)}, σ_{g (i, l)}^{2})} dx}{Σ_{i = 1}^{N_{g}} {&Integral;}_{- \infty}^{\infty} {Σ_{l = 1}^{L_{g (i)}} γ (μ_{g (i, l)}, m) {&upsi;}_{g (i, l)} g (x; μ_{g (i, l)}, σ_{g (i, l)}^{2})} dx}

(m＝1，2，...，M _f，j＝1，2，...，J)

According to the master pattern that uses this approximate expression to make, inventors have confirmed to obtain high recognition performance.For example, being made as 16 o'clock recognition result with reference to model and master pattern mixed number separately, is 82.2% before adaptation, and in the method based on sufficient statistic shown in the above-mentioned non-patent literature 2, be 85.0%, in method, be improved to 85.5% based on above-mentioned approximate expression.That is, compare, can obtain high recognition performance as can be known with method based on sufficient statistic.In addition, be made as 64, the mixed number of master pattern is made as 16 o'clock recognition result, in method, can obtain discrimination up to 85.7% based on above-mentioned approximate expression with reference to the model mix number.

In addition, with regard to primary standard modelling portion makes the primary standard model, also can prepare the classification ID-primary standard model-reference model corresponding tables shown in Figure 52 in advance, show to determine the primary standard model according to this.Below, definite method of the primary standard model that uses this classification ID-primary standard model-reference model corresponding tables is described.In addition, so-called classification ID is the ID that the identifying object kind of having used master pattern is discerned, corresponding to the kind of master pattern.

Classification ID-primary standard model-reference model corresponding tables shown in Figure 52 is at a plurality of reference models with regulation common property, their a classification ID of corresponding identification, the corresponding simultaneously table that has with the primary standard model of the prior making of the common character of these reference models.In this table, to reference model 8AA-8AZ, corresponding classification ID and primary standard model 8A, to reference model 64ZA-ZZ, corresponding classification ID and primary standard model 64Z.Master pattern preparing department can generate the high precision standard model by using the primary standard model identical with used reference model character.

Here, expression mixed distribution numbers such as first mark ' 8 ' among extra token 8A, the 8AA of classification ID, primary standard model and reference model, expression such as the 2nd mark ' A ' macrotaxonomy, for example under the situation of the speech recognition under noise, represent the kind of noisy environment (to establish and be A in the family under the noise, if noise is down B etc. in the electric car), expression such as the 3rd mark ' A ' subclassification, for example become the people's of speech recognition object attribute (pupil that will hang down academic year is made as A, and the pupil of high academic year is made as B etc.).Therefore, reference model 8AA-AZ in classification ID-primary standard model-reference model corresponding tables of Figure 52 is that the mixed distribution number shown in Figure 53 is 8 model, reference model 64ZA-ZZ is that the mixed distribution number shown in Figure 54 is 64 model, and primary standard model 8A-64Z is that the mixed distribution number shown in Figure 55 is the model of 8-16.

Below, the method for making of this classification ID-primary standard model-reference model corresponding tables is described.Figure 56 is the process flow diagram of its step of expression, and Figure 57-Figure 60 is the figure of the instantiation of each step of expression.Here, be example with the speech recognition under the noisy environment, not only instruction card is described, also explanation comprises classification ID, primary standard model, the step when reference model is made again.、

At first, speech data is categorized into group approaching on the audio frequency (the step S1100 of Figure 56).For example shown in Figure 57, by the speech data of classifying as the noisy environment that utilizes information.Be categorized into, in environment A (speech data in the family under the noise), comprise voice, the voice of high academic year of pupil of the low academic year of pupil of including under the noise within the family, adult women's voice etc., in environment B (speech data in the electric car), be included in the pupil that includes in the electric car voice of low academic year, the voice of high academic year of pupil, adult women's voice etc.In addition, also can by as the character of sound such as the speaker's who utilizes information sex, age level, laugh, angry sound, read aloud language such as tone, English and Chinese such as accent, session accent and wait and classify.

Then, according to specification information etc., determine the more than one model construction (the step S1101 of Figure 56) of the reference model prepared.For example, determine 8 mixing, 16 mixing, 32 mixing and 64 mixing are made as object.In the determining of model construction, be not limited to determine the mixed distribution number, also can determine the kind etc. of the HMM such as status number, monophone, tripartite phone of HMM.

Then, make primary standard model (the step S1102 of Figure 56).That is, by the classification of determining in the classification (step S1100) of each above-mentioned speech data (environment A, environment B ...), the primary standard model of each model construction of determining among the making step S1101.For example shown in Figure 58, if primary standard model 8A, then use in the family speech data of (environment A) under the noise (speech datas such as the pupil of low academic year, the pupil of high academic year, adult man, adult woman), wait by Bao Mu-Wei Erqi algorithm and learn and make the 8 primary standard models that mix.

Then, make reference model (the step S1103 of Figure 56).That is, the primary standard model that uses above-mentioned steps S1102 to make is made reference model.Particularly, use primary standard model that learn, that have the same mixture distribution number under the noisy environment identical, the study reference model with the noisy environment of the speech data of learning reference model.For example shown in Figure 59, reference model 8AA is that to utilize the mixed distribution number be the model of the speech data study of low academic year of the pupil under the noise in 8 the family, initial value when learning uses the primary standard model by the speech data under the noise in as the family of equivalent environment (voice that comprise low academic year of pupil, high academic year of pupil, adult women, the adult male sex) study.As learning method, use Bao Mu-Wei Erqi algorithm.

At last, give classification ID9 (the step S1104 of Figure 56)))).For example,, can make the classification ID-primary standard model-reference model corresponding tables shown in Figure 61, i.e. " the primary standard model of band classification ID " and " reference model of band classification ID " by under each noisy environment, giving a classification ID.

In addition, this classification ID-primary standard model-the reference model corresponding tables is as completed table, and terminal (master pattern producing device) needn't keep in advance.Terminal (master pattern producing device) also can be finished table by communicating by letter with other device (server) shown in Figure 61.That is, master pattern producing device (terminal) can be through communication network etc., obtains " the primary standard model of band classification ID " and " band classify ID reference model ".But, terminal not necessarily must obtain " the primary standard model of band classification ID " and " reference model of band classification ID ", dispatches from the factory after also can storing in advance.

Shown in Figure 61, terminal can obtain " the primary standard model of band classification ID " and " reference model of band classification ID " by the following method.As the 1st method, be following situation, promptly terminal storage " is with the primary standard model of classification ID " (for example observing the primary standard model by the classification ID adding method of predefineds such as Association for Standardization).At this moment, terminal is from an above downloaded " reference model of band classification ID " (for example having observed the reference model by the classification ID adding method of predefineds such as Association for Standardization).In addition, also can when dispatching from the factory, allow terminal storage " be with the reference model of classification ID ".

In addition, as the 2nd method, be that terminal is not stored the situation of " the primary standard model of band classification ID ".At this moment, terminal is downloaded " the primary standard model of band classification ID " from server (server 1 of Figure 61).Afterwards, terminal is downloaded " reference model of band classification ID " from an above server (server 2 of Figure 61).Can append successively in case of necessity, the definition of shifting ID.In addition, also can save the storer of terminal.

And as the 3rd method, the bright note that is terminal storage has the situation of classification ID and the corresponding relation of primary standard model and reference model " the ID-primary standard model-reference model corresponding tables of classifying ".At this moment, terminal uploads to the not server (server 3 of Figure 61) of storage " corresponding tables " with " corresponding tables ".Server is prepared " reference model of band classification ID " according to " corresponding tables " of sending." reference model of band classification ID " that terminal downloads is prepared.

Definite method of the primary standard model that is undertaken by the primary standard modelling portion that uses this classification ID-primary standard model-reference model corresponding tables then, is described.Figure 62 is the process flow diagram of its step of expression.Figure 63 and Figure 64 are the figure of the concrete example of each step of expression.

At first, from the used reference model of production standard model, extract classification ID (the step S1105 of Figure 62).For example, according to the form shown in Figure 63, from the reference model of selecting, extract corresponding classification ID.Here, as the classification ID that extracts, establishing 8A is 1, and 16A is 3, and 16B is 1, and 64B is 1.

Below, use the classification ID that extracts to be identified for the primary standard model (the step S1106 of Figure 62) of production standard model.Particularly, determine the primary standard model according to following steps.

(1) be conceived to the classification ID (16A, 16B) that from have the reference model with mixed distribution number (16 mix) the same category ID (16*) of the master pattern of making, extracts, will with therefrom extract maximum classification ID corresponding to the primary standard model be defined as final primary standard model.For example, being configured under 16 situations of mixing of master pattern, the classification ID as relating to 16 mixing extract 3 16A, extract 1 16B, are the primary standard model of 16A so adopt classification ID.

(2) be conceived to the classification ID (8A) that extracts from have the reference model with mixed distribution number (8 mix) the same category ID (8*) of the master pattern of making, the primary standard model that will have same category ID is defined as final primary standard model.For example, under the situations that are configured to 8 mixing of master pattern, extract 1 8A, as relating to the 8 classification ID that mix, so sampling classification ID is the primary standard model of 8A.

(3) be conceived to the classification ID that from have the reference model with mixed distribution number (32 mix) the same category ID (3*) of the master pattern of making, extracts, under non-existent situation, be conceived to specification information, use has the primary standard model (8A, 16A) of the maximum classification ID (* A) of from specification information extraction, by trooping, after becoming 32 mixing, be made as final primary standard model (with reference to Figure 44).For example,, do not relate to the 32 classification ID that mix, extract maximum classification ID (16A) so use because do not extract being configured under 32 situations of mixing of master pattern, by trooping, become 32 mix after, be made as the primary standard model.

In addition, also can not be conceived to the specification information (mixed distribution number etc.) of the master pattern of making in advance, and be conceived to utilize information (kind of noise etc.) to determine initial value.

Having adopted the mixed distribution number that uses the 3rd approximate portion to make shown in Figure 64 is the identification experimental result of 64 master pattern.The longitudinal axis is represented adult's (male sex and women) discrimination (%), and transverse axis is represented multiplicity R.So-called multiplicity R=0 is, the result that the initial model of making by the primary standard modelling portion learning before is discerned.In addition, for multiplicity R=1～5, near indication parameter G=1 being made as.

Result when curve ' data ' expression utilizes speech data study during several days, the result when curve ' women ', curve ' male sex ' represent that respectively initial model is made as adult women, the adult male sex.Learning time of the present invention based on reference model is several minutes levels.From this experimental result as can be known, be defined as under the situation of primary standard model, can make the high master pattern of result that ratio of precision is utilized speech data study at the women's that will be grown up reference model.

This expression is cut apart speech data and divided speech data is carried out after as the strict study of separately reference model comprehensive, has and can solve the possibility (with the comparison of carrying out based on the study of speech data on the accuracy of identification) of promptly sinking into the such problem of local solution based on the problem of the study of speech data.

In addition, include the children's of difficulty speech data for speech data, utilization comes strict study to the few reference model of the suitable mixed distribution number of data number, speech data for the adult that can include a large amount of speech datas, utilize the many reference models of mixed distribution number to come strict study, therefore, if, then can expect to produce the high master pattern of precision by the comprehensive back of the present invention production standard model.

In addition, in the mixed distribution number of master pattern was 16 o'clock identification experiment (Figure 39), method of the present invention surpassed the discrimination by the master pattern of speech data study.Think that this is owing to when speech data being become the reference model form of 16 mixing, the poor information of speech data.If, then can make more high precision standard model by 64 mixing manufacture reference models and the abundant feature that keeps speech data.Thus, in the 9th embodiment, be set at bigger by 300 with reference to the model mix distribution number.

In addition, test according to the identification shown in Figure 39 and Figure 64, the influence that the primary standard model causes accuracy of identification is shown, the importance of definite method of having emphasized the primary standard model is (among Figure 64, be illustrated in reference model with the adult women as under the situation of primary standard model, high master pattern in the time of can making the ratio of precision utilization adult male sex's reference model).

As mentioned above, by according to classification ID-primary standard model-reference model corresponding tables, use the primary standard model with the reference model common property, can make the high master pattern of precision.

In addition, used the definite of primary standard model of this classification ID-primary standard model-reference model corresponding tables also to can be used among one of above-mentioned embodiment 1-10.

In addition, in the above-described embodiment, when inferring the statistic of master pattern, use formula 25 is as the likelihood of master pattern with respect to reference model, but the invention is not restricted to this likelihood function, for example, also can use the likelihood function shown in the following formula 66.

(formula 66)

\log L = Σ_{i = 1}^{N} {&Integral;}_{- \infty}^{\infty} \log [Σ_{m = 1}^{M} ω_{(m)} f (x; μ_{(m)}, σ_{(m)}^{2})] α (i) {Σ_{l = 1}^{L_{i}} {&upsi;}_{(l)} g_{i} (x; μ_{(l)}, σ_{(l)}^{2})} dx

Here, α (i) is the weighting of the expression importance corresponding with each comprehensive reference model i.For example, if the speaker in the suitable speech recognition then determines importance by user's voice with the proximity of the voice of making unified model.That is, under the situation of reference model, set α (i) for big value (weighting is big) near user's voice (importance is big).Likelihood size when preferably utilizing phonetic entry with the user to unified model is determined the proximity of unified model and user's voice.Thus, when comprehensive a plurality of reference models come the production standard model,, then impact, can make the high precision master pattern of further reflection user characteristic with the statistic of big more weighting to master pattern near the reference model of user's voice.

In addition, master pattern in each embodiment structure determination portion is according to the settle the standard structure of model such as the various factors that utilizes information or specification information, but the invention is not restricted to this, for example, under the situation of speech recognition, also can rely on the people's who becomes identifying object age, sex, speaker's character of sound matter, tone based on emotion or health status, speech rate, the respect tone of speaking, dialect, the kind of ground unrest, the size of background noise, the signal to noise ratio (S/N ratio) of voice and ground unrest, the settle the standard structure of model of the various attributes such as complicacy of microphone characteristics and identification vocabulary.

Particularly, shown in Figure 65 (a)-(j), people's the age that becomes the speech recognition object is high more, the Gaussian distribution number (mixed number) that then will constitute master pattern is established greatly more (Figure 65 (a)), be under the male sex's the situation perhaps the people who constitutes the speech recognition object, be made as big mixed number (Figure 65 (b)) than the women, the tonequality that perhaps becomes the people of speech recognition object is gone back ' hoarse ' than ' usually ', and then become ' hoarse sound ', then increase mixed number (Figure 65 (c)), perhaps the tune based on the sound emotion that becomes the speech recognition object becomes than ' usually ' and ' angry sound ', and then become ' cry or laugh ', then increase mixed number (Figure 65 (d)), people's the speed of giving orders or instructions that perhaps becomes the speech recognition object is fast more or slow more, then increase mixed number (Figure 65 (e)), the respect tone that perhaps becomes the people of speech recognition object becomes than ' reading aloud accent ' elephant ' speech is transferred ' more, and then become ' session accent ', then increase mixed number (Figure 65 (f)), people's the dialect that perhaps becomes the speech recognition object is than ' standard speech ' elephant ' Osaka accent ' more, and then elephant ' Kagoshima accent ' more, then increase mixed number (Figure 65 (g)), perhaps the ground unrest of speech recognition becomes big, then reduce mixed number (Figure 65 (h)), the performance that perhaps is used for the microphone of speech recognition improves, then increase mixed number (Figure 65 (I)), the vocabulary that perhaps becomes the speech recognition object increases, and then increases mixed number (Figure 65 (j)).These example majorities are big more from the voice difference of identifying object, then increase mixed number determines mixed number with the viewpoint of guaranteeing precision.

Utilizability on the industry

Master pattern producing device of the present invention can be used as identification voice, literal, the figure of probability of use model etc. The devices of the objects such as picture etc. for example, can be used as the television set that utilizes voice to carry out various processing and receive dress Put, automobile navigation apparatus, with voiced translation become the translating equipment of other Languages, by the playsuit of voice operating Put, come indexing unit, execution person detecting, the fingerprint of retrieving information to recognize by voice-based search key The information of the predictions such as the authenticate device of card, face authentication, iris authentication etc., execution Prediction of Stock Index, weather forecast Treating apparatus etc.

Claims

1, a kind of master pattern producing device uses the probability model that is showed the frequency parameter of expression phonetic feature by output probability, makes the speech recognition master pattern that expression has the phonetic feature of particular community, it is characterized in that:

Possess the reference model storage unit, storage has the above reference model of probability model of the phonetic feature of certain attribute as expression; With

The master pattern production unit is used the statistic that is stored in the more than one reference model in the described reference model storage unit, calculates the statistic of described master pattern, thus the production standard model;

Described master pattern production unit has:

Master pattern structure determination portion, the structure of definite master pattern of making;

Primary standard modelling portion determines that to the statistic initial value of specific criteria model this master pattern has been determined structure; With

Statistic is inferred portion, infers and calculate the statistic of described master pattern, so that determined probability or likelihood maximization or the maximization of the master pattern of initial value to described reference model.

2, master pattern producing device according to claim 1 is characterized in that:

Described master pattern producing device also possesses the reference model selected cell, according to the information relevant, promptly utilize information with the attribute that becomes the speech recognition object, in the reference model from be stored in described reference model storage unit, select more than one reference model;

Described master pattern production unit is used the statistic of the reference model of described reference model selected cell selection, production standard model.

3, master pattern producing device according to claim 2 is characterized in that:

Described master pattern producing device also possess make described utilize information utilize the information issuing unit;

Described reference model selected cell in the reference model from be stored in described reference model storage unit, is selected more than one reference model according to the information of utilizing of making.

4, master pattern producing device according to claim 2 is characterized in that:

On described master pattern producing device, connecting end device through communication path,

Described master pattern producing device also possesses from the described information receiving unit that utilizes that utilizes information of described end device reception;

Described reference model selected cell in the reference model from be stored in described reference model storage unit, is selected more than one reference model according to the information of utilizing that receives.

5, speech recognition according to claim 1 master pattern producing device is characterized in that:

Described master pattern structure determination portion is that specification information and the information relevant with the attribute that becomes the speech recognition object are promptly utilized at least one in the information according to the information relevant with the specification of the master pattern of making, and determines the structure of described master pattern.

6, speech recognition according to claim 5 master pattern producing device is characterized in that:

Described specification information represent to use master pattern application program kind and use at least one specification in the specification of equipment of master pattern.

7, speech recognition according to claim 5 master pattern producing device is characterized in that:

Described attribute comprises at least one the relevant information in the complicacy with signal to noise ratio (S/N ratio), microphone characteristics and the identification vocabulary of size, voice and the ground unrest of the kind of speaker's character of age, sex, tonequality, tone, the speech rate based on emotion or health status, the cordiality of speaking, dialect, ground unrest, ground unrest.

8, master pattern producing device according to claim 5 is characterized in that:

Described master pattern producing device also possesses the specification information holding unit, and the corresponding application program specification correspondence database between the application program of maintenance expression use master pattern and the specification of master pattern is as described specification information;

In the application program specification correspondence database of described master pattern structure determination portion from remain on described specification information holding unit, read the specification corresponding, and, determine the structure of described master pattern according to the specification that reads with the application program that is activated.

9, master pattern producing device according to claim 5 is characterized in that:

Described master pattern producing device also possesses the specification information production unit of making described specification information,

Described master pattern structure determination portion is determined the structure of described master pattern according to the specification information of making.

10, master pattern producing device according to claim 5 is characterized in that:

Described master pattern producing device also possesses the specification information receiving element that receives described specification information from described end device,

Described master pattern structure determination portion is determined the structure of described master pattern according to the specification information that receives.

11, master pattern producing device according to claim 5 is characterized in that:

Show described reference model and described master pattern with more than one Gaussian distribution;

As the structure of described standard analog, described master pattern structure determination portion is determined the mixed number of Gaussian distribution at least.

12, master pattern producing device according to claim 1 is characterized in that:

On described master pattern producing device, connecting end device through communication path;

Described master pattern producing device also possesses the master pattern transmitting element, sends the master pattern that described master pattern production unit is made to described end device.

13, master pattern producing device according to claim 1 is characterized in that:

The a pair of reference model that the mixed number that described reference model storage unit is stored Gaussian distribution at least is different;

The statistic of described master pattern is calculated by the described statistic portion of inferring, so that to probability or the likelihood maximization or the maximization of the described master pattern of described a pair of reference model.

14, master pattern producing device according to claim 1 is characterized in that:

Described master pattern production unit also possesses the reference model preparatory unit, carries out at least one of the work that is stored in after being stored in the work the described reference model storage unit and making new reference model after reference model being obtained in the outside in the described reference model storage unit.

15, master pattern producing device according to claim 14 is characterized in that:

Described reference model preparatory unit also carry out described reference model cell stores reference model renewal and append at least one.

16, master pattern producing device according to claim 15 is characterized in that:

It is in the specification information at least one that described reference model preparatory unit is promptly utilized information and the information relevant with the specification of the master pattern of making according to the information relevant with identifying object, carry out described reference model cell stores reference model renewal and append at least one.

17, master pattern producing device according to claim 15 is characterized in that:

Described master pattern producing device also possesses similar degree information issuing unit, according to the information relevant with the specification of the master pattern of making is that specification information and the information relevant with the attribute that becomes the speech recognition object are promptly utilized at least one in the information and is stored in reference model in the described reference model storage unit, makes the similar degree information of the similar degree between described at least one and the described reference model that utilizes in information and the described specification information of expression;

Described reference model preparatory unit is according to the similar degree information that described similar degree information issuing unit makes, determine whether to carry out the renewal of the reference model that described reference model storage unit stored and append at least one.

18, master pattern producing device according to claim 1 is characterized in that:

Described primary standard modelling portion utilizes described statistic to infer the more than one described reference model that portion uses when the statistic of basis of calculation model, determines the statistic initial value of specific described master pattern.

19, master pattern producing device according to claim 1 is characterized in that:

Described primary standard modelling portion determines described initial value according to the classification ID of criterion of identification version.

20, master pattern producing device according to claim 19 is characterized in that:

Described primary standard modelling portion comes specific described classification ID according to described reference model, will be defined as described initial value with the initial value that is mapped by specific classification ID.

21, master pattern producing device according to claim 20 is characterized in that:

The corresponding tables of the correspondence between the described classification of described primary standard modelling portion's maintenance expression ID, described initial value and described reference model according to described corresponding tables, is determined described initial value.

22, master pattern producing device according to claim 21 is characterized in that:

Described primary standard modelling portion make or the initial value of obtaining from the outside described classification ID corresponding promptly with the reference model of the primary standard model of classification ID or corresponding described classification ID promptly with the reference model of classification ID, generate described corresponding tables thus.

23, master pattern producing device according to claim 1 is characterized in that:

The a plurality of reference models of described reference model cell stores;

Described statistic is calculated by the described statistic portion of inferring, so that described probability or likelihood maximization or maximization to being stored in a plurality of reference model weightings in the described reference model storage unit.

24, a kind of master pattern method for making is used the probability model that is showed the frequency parameter of expression phonetic feature by output probability, makes the speech recognition master pattern that expression has the phonetic feature of particular community, it is characterized in that:

Comprise: the reference model read step, the probability model that has the phonetic feature of certain attribute from storage representation is the reference model storage unit of an above reference model, reads more than one reference model; With

The master pattern making step uses the statistic of the reference model read, calculates the statistic of described master pattern, thus the production standard model;

Described master pattern making step has:

The master pattern structure is determined substep, determines the structure of the master pattern of making;

Primary standard modelling substep determines the statistic initial value of specific criteria model, and this master pattern has been determined structure; With

Statistic is inferred substep, infers and calculate the statistic of described master pattern, so that determined probability or likelihood maximization or the maximization of the master pattern of initial value to described reference model.

25, a kind of program is used for as lower device, and promptly this device uses the probability model that is showed the frequency parameter of expression phonetic feature by output probability, makes the speech recognition master pattern that expression has the phonetic feature of particular community, it is characterized in that:

Described master pattern making step has: