WO2010109725A1 - Appareil de traitement vocal, procédé de traitement vocal et programme de traitement vocal - Google Patents

Appareil de traitement vocal, procédé de traitement vocal et programme de traitement vocal Download PDF

Info

Publication number
WO2010109725A1
WO2010109725A1 PCT/JP2009/069580 JP2009069580W WO2010109725A1 WO 2010109725 A1 WO2010109725 A1 WO 2010109725A1 JP 2009069580 W JP2009069580 W JP 2009069580W WO 2010109725 A1 WO2010109725 A1 WO 2010109725A1
Authority
WO
WIPO (PCT)
Prior art keywords
distribution
feature
speech
acoustic model
noise
Prior art date
Application number
PCT/JP2009/069580
Other languages
English (en)
Japanese (ja)
Inventor
雄介 篠原
政巳 赤嶺
Original Assignee
株式会社東芝
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社東芝 filed Critical 株式会社東芝
Publication of WO2010109725A1 publication Critical patent/WO2010109725A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech

Definitions

  • the present invention relates to a voice processing device, a voice processing method, and a voice processing program.
  • the feature enhancement method uses noisy speech features extracted from noise-superposed speech (hereinafter referred to as noisy speech) in a noisy environment. Technology). By using clean speech features estimated by the feature enhancement method, speech recognition performance under noise can be improved.
  • Non-Patent Document 1 discloses a conventional speech recognition apparatus.
  • a conventional speech recognition apparatus includes a feature extraction unit, a first acoustic model storage unit, a probability calculation unit, a distribution storage unit, a mixed distribution generation unit, a feature enhancement unit, a second acoustic model storage unit, and a decoding unit.
  • the feature extraction unit extracts a noisy speech feature from each frame of the input noisy speech.
  • the first acoustic model storage unit stores a first acoustic model representing a standard phonemic pattern in a noisy environment.
  • the probability calculation unit collates the noisy speech feature sequence with the first acoustic model, and calculates a probability of staying in each distribution of the first acoustic model in each frame (distribution posterior probability).
  • the distribution storage unit stores a set of basis distributions. Each of the basis distributions is a combined Gaussian distribution of clean speech features and noisy speech features.
  • the mixed distribution generation unit generates a mixed distribution by mixing the base distribution with the distribution posterior probability for each frame. This mixed distribution represents a combined distribution of clean speech characteristics and noisy speech features in the frame.
  • the feature enhancement unit estimates the clean speech feature from the noisy speech feature using the mixture distribution in each frame.
  • the second acoustic model storage unit stores a second acoustic model representing a standard phonemic pattern in a clean environment.
  • the decoding unit collates the sequence of clean speech features estimated by the feature enhancement unit with the second acoustic model, and outputs an optimal word string.
  • Non-Patent Document 1 since the speech recognition apparatus of Non-Patent Document 1 performs feature enhancement using a joint Gaussian distribution learned in advance, there is a problem in that speech recognition performance deteriorates under noise different from that during learning. .
  • a combined Gaussian distribution of clean speech characteristics and noisy speech features can be dynamically synthesized from a Gaussian distribution of clean speech features each time the noise changes.
  • an ordinary acoustic model has several thousand to several tens of thousands of Gaussian distributions, enormous amount of calculation is required to dynamically synthesize coupled Gaussian distributions, which is not realistic.
  • the present invention has been made in view of the above, and provides a voice processing device, a voice processing method, and a voice processing program capable of achieving high voice recognition performance with a small amount of calculation even in an environment where noise changes.
  • the purpose is to provide.
  • the present invention extracts a first voice feature from each frame of the first voice on which noise is superimposed in a noisy environment, and extracts the first voice feature.
  • a feature extraction unit for calculating a sequence; a noise estimation unit for estimating the noise superimposed on the first speech; and a first distribution representing a second speech feature distribution of the second speech in an environment free from noise.
  • a first distribution storage unit that stores a set of basis distributions and a combined distribution of the first speech feature and the second speech feature from each of the first basis distributions based on the noise
  • a distribution synthesis unit that synthesizes a second basis distribution
  • a first acoustic model storage unit that stores a first acoustic model representing a standard pattern of phonemes in a noisy environment, a sequence of the first speech features, The first acoustic model is collated and each frame is checked.
  • a probability calculating unit that calculates a state posterior probability that is a probability of staying in each state of the first acoustic model, and each state of the first acoustic model corresponds to each of the second basis distributions
  • a blending weight storage unit for storing blending weights, a blending weight blending unit for blending the blending weights using the state posterior probabilities and calculating a blended blending weight, and for each frame, with the blending blending weight, Mixing the second basis distribution, generating a mixture distribution that is a combined distribution of the first speech feature and the second speech feature in the frame; and for each frame, the mixture distribution
  • a feature enhancement unit for estimating the second speech feature from the first speech feature.
  • FIG. 1 is a block diagram showing the configuration of the speech processing apparatus according to the present embodiment.
  • the speech recognition apparatus includes a feature extraction unit 1, a noise estimation unit 2, a first distribution storage unit 3, a first distribution storage control unit 4, a second distribution storage unit 5, a distribution synthesis unit 6, and a first acoustic model.
  • the feature extraction unit 1 extracts features from each frame of the input noisy speech, and calculates a sequence of noisy speech features.
  • the frame is obtained by cutting out a part of the input audio signal, and is sequentially cut out while gradually shifting the cut out section.
  • a vector having a mel frequency cepstrum coefficient (MFCC) as an element can be used as a feature.
  • MFCC mel frequency cepstrum coefficient
  • the feature dimension is d.
  • a sequence of noisy speech features is calculated by extracting features from each of the sequentially extracted frames.
  • the noise estimation unit 2 estimates noise superimposed on the input noisy speech. For example, it is possible to select a section that does not include speech and includes only noise using a voice section detector, and perform noise estimation using this section. More specifically, in the section consisting only of noise, the above features are extracted from each frame, and the average / covariance is obtained from the obtained set of features. This mean / covariance defines a Gaussian distribution of noise features.
  • the first distribution storage unit 3 stores a set of first basis distributions.
  • a d-dimensional Gaussian distribution is used as the first basis distribution.
  • Each basis distribution represents a distribution of clean speech features. A method of calculating the first basis distribution set will be described in detail later.
  • the first distribution storage control unit 4 performs control such that the first distribution storage unit 3 stores the first set of basis distributions.
  • the second distribution storage unit 5 stores a set of second basis distributions.
  • a 2 ⁇ d-dimensional Gaussian distribution is used as the second basis distribution.
  • Each basis distribution represents a combined Gaussian distribution of clean speech features and noisy speech features.
  • the distribution synthesis unit 6 synthesizes the second basis distribution from each of the first basis distributions stored in the first distribution storage unit 3 based on the noise calculated by the noise estimation unit 2, Store in the distribution storage unit 5. That is, a combined Gaussian distribution of clean speech features and noisy speech features is synthesized from a Gaussian distribution of noise features and a Gaussian distribution of clean speech features.
  • a combined Gaussian distribution of clean speech features and noisy speech features is synthesized from a Gaussian distribution of noise features and a Gaussian distribution of clean speech features.
  • Vector Taylor Series method or uncentred transformation can be used.
  • the first acoustic model storage unit 7 stores a first acoustic model representing a standard phonemic pattern in a noisy environment. More specifically, the acoustic model is a hidden Markov model, and the output distribution in each state and the transition probability between states are stored. The first acoustic model is created in advance from learning data consisting of a set of noisy speech features.
  • the first acoustic model storage control unit 8 performs control such that the first acoustic model storage unit 7 stores the first acoustic model.
  • the probability calculation unit 9 collates the noisy speech feature series calculated by the feature extraction unit 1 with the first acoustic model stored in the first acoustic model storage unit 7, and in each frame, The probability of staying in each state of the acoustic model (state posterior probability) is calculated.
  • state posterior probability can be calculated by using a forward backward algorithm.
  • the state posterior probability can be calculated from the N best candidate list. The method for calculating the state posterior probability using the N best candidate list is described in detail in Non-Patent Document 1, for example.
  • the mixing weight storage unit 10 stores the mixing weight corresponding to each of the second basis distributions for each state of the first acoustic model.
  • L the number of states
  • K the number of basis distributions
  • L ⁇ K values are stored.
  • the mixed distribution generated by mixing the K basis distributions stored in the first distribution storage unit 3 with the mixing weight corresponding to the state is a clean distribution in the state. Represents the distribution of voice features.
  • a mixture distribution generated by mixing K basis distributions stored in the second distribution storage unit 5 with a mixture weight corresponding to the state is Represents the joint distribution of clean and noisy speech features in the state. The method for calculating the mixing weight will be described in detail later.
  • the mixing weight storage control unit 11 controls the mixing weight storage unit 10 to store the mixing weight.
  • the blending weight fusion unit 12 blends the blending weights stored in the blending weight storage unit 10 using the state posterior probabilities calculated by the probability calculation unit 9 and calculates a blending blending weight. Specifically, the blending of the blending weights is performed according to the equation (1).
  • ⁇ (t, j) is the state posterior probability of staying in state j in frame t
  • w (j, k) is the mixing weight of the kth basis distribution in state j
  • ⁇ j is the sum for j
  • v (t, k) represents the fusion mixture weight of the kth basis distribution in frame t.
  • the mixed distribution generation unit 13 mixes the second base distribution acquired from the second distribution storage unit 5 with the combined mixing weight calculated by the mixing weight combining unit 12 for each frame to generate a mixed distribution.
  • the mixture distribution is a Gaussian mixture distribution.
  • the generated mixture distribution represents a combined distribution of clean speech characteristics and noisy speech features in the frame.
  • Non-Patent Document 1 discloses details of a feature enhancement method using a mixed distribution representing a combined distribution of clean speech features and noisy speech features.
  • the second acoustic model storage unit 15 stores a second acoustic model representing a standard phonemic pattern in a clean environment. More specifically, the acoustic model is a hidden Markov model, and the output distribution in each state and the transition probability between states are stored.
  • the second acoustic model is created in advance using learning data composed of a set of clean speech features. Preferably, it is created in advance using learning data consisting of a set of clean speech features processed by the feature enhancement unit 14. That is, an acoustic model is created using a set of clean speech features obtained by processing the set of noisy speech features used for learning the first acoustic model by the feature enhancement unit 14 as learning data.
  • the second acoustic model storage control unit 16 controls the second acoustic model storage unit 15 to store the second acoustic model.
  • the decoding unit 17 collates the sequence of clean speech features estimated by the feature enhancement unit 14 with the second acoustic model stored in the second acoustic model storage unit 15 and outputs an optimum word string.
  • a Viterbi algorithm is used for collation.
  • a first set of basis distributions and a mixture weight are calculated using an EM algorithm so as to maximize the likelihood of learning data including a given set of clean speech features.
  • EM algorithm a method of creating learning data including a set of clean speech features will be described first, and then a likelihood maximization method using the EM algorithm will be described.
  • each clean speech feature is associated with one of the states of the first acoustic model.
  • the sequence is obtained by using the Viterbi algorithm. Can be associated with any state of the acoustic model. Or you may perform a fuzzy matching using a forward backward algorithm.
  • learning data that is a set of clean speech features associated with any state of the acoustic model can be created.
  • a set of learning data is D
  • a set of learning data associated with the jth state is Dj.
  • the i-th learning data (clean speech feature) is set to x i .
  • ⁇ k is an average / covariance parameter of the kth Gaussian distribution.
  • w jk be the mixing weight of the kth basis distribution in state j
  • w j ⁇ w j1 ,..., W jK ⁇ that is collected for all k
  • w that is collected for all j. ⁇ W 1 ,..., W L ⁇ .
  • K is the number of basis distributions
  • L is the number of states.
  • the (logarithm) likelihood L ( ⁇ , w) for the learning data is defined as in Expression (2).
  • ⁇ and w are calculated using an EM algorithm so as to maximize this likelihood.
  • the posterior probability of which distribution each learning sample belongs to is calculated based on the current values of ⁇ and w.
  • ⁇ and w are calculated so as to maximize the expected value of the log likelihood of complete data based on this posterior probability.
  • initial values of ⁇ and w are required. For example, a Gaussian mixture distribution with a distribution number K is learned for the entire learning data (D), and a set of obtained Gaussian distributions and a mixture weight (u) are set. ) Can be used.
  • the maximum likelihood learning method using the EM algorithm is described in detail in, for example, “L. Rabiner, B.-H. Jung (Author), Sadahiro Furui (translation), Basics of Speech Recognition, NTT Advanced Technology, 1995”. It is disclosed.
  • the first basis distribution set and the mixture weight calculated as described above are stored in the first distribution storage unit 3 and the mixture weight storage unit 10, respectively.
  • FIG. 2 is a flowchart showing the operation of the speech processing apparatus according to this embodiment.
  • the feature extraction unit 1 extracts features from each frame of the input noisy speech and calculates a sequence of noisy speech features (step S1).
  • the noise estimation unit 2 performs noise estimation from the noisy speech feature sequence calculated by the feature extraction unit 1 (step S2).
  • the distribution synthesis unit 6 synthesizes the second basis distribution from each of the first basis distributions using the noise estimated by the noise estimation unit 2 and stores it in the second distribution storage unit 5 (Ste S3).
  • steps S4 and S5 are executed. That is, the probability calculation unit 9 collates the sequence of noisy speech features calculated by the feature extraction unit 1 with the first acoustic model, and the probability of staying in each state of the first acoustic model in each frame (state posterior (Probability) is calculated (step S4).
  • the blending weight blending unit 12 blends the blending weights acquired from the blending weight storage unit 10 with the state posterior probabilities calculated by the probability calculating unit 9 for each frame, and calculates the blending blending weight (Step S1). S5).
  • step S6 the mixture distribution generation unit 13 mixes the second basis distribution stored in the second distribution storage unit 5 with the fusion mixture weight to generate a mixture distribution.
  • the feature emphasizing unit 14 calculates a clean speech feature from the noisy speech feature using the mixture distribution generated by the mixture distribution generation unit 13 for each frame (step S7).
  • the decoding unit 17 collates the series of clean speech features calculated by the feature enhancement unit 14 with the second acoustic model stored in the second acoustic model storage unit 15 to determine an optimum word string.
  • the voice recognition (voice processing) is finished (step S8). As described above, the correct voice is recognized from the noisy voice.
  • the speech processing apparatus instead of using a large number of distributions as in the prior art, only a small number of basis distributions are used, so that the combined distribution of clean speech features and noisy speech features is obtained. Therefore, it is possible to greatly reduce the amount of calculation required for the synthesis of the speech and maintain high speech recognition performance with a small amount of calculation even in an environment where noise changes.
  • the voice processing apparatus of this embodiment includes a control device such as a CPU, a storage device, an external storage device, a display device such as a display device, and an input device such as a keyboard and a mouse.
  • a control device such as a CPU
  • a storage device such as a hard disk drive
  • an external storage device such as a hard disk drive
  • a display device such as a display device
  • an input device such as a keyboard and a mouse.
  • the hardware configuration is used.
  • the audio processing program executed by the audio processing apparatus is a file in an installable or executable format, such as a CD-ROM, a flexible disk (FD), a CD-R, a DVD (Digital Versatile Disk), or the like.
  • the program is recorded on a computer-readable recording medium and provided as a computer program product.
  • the voice processing program executed by the voice processing apparatus of the present embodiment may be provided by being stored on a computer connected to a network such as the Internet and downloaded via the network.
  • the voice processing program executed by the voice processing apparatus according to the present embodiment may be provided or distributed via a network such as the Internet.
  • the voice processing program of the present embodiment may be provided by being incorporated in advance in a ROM or the like.
  • the speech processing program executed by the speech processing apparatus includes the above-described units (feature extraction unit, noise estimation unit, first distribution storage control unit, distribution synthesis unit, first acoustic model storage control unit,
  • the module configuration includes a probability calculation unit, a mixture weight storage control unit, a mixture weight fusion unit, a mixture distribution generation unit, a feature enhancement unit, a second acoustic model storage control unit, and a decoding unit.
  • a CPU reads out and executes an audio processing program from the storage medium, and the above-described units are loaded on the main storage device, so that a feature extraction unit, a noise estimation unit, and a first distribution storage A control unit, a distribution synthesis unit, a first acoustic model storage control unit, a probability calculation unit, a mixture weight storage control unit, a mixture weight fusion unit, a mixture distribution generation unit, a feature enhancement unit, a second acoustic model storage control unit, and The decoding unit is generated on the main storage device.
  • the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage.
  • various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the above embodiments. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.
  • the speech processing apparatus, speech processing method, and speech processing program according to the present invention are useful when speech recognition is performed under noise.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention porte sur un appareil de traitement vocal qui comprend une unité d'extraction de caractéristiques (1) qui extrait des premières caractéristiques vocales, une unité de déduction de bruit (2) qui déduit un bruit, une unité de stockage de premières distributions (3) qui stocke un ensemble de premières distributions de base, une unité de synthèse de distribution (6) qui synthétise une seconde distribution de base à partir de chaque distribution de base sur la base du bruit déduit, une unité de stockage de premier modèle acoustique (7) qui stocke un premier modèle acoustique, une unité de calcul de probabilité (9) qui calcule une probabilité postérieure d'état par comparaison d'une série des premières caractéristiques vocales et du premier modèle acoustique, une unité de stockage de poids de mélange (10) qui stocke des poids de mélange correspondant aux secondes distributions de base respectives, une unité de fusion de poids de mélange (12) qui fusionne les poids de mélange à l'aide de la probabilité postérieure d'état pour calculer un poids de mélange fusionné, une unité de génération de distribution mélangée (13) qui mélange les secondes distributions de base avec le poids de mélange fusionné pour générer une distribution mélangée, et une unité de mise en évidence de caractéristique (14) qui déduit des secondes caractéristiques vocales à partir des premières caractéristiques vocales à l'aide de la distribution mélangée.
PCT/JP2009/069580 2009-03-26 2009-11-18 Appareil de traitement vocal, procédé de traitement vocal et programme de traitement vocal WO2010109725A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2009-077325 2009-03-26
JP2009077325A JP2010230913A (ja) 2009-03-26 2009-03-26 音声処理装置、音声処理方法、及び、音声処理プログラム

Publications (1)

Publication Number Publication Date
WO2010109725A1 true WO2010109725A1 (fr) 2010-09-30

Family

ID=42780427

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2009/069580 WO2010109725A1 (fr) 2009-03-26 2009-11-18 Appareil de traitement vocal, procédé de traitement vocal et programme de traitement vocal

Country Status (2)

Country Link
JP (1) JP2010230913A (fr)
WO (1) WO2010109725A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106384587A (zh) * 2015-07-24 2017-02-08 科大讯飞股份有限公司 一种语音识别方法及系统
CN108511002A (zh) * 2018-01-23 2018-09-07 努比亚技术有限公司 危险事件声音信号识别方法、终端和计算机可读存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105788600B (zh) * 2014-12-26 2019-07-26 联想(北京)有限公司 声纹识别方法和电子设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004004509A (ja) * 2001-12-20 2004-01-08 Matsushita Electric Ind Co Ltd 音響モデルを作成する方法、音響モデルを作成する装置、音響モデルを作成するためのコンピュータプログラム
JP2007279444A (ja) * 2006-04-07 2007-10-25 Toshiba Corp 特徴量補正装置、特徴量補正方法および特徴量補正プログラム
JP2007279349A (ja) * 2006-04-06 2007-10-25 Toshiba Corp 特徴量補正装置、特徴量補正方法および特徴量補正プログラム

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004004509A (ja) * 2001-12-20 2004-01-08 Matsushita Electric Ind Co Ltd 音響モデルを作成する方法、音響モデルを作成する装置、音響モデルを作成するためのコンピュータプログラム
JP2007279349A (ja) * 2006-04-06 2007-10-25 Toshiba Corp 特徴量補正装置、特徴量補正方法および特徴量補正プログラム
JP2007279444A (ja) * 2006-04-07 2007-10-25 Toshiba Corp 特徴量補正装置、特徴量補正方法および特徴量補正プログラム

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106384587A (zh) * 2015-07-24 2017-02-08 科大讯飞股份有限公司 一种语音识别方法及系统
CN106384587B (zh) * 2015-07-24 2019-11-15 科大讯飞股份有限公司 一种语音识别方法及系统
CN108511002A (zh) * 2018-01-23 2018-09-07 努比亚技术有限公司 危险事件声音信号识别方法、终端和计算机可读存储介质
CN108511002B (zh) * 2018-01-23 2020-12-01 太仓鸿羽智能科技有限公司 危险事件声音信号识别方法、终端和计算机可读存储介质

Also Published As

Publication number Publication date
JP2010230913A (ja) 2010-10-14

Similar Documents

Publication Publication Date Title
Deng et al. Recursive estimation of nonstationary noise using iterative stochastic approximation for robust speech recognition
JP6293912B2 (ja) 音声合成装置、音声合成方法およびプログラム
WO2010119534A1 (fr) Dispositif, procédé et programme de synthèse de parole
JP2008203469A (ja) 音声認識装置及び方法
JP2004310098A (ja) スイッチング状態空間型モデルによる変分推論を用いた音声認識の方法
WO2012001458A1 (fr) Procédé et appareil d'étiquette vocale basée sur un score de confiance
JP2010055030A (ja) 音響処理装置およびプログラム
JP2004279466A (ja) 音声モデルの雑音適応化システム、雑音適応化方法、及び、音声認識雑音適応化プログラム
JP2010078650A (ja) 音声認識装置及びその方法
JP2004226982A (ja) 隠れ軌跡隠れマルコフモデルを使用した音声認識の方法
JP5351856B2 (ja) 音源パラメータ推定装置と音源分離装置とそれらの方法と、プログラムと記憶媒体
Saheer et al. VTLN adaptation for statistical speech synthesis
WO2010109725A1 (fr) Appareil de traitement vocal, procédé de traitement vocal et programme de traitement vocal
JP6594251B2 (ja) 音響モデル学習装置、音声合成装置、これらの方法及びプログラム
JP6142401B2 (ja) 音声合成モデル学習装置、方法、及びプログラム
JP4964194B2 (ja) 音声認識モデル作成装置とその方法、音声認識装置とその方法、プログラムとその記録媒体
JP6542823B2 (ja) 音響モデル学習装置、音声合成装置、それらの方法、及びプログラム
JP2004117624A (ja) 音声モデルの雑音適応化システム、雑音適応化方法、及び、音声認識雑音適応化プログラム
JP2003005785A (ja) 音源の分離方法および分離装置
CN111933121B (zh) 一种声学模型训练方法及装置
JP4464797B2 (ja) 音声認識方法、この方法を実施する装置、プログラムおよびその記録媒体
JP2007233308A (ja) 音声認識装置
JP4654452B2 (ja) 音響モデル生成装置、およびプログラム
JP2005321660A (ja) 統計モデル作成方法、その装置、パターン認識方法、その装置、これらのプログラム、その記録媒体
JPH0822296A (ja) パターン認識方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09842335

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09842335

Country of ref document: EP

Kind code of ref document: A1