JPH0636156B2 - Voice recognizer - Google Patents
Voice recognizerInfo
- Publication number
- JPH0636156B2 JPH0636156B2 JP1057760A JP5776089A JPH0636156B2 JP H0636156 B2 JPH0636156 B2 JP H0636156B2 JP 1057760 A JP1057760 A JP 1057760A JP 5776089 A JP5776089 A JP 5776089A JP H0636156 B2 JPH0636156 B2 JP H0636156B2
- Authority
- JP
- Japan
- Prior art keywords
- word
- label
- vector
- segment
- representative value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 239000013598 vector Substances 0.000 claims description 70
- 238000013139 quantization Methods 0.000 claims description 32
- 230000006978 adaptation Effects 0.000 claims description 27
- 238000006073 displacement reaction Methods 0.000 claims 5
- 238000012986 modification Methods 0.000 claims 2
- 230000004048 modification Effects 0.000 claims 2
- 238000012937 correction Methods 0.000 claims 1
- 238000000034 method Methods 0.000 description 9
- 230000003044 adaptive effect Effects 0.000 description 6
- 238000002372 labelling Methods 0.000 description 6
- 230000007704 transition Effects 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000006866 deterioration Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Description
【発明の詳細な説明】 A.産業上の利用分野 この発明はフェノニック・マルコフ・モデルを利用した
音声認識装置に関し、特にベクトル量子化用コードブッ
クの適応化を高精度かつ簡易に行えるようにしたもので
ある。Detailed Description of the Invention A. BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus using a Phenonic Markov model, and more particularly to a vector quantization codebook adapted to be highly accurate and easy.
B.従来の技術 マルコフ・モデルを利用した音声認識は確率的な観点か
ら音声の認識を行なおうとするものである。たとえばそ
のうちの1つの手法では、まず、音声の特徴が一定周期
(フレームと呼ぶ)ごとに周波数分析されたのちにベク
トル量子化され、ラベル(シンボル)の系列に変換され
る。このラベルごとに1つのマルコフ・モデルが設定さ
れる。また、登録用音声のラベル系列に基づいて、単語
ごとにこのマルコフ・モデルの系列(単語ベースフォー
ム)が与えられる。それぞれのマルコフ・モデルには複
数の状態と、これら状態間の遷移が規定され、これら遷
移にはその遷移の生起確率が割当てられ、また、状態ま
たはその遷移には、その状態または遷移においてラベル
を出力する確率が割当てられる。未知入力音声はラベル
系列に変換され、単語ベースフォームによって規定され
る単語マルコフ・モデルの各々がこのラベル系列を生成
する確率を、上述の遷移生起確率及びラベル出力確率
(以下これらをパラメータと呼ぶ)に基づいて決定し、
ラベル生成確率が最大となる単語マルコフ・モデルを求
める。そしてこの結果に基づいて認識を行なう。B. 2. Description of the Related Art Speech recognition using a Markov model attempts to recognize speech from a probabilistic point of view. For example, in one of these methods, first, a speech feature is frequency-analyzed at regular intervals (called a frame), then vector-quantized, and converted into a label (symbol) sequence. One Markov model is set for each label. Further, this Markov model sequence (word-based form) is given for each word based on the label sequence of the registration voice. Each Markov model defines multiple states and the transitions between these states, these transitions are assigned the probability of occurrence of the transition, and the state or its transition is labeled with the state or transition. Probability to output is assigned. The unknown input speech is converted into a label sequence, and the probability that each of the word Markov models defined by the word-based form produces this label sequence is the above-mentioned transition occurrence probability and label output probability (these are referred to as parameters below). Based on
The word Markov model with the maximum label generation probability is obtained. Then, recognition is performed based on this result.
このようなラベル単位のマルコフ・モデルはフェノニッ
ク・マルコフ・モデルと呼ばれている。同じラベル名で
対応づけられたモデルは、モデルの訓練および認識時に
共通のモデルとして扱われる。フェノニック・マルコフ
・モデルについては以下の論文に詳細が記載されてい
る。Such a label-based Markov model is called a Phenonic Markov model. The models associated with the same label name are treated as a common model during model training and recognition. The Phenonic Markov model is described in detail in the following paper.
(1)“Acoustic Markov Models Used in The Tangora
Speech Recognition System”(Proceedings of ICASS
P′88,1988,4月,S11−3,L.R.Bahl,P.F.Brown,
P.V.de Souza,R.L.Mercer and M.A.Picheny) ところで、上記のようなマルコフ・モデルを用いた音声
認識では、ベクトル量子化のコードブックの作成と、マ
ルコフ・モデルの推定、さらには単語ベースフォームの
登録用に大量の音声データが必要であり、また、これら
の操作を行うのにも多くの時間を必要とする。しかも所
定の話者の音声データで作成したシステムでは、他の話
者の認識精度が十分でない場合が多い。また、同一話者
であっても、学習時と認識時との間にかなりの時間を置
き、そのため環境が異なってしまうと、認識精度が低下
する。さらに環境雑音による認識精度の劣化も問題とな
る。文献(1)では、単語ベースフォームを所定の話者
の発声で作成しておくことで、学習時間を大幅に削減し
てはいるが、量子化コードブックおよびマルコフ・モデ
ルのパラメータは話者ごとに推定しなおしているため、
まだ多くの音声データと処理時間を要した。近年このよ
うな課題を解決するために、所定の話者のベクトル量子
化コードブックとマルコフ・モデルを、話者や環境に対
して適応化させることが提案されている。特にベクトル
量子化コードブックの適応化方法としては、つぎの2つ
の類型に分けて考えることが出来る。(1) “Acoustic Markov Models Used in The Tangora
Speech Recognition System ”(Proceedings of ICASS
P'88, 1988, April, S11-3, LRBahl, PFBrown,
PVde Souza, RL Mercer and MAPicheny) By the way, in speech recognition using the Markov model as described above, it is necessary to create a codebook for vector quantization, estimate the Markov model, and register a large number of word-based forms. Voice data is required and it takes a lot of time to perform these operations. Moreover, in a system created with voice data of a predetermined speaker, recognition accuracy of other speakers is often insufficient. In addition, even if the speaker is the same, if a considerable amount of time is left between the time of learning and the time of recognition, and thus the environment is different, the recognition accuracy is reduced. Further, deterioration of recognition accuracy due to environmental noise is also a problem. In the literature (1), although the learning time is significantly reduced by creating the word-based form with the utterance of a predetermined speaker, the parameters of the quantization codebook and the Markov model are different for each speaker. Since it is re-estimated to
It still requires a lot of voice data and processing time. In recent years, in order to solve such a problem, it has been proposed to adapt the vector quantization codebook and Markov model of a given speaker to the speaker and environment. In particular, the vector quantization codebook adaptation method can be divided into the following two types.
1つは学習用の発声と所定の話者の発声との対応をDP
マッチングによって求め、これを利用してコードブック
を適応化するものである。これについては、 (2)“ベクトル量子化による話者適応化”(電子通信
学会技術研究報告、1986、12月、SP86−6
5、pp.33−40、鹿野清宏) に記載がある。しかしながらこの方法では特徴量の分布
が大幅に変化する場合には、正確な対応関係を求めるこ
とはできない。また、距離に基づく対応なので、マルコ
フ・モデル上での評価とは必ずしも一致しないし、マル
コフ・モデルとは別にDPを必要とするため記憶量の面
でも効率が悪い。The first is the correspondence between the vocalization for learning and the vocalization of a predetermined speaker.
It is obtained by matching and the codebook is adapted using this. Regarding this, (2) "Speaker adaptation by vector quantization" (Technical report of IEICE, 1986, December, SP86-6).
5, pp. 33-40, Kiyohiro Shikano). However, this method cannot obtain an accurate correspondence when the distribution of the feature amount changes significantly. Further, since the correspondence is based on the distance, it does not always match the evaluation on the Markov model, and since the DP is required separately from the Markov model, the efficiency of the storage amount is also poor.
2つめは時間軸上の対応関係を使わず、学習音声を元の
コードブックを参照しながらクラスタリングすること
で、適応化されたコードブックを作成するものである。
このような方法としては、 (3)“スペクトル空間のクラスタ化に基づく教師なし
話者適応化方法”(日本音響学会昭和63年度春季全国
大会講演論文集、1988、3月、2−2−16、古井
貞おき) (4)“Speaker Adaptation Method for HMM-Based Sp
eech Recognition”、(Proceedinds of ICASSP′88,19
88,4月,S5−7,M.Nishimura and K.Sugawara) に記載がある。これらの方法は多くの計算量、記憶量を
必要とする。また、時間軸上の対応関係を一切無視して
いることから、あまり精度の高い適応化は期待できな
い。The second is to create an adapted codebook by clustering the learning speech while referring to the original codebook without using the correspondence on the time axis.
As such a method, (3) "Unsupervised speaker adaptation method based on clustering of spectrum space" (Acoustic Society of Japan 1988 Spring National Convention Proceedings, 1988, March, 2-2-16) , Sadaoki Furui) (4) “Speaker Adaptation Method for HMM-Based Sp
eech Recognition ”, (Proceedinds of ICASSP'88,19
88, April, S5-7, M. Nishimura and K. Sugawara). These methods require a large amount of calculation and storage. In addition, since the correspondence on the time axis is completely ignored, highly precise adaptation cannot be expected.
その他、文献(4)にはマルコフ・モデルのパラメータ
を適応化する方法に関する記載がある。In addition, document (4) describes a method of adapting the parameters of the Markov model.
C.発明が解決しようとする問題点 この発明は以上の事情を考慮してなされたものであり、
ラベル間の対応関係を保持しつつ、大幅な特徴量の変動
にも適応化させることが出来、しかもその適応化を簡易
に行うことが出来る音声認識装置を提供することを目的
としている。C. Problems to be Solved by the Invention This invention was made in consideration of the above circumstances,
An object of the present invention is to provide a voice recognition device that can adapt to a large variation of a feature amount while maintaining the correspondence between labels and can easily perform the adaptation.
D.問題点を解決するための手段 本発明では、まず適応化学習用の単語発声を一定周期ご
とに周波数分析して特徴ベクトルの系列を求める。そし
て、この特徴ベクトル系列をN(1≦N)個の区画に時
間軸上で分割(好ましくは等分割)し、所定の話者から
前以て求めておいた単語ベースフォームも同様にN個の
区画に分割(好ましくは等分割)することで各部分の対
応関係を得る。ベースフォーム側もベクトル量子化コー
ドブックを参照することで特徴ベクトルの系列とみなせ
るから、各区画の対応関係に基づき、それぞれの区画内
の特徴量の代表値(好ましくは平均値)の差(特徴量の
移動ベクトル)を求める。一方、各ラベルと各区画との
対応の強さを、ラベルの条件付の各区画の出現確率とし
て求める。そして、(式1)に従い、この条件付確率を
重みとして区画ごとに求まる特徴量の移動ベクトルを合
成することで、各ラベルに対応するコードベクトルを適
応化するようにしている。一連の操作の概要を、適応化
学習用単語数が1、分割された区画数が2、ラベル数も
2の場合を例にとって第1図に示す。ただし、i(1≦
i≦W)は単語番号、j(1≦j≦N)は区画番号、S
ijは適応化学習用音声の単語i、区画jにおける特徴
量の平均ベクトル、Bijは単語ベースフォームと量子
化コードブックによって推定される特徴量の平均ベクト
ル、Fkはラベル番号kに対応するコードベクトル、F
k′は適応化後のコードベクトルである。また、P
(i,j|Lk)はLkの条件付の単語i、区画jの出
現確率である。D. Means for Solving the Problem In the present invention, first, a word utterance for adaptive learning is subjected to frequency analysis at regular intervals to obtain a sequence of feature vectors. Then, this feature vector sequence is divided (preferably equally divided) into N (1 ≦ N) partitions on the time axis, and N word-based forms previously obtained from a predetermined speaker are similarly divided. By dividing (preferably equally dividing) into sections, the correspondence relationship of each part is obtained. Since the base form side can also be regarded as a series of feature vectors by referring to the vector quantization codebook, the difference (feature value) between the representative values (preferably average values) of the feature amounts in each section is based on the correspondence between the sections. Quantity movement vector). On the other hand, the strength of correspondence between each label and each section is obtained as the appearance probability of each section with the label condition. Then, according to (Equation 1), by using the conditional probability as a weight and synthesizing the movement vector of the feature amount obtained for each section, the code vector corresponding to each label is adapted. An outline of a series of operations is shown in FIG. 1 by way of example in which the number of words for adaptive learning is 1, the number of divided sections is 2, and the number of labels is also 2. However, i (1 ≦
i ≦ W) is a word number, j (1 ≦ j ≦ N) is a partition number, S
ij is the word i of the adaptive learning speech, the average vector of the feature amount in the section j, B ij is the average vector of the feature amount estimated by the word base form and the quantization codebook, and F k corresponds to the label number k. Code vector, F
k'is the code vector after adaptation. Also, P
(I, j | L k ) is the occurrence probability of the conditional word i of L k , the section j.
なお、ラベルの条件付の各区画の出現確率P(i,j|
Lk)は、単語ベースフォームについて各区画内のラベ
ルの出現頻度P(Lk|i,j)を求め、これをベイズ
の定理に従って変形すれば求まる。また、各区画内のラ
ベルの出現頻度としては、(式2)に示すように、単語
ベースフォーム中のラベルの出現頻度をフェノニック・
マルコフ・モデルのラベル出力確率を用いて平滑化した
ものを用いることも出来る。ここでMkはラベルLkに
対応付けられたフェノニック・マルコフ・モデルの状態
(フェノン)であり、P(Lk|Ml)はこのモデルの
ラベル出力確率を表している。 The appearance probability P (i, j |
L k ) can be obtained by obtaining the appearance frequency P (L k | i, j) of the label in each section for the word-based form and transforming it according to Bayes' theorem. As the frequency of appearance of the label in each section, as shown in (Equation 2), the frequency of appearance of the label in the word-based form is phenonic.
It is also possible to use one smoothed by using the Markov model label output probability. Here, M k is the state (phenon) of the Phenonic Markov model associated with the label L k , and P (L k | M l ) represents the label output probability of this model.
E.実施例 以下、この発明をフェノニック・マルコフ・モデルに基
づく単語音声認識に適用した一実施例について図面を参
照しながら説明しよう。第2図はこの実施例を全体とし
て示すものであり、この第2図において、入力音声デー
タはマイクロホン1および増幅器2を介してアナログ・
デジタル(A/D)変換器3に供給され、ここでデジタ
ル・データとされる。デジタル化された音声データは特
徴抽出装置4に供給される。この特徴抽出装置4におい
ては、まず音声データが離散フーリエ変換された後、聴
覚の特性を反映した20チャンネル分の臨界帯域フィル
タの出力として取り出される。この出力は8m秒毎に次
段の切り換え装置5に送られ、ベクトル量子化コードブ
ック初期学習装置6、ベクトル量子化コードブック適応
化装置7およびラベル付け装置8のいずれかに送られ
る。ベクトル量子化コードブックの初期学習時には切り
換え装置5がコードブック初期学習装置6側に切り替わ
って、臨界帯域フィルタの出力を初期学習装置6に供給
する。初期学習装置6はクラスタリングによって128
個のコードベクトルからなるベクトル量子化コードブッ
ク9を作成する。コードブックの適応化を行う際には切
り換え装置5が適応化装置7側に切り替わり、適応化装
置7が初期学習時のベクトル量子化コードブック9を初
期値とし、このあと説明する単語ベースフォーム・テー
ブル15を参照しながらコードブックの適応化を行う。
なお適応化装置7の詳細についてはのちに第4図を参照
して説明する。認識を行う際あるいは単語ベースフォー
ムの登録、マルコフ・モデルの初期学習、適応化を行う
際には切り換え装置5がラベル付け装置8側に切り替わ
り、ラベル付け装置8はベクトル量子化コードブック9
を参照して順次ラベル付けを行ってゆく。ただし、マル
コフ・モデルの初期学習を行う際にはベクトル量子化コ
ードブックは初期学習時のものが用いられる。 E. Embodiment An embodiment in which the present invention is applied to word speech recognition based on the Phenonic Markov Model will be described below with reference to the drawings. FIG. 2 shows this embodiment as a whole. In FIG. 2, the input voice data is transferred to the analog / analog system through the microphone 1 and the amplifier 2.
It is supplied to the digital (A / D) converter 3, where it is converted to digital data. The digitized voice data is supplied to the feature extraction device 4. In the feature extraction device 4, the sound data is first subjected to the discrete Fourier transform, and then extracted as the output of the 20-channel critical band filter reflecting the auditory characteristics. This output is sent to the next-stage switching device 5 every 8 msec, and sent to any of the vector quantization codebook initial learning device 6, the vector quantization codebook adaptation device 7, and the labeling device 8. During the initial learning of the vector quantization codebook, the switching device 5 switches to the codebook initial learning device 6 side and supplies the output of the critical band filter to the initial learning device 6. The initial learning device 6 uses 128 by clustering.
A vector quantization codebook 9 consisting of a number of code vectors is created. When the codebook is adapted, the switching device 5 is switched to the adaptation device 7 side, and the adaptation device 7 sets the vector quantization codebook 9 at the time of initial learning as an initial value. The codebook is adapted with reference to the table 15.
The details of the adaptation device 7 will be described later with reference to FIG. When performing recognition, word-based form registration, initial learning of a Markov model, or adaptation, the switching device 5 switches to the labeling device 8 side, and the labeling device 8 uses the vector quantization codebook 9
Refer to and label sequentially. However, when initial learning of the Markov model is performed, the vector quantization codebook used for initial learning is used.
なお、ラベル付けはたとえば第3図に示すように行なわ
れる。第3図においてXは入力の特徴量、Yjは第j番
目のラベルの特徴量(コードベクトル)、Mはコードベ
クトルの個数(=128)、dist(X,Y)はXとYj
とのユークリッド距離、mは各時点までのdist(X,
Y)の最小値である。なおmは非常に大きな値Vに最初
設定される。図から明らかなように入力の特徴量Xはコ
ードベクトルの各々と順次比較されていき、最も似てい
る、すなわち距離の最も小さいものが観測されたラベル
(ラベル番号)Lとして出力されてゆく。Labeling is performed, for example, as shown in FIG. In FIG. 3, X is the input feature amount, Y j is the feature amount (code vector) of the j-th label, M is the number of code vectors (= 128), and dist (X, Y) is X and Y j.
Euclidean distance between, and m is dist (X,
It is the minimum value of Y). Note that m is initially set to a very large value V. As is apparent from the figure, the input feature amount X is sequentially compared with each of the code vectors, and the most similar one, that is, the one having the smallest distance is output as the observed label (label number) L.
第2図に戻る。ラベル付け装置8からのラベル系列は切
り換え装置10を介して単語ベースフォーム登録装置1
1、マルコフ・モデル初期学習装置12、マルコフ・モ
デル適用化装置13および認識装置14のいずれか1つ
に供給される。単語ベースフォーム登録時には、切り換
え装置10が単語ベースフォーム登録装置11側に切り
替わって、ラベル系列を単語ベースフォーム登録装置1
1に供給する。単語ベースフォーム登録装置11はラベ
ル系列を利用して、単語ベースフォーム・テーブル15
を作成する。マルコフ・モデルの初期学習時には、切り
換え装置10が初期学習装置12側に切り替わってラベ
ル系列を初期学習装置12に供給する。初期学習装置1
2はラベル系列とベースフォーム・テーブル15を利用
してモデルの訓練を行ない、パラメータ・テーブル16
のパラメータ値を決定する。適応化を行う際には切り換
え装置10が適応化装置13側に切り替わり、適応化装
置13が入力ラベル系列と、単語ベースフォーム上の各
フェノニック・マルコフ・モデルとの対応関係を利用し
てパラメータ・テーブル16のパラメータ値を適応化す
る。認識を行う際には切り換え装置10が認識装置14
側に切り替わり、認識装置14は入力ラベル系列と、単
語ベースフォームおよびパラメータ・テーブルに基づい
て入力音声の認識を行う。Return to FIG. The label sequence from the labeling device 8 is passed through the switching device 10 to the word-based form registration device 1
1, the Markov model initial learning device 12, the Markov model application device 13, and the recognition device 14. When registering the word-based form, the switching device 10 switches to the word-based form registration device 11 side, and the label series is changed to the word-based form registration device 1.
Supply to 1. The word-based form registration device 11 uses the label series to generate the word-based form table 15
To create. At the time of initial learning of the Markov model, the switching device 10 switches to the initial learning device 12 side and supplies the label series to the initial learning device 12. Initial learning device 1
2 uses the label sequence and the base form table 15 to train the model, and the parameter table 16
Determine the parameter value of. When performing the adaptation, the switching device 10 is switched to the adaptation device 13 side, and the adaptation device 13 uses the correspondence relationship between the input label sequence and each Phenonic Markov model on the word-based form to set parameters / parameters. Adapt the parameter values in table 16. When performing recognition, the switching device 10 causes the recognition device 14
The recognition device 14 recognizes the input voice based on the input label sequence and the word-based form and the parameter table.
認識装置14の出力はワークステーション17に供給さ
れ、たとえばその表示装置に表示される。なお第2図に
おいてマイクロフォン1、増幅器2、および表示装置1
7を除く全ての装置はワークステーション上にソフトウ
ェアとして実現されている。なおワークステーションと
してはIBM社の5570処理装置、オペレーション・
システムとしては日本語DOS、言語としてはC言語お
よびマクロ・アセンブラを用いた。もちろん、ハードウ
ェアとして実現しても良い。The output of the recognition device 14 is supplied to the workstation 17 and displayed on the display device, for example. In FIG. 2, the microphone 1, the amplifier 2, and the display device 1 are shown.
All devices except 7 are implemented as software on the workstation. As a workstation, IBM's 5570 processor, operation
Japanese DOS was used as the system, and C language and macro assembler were used as the languages. Of course, it may be realized as hardware.
次にベクトル量子化コードブック適応化装置7の動作つ
いて第4図を参照しながら説明する。第4図はコードブ
ック適応化の手順を示すもので、この図においてまず、
ベクトル量子化コードブックから、各ラベルLkに対応
するコードベクトルFkが読みこまれる(ステップ1
8)。次に適応化学習用単語iの音声データが入力され
る(ステップ20)。この音声データを時間軸上でN等
分割し、それぞれの区画jにおける平均特徴ベクトルS
ijを推定する(ステップ21)。また、単語ベースフ
ォームについても単語番号iのベースフォームを読み込
む(ステップ22)。この単語ベースフォームも時間軸
上でN等分割し、ステップ18で読みこんだコードベク
トルを参照することで、各区画jにおける平均特徴ベク
トルBijを推定する(ステップ23)。さらに各区画
jにおけるラベルLkの出現頻度P(Lk|i,j)も
N等分割された単語ベースフォームから推定する(ステ
ップ24)。ステップ20〜24の操作を全ての適応化
学習用語彙に対して行なったのち、P(Lk|i,j)
を変換し、ラベルの条件付の単語と区画の出現確率p
(i,j|Lk)を求める(ステップ27)。そして式
(1)に従って、全てのコードベクトルFkを適応化
し、既存のベクトル量子化コードブックをこの適応化さ
れたコードベクトルで置き換える(ステップ28)。Next, the operation of the vector quantization codebook adaptation device 7 will be described with reference to FIG. FIG. 4 shows the procedure for codebook adaptation.
From the vector quantization codebook, codevector F k corresponding to each label L k is crowded read (Step 1
8). Next, voice data of the adaptive learning word i is input (step 20). This voice data is divided into N equal parts on the time axis, and the average feature vector S in each section j is divided.
Estimate ij (step 21). Further, as for the word base form, the base form with the word number i is read (step 22). This word-based form is also divided into N equal parts on the time axis, and the average feature vector B ij in each section j is estimated by referring to the code vector read in step 18 (step 23). Further, the appearance frequency P (L k | i, j) of the label L k in each section j is also estimated from the word-based form divided into N equal parts (step 24). After performing the operations of steps 20 to 24 for all the adaptive learning vocabularies, P (L k | i, j)
And the conditional word of the label and the appearance probability p of the section
(I, j | L k ) is calculated (step 27). Then, according to equation (1), all code vectors F k are adapted and the existing vector quantization codebook is replaced with this adapted code vector (step 28).
最後に「警報、平方、直線、直前」など類似性の高い1
50単語を認識対象語彙としてこの実施例の評価実験を
行った。この実験ではベクトル量子化コードブックおよ
びマルコフ・モデルの初期学習用の音声データは男性話
者1名の10回分の150単語発声を用い、そして他の
11名の話者(男性7名、女性4名)で適応化の効果を
みた。適応化は対象語彙の一部(10,25,50,1
00および150単語:各単語1回の発声)で行ない、
各話者3回分の150単語発声を用いて認識実験を行っ
た。第5図に認識実験結果を示す。ここで、横軸は適応
化学習用単語数、縦軸は平均誤認識率である。白丸はマ
ルコフ・モデルだけを適応化した場合の結果を、黒丸は
本発明をマルコフ・モデルの適応化と併用した場合の結
果を示している。なお、4%のところの実線は、初期学
習を行った話者での認識実験結果である。この結果か
ら、本発明を適用することによって、男性話者間では、
初期学習を行った話者とまったく同じ認識精度が、25
単語1回の学習で得られている。また、特徴量の大幅な
変動のために、マルコフ・モデルだけの適応化では15
0単語の学習を行っても10%近い誤りのあった男女間
の適応化についても、本発明を用いることで、初期学習
を行った話者とほぼ同等の精度が得られることが分る。Finally, there is a high similarity 1 such as "alarm, square, straight line, immediately before"
An evaluation experiment of this example was conducted with 50 words as the recognition target vocabulary. In this experiment, the vector quantization codebook and the voice data for the initial learning of the Markov model used the utterance of 150 words for 10 times by one male speaker, and the other 11 speakers (7 males, 4 females). Name) to see the effect of adaptation. Adaptation is part of the target vocabulary (10, 25, 50, 1
00 and 150 words: Spoken once for each word),
A recognition experiment was performed using 150-word utterances for three times for each speaker. Figure 5 shows the results of recognition experiments. Here, the horizontal axis is the number of words for adaptive learning, and the vertical axis is the average misrecognition rate. White circles show the results when only the Markov model was adapted, and black circles show the results when the present invention was used in combination with the Markov model adaptation. The solid line at 4% is the result of the recognition experiment by the speaker who performed the initial learning. From this result, by applying the present invention, among male speakers,
The same recognition accuracy as that of the speaker who performed initial learning is 25
It is obtained by learning a word once. Also, due to the large variation of the feature quantity, the adaptation of Markov model alone would
It can be seen that, with regard to adaptation between men and women who have an error of nearly 10% even if 0 words are learned, by using the present invention, almost the same accuracy as that of a speaker who has performed initial learning can be obtained.
なお、本発明は適応化に要する計算量や記憶量も僅か
で、小規模な処理装置上でも容易に実現することが出来
る。The present invention requires a small amount of calculation and memory for adaptation, and can be easily implemented on a small-scale processing device.
F.発明の効果 以上説明したように、この発明によれば僅かなデータで
簡易に音声認識システムの適応化を行うことが出来る。
しかも、そのための計算量や記憶量も少ない。F. As described above, according to the present invention, the voice recognition system can be easily adapted with a small amount of data.
Moreover, the amount of calculation and the amount of memory for that are small.
第1図はこの発明を説明するための図、第2図はこの発
明の一実施例を示すブロック図、第3図は第2図例のラ
ベル付け装置8を説明するフローチャート、第4図は第
2図例のベクトル量子化コードブック適応化装置7を説
明するフローチャート、第5図は本発明の適用結果の実
験データを示す図である。 7…ベクトル量子化コードブック適応化装置、9…ベク
トル量子化コードブック、15…単語ベースフォーム・
テーブル、16…パラメータ・テーブル。FIG. 1 is a diagram for explaining the present invention, FIG. 2 is a block diagram showing an embodiment of the present invention, FIG. 3 is a flow chart for explaining the labeling device 8 in the example of FIG. 2, and FIG. FIG. 2 is a flowchart for explaining the vector quantization codebook adaptation device 7 of the example, and FIG. 5 is a diagram showing experimental data as an application result of the present invention. 7 ... Vector quantization codebook adaptation device, 9 ... Vector quantization codebook, 15 ... Word-based form
Table, 16 ... Parameter table.
Claims (6)
徴ベクトルを得る手段と、 さらにベクトル量子化コードブックを用いて対応するラ
ベルの系列を生成する手段と、 ラベルに対応するマルコフ・モデルの連鎖として記述さ
れる複数の単語ベースフォームと、上記ラベルの系列を
整合させる手段とを具備し、 この整合結果に基づいて入力音声の認識を行うととも
に, さらに、複数の単語入力音声をN(Nは2以上の整数)
分割し、各単語入力音声の各セグメントの特徴ベクトル
の代表値を生成する手段と、 上記単語入力音声に対応する単語ベースフォームをN分
割し、各単語ベースフォームの各セグメントの特徴ベク
トルの代表値を上記ベクトル量子化コードブックのプロ
トタイプ・ベクトルに基づいて生成する手段と、 各単語入力音声の各セグメントの代表値と対応する単語
ベースフォームの対応するセグメントの代表値との間の
変位を表示する変位ベクトルを生成する手段と、 上記各単語入力音声の各セグメントとベクトル量子化コ
ードブックのラベル組中の各ラベルとの間の関連度を記
憶する手段と、 上記ベクトル量子化コードブックのラベル組中の各ラベ
ルのプロトタイプ・ベクトルを上記各変位ベクトルによ
り当該ラベルおよび当該変位ベクトルとの間の関連度に
応じて修正するプロトタイプ適応化手段とを有すること
を特徴とする音声認識装置。1. A means for obtaining a feature vector by frequency-analyzing an input speech for each constant frequency, a means for generating a sequence of corresponding labels using a vector quantization codebook, and a Markov model corresponding to the labels. A plurality of word-based forms described as a chain of and a means for matching the above-mentioned label series are provided. Based on the matching result, the input speech is recognized, and further, a plurality of word input speeches are recognized by N ( (N is an integer of 2 or more)
A means for dividing and generating a representative value of the feature vector of each segment of each word input voice, and a word base form corresponding to the above word input voice is divided into N, and a representative value of the feature vector of each segment of each word base form Displaying the displacement between the representative value of each segment of each word input speech and the representative value of the corresponding segment of the corresponding word-based form. Means for generating a displacement vector, means for storing the degree of association between each segment of each word input speech and each label in the label set of the vector quantization codebook, and a label set of the vector quantization codebook The prototype vector of each label in the And a prototype adapting means for making corrections according to the degree of association between the speech recognition apparatus.
ベクトルの代表値を当該セグメント中の特徴ベクトルの
平均値とした特許請求の範囲第1項記載の音声認識装
置。2. The voice recognition device according to claim 1, wherein the representative value of the feature vector of each segment of each word input voice is the average value of the feature vectors in the segment.
の特徴ベクトルの代表値を当該セグメント中のラベルの
プロトタイプ・ベクトルの平均値とした特許請求の範囲
第1項または第2項記載の音声認識装置。3. The speech recognition apparatus according to claim 1, wherein the representative value of the feature vector of each segment of each word-based form is the average value of the prototype vector of the label in the segment. .
トル量子化コードブックのラベル組中の各ラベルとの間
の関連度を ただし、P(Lk|i,j)は単語iの単語入力音声のセ
グメントjとベクトル量子化コードブックのラベル組中
のラベルLkとの間の関連度、P(LK|Ml)はマルコフ・
モデルMlにおいてラベルLkを出力する確率、P(Ml|
i,j)は単語iのセグメントjにおいてマルコフ・モ
デルMlが生起する確率である。 に基づいて求める特許請求の範囲第1項、第2項または
第3項記載の音声認識装置。4. The degree of association between each segment of each word input speech and each label in the label set of the vector quantization codebook is calculated. Where P (L k | i, j) is the degree of association between the segment j of the word input speech of the word i and the label L k in the label set of the vector quantization codebook, P (L K | M l ). Is Markov
Probability of outputting the label L k in the model M l , P (M l |
i, j) is the probability that the Markov model M l will occur in the segment j of the word i. The voice recognition device according to claim 1, claim 2, or claim 3 determined based on
ベクトル量子化コードブックのラベル組中の各ラベルの
プロトタイプ・ベクトルを ただしFkはラベルLkの修正前のプロトタイプ・ベクト
ル、Fk′はラベルLkの修正後のプロトタイプ・ベクト
ル、Sijは単語iの単語入力音声のセグメントjの特徴
ベクトルの代表値、Bijは単語iの単語ベースフォーム
のセグメントjの特徴ベクトルの代表値である。 に基づいて求める特許請求の範囲第4項記載の音声認識
装置。5. A prototype vector for each label in the label set of the vector quantization codebook in the prototype adaptation means. Where F k is the prototype vector of the label L k before modification, F k ′ is the prototype vector of the label L k after modification, S ij is the representative value of the feature vector of the segment j of the word input speech of the word i, and B ij is a representative value of the feature vector of the segment j of the word-based form of the word i. The voice recognition device according to claim 4, which is obtained based on
徴ベクトルを得る手段と、 さらにベクトル量子化コードブックを用いて対応するラ
ベルの系列を生成する手段と、 ラベルに対応するマルコフ・モデルの連鎖として記述さ
れる複数の単語ベースフォームと、上記ラベルの系列を
整合させる手段とを具備し、 この整合結果に基づいて入力音声の認識を行うととも
に, さらに、複数の単語入力音声の各々の特徴ベクトルの代
表値を生成する手段と、 上記単語入力音声に対応する単語ベースフォームの各々
の特徴ベクトルの代表値を上記ベクトル量子化コードブ
ックのプロトタイプ・ベクトルに基づいて生成する手段
と 各単語入力音声の代表値と対応する単語ベースフォーム
の代表値との間の変位を表示する変位ベクトルを生成す
る手段と、 上記各単語入力音声とベクトル量子化コードブックのラ
ベル組中の各ラベルとの間の関連度を記憶する手段と、 上記ベクトル量子化コードブックのラベル組中の各ラベ
ルのプロトタイプ・ベクトルを上記各変位ベクトルによ
り当該ラベルおよび当該変位ベクトルとの間の関連度に
応じて修正するプロトタイプ適応化手段とを有すること
を特徴とする音声認識装置。6. A means for obtaining a feature vector by frequency-analyzing an input voice for each constant frequency, a means for generating a series of corresponding labels using a vector quantization codebook, and a Markov model corresponding to the labels. A plurality of word-based forms described as a chain of words and a means for matching the above-mentioned label series are provided, and the input speech is recognized based on the matching result. Means for generating a representative value of the feature vector, means for generating a representative value of each feature vector of the word-based form corresponding to the word input speech based on the prototype vector of the vector quantization codebook, and each word input A means for generating a displacement vector representing the displacement between the representative value of the speech and the representative value of the corresponding word-based form; Means for storing the degree of association between each word input speech and each label in the label set of the vector quantization codebook, and a prototype vector for each label in the label set of the vector quantization codebook A speech recognition apparatus, comprising: a prototype adaptation unit that corrects the label according to the degree of association between the label and the displacement vector.
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP1057760A JPH0636156B2 (en) | 1989-03-13 | 1989-03-13 | Voice recognizer |
US07/485,402 US5046099A (en) | 1989-03-13 | 1990-02-27 | Adaptation of acoustic prototype vectors in a speech recognition system |
EP90302404A EP0388067B1 (en) | 1989-03-13 | 1990-03-07 | Speech recognition system |
DE69010722T DE69010722T2 (en) | 1989-03-13 | 1990-03-07 | Speech recognition system. |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP1057760A JPH0636156B2 (en) | 1989-03-13 | 1989-03-13 | Voice recognizer |
Publications (2)
Publication Number | Publication Date |
---|---|
JPH02238496A JPH02238496A (en) | 1990-09-20 |
JPH0636156B2 true JPH0636156B2 (en) | 1994-05-11 |
Family
ID=13064835
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP1057760A Expired - Lifetime JPH0636156B2 (en) | 1989-03-13 | 1989-03-13 | Voice recognizer |
Country Status (4)
Country | Link |
---|---|
US (1) | US5046099A (en) |
EP (1) | EP0388067B1 (en) |
JP (1) | JPH0636156B2 (en) |
DE (1) | DE69010722T2 (en) |
Families Citing this family (154)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5345536A (en) * | 1990-12-21 | 1994-09-06 | Matsushita Electric Industrial Co., Ltd. | Method of speech recognition |
US5182773A (en) * | 1991-03-22 | 1993-01-26 | International Business Machines Corporation | Speaker-independent label coding apparatus |
JP3050934B2 (en) * | 1991-03-22 | 2000-06-12 | 株式会社東芝 | Voice recognition method |
US5487129A (en) * | 1991-08-01 | 1996-01-23 | The Dsp Group | Speech pattern matching in non-white noise |
JP3129778B2 (en) * | 1991-08-30 | 2001-01-31 | 富士通株式会社 | Vector quantizer |
US5222146A (en) * | 1991-10-23 | 1993-06-22 | International Business Machines Corporation | Speech recognition apparatus having a speech coder outputting acoustic prototype ranks |
JPH0776878B2 (en) * | 1991-10-31 | 1995-08-16 | インターナショナル・ビジネス・マシーンズ・コーポレイション | Speech recognition method and device |
DE69232463T2 (en) * | 1991-12-31 | 2002-11-28 | Unisys Pulsepoint Communicatio | VOICE CONTROLLED MESSAGE SYSTEM AND PROCESSING PROCESS |
JP2779886B2 (en) * | 1992-10-05 | 1998-07-23 | 日本電信電話株式会社 | Wideband audio signal restoration method |
US6311157B1 (en) * | 1992-12-31 | 2001-10-30 | Apple Computer, Inc. | Assigning meanings to utterances in a speech recognition system |
US5613036A (en) * | 1992-12-31 | 1997-03-18 | Apple Computer, Inc. | Dynamic categories for a speech recognition system |
US5483579A (en) * | 1993-02-25 | 1996-01-09 | Digital Acoustics, Inc. | Voice recognition dialing system |
US5692100A (en) * | 1994-02-02 | 1997-11-25 | Matsushita Electric Industrial Co., Ltd. | Vector quantizer |
US5615299A (en) * | 1994-06-20 | 1997-03-25 | International Business Machines Corporation | Speech recognition using dynamic features |
AU683783B2 (en) * | 1994-12-02 | 1997-11-20 | Australian National University, The | Method for forming a cohort for use in identification of an individual |
AUPM983094A0 (en) * | 1994-12-02 | 1995-01-05 | Australian National University, The | Method for forming a cohort for use in identification of an individual |
US5751903A (en) * | 1994-12-19 | 1998-05-12 | Hughes Electronics | Low rate multi-mode CELP codec that encodes line SPECTRAL frequencies utilizing an offset |
US5864810A (en) * | 1995-01-20 | 1999-01-26 | Sri International | Method and apparatus for speech recognition adapted to an individual speaker |
JP3008799B2 (en) * | 1995-01-26 | 2000-02-14 | 日本電気株式会社 | Speech adaptation device, word speech recognition device, continuous speech recognition device, and word spotting device |
JP3280825B2 (en) * | 1995-04-26 | 2002-05-13 | 富士通株式会社 | Voice feature analyzer |
JP2738403B2 (en) * | 1995-05-12 | 1998-04-08 | 日本電気株式会社 | Voice recognition device |
JPH0981183A (en) * | 1995-09-14 | 1997-03-28 | Pioneer Electron Corp | Generating method for voice model and voice recognition device using the method |
GB2305288A (en) * | 1995-09-15 | 1997-04-02 | Ibm | Speech recognition system |
US5774841A (en) * | 1995-09-20 | 1998-06-30 | The United States Of America As Represented By The Adminstrator Of The National Aeronautics And Space Administration | Real-time reconfigurable adaptive speech recognition command and control apparatus and method |
US6081660A (en) * | 1995-12-01 | 2000-06-27 | The Australian National University | Method for forming a cohort for use in identification of an individual |
US5745872A (en) * | 1996-05-07 | 1998-04-28 | Texas Instruments Incorporated | Method and system for compensating speech signals using vector quantization codebook adaptation |
US5835890A (en) * | 1996-08-02 | 1998-11-10 | Nippon Telegraph And Telephone Corporation | Method for speaker adaptation of speech models recognition scheme using the method and recording medium having the speech recognition method recorded thereon |
US6460017B1 (en) | 1996-09-10 | 2002-10-01 | Siemens Aktiengesellschaft | Adapting a hidden Markov sound model in a speech recognition lexicon |
US6151575A (en) * | 1996-10-28 | 2000-11-21 | Dragon Systems, Inc. | Rapid adaptation of speech models |
US6212498B1 (en) | 1997-03-28 | 2001-04-03 | Dragon Systems, Inc. | Enrollment in speech recognition |
US6003003A (en) * | 1997-06-27 | 1999-12-14 | Advanced Micro Devices, Inc. | Speech recognition system having a quantizer using a single robust codebook designed at multiple signal to noise ratios |
US6044343A (en) * | 1997-06-27 | 2000-03-28 | Advanced Micro Devices, Inc. | Adaptive speech recognition with selective input data to a speech classifier |
US6032116A (en) * | 1997-06-27 | 2000-02-29 | Advanced Micro Devices, Inc. | Distance measure in a speech recognition system for speech recognition using frequency shifting factors to compensate for input signal frequency shifts |
US6070136A (en) * | 1997-10-27 | 2000-05-30 | Advanced Micro Devices, Inc. | Matrix quantization with vector quantization error compensation for robust speech recognition |
US6067515A (en) * | 1997-10-27 | 2000-05-23 | Advanced Micro Devices, Inc. | Split matrix quantization with split vector quantization error compensation and selective enhanced processing for robust speech recognition |
US6263309B1 (en) | 1998-04-30 | 2001-07-17 | Matsushita Electric Industrial Co., Ltd. | Maximum likelihood method for finding an adapted speaker model in eigenvoice space |
US6343267B1 (en) | 1998-04-30 | 2002-01-29 | Matsushita Electric Industrial Co., Ltd. | Dimensionality reduction for speaker normalization and speaker and environment adaptation using eigenvoice techniques |
US6163768A (en) | 1998-06-15 | 2000-12-19 | Dragon Systems, Inc. | Non-interactive enrollment in speech recognition |
US6219642B1 (en) | 1998-10-05 | 2001-04-17 | Legerity, Inc. | Quantization using frequency and mean compensated frequency input data for robust speech recognition |
US6347297B1 (en) | 1998-10-05 | 2002-02-12 | Legerity, Inc. | Matrix quantization with vector quantization error compensation and neural network postprocessing for robust speech recognition |
EP1011094B1 (en) * | 1998-12-17 | 2005-03-02 | Sony International (Europe) GmbH | Semi-supervised speaker adaption |
KR100307623B1 (en) * | 1999-10-21 | 2001-11-02 | 윤종용 | Method and apparatus for discriminative estimation of parameters in MAP speaker adaptation condition and voice recognition method and apparatus including these |
US6526379B1 (en) | 1999-11-29 | 2003-02-25 | Matsushita Electric Industrial Co., Ltd. | Discriminative clustering methods for automatic speech recognition |
US6571208B1 (en) | 1999-11-29 | 2003-05-27 | Matsushita Electric Industrial Co., Ltd. | Context-dependent acoustic models for medium and large vocabulary speech recognition with eigenvoice training |
US8645137B2 (en) | 2000-03-16 | 2014-02-04 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US6895376B2 (en) * | 2001-05-04 | 2005-05-17 | Matsushita Electric Industrial Co., Ltd. | Eigenvoice re-estimation technique of acoustic models for speech recognition, speaker identification and speaker verification |
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US8010341B2 (en) * | 2007-09-13 | 2011-08-30 | Microsoft Corporation | Adding prototype information into probabilistic models |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US20100030549A1 (en) | 2008-07-31 | 2010-02-04 | Lee Michael M | Mobile device having human language translation capability with positional feedback |
WO2010067118A1 (en) | 2008-12-11 | 2010-06-17 | Novauris Technologies Limited | Speech recognition involving a mobile device |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
DE202011111062U1 (en) | 2010-01-25 | 2019-02-19 | Newvaluexchange Ltd. | Device and system for a digital conversation management platform |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US8994660B2 (en) | 2011-08-29 | 2015-03-31 | Apple Inc. | Text correction processing |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9280610B2 (en) | 2012-05-14 | 2016-03-08 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
DE212014000045U1 (en) | 2013-02-07 | 2015-09-24 | Apple Inc. | Voice trigger for a digital assistant |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
WO2014144579A1 (en) | 2013-03-15 | 2014-09-18 | Apple Inc. | System and method for updating an adaptive speech recognition model |
KR101759009B1 (en) | 2013-03-15 | 2017-07-17 | 애플 인크. | Training an at least partial voice command system |
WO2014197334A2 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
WO2014197336A1 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
WO2014197335A1 (en) | 2013-06-08 | 2014-12-11 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
JP6259911B2 (en) | 2013-06-09 | 2018-01-10 | アップル インコーポレイテッド | Apparatus, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
KR101809808B1 (en) | 2013-06-13 | 2017-12-15 | 애플 인크. | System and method for emergency calls initiated by voice command |
DE112014003653B4 (en) | 2013-08-06 | 2024-04-18 | Apple Inc. | Automatically activate intelligent responses based on activities from remote devices |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
WO2015184186A1 (en) | 2014-05-30 | 2015-12-03 | Apple Inc. | Multi-command single utterance input method |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
DK179309B1 (en) | 2016-06-09 | 2018-04-23 | Apple Inc | Intelligent automated assistant in a home environment |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
DK179343B1 (en) | 2016-06-11 | 2018-05-14 | Apple Inc | Intelligent task discovery |
DK179415B1 (en) | 2016-06-11 | 2018-06-14 | Apple Inc | Intelligent device arbitration and control |
DK179049B1 (en) | 2016-06-11 | 2017-09-18 | Apple Inc | Data driven natural language event detection and classification |
DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
DK179745B1 (en) | 2017-05-12 | 2019-05-01 | Apple Inc. | SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT |
DK201770431A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4718094A (en) * | 1984-11-19 | 1988-01-05 | International Business Machines Corp. | Speech recognition system |
US4977599A (en) * | 1985-05-29 | 1990-12-11 | International Business Machines Corporation | Speech recognition employing a set of Markov models that includes Markov models representing transitions to and from silence |
-
1989
- 1989-03-13 JP JP1057760A patent/JPH0636156B2/en not_active Expired - Lifetime
-
1990
- 1990-02-27 US US07/485,402 patent/US5046099A/en not_active Expired - Fee Related
- 1990-03-07 EP EP90302404A patent/EP0388067B1/en not_active Expired - Lifetime
- 1990-03-07 DE DE69010722T patent/DE69010722T2/en not_active Expired - Lifetime
Also Published As
Publication number | Publication date |
---|---|
JPH02238496A (en) | 1990-09-20 |
US5046099A (en) | 1991-09-03 |
EP0388067B1 (en) | 1994-07-20 |
DE69010722T2 (en) | 1995-03-16 |
DE69010722D1 (en) | 1994-08-25 |
EP0388067A2 (en) | 1990-09-19 |
EP0388067A3 (en) | 1991-09-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JPH0636156B2 (en) | Voice recognizer | |
JP4141495B2 (en) | Method and apparatus for speech recognition using optimized partial probability mixture sharing | |
Digalakis et al. | Genones: Generalized mixture tying in continuous hidden Markov model-based speech recognizers | |
US5865626A (en) | Multi-dialect speech recognition method and apparatus | |
Hain et al. | New features in the CU-HTK system for transcription of conversational telephone speech | |
US5822728A (en) | Multistage word recognizer based on reliably detected phoneme similarity regions | |
US5930753A (en) | Combining frequency warping and spectral shaping in HMM based speech recognition | |
JP4218982B2 (en) | Audio processing | |
US6026359A (en) | Scheme for model adaptation in pattern recognition based on Taylor expansion | |
AU2013305615B2 (en) | Method and system for selectively biased linear discriminant analysis in automatic speech recognition systems | |
Austin et al. | Speech recognition using segmental neural nets | |
US9280979B2 (en) | Online maximum-likelihood mean and variance normalization for speech recognition | |
KR20050082253A (en) | Speaker clustering method and speaker adaptation method based on model transformation, and apparatus using the same | |
JPH075892A (en) | Voice recognition method | |
US6148284A (en) | Method and apparatus for automatic speech recognition using Markov processes on curves | |
Ranjan et al. | Isolated word recognition using HMM for Maithili dialect | |
CN117043857A (en) | Method, apparatus and computer program product for English pronunciation assessment | |
Ostendorf et al. | The impact of speech recognition on speech synthesis | |
Zavaliagkos et al. | A hybrid continuous speech recognition system using segmental neural nets with hidden Markov models | |
Beulen et al. | Experiments with linear feature extraction in speech recognition. | |
JPH1185186A (en) | Nonspecific speaker acoustic model forming apparatus and speech recognition apparatus | |
Ranjan et al. | Acoustic Feature Extraction and Isolated Word Recognition of Speech Signal Using HMM for Different Dialects | |
Sathiarekha et al. | A survey on the evolution of various voice conversion techniques | |
Debyeche et al. | A new vector quantization approach for discrete HMM speech recognition system | |
JP2003271185A (en) | Device and method for preparing information for voice recognition, device and method for recognizing voice, information preparation program for voice recognition, recording medium recorded with the program, voice recognition program and recording medium recorded with the program |