JPH11282486A

JPH11282486A - Sub word type unspecified speaker voice recognition device and method

Info

Publication number: JPH11282486A
Application number: JP10087069A
Authority: JP
Inventors: Shinichi Tanaka; 信一田中
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1998-03-31
Filing date: 1998-03-31
Publication date: 1999-10-15
Anticipated expiration: 2018-03-31
Also published as: JP3790038B2

Abstract

PROBLEM TO BE SOLVED: To perform word registration usable by an unspecified speaker with the easiness of the same degree as a specified speaker voice recognition system. SOLUTION: At the time of a word registration mode, an input voice is inputted to a partial word sequence generation part 12 by a mode switching part 11. The partial word sequence generation part 12 converts the inputted voice to the sequence of at least one partial word and registers information corresponding to the partial word sequence to a user registration word dictionary 13. At the time of a voice recognition mode, the input voice is inputted to a main voice recognition part 14 by the mode switching part 11. The main voice recognition part 14 obtains a word voice model in which partial word voice models are connected together from the information corresponding to the respective partial word sequences registered in the user registration word dictionary 13 and recognizes the input voice by using the obtained word voice model.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、使用者が容易かつ
適切に単語を追加登録することの可能なサブワード型不
特定話者音声認識装及び方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a subword-type speaker-independent speech recognition apparatus and method that allows a user to easily and appropriately register additional words.

【０００２】[0002]

【従来の技術】音声認識の技術は、優れたマン・マシン
・インタフェースを実現する上での重要な役割を担って
いる。2. Description of the Related Art Speech recognition technology plays an important role in realizing an excellent man-machine interface.

【０００３】初期には、特定話者音声認識装置を用いて
いた。特定話者音声認識装置は、使用前に使用者が音声
入力をしたい単語を数回ずつ発声し、使用者の単語音声
をもとに照合用の単語音声モデルを音声装置内部に登録
する。実際に音声入力を行う際は、使用者が発した入力
音声を、装置内部に登録された単語音声モデルと照合
し、照合の度合が最も良かった単語を認識結果とする
（文献：正井、新田、上原，“微分−直交化フィルタ法
を用いた特定話者単語音声認識装置の開発”，日本音響
学会昭和63年度秋季研究発表会講演論文集，pp.65-66，
昭和63年10月）。[0003] Initially, a specific speaker voice recognition device was used. The specific-speaker voice recognition device utters a word that the user wants to input voice several times before use, and registers a word-speech model for verification in the voice device based on the word voice of the user. When actually inputting voice, the input voice uttered by the user is checked against a word-speech model registered in the device, and the word having the highest matching level is determined as a recognition result (Literature: Masai, Shin Ta, Uehara, "Development of Specific Speaker Word Speech Recognition System Using Differential-Orthogonalization Filter Method", Proc. Of the Autumn Meeting of the Acoustical Society of Japan, 1988, pp.65-66,
October 1988).

【０００４】このような装置では、装置内部に持つ単語
音声モデルは、登録時の使用者の声に特化しているた
め、登録した使用者以外の音声は認識できないか、認識
性能が著しく低下する。異なる使用者が装置を使うため
には、繁雑な音声登録の作業を再度行う必要があった。[0004] In such a device, since the word voice model inside the device is specialized for the voice of the user at the time of registration, voices other than the registered user cannot be recognized or the recognition performance is significantly reduced. . In order for a different user to use the device, it was necessary to perform a complicated voice registration operation again.

【０００５】特定話者音声認識装置では、複数の使用者
が交代して利用する場合、非常に不便である。更に、街
頭に設置される自動販売機等の装置では、使用者の音声
を登録することができないため、このような音声認識装
置では対応できない。[0005] The specific-speaker speech recognition device is very inconvenient when a plurality of users alternately use the same. In addition, a device such as a vending machine installed on a street cannot register a user's voice, and thus cannot be handled by such a voice recognition device.

【０００６】そのため、不特定話者認識装置が用いられ
るようになった。初期の不特定話者音声認識装置では、
まず装置に音声入力したい単語について多数の話者（典
型的な例では１００人以上）が発声した単語音声を収集
する。これらの単語音声から、単語音声モデルを生成
し、それを装置内部に登録する。音声入力を行う際は、
使用者が発した入力音声を、装置内部に登録された単語
音声モデルと照合し、照合の度合が最も良かった単語を
認識結果とする（文献：松浦、新田，“ＳＭＱ／ＨＭＭ
方式に基づく不特定話者大語い単語認識：松浦、新田，
電子情報通信学会論文誌 D-II vol.J76-D-II No.12，p
p.2486-2494，1993年12月）。For this reason, an unspecified speaker recognition device has been used. In the early unspecified speaker speech recognition device,
First, word voices spoken by a large number of speakers (typically 100 or more) for words to be input to the device are collected. A word voice model is generated from these word voices and registered in the device. When performing voice input,
The input speech emitted by the user is collated with the word speech model registered in the device, and the word having the highest degree of collation is regarded as the recognition result (Reference: Matsuura, Nitta, “SMQ / HMM”
Speaker-Based Vocabulary Word Recognition: Matsuura, Nitta,
IEICE Transactions D-II vol.J76-D-II No.12, p
p.2486-2494, December 1993).

【０００７】このような装置が内部に持つ単語音声モデ
ルは多数の話者が共通して持つ特徴を持っており、特定
の話者の音声には依存していない。したがって、不特定
の話者が発声した音声を認識することができる。[0007] The word-speech model included in such a device has characteristics that many speakers have in common, and does not depend on the speech of a specific speaker. Therefore, a voice uttered by an unspecified speaker can be recognized.

【０００８】しかし、初期の不特定話者音声認識装置で
は、単語毎に多数の話者が発声した音声データを収集す
ることが必要となるため、数単語の追加・変更でさえ、
必要となる労力は非常に大きくなるという問題があっ
た。However, in the initial speaker-independent speech recognition device, it is necessary to collect speech data uttered by a large number of speakers for each word.
There was a problem that the required labor was extremely large.

【０００９】単語音声モデルの学習用話者数が比較的少
数の場合、その小集団が持つ固有の特徴（ある特定地域
・世代でのみ通用する抑揚や音の変形など）も含んだ単
語音声モデルが生成されてしまうことがある。このよう
に学習されてしまった音声モデルでは、不特定話者に対
する認識性能は劣化する。When the number of speakers for learning the word speech model is relatively small, the word speech model also includes the unique characteristics of the small group (such as intonation and sound deformation that are valid only in a certain specific region / generation). May be generated. In the speech model trained in this way, the recognition performance for an unspecified speaker deteriorates.

【００１０】初期の不特定話者音声認識装置では、単語
毎に音声を収集・処理しなければならず、１単語あたり
の学習用話者数をあまり大きくできないために、不特定
話者が発した音声と十分に照合し得る単語音声モデルが
生成できないことがあるという問題もあった。In the initial speaker-independent speech recognition device, speech must be collected and processed for each word, and the number of learning speakers per word cannot be so large. There is also a problem that a word-speech model that can be sufficiently matched with the generated speech may not be generated.

【００１１】そこで近年では、単語音声モデルを音響的
に意味を持つ部分単語音声モデル（部分単語として主に
音韻や音節などが用いられる）を単位として認識装置内
部に保持し、認識しようとする単語の単語音声モデル
は、部分単語音声モデルを連結して生成し、それと入力
音声とを照合する方法（以下、サブワード型不特定話者
音声認識方法と呼ぶ）がとられるようになった（文献：
マーク・プンサック、新田，“Comparison of Context
Dependent Sub-word HMMs for Japanese”，電子情報通
信学会技術研究報告 vol.93 No.364，pp.63-70，1993年
12月）。Therefore, in recent years, a word-speech model is held in a recognition device in units of a partial-word speech model having acoustically meaning (phonemes and syllables are mainly used as partial words), and a word to be recognized is recognized. Has been developed by connecting partial word speech models to each other, generating a collated partial speech model, and comparing the generated speech model with an input speech (hereinafter, referred to as a subword type unspecified speaker speech recognition method) (Reference:
Mark Punsak, Nitta, “Comparison of Context
Dependent Sub-word HMMs for Japanese ”, IEICE Technical Report vol.93 No.364, pp.63-70, 1993
December).

【００１２】以下にサブワード型不特定話者音声認識装
置の構成を図４４を参照して説明する。The configuration of the sub-word type speaker-independent speech recognition apparatus will be described below with reference to FIG.

【００１３】まず、主音声認識部４４０は、音響分析部
４４１、量子化部４４２、ＨＭＭ認識部４４３から構成
される。First, the main speech recognition section 440 comprises an acoustic analysis section 441, a quantization section 442, and an HMM recognition section 443.

【００１４】音響分析部４４１は、入力される音声信号
を、例えばＬＰＣ（Linear Predictive Coding）分析し
て、入力音声の特徴パラメータを求める。The acoustic analyzer 441 analyzes the input audio signal, for example, by LPC (Linear Predictive Coding), and obtains the characteristic parameters of the input audio signal.

【００１５】量子化部４４２は、音響分析部４４１で入
力音声を音響分析して得られた特徴パラメータを、統計
的量子化により音声セグメントを表すラベル系列に変換
する。The quantization unit 442 converts the characteristic parameters obtained by acoustically analyzing the input speech by the acoustic analysis unit 441 into a label sequence representing a speech segment by statistical quantization.

【００１６】ＨＭＭ認識部（ＨＭＭ照合部）４４３は、
単語ＨＭＭ辞書４５０に格納されている単語ＨＭＭのそ
れぞれが入力音声に対応したラベル系列を生成する確率
を計算し、最大の確率でラベル系列を出力する単語を認
識結果として出力する。The HMM recognition unit (HMM collation unit) 443 is
The probability that each of the word HMMs stored in the word HMM dictionary 450 generates a label sequence corresponding to the input speech is calculated, and the word that outputs the label sequence with the maximum probability is output as a recognition result.

【００１７】ここで、本実施例で用いる離散ＨＭＭ（Hi
dden Markov Model ；隠れマルコフモデル）について説
明する。Here, the discrete HMM (Hi
dden Markov Model (Hidden Markov Model) will be described.

【００１８】ＨＭＭは状態と遷移からなり、ある状態か
らある状態へと遷移する際に１つのラベルが出力され
る。ある状態から他の状態に遷移する確率が状態毎に定
義されており、更に遷移毎に各ラベルが出力される確率
が定義されている。The HMM is composed of states and transitions, and one label is output when transitioning from a certain state to a certain state. The probability of transition from one state to another state is defined for each state, and the probability that each label is output for each transition is defined.

【００１９】実際には、ＨＭＭは次の６つのパラメータ
から定義される。In practice, the HMM is defined by the following six parameters.

【００２０】Ｎ^x ：部分単語ｘを表すＨＭＭの状態数（状態Ｓ(1) ，Ｓ(2) ，…，Ｓ(N) ） Κ ：ラベル数（ラベルＲ＝１，２，…，Κ）ｐ^x(i,j) ：部分単語ｘを表すＨＭＭの遷移確率（Ｓ(i) からＳ(j) に遷移する確率）ｑ^x(i,j,k) ：部分単語ｘを表すＨＭＭのＳ(i) からＳ(j) への遷移の際にラベルｋを出力する確率ｍ^x(i) ：部分単語ｘを表すＨＭＭの初期状態確率（Ｓ(i) が初期状態になる確率）Ｆ^x ：部分単語ｘを表すＨＭＭの最終状態になり得る状態の集合上記ＨＭＭには、音声の特徴を反映した遷移上の制限が
ある。音声では、一般的に状態Ｓ(i) から以前通過した
状態Ｓ(i-1) ，Ｓ(i-2) に戻るようなループの遷移は時
間的前後関係を乱すため許されない。図４５に３状態２
ループの離散ＨＭＭを示す。ここで、最終状態Ｓ(N) 、
すなわちＳ(3) は照合に寄与しない。N ^x : Number of states of HMM representing partial word x (states S (1), S (2),..., S (N)) ：: Number of labels (label R = 1, 2,..., Κ) p ^x (i, j): transition probability of HMM representing partial word x (probability of transition from S (i) to S (j)) q ^x (i, j, k): SMM of HMM representing partial word x Probability of outputting label k at the time of transition from (i) to S (j) ^mx (i): Probability of initial state of HMM representing partial word x (probability that S (i) becomes initial state) F ^x : a set of states that can be the final state of the HMM representing the partial word x The HMM has a transition restriction reflecting the characteristics of speech. In speech, in general, a loop transition that returns from the state S (i) to the states S (i-1) and S (i-2) previously passed is not allowed because the temporal context is disturbed. FIG. 45 shows three states 2
3 shows a discrete HMM of a loop. Here, the final state S (N),
That is, S (3) does not contribute to collation.

【００２１】上述したように、ＨＭＭとラベル系列との
照合は、ＨＭＭがラベル系列を出力する確率もしくは確
率の対数値を計算することで行われるが、実際の装置で
はより高速に実行できるビタビ（viterbi ）アルゴリズ
ムによって計算される値（ビタビスコア）で代用するこ
とが多い。As described above, the collation between the HMM and the label sequence is performed by calculating the probability or the logarithm of the probability that the HMM outputs the label sequence. However, in an actual apparatus, Viterbi ( viterbi) A value calculated by an algorithm (Viterbi score) is often substituted.

【００２２】ビタビスコアは、入力ラベル系列を最も高
い確率で出力するように状態遷移が起きたときに、ラベ
ル系列が出力される確率の対数値である。The Viterbi score is a logarithmic value of a probability that a label sequence is output when a state transition occurs so as to output an input label sequence with the highest probability.

【００２３】入力ラベル系列をＹ＝ｙ(1) ，ｙ(2) ，
…，ｙ(L) とした場合、ビタビスコアは次のように計算
できる。The input label sequence is represented by Y = y (1), y (2),
.., Y (L), the Viterbi score can be calculated as follows.

【００２４】ビタビスコアを計算するために、配列Ｄ
(T,M) を使用する。To calculate the Viterbi score, the array D
Use (T, M).

【００２５】（１）Ｄ(0,1〜N)、つまりＤ(0,1 )〜Ｄ
(0,N) を初期状態確率の対数値で初期化する。即ち、Ｄ
(0,1) ＝ ln ｍ₁〜Ｄ(0,N) ＝ ln ｍ_Nとする。図のよ
うな構成のＨＭＭの場合、Ｄ(0,1) ＝０，Ｄ(0,2〜N)＝
−∞となる。さらにＤ(0〜T,0)＝−∞に初期化する。(1) D (0,1 to N), that is, D (0,1) to D
Initialize (0, N) with the logarithmic value of the initial state probability. That is, D
Let (0,1) = ln m _{1 to} D (0, N) = ln m _N. In the case of the HMM having the configuration shown in the figure, D (0,1) = 0, D (0,2 to N) =
−∞ Further, it is initialized to D (0 to T, 0) = − ∞.

【００２６】（２）ｔを１からＴまで１ずつ増加させ
ながら、（３）〜（４）を繰り返す。(2) Steps (3) and (4) are repeated while increasing t by 1 from 1 to T.

【００２７】（３）ｎを１からＮまで１ずつ増加させ
ながら、（４）を繰り返す。(3) Repeat (4) while increasing n by 1 from 1 to N.

【００２８】（４）ｄ₁＝Ｄ(t-1,n-1) ＋ ln ｐ(n-
1,n) ＋ ln ｑ(n-1,n,y(t))と、ｄ₂＝Ｄ（t-1,n) + ln
ｐ(n,n) ＋ ln ｑ(n,n,y(t))とを計算し、Ｄ(t,n) に大
きい方の値を代入する。(4) d ₁ = D (t-1, n-1) + ln p (n-
1, n) + ln q and (n-1, n, y (t)), d 2 = D (t-1, n) + ln
Calculate p (n, n) + lnq (n, n, y (t)) and substitute the larger value for D (t, n).

【００２９】（５）Ｄ(T,N) に求めたいビタビスコア
が得られる。(5) The desired Viterbi score can be obtained from D (T, N).

【００３０】部分単語ＨＭＭは、部分単語毎に作成した
離散ＨＭＭである。ここでは部分単語の単位として音韻
を用い、２ループ３状態の離散ＨＭＭでモデル化したも
のとする。The partial word HMM is a discrete HMM created for each partial word. Here, it is assumed that a phoneme is used as a unit of a partial word, and that the model is modeled by a discrete HMM having two loops and three states.

【００３１】部分単語ＨＭＭは部分単語ＨＭＭ辞書４６
０に登録されている。部分単語ＨＭＭ辞書４６０に登録
される部分単語ＨＭＭの記憶形式（登録形式）の一例を
図４６に示す。この例では、部分単語ＨＭＭ（のパラメ
ータ）は、部分単語モデル名と対をなして登録されてい
る。The partial word HMM is a partial word HMM dictionary 46
0 is registered. An example of the storage format (registration format) of the partial word HMM registered in the partial word HMM dictionary 460 is shown in FIG. In this example, (the parameter of) the partial word HMM is registered as a pair with the partial word model name.

【００３２】単語ＨＭＭは、部分単語ＨＭＭ辞書４６０
に登録されている部分単語ＨＭＭを単語の読みに従って
連結することで作成することができる。この際、各部分
単語ＨＭＭの最終状態Ｓ(N) は直後に連結する後続の部
分単語ＨＭＭのＳ(1) と重ね合わされる。例えば、単語
「おとな」を部分単語で表すと「ｏ，ｔ，ｏ，ｎ，ａ」
になるので、相当する単語ＨＭＭは図４７のようにな
る。The word HMM is a partial word HMM dictionary 460
Can be created by linking the partial words HMM registered in the. At this time, the final state S (N) of each partial word HMM is superimposed on S (1) of the succeeding partial word HMM connected immediately thereafter. For example, when the word "adult" is represented by a partial word, "o, t, o, n, a"
Therefore, the corresponding word HMM is as shown in FIG.

【００３３】単語ＨＭＭは単語ＨＭＭ辞書４５０に登録
される。この単語ＨＭＭ辞書４５０に登録される単語Ｈ
ＭＭの記憶形式の一例を図４８に示す。この例では、上
述のようにして構成された単語ＨＭＭのパラメータが、
単語名と対をなして記憶されている。The word HMM is registered in the word HMM dictionary 450. Word H registered in this word HMM dictionary 450
FIG. 48 shows an example of the storage format of the MM. In this example, the parameters of the word HMM configured as described above are
It is stored in pairs with word names.

【００３４】なお、単語ＨＭＭを単語毎に構成し、その
パラメータを記憶するのではなく、単語を構成する部分
単語ＨＭＭ名を記憶しておき、照合時に部分単語ＨＭＭ
辞書４６０を参照して単語ＨＭＭを構成し、それから照
合を行う装置構成もある。更に、ひらがな等で記述され
る読みを記憶しておき、照合時に部分単語名に変換後、
部分単語ＨＭＭ辞書４６０を参照して単語ＨＭＭを構成
し、それから照合を行う装置構成もある。It should be noted that the word HMM is not composed for each word and its parameters are stored, but the partial word HMM names constituting the word are stored, and the partial word HMM
There is also a device configuration in which a word HMM is constructed with reference to the dictionary 460 and then collated. Furthermore, readings written in hiragana or the like are stored, and are converted to partial word names during collation,
There is also a device configuration in which a word HMM is constructed with reference to the partial word HMM dictionary 460 and then collation is performed.

【００３５】ＨＭＭ認識部４４３は単語ＨＭＭ辞書４５
０に登録されている単語ＨＭＭのそれぞれのパラメータ
を用いて、入力ラベル系列に対するビタビスコアを単語
毎に計算する。そして、最大のビタビスコアを持つ単語
を認識結果として出力する。The HMM recognizing unit 443 stores the word HMM dictionary 45
The Viterbi score for the input label sequence is calculated for each word using the respective parameters of the word HMM registered in 0. Then, a word having the largest Viterbi score is output as a recognition result.

【００３６】本方式では、単語の読みを入力することに
よって、音声入力可能な単語を追加・変更することがで
きるため、初期の不特定話者音声認識方法と比較して、
その手間は大きく軽減される。In the present system, by inputting the reading of a word, words that can be input by voice can be added or changed. Therefore, compared with the initial unspecified speaker voice recognition method,
The trouble is greatly reduced.

【００３７】また、必要となる部分単語音声モデルは認
識語彙の異なる装置間で共通に用いることができるた
め、非常に多くの学習用話者が発声した音声データから
生成することが可能となる。したがって、不特定話者が
発声した音声を認識するのにより適した単語音声モデル
が生成できる。Further, since the necessary partial word speech model can be used in common between devices having different recognition vocabularies, it can be generated from speech data uttered by a large number of learning speakers. Therefore, a word voice model more suitable for recognizing a voice uttered by an unspecified speaker can be generated.

【００３８】[0038]

【発明が解決しようとする課題】上記したサブワード型
不特定話者音声認識方式では、認識単語を使って登録、
変更する場合、使用者が単語の読みを表す音韻系列もし
くは平仮名列を入力しなければならないが、文字入力手
段と音声に対する専門知識がない場合、これが困難であ
るという問題点がある。In the above-mentioned sub-word type speaker-independent speech recognition system, registration and recognition are performed using recognition words.
When changing, the user has to input a phoneme sequence or a hiragana string representing the reading of a word, but this is difficult if there is no specialized knowledge about character input means and voice.

【００３９】つまり、音声認識装置に登録したい読み方
（発声の仕方）と、それを表現する記号との対応を登録
作業者が熟知している必要がある。単語の読みを平仮名
で行う場合は、比較的対応が分かりやすいが、この場合
でも長音化の有無（「とけい」という平仮名列は／ｔｏ
ｋｅｉ／という発声に対応するのか、それとも／ｔｏｋ
ｅ：／に対応するのか）などを装置がどのように解釈す
るのかに関する知識は依然として必要である。That is, the registration operator needs to be familiar with the correspondence between the reading method (speech method) desired to be registered in the speech recognition device and the symbol expressing the reading method. When words are read in hiragana, the correspondence is relatively easy to understand, but even in this case, the presence or absence of prolonged sound (the hiragana string "tokei" is / to
Does it correspond to the utterance of "kei /" or "/ tok"
Knowledge of how the device interprets e./e./ etc.) is still needed.

【００４０】また、携帯機器や車載用機器などの場合、
文字入力装置を取り付けることが困難な場合がある。こ
のような場合でも、画面上に全音韻（或いは平仮名）を
表示し１文字ずつ選択する方法、文字認識技術を利用す
る方法などを用いることで読み方を入力することは可能
となるが、使用者は繁雑な操作を要求され、必ずしも使
い勝手が良いとは言えない。In the case of a portable device or a vehicle-mounted device,
It may be difficult to attach a character input device. Even in such a case, it is possible to input the reading method by using a method of displaying the whole phoneme (or hiragana) on the screen and selecting one character at a time, a method using character recognition technology, or the like. Requires complicated operations and is not always convenient.

【００４１】一方、特定話者認識方式では、登録したい
単語を数回発声するだけで、使用者独自の単語を登録す
ることが可能である。つまり、音韻記号、平仮名と発音
との対応に熟知している必要はなく、文字入力手段も必
要としない。したがって、認識単語の追加、変更は容易
に行うことができた。しかし、使用者は装置を入手後
に、まず認識すべき単語を全て自分で登録しなければな
らない欠点があった。それに対して、不特定話者認識方
式を用いた装置では、入力に用いられることが多いと考
えられる単語を装置に予め組み込んでおくことが可能な
ため、使用者は自分で特殊な呼称をしたい単語について
のみ単語登録をするだけで装置を使用し始めることがで
きる。On the other hand, in the specific speaker recognition system, it is possible to register a user-specific word only by saying a word to be registered several times. In other words, there is no need to be familiar with the correspondence between phonological symbols, hiragana and pronunciation, and no character input means is required. Therefore, addition and change of the recognition word could be easily performed. However, there is a drawback in that the user must first register all words to be recognized after obtaining the device. On the other hand, in a device using the unspecified speaker recognition method, it is possible to incorporate words considered to be frequently used for input into the device in advance, so that the user wants to make a special name by himself / herself. The device can be started to be used only by registering words only for words.

【００４２】そこで、上記両方式の欠点を解消するため
に、入力音声に対して不特定話者認識と特定話者認識を
同時に用いて照合を行い、両者の認識結果のうち、より
尤度（認識の確からしさ）の大きい方を最終的な認識結
果とする方式が考えられる。この方式では、入力に用い
られることが多いと考えられる単語は、装置を設計・製
造する段階で予め不特定話者認識部に登録しておき、使
用者が追加・変更する単語は特定話者認識部に登録す
る。このような構成にすることによって、使用者は自分
が追加・変更したい単語のみを登録すればよく、しかも
登録時に記号入力を必要としない。Therefore, in order to eliminate the drawbacks of the above two methods, collation is performed by using unspecified speaker recognition and specific speaker recognition at the same time on the input speech, and the likelihood ( There is a method in which the one with the higher degree of certainty of recognition) is used as the final recognition result. In this method, words that are considered to be frequently used for input are registered in advance in the unspecified speaker recognition unit at the stage of designing and manufacturing the device, and words added or changed by the user are specified speakers. Register in the recognition unit. With such a configuration, the user only needs to register words that he or she wants to add or change, and does not need to input symbols at the time of registration.

【００４３】しかし、不特定話者認識方式と特定話者認
識方式の両者を装置に組み込まなければならないため、
装置が複雑になってしまう。また、異なる方式を併用す
ることから、それぞれの尤度の尺度も異なっており、両
者を比較するには補正が必要となる。しかし、常に使う
ことのできる補正方法を決定することが困難である。更
に、特定話者認識用に登録した単語は登録者の音声に強
く依存しており、登録者以外の人間が同じ単語を発声し
た場合に正しく照合できないという特定話者認識方式の
欠点は依然として克服できない。However, since both the unspecified speaker recognition method and the specific speaker recognition method must be incorporated in the apparatus,
The device becomes complicated. In addition, since different methods are used in combination, the respective likelihood scales are different, and a correction is required to compare the two. However, it is difficult to determine a correction method that can always be used. Furthermore, the words registered for specific speaker recognition strongly depend on the voice of the registrant, and the disadvantage of the specific speaker recognition method in which a person other than the registrant cannot correctly verify when the same word is uttered is still overcome. Can not.

【００４４】また、サブワード型不特定話者音声認識装
置で用いる辞書は、音素、音韻等で表される読み情報に
従って一般的な部分単語モデルを連結して単語を生成し
ているため、使用者になまりがある等して一部の部分単
語の発音が一般的ではない場合、音声認識の精度が低下
する。したがって、なまり等のある使用者は、常に（そ
の話者に対する）認識精度の低い音声認識装置を使用し
続けることを余儀なくされる問題がある。The dictionary used in the sub-word type speaker-independent speech recognition apparatus generates words by connecting general partial word models in accordance with reading information represented by phonemes, phonemes, and the like. When pronunciation of some partial words is unusual due to dullness or the like, the accuracy of speech recognition is reduced. Therefore, there is a problem that a user having a dullness or the like is forced to always use a speech recognition device having low recognition accuracy (for the speaker).

【００４５】本発明は、上記のような実情を考慮してな
されたもので、その目的は、特定話者音声認識方式と同
程度の容易さで単語登録を行うことができるサブワード
型不特定話者音声認識装置及び方法を提供することにあ
る。The present invention has been made in consideration of the above-described circumstances, and has as its object to provide a subword-type unspecified speech that can perform word registration as easily as a specific speaker speech recognition method. It is an object of the present invention to provide a person voice recognition apparatus and method.

【００４６】本発明の他の目的は、使用者の発音に応じ
て単語辞書を更新することができるサブワード型不特定
話者音声認識装置及び方法を提供することにある。Another object of the present invention is to provide a subword-type speaker-independent speech recognition apparatus and method capable of updating a word dictionary according to pronunciation of a user.

【００４７】本発明の更に他の目的は、使用者の発音に
応じた単語辞書の登録において誤った登録が行われるの
を防止できるサブワード型不特定話者音声認識装置及び
方法を提供することにある。Still another object of the present invention is to provide an apparatus and method for recognizing a subword-type unspecified speaker that can prevent erroneous registration in registration of a word dictionary according to pronunciation of a user. is there.

【００４８】本発明の更に他の目的は、構成の簡略化が
図れるサブワード型不特定話者音声認識装置及び方法を
提供することにある。Still another object of the present invention is to provide an apparatus and method for recognizing a subword-type unspecified speaker whose structure can be simplified.

【００４９】本発明の更に他の目的は、使用者の音声に
逐次適応して認識精度の向上が図れるサブワード型不特
定話者音声認識装置及び方法を提供することにある。It is still another object of the present invention to provide a subword-type speaker-independent speech recognition apparatus and method capable of sequentially adapting to a user's speech to improve recognition accuracy.

【００５０】[0050]

【課題を解決するための手段】本発明の第１の観点に係
る構成は、入力された音声を少なくとも１個の部分単語
の系列に変換する部分単語系列生成手段と、この部分単
語系列生成手段によって変換された部分単語系列に対応
する情報が登録される使用者登録単語辞書と、この使用
者登録単語辞書に登録されている各部分単語系列に対応
する情報から部分単語音声モデルがつなぎ合わされた単
語音声モデルを取得する単語音声モデル取得手段と、使
用者が発声した音声を、上記使用者登録単語辞書から取
得された単語音声モデルを用いて認識する主音声認識手
段とを備えたことを特徴とする。According to a first aspect of the present invention, there is provided a partial word sequence generating means for converting an input speech into a sequence of at least one partial word, and a partial word sequence generating means. The user-registered word dictionary in which information corresponding to the partial word sequence converted by the above is registered, and the partial-word speech models are connected from the information corresponding to each partial word sequence registered in the user-registered word dictionary. A word voice model obtaining means for obtaining a word voice model; and a main voice recognition means for recognizing a voice uttered by the user using the word voice model obtained from the user registered word dictionary. And

【００５１】このような構成においては、単語登録時に
文字情報による入力を用いずに、音声により入力を用い
ていながら、その入力音声を直接単語音声モデルに変換
して登録するのではなく、一旦音素、音韻などの部分単
語系列に変換して、その部分単語系列に対応する情報を
辞書登録（使用者登録単語辞書に登録）することによ
り、（音声認識モード時に）使用者が発声した音声を認
識する際に、辞書内の部分単語系列に対応する情報から
不特定話者認識用の部分単語音声モデルがつなぎ合わさ
れた単語音声モデルを取得して、その単語音声モデルを
用いて音声認識を行うことができるため、音声で登録し
たにも拘らず、その登録単語は不特定話者が使用し得る
ものとなる。これに対して、従来技術では、音声による
単語登録を適用する場合には、登録された単語は登録を
した話者専用になってしまい、他の話者が使用したとき
の認識性能は非常に悪い。また、文字情報による単語登
録を適用する場合には、文字情報の入力手段を必要とす
ると共に操作が繁雑である。In such a configuration, the input speech is not directly converted into a word-speech model and registered while using the input by voice without using the input by character information at the time of word registration. Recognizes the voice uttered by the user (during speech recognition mode) by converting it to a partial word sequence such as a phoneme and registering the information corresponding to the partial word sequence in a dictionary (registering it in the user registration word dictionary) A word speech model in which partial word speech models for unspecified speaker recognition are joined from information corresponding to a partial word sequence in the dictionary, and perform speech recognition using the word speech model. Therefore, the registered word can be used by an unspecified speaker despite the fact that the registered word has been registered by voice. On the other hand, in the prior art, when word registration by voice is applied, the registered words are exclusively used for the registered speaker, and the recognition performance when another speaker uses the word is extremely high. bad. In addition, when word registration using character information is applied, an input means for character information is required, and the operation is complicated.

【００５２】ここで、使用者登録単語辞書に登録される
部分単語系列に対応する情報としては、部分単語系列そ
れ自体、或いは部分単語系列を構成する各部分単語に対
応する部分単語音声モデルを予めつなぎ合わせて作成さ
れた単語音声モデルのいずれであっても構わない。Here, as the information corresponding to the partial word sequence registered in the user registration word dictionary, the partial word sequence itself or the partial word speech model corresponding to each partial word constituting the partial word sequence is stored in advance. Any of the word voice models created by joining together may be used.

【００５３】前者の場合には、単語音声モデルを取得す
る手段（単語音声モデル取得手段）には、部分単語系列
に対応する情報（部分単語系列それ自体）から対応する
部分単語音声モデルを連結して単語音声モデルを作成す
ることで、当該単語音声モデルを取得する機能を持たせ
る必要がある。これに対して後者の場合には、単語音声
モデル取得手段には、使用者登録単語辞書から部分単語
系列に対応する情報を取り出す機能を持たせるだけで、
単語音声モデルを取得することが可能となる。但し、入
力音声から変換された部分単語系列から、当該部分単語
系列に対応する情報を生成する際に、部分単語系列から
対応する部分単語音声モデルを連結して単語音声モデル
を作成する必要がある。In the former case, the means for acquiring the word speech model (word speech model acquisition means) connects the corresponding partial word speech model from the information (partial word series itself) corresponding to the partial word series. It is necessary to have a function of acquiring the word voice model by creating the word voice model. On the other hand, in the latter case, the word-speech model acquisition means only has a function of extracting information corresponding to the partial word series from the user registered word dictionary,
It is possible to acquire a word voice model. However, when generating information corresponding to the partial word sequence from the partial word sequence converted from the input voice, it is necessary to create a word voice model by connecting the corresponding partial word voice models from the partial word sequence. .

【００５４】本発明の第２の観点に係る構成は、上記第
１の観点に係る構成に、部分単語系列生成手段によって
変換された部分単語系列が予め定められた登録条件を満
たしているか否かを判定する登録条件判定手段を追加
し、登録条件を満たしていると判定された部分単語系列
だけが使用者登録単語辞書に登録されるようにしたこと
を特徴とする。The configuration according to the second aspect of the present invention is the same as the configuration according to the first aspect, except that the partial word sequence converted by the partial word sequence generating means satisfies a predetermined registration condition. Is added, and only the partial word sequence determined to satisfy the registration condition is registered in the user registered word dictionary.

【００５５】このような構成においては、登録条件を適
切に設定することにより、部分単語系列生成手段での認
識誤りの結果変換出力される部分単語系列、つまり明ら
かに入力音声とは対応しない部分単語系列が使用者登録
単語辞書に登録されて、主音声認識手段の認識性能の低
下を招くのを防止することが可能となる。In such a configuration, by appropriately setting the registration condition, the partial word sequence converted and output as a result of the recognition error in the partial word sequence generation means, that is, the partial word sequence which does not clearly correspond to the input speech It is possible to prevent the sequence from being registered in the user registration word dictionary and causing the recognition performance of the main voice recognition unit to be reduced.

【００５６】ここで登録条件としては、例えば部分単語
系列生成手段によって変換（生成）された部分単語系列
の尤度（認識の確からしさ）との比較により条件成立の
有無が判定されるものが適用可能である。このように、
部分単語系列の尤度を調べて、基準値以下の場合は登録
しないようにすることによって、部分単語系列生成手段
がもっともらしい部分単語系列を出力できなかった（誤
認識した）ときに生成される系列が登録される問題を減
らすことができる。Here, as the registration condition, for example, a condition for determining whether or not the condition is satisfied by comparison with the likelihood (recognition probability) of the partial word sequence converted (generated) by the partial word sequence generating means is applied. It is possible. in this way,
By examining the likelihood of the partial word sequence and not registering the partial word sequence if it is equal to or less than the reference value, the partial word sequence generation means is generated when the plausible partial word sequence cannot be output (misrecognized). The problem that a sequence is registered can be reduced.

【００５７】この他に、１単語当たりの登録可能な部分
単語系列の数に上限値Ｎを設けて、その上限値Ｎを登録
条件の１つに用い、部分単語系列の個数がＮ以下の場合
には、尤度に無関係に全て使用者登録単語辞書に登録
し、Ｎを越えている場合には、尤度の大きい順に上位Ｎ
個を使用者登録単語辞書に登録することも可能である。
このようにすると、メモリ（記憶領域）の制約の大きい
装置では、登録される部分単語系列の個数を制限するこ
とが可能となる。In addition, an upper limit N is set for the number of partial word sequences that can be registered per word, and the upper limit N is used as one of the registration conditions. Are registered in the user registration word dictionary irrespective of the likelihood. If the number exceeds N, the top N
Individuals can be registered in the user registration word dictionary.
In this way, in an apparatus having a large memory (storage area) restriction, the number of registered partial word sequences can be limited.

【００５８】本発明の第３の観点に係る構成は、上記第
１の観点に係る構成に、部分単語系列生成手段によって
変換された全ての部分単語系列について、その部分単語
系列を表す情報を使用者に提示して、その登録の可否に
ついて使用者からの指定を受け付け、その受け付けた指
定内容に応じて対応する部分単語系列の登録の可否を確
認する登録確認手段を追加し、使用者から登録指示がな
された部分単語系列に対応する情報だけが使用者登録単
語辞書に登録されるようにしたことを特徴とする。The structure according to the third aspect of the present invention uses the information according to the structure according to the first aspect, for all the partial word sequences converted by the partial word sequence generating means, using information representing the partial word sequences. To the user, accepts the designation from the user as to whether or not the registration is possible, and adds a registration confirmation means to confirm whether or not the corresponding partial word sequence can be registered according to the accepted specification, and registers from the user. Only information corresponding to the instructed partial word sequence is registered in the user registration word dictionary.

【００５９】このような構成においては、雑音等の影響
により部分単語系列生成手段で認識誤りが発生したにも
拘らず、雑音の種類によって偶然大きな値の尤度が得ら
れるような場合でも、得られた部分単語系列を表す情報
を使用者に提示することで、使用者は部分単語系列生成
手段での認識誤りを確認して、登録の可否を指示できる
ため、誤った部分単語系列の情報が使用者登録単語辞書
に登録されるのを防止できる。In such a configuration, even if a recognition error occurs in the partial word sequence generation means due to the influence of noise or the like, even if a large value likelihood is obtained by chance depending on the type of noise, the value can be obtained. By presenting the information indicating the partial word sequence to the user, the user can confirm the recognition error in the partial word sequence generation means and can instruct whether or not registration is to be performed. It can be prevented from being registered in the user registration word dictionary.

【００６０】本発明の第４の観点に係る構成は、上記第
１の観点に係る構成に、部分単語系列生成手段によって
変換された全ての部分単語系列について、その部分単語
系列を表す情報を使用者に提示し、当該情報を対象とす
る使用者の編集操作を受け付けて、当該情報に対する編
集処理を行い、その編集処理の結果を対応する部分単語
系列に反映させると共に、当該情報に対する登録の可否
について使用者からの指定を受け付けて、その受け付け
た指定内容に応じて対応する部分単語系列の登録の可否
を確認する登録編集手段を追加し、部分単語系列生成手
段により入力音声から生成された部分単語系列の修正等
を可能とすると共に、使用者から登録指示がなされた部
分単語系列に対応する情報だけが使用者登録単語辞書に
登録されるようにしたことを特徴とする。The configuration according to the fourth aspect of the present invention uses the information according to the configuration according to the first aspect, for all the partial word sequences converted by the partial word sequence generating means, using information representing the partial word sequences. To the user, accepts the user's editing operation for the information, performs the editing process on the information, reflects the result of the editing process on the corresponding partial word sequence, and determines whether the information can be registered. A registration / editing unit that accepts a specification from the user and checks whether or not a corresponding partial word sequence can be registered in accordance with the received specification, and a part generated from the input speech by the partial word sequence generation unit. In addition to enabling word series correction, only information corresponding to the partial word series for which registration has been instructed by the user has been registered in the user registration word dictionary. Characterized in that was.

【００６１】このような構成においては、部分単語系列
生成手段で認識誤りがあった場合に、それを確認、修正
した上で登録できるため、再発声することなく登録可能
となる。In such a configuration, if there is a recognition error in the partial word sequence generation means, it can be registered after confirming and correcting it, so that registration can be made without re-uttering.

【００６２】本発明の第５の観点に係る構成は、上記第
１乃至第４の観点に係る構成のいずれかに、使用者登録
単語辞書に登録されている部分単語系列に対応する情報
を文字情報に変換して使用者に提示する使用者単語登録
辞書表示手段を追加したことを特徴とする。According to a fifth aspect of the present invention, there is provided the configuration according to any of the first to fourth aspects, wherein information corresponding to the partial word sequence registered in the user registration word dictionary is written in characters. User word registration dictionary display means for converting the information into information and presenting it to the user is added.

【００６３】このような構成においては、使用者登録単
語辞書に登録された情報を、後日使用者が確認すること
ができる。In such a configuration, the user can check the information registered in the user registered word dictionary at a later date.

【００６４】本発明の第６の観点に係る構成は、上記第
１乃至第４の観点に係る構成のいずれかに、使用者登録
単語辞書に登録されている部分単語系列に対応する情報
を文字情報に変換して使用者に提示し、当該情報を対象
とする使用者の編集操作を受け付けて、当該情報に対す
る編集処理を行い、その編集処理の結果を使用者登録単
語辞書に反映させる使用者単語登録辞書編集手段を追加
したことを特徴とする。According to a sixth aspect of the present invention, there is provided the configuration according to any of the first to fourth aspects, wherein information corresponding to the partial word sequence registered in the user registration word dictionary is written in characters. A user who converts the information into information and presents it to a user, accepts a user's editing operation on the information, performs an editing process on the information, and reflects a result of the editing process on a user registration word dictionary. A feature is that a word registration dictionary editing means is added.

【００６５】このような構成においては、使用者登録単
語辞書に登録された情報を使用者が確認することがで
き、しかも不具合があった場合に訂正することもでき
る。In such a configuration, the user can check the information registered in the user registration word dictionary, and can correct any troubles.

【００６６】本発明の第７の観点に係る構成は、上記第
１乃至第４の観点に係る構成のいずれかに、単語の読み
を表す文字列情報から生成された部分単語系列に対応す
る情報が、使用者登録単語辞書と同一の表現形式で登録
された文字登録単語辞書を追加し、主音声認識手段によ
る認識処理に際しては、使用者登録単語辞書に登録され
ている各部分単語系列に対応する情報から部分単語音声
モデルがつなぎ合わされた単語音声モデルを取得すると
共に、文字登録単語辞書に登録されている各部分単語系
列に対応する情報からも部分単語音声モデルがつなぎ合
わされた単語音声モデルを取得し、これらの各単語音声
モデルを用いて、使用者が発声した音声を認識するよう
にしたことを特徴とする。According to a seventh aspect of the present invention, there is provided the configuration according to any one of the first to fourth aspects, wherein the information corresponding to the partial word series generated from the character string information representing the reading of the word is provided. Adds a character registered word dictionary registered in the same expression format as the user registered word dictionary, and corresponds to each partial word sequence registered in the user registered word dictionary during recognition processing by the main voice recognition means. The word speech model in which the partial word speech models are joined from the information to be obtained is obtained, and the word speech model in which the partial word speech models are joined is also obtained from the information corresponding to each partial word sequence registered in the character registration word dictionary. It is characterized in that a voice uttered by the user is recognized using the acquired word voice models.

【００６７】このような構成においては、使用者登録単
語辞書、及び文字登録単語辞書がそれぞれ異なる方法で
単語登録がなされるにも拘らず、部分単語系列に対応す
る情報の表現形式（登録形式）を一致させたことで、１
つの認識方式のみで両者を同時に用いて音声認識を行う
ことが可能となり、装置の構成の簡略化が図れる。ここ
で、文字登録単語辞書は予め認識単語が登録された状態
で装置に予め組み込まれているものであっても、着脱可
能な記録媒体に記憶された状態で提供されるものであっ
ても構わない。また、同じ装置内で、キーボード等の文
字入力手段から入力される単語の読み情報をもとに登録
されるものであっても構わない。In such a configuration, although the user registration word dictionary and the character registration word dictionary are registered in different ways, the expression format (registration format) of the information corresponding to the partial word sequence By matching
It is possible to perform voice recognition by using both of them at the same time with only one recognition method, and the configuration of the device can be simplified. Here, the character registration word dictionary may be one that is pre-installed in the device in a state where recognition words are registered in advance, or one that is provided in a state stored in a removable recording medium. Absent. Also, the information may be registered in the same device based on word reading information input from character input means such as a keyboard.

【００６８】本発明の第８の観点に係る構成は、上記第
１乃至第４の観点に係る構成のいずれかにおける部分単
語系列生成手段が、単語登録モード時だけでなく、音声
認識モード時にも、使用者が発声した音声を認識して少
なくとも１個の部分単語の系列を生成する構成とすると
共に、次のような使用時単語登録判定手段、即ち音声認
識モード時に、部分単語系列生成手段により生成された
部分単語系列の尤度、主音声認識手段の認識結果、及び
当該認識結果の尤度の少なくとも１つをもとに、部分単
語系列生成手段により生成された部分単語系列の登録の
可否を判定し、その判定結果に応じて当該部分単語系列
の情報を使用者登録単語辞書に追加登録する使用時単語
登録判定手段を新たに設けたことを特徴とする。The configuration according to the eighth aspect of the present invention is characterized in that the partial word sequence generating means in any one of the configurations according to the first to fourth aspects can be used not only in the word registration mode but also in the speech recognition mode. In addition to the configuration in which a voice uttered by the user is recognized to generate at least one partial word sequence, the following in-use word registration determining means, that is, in the voice recognition mode, the partial word sequence generating means Whether the partial word sequence generated by the partial word sequence generation unit can be registered based on at least one of the likelihood of the generated partial word sequence, the recognition result of the main speech recognition unit, and the likelihood of the recognition result And a new word-in-use determining means for additionally registering the information of the partial word series in the user-registered word dictionary according to the result of the determination is provided.

【００６９】このような構成においては、使用者が特殊
な発声をしたために（例：なまりが強い）、主音声認識
手段の認識精度が低下して認識結果の尤度も低下する傾
向にある場合でも、主音声認識手段による認識処理と並
行して行われる部分単語系列生成手段での認識処理で生
成される部分単語系列の情報を、使用時単語登録判定手
段の判定によって使用者登録単語辞書に登録し、次回か
らはそれも用いて認識が行えるようにすることで、主音
声認識手段の認識精度を高めることができる。In such a configuration, when the user makes a special utterance (for example, strong rounding), the recognition accuracy of the main voice recognition means tends to decrease, and the likelihood of the recognition result tends to decrease. However, the information of the partial word sequence generated by the recognition processing by the partial word sequence generation means performed in parallel with the recognition processing by the main voice recognition means is stored in the user registration word dictionary by the use word registration determination means. By registering and using it again from the next time, recognition accuracy of the main voice recognition means can be improved.

【００７０】ここで、部分単語系列の登録の可否を判定
するのに、部分単語系列の尤度が基準値より大きいもの
だけを登録可とする第１の方式、部分単語系列の尤度を
認識結果の尤度と比較し、認識結果の尤度より基準値以
上大きい尤度の部分単語系列だけを登録可とする第２の
方式、認識結果の尤度が基準値より小さい場合に、全て
の部分単語系列、或いは尤度が上位の一定個数を上限と
する部分単語系列を登録可とする第３の方式、使用者が
発声した単語が既知であり、その単語と認識結果とが一
致しない場合に、部分単語系列生成手段での認識処理で
生成される部分単語系列を登録可とする第４の方式、こ
の第４の方式に対象となる部分単語系列の尤度を判定条
件に加え、例えば使用者が発声した単語と認識結果とが
一致せず、且つ部分単語系列の尤度が基準値より大きい
場合に登録可とする第５の方式などが適用可能である。
なお、上記第４または第５の方式を適用可能とするため
には、特別のモード（適応モード）を用意すると共に、
そのモードでは、装置から使用者に単語を提示して（提
示する単語は、使用者が指定するものであっても構わな
い）、その単語を使用者に発声させるインタフェース機
能を設ければよい。Here, in order to determine whether a partial word sequence can be registered or not, a first method in which only a partial word sequence having a likelihood larger than a reference value can be registered, the likelihood of a partial word sequence is recognized A second method in which only a partial word sequence having a likelihood greater than a reference value by a reference value or more than the likelihood of a recognition result can be registered in comparison with the likelihood of the result. A third method in which a partial word sequence or a partial word sequence having a certain number of higher likelihoods as an upper limit can be registered. A word uttered by a user is known, and the word does not match the recognition result. In the fourth method, a partial word sequence generated by the recognition processing in the partial word sequence generating means can be registered, and the likelihood of the partial word sequence to be added to the fourth method is added to the determination condition. If the word uttered by the user does not match the recognition result, Including a fifth scheme likelihood of word sequence is to be registered is greater than the reference value can be applied.
In order to make the above fourth or fifth method applicable, a special mode (adaptive mode) is prepared and
In this mode, an interface function may be provided in which a word is presented from the device to the user (the presented word may be specified by the user) and the word is uttered by the user.

【００７１】これにより、使用者の音声に逐次適応する
音声認識装置を実現することができる。As a result, it is possible to realize a speech recognition device that sequentially adapts to the user's speech.

【００７２】本発明の第９の観点に係る構成は、上記第
１乃至第４の観点に係る構成のいずれかにおける部分単
語系列生成手段が、単語登録モード時だけでなく、音声
認識モード時にも、使用者が発声した音声を認識して少
なくとも１個の部分単語の系列を生成する構成とすると
共に、次のような使用時単語登録確認手段、即ち主音声
認識手段による認識結果出力時に使用者からの部分単語
系列登録指示の受け付けを行い、当該登録指示を受け付
けた際には、部分単語系列生成手段によって生成され且
つ登録指示された部分単語系列に対応する情報を使用者
登録単語辞書に追加登録する使用時単語登録確認手段を
新たに設けたことを特徴とする。The configuration according to the ninth aspect of the present invention is characterized in that the partial word sequence generation means in any of the configurations according to the first to fourth aspects does not only operate in the word registration mode but also in the speech recognition mode. In addition to the configuration of generating a sequence of at least one partial word by recognizing a voice uttered by a user, the user uses the following word registration confirmation means at the time of use, that is, when the recognition result is output by the main voice recognition means. , And when the registration instruction is received, information generated by the partial word sequence generation unit and corresponding to the registered partial word sequence is added to the user registration word dictionary. It is characterized in that a use-time word registration confirmation means for registration is newly provided.

【００７３】このような構成においては、使用者が特殊
な発声をしたために（例：なまりが強い）、主音声認識
手段の認識精度が低下して認識結果の尤度も低下し、認
識結果が誤っている場合でも、主音声認識手段による認
識処理と並行して行われる部分単語系列生成手段での認
識処理で生成される部分単語系列の情報を、使用者から
の主音声認識手段の認識結果を考慮した登録指示に従っ
て使用者登録単語辞書に登録し、次回からはそれも用い
て認識が行えるようにすることで、主音声認識手段の認
識精度を高めることが可能となる。これにより、使用者
の音声に逐次適応する音声認識装置を実現することがで
きる。In such a configuration, since the user makes a special utterance (eg, strong rounding), the recognition accuracy of the main voice recognition means is reduced, the likelihood of the recognition result is reduced, and the recognition result is reduced. Even in the case of an error, the information of the partial word sequence generated by the recognition processing by the partial word sequence generation means performed in parallel with the recognition processing by the main voice recognition means can be used as a result of the recognition of the main speech recognition means by the user. Is registered in the user registration word dictionary in accordance with the registration instruction in consideration of the above, and the recognition can be performed using the same from the next time, whereby the recognition accuracy of the main voice recognition means can be improved. As a result, it is possible to realize a speech recognition device that sequentially adapts to the user's speech.

【００７４】なお、上記第８または第９の観点に係る構
成における部分単語系列生成手段に代えて、音声認識モ
ード時には、予め定められた条件が成立した場合だけ、
入力音声に対する認識処理を行って部分単語系列を生成
する部分単語系列を用いると共に、音声認識モード時に
使用者が発声した音声を一時記憶するための入力音声記
憶手段を新たに設け、更に上記使用時単語登録判定手段
に代えて、音声認識モード時に、主音声認識手段の認識
結果の尤度をもとに上記条件の成立の有無を判定し、条
件成立を判定した場合には入力音声記憶手段に記憶され
ている音声を部分単語系列生成手段に入力させて当該部
分単語系列生成手段を動作させ、当該部分単語系列生成
手段により生成される部分単語系列の尤度、主音声認識
手段の認識結果、及び当該認識結果の尤度の少なくとも
１つをもとに、部分単語系列生成手段により生成された
部分単語系列の登録の可否を判定して、その判定結果に
応じて当該部分単語系列の情報を前記使用者登録単語辞
書に追加登録する使用時単語登録判定手段を設けた構成
とすることも可能である。It should be noted that, instead of the partial word sequence generation means in the configuration according to the eighth or ninth aspect, in the voice recognition mode, only when a predetermined condition is satisfied,
In addition to using a partial word sequence for performing a recognition process on an input voice to generate a partial word sequence, input voice storage means for temporarily storing a voice uttered by a user in a voice recognition mode is newly provided. Instead of the word registration determination means, in the voice recognition mode, the presence or absence of the above condition is determined based on the likelihood of the recognition result of the main voice recognition means. The stored speech is input to the partial word sequence generation unit to operate the partial word sequence generation unit, the likelihood of the partial word sequence generated by the partial word sequence generation unit, the recognition result of the main speech recognition unit, And determining whether or not to register the partial word sequence generated by the partial word sequence generation means based on at least one of the likelihoods of the recognition result and the partial unit according to the determination result. It is also possible to adopt a configuration in which the information of the sequence provided when using the word registration decision means for additionally registered in the user registration word dictionary.

【００７５】このような構成においては、主音声認識手
段の認識結果の尤度が上記条件を満たさない場合、例え
ば認識結果の尤度が大きい場合には、部分単語系列生成
手段による認識処理は行われないため、つまり使用者に
よる平均的な発声に対しては部分単語系列生成手段は起
動されないため、高速でないコンピュータで、部分単語
系列生成手段の機能等を実現するのに適している。In such a configuration, when the likelihood of the recognition result of the main speech recognition means does not satisfy the above condition, for example, when the likelihood of the recognition result is large, the recognition processing by the partial word sequence generation means is not executed. Since the partial word sequence generation unit is not activated for an average utterance by the user, it is suitable for realizing the function of the partial word sequence generation unit on a computer that is not fast.

【００７６】[0076]

【発明の実施の形態】以下、本発明の実施の形態につき
図面を参照して説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００７７】［第１の実施形態］図１は本発明の第１の
実施形態を示すサブワード型不特定話者音声認識装置の
ブロック構成図である。[First Embodiment] FIG. 1 is a block diagram showing a sub-word type speaker-independent speech recognition apparatus according to a first embodiment of the present invention.

【００７８】図１の装置では、単語登録と音声認識（認
識処理）の２つのモードが使用者から選択指定可能なよ
うになっている。単語登録時には、モード切替部１１の
切り替えにより、入力音声が部分単語系列生成部１２に
入力されて、その入力音声が音素、音韻などに相当する
少なくとも１個の部分単語の系列に変換され、その部分
単語系列に対応する情報が使用者登録単語辞書１３に登
録される。一方、認識処理時には、入力音声はモード切
替部１１により主音声認識部１４に入力され、使用者登
録単語辞書１３の登録内容から取得される、部分単語音
声モデルがつなぎ合わされた単語音声モデル、例えば部
分単語ＨＭＭのパラメータがつなぎ合わされた単語音声
ＨＭＭのパラメータを用いて、入力音声が認識される。In the apparatus shown in FIG. 1, two modes of word registration and voice recognition (recognition processing) can be selected and designated by the user. At the time of word registration, an input voice is input to the partial word sequence generation unit 12 by switching of the mode switching unit 11, and the input voice is converted into a sequence of at least one partial word corresponding to a phoneme, a phoneme, or the like. Information corresponding to the partial word sequence is registered in the user registration word dictionary 13. On the other hand, at the time of recognition processing, the input voice is input to the main voice recognition unit 14 by the mode switching unit 11 and acquired from the registered contents of the user registration word dictionary 13, and a word voice model in which partial word voice models are joined, for example, The input voice is recognized using the parameters of the word voice HMM in which the parameters of the partial word HMM are connected.

【００７９】上記のように、単語登録時には、入力音声
はまず部分単語系列生成部１２に入力される。部分単語
系列生成部１２は、入力音声を部分単語の系列に変換す
る。As described above, at the time of word registration, the input speech is first input to the partial word sequence generation unit 12. The partial word sequence generation unit 12 converts the input voice into a partial word sequence.

【００８０】部分単語系列生成部１２の内部構成例を図
２に示す。FIG. 2 shows an example of the internal configuration of the partial word sequence generation unit 12.

【００８１】ここでは、部分単語系列生成部１２は、音
響分析部１２１、量子化部１２２、部分単語接続表１２
３、部分単語ＨＭＭ認識部１２４、部分単語ＨＭＭ辞書
１２５、及び単語ＨＭＭ生成部１２６から構成される。
音響分析部１２１及び量子化部１２２は、図４４に示し
た従来のサブワード型不特定話者音声認識装置の主音声
認識部４４０に用いられる音響分析部４４１及び量子化
部４４２と同様である。また部分単語ＨＭＭ辞書１２５
は、図４４中の部分単語ＨＭＭ辞書４６０に相当する。Here, the partial word sequence generation unit 12 includes an acoustic analysis unit 121, a quantization unit 122, a partial word connection table 12
3. It is composed of a partial word HMM recognition unit 124, a partial word HMM dictionary 125, and a word HMM generation unit 126.
The sound analysis unit 121 and the quantization unit 122 are the same as the sound analysis unit 441 and the quantization unit 442 used in the main speech recognition unit 440 of the conventional subword type unspecified speaker speech recognition device shown in FIG. Also, the partial word HMM dictionary 125
Corresponds to the partial word HMM dictionary 460 in FIG.

【００８２】部分単語接続表１２３には、直接連結可能
な音韻の組み合わせが登録されている。この表（テーブ
ル）１２３は、「子音と子音は連接しない」、「促音、
撥音は語頭には存在しない」などの日本語音声の制約を
用いてより高精度に認識するために用いる。部分単語接
続表１２３の一例を図３に示す。この例では、先行部分
単語毎に後続し得る音韻が登録されている。図３におい
て、部分単語「＃」は語頭を表す仮想的な音韻に対する
記号、「＆」は語尾を表す仮想的な音韻に対する記号で
ある。In the partial word connection table 123, combinations of phonemes that can be directly connected are registered. This table (table) 123 includes “consonants and consonants do not connect”, “consonants,
It is used for recognition with higher accuracy by using a restriction of Japanese speech such as "the repellent does not exist at the beginning of the word." FIG. 3 shows an example of the partial word connection table 123. In this example, a phoneme that can follow for each preceding partial word is registered. In FIG. 3, a partial word “#” is a symbol for a virtual phoneme indicating a head, and “&” is a symbol for a virtual phoneme indicating an end.

【００８３】部分単語ＨＭＭ認識部１２４は部分単語Ｈ
ＭＭ辞書１２５を用いて入力音声の部分単語単位（ここ
では音韻単位）での認識を行い、その認識結果として部
分単語接続表１２３に従って部分単語が接続された少な
くとも１個の部分単語系列を出力する。この部分単語Ｈ
ＭＭ認識部１２４の詳細について、図４乃至図６のフロ
ーチャートを参照して以下に述べる。The partial word HMM recognizing section 124 outputs the partial word H
Using the MM dictionary 125, the input speech is recognized in partial word units (here, phoneme units), and at least one partial word sequence in which partial words are connected according to the partial word connection table 123 is output as the recognition result. . This partial word H
Details of the MM recognizing unit 124 will be described below with reference to flowcharts of FIGS.

【００８４】部分単語接続表１２３に従って部分単語を
接続してできる系列は無数にある。部分単語ＨＭＭ認識
部１２４は、その系列を動的に生成しながら、入力音声
に対応するラベル系列を出力する部分単語モデル系列と
そのビタビ（ｖｉｔｅｒｂｉ）スコアを探索する。There are countless series formed by connecting partial words according to the partial word connection table 123. The partial word HMM recognizing unit 124 searches for a partial word model sequence that outputs a label sequence corresponding to the input speech and its Viterbi score while dynamically generating the sequence.

【００８５】これは以下のようにして計算する。This is calculated as follows.

【００８６】部分単語の系列をΧ＝［ｘ(1) ，ｘ(2) ，
…，ｘ(J(X))］と表す。J(X)は部分単語系列の長さであ
る。ビタビアルゴリズムで用いた配列Ｄを照合に用いた
部分単語の系列毎に保持し、それらをＤ^Xと表すことに
する。また、配列Ｄ^Xの状態数方向の大きさをΗ^Xに保
持する。Η^Xは、部分単語系列Ｘの末尾の部分単語ｘ(J
(X))に対応する部分単語ＨＭＭの状態数Ｎ^x(J(X))に等
しい。但し仮想的な音韻「＃」に対する状態数は１とす
る。つまりΗ^[#]＝１である。配列Ｚは、Ｄ^Xを保持し
ている。The sequence of partial words is represented by Χ = [x (1), x (2),
, X (J (X))]. J (X) is the length of the partial word sequence. It holds an array D used in Viterbi algorithm for each sequence of partial word used in matching them to be represented as D ^X. Further, to hold the number of states the direction of the size of the array D ^X to Eta ^X. Η ^X is a partial word x (J
It is equal to the number of states Nx ^{(J (X))} of the partial word HMM corresponding to ^(X)) . However, the number of states for the virtual phoneme “#” is one. That is, Η ^[#] = 1. Sequence Z holds D ^X.

【００８７】まず、配列Ｄ^[#]を生成して、Ｄ^[#](1,
1) ＝０に初期化し、Ｄ^[#]をＺに追加する（ステップ
Ｓ１〜Ｓ３）。First, an array D ^[#] is generated, and D ^[#] (1,
1) Initialize to = 0 and add D ^[#] to Z (steps S1 to S3).

【００８８】次に、ｔを１からＴまで１ずつ増加させな
がら（ステップＳ４，Ｓ５，Ｓ２１）、ステップＳ６〜
Ｓ２０を繰り返す。Next, while increasing t by 1 from 1 to T (steps S4, S5, S21), steps S6 to
S20 is repeated.

【００８９】ステップＳ６〜Ｓ２０では、ＺからＤ^Xを
１つずつ系列長J(X)の小さい順に取り出しながら（ステ
ップＳ６，Ｓ７）、ステップＳ８〜Ｓ２０が繰り返され
る。[0089] At step S6～S20, while removed D ^X from Z in ascending order of one by one sequence length J (X) (step S6, S7), step S8~S20 is repeated.

【００９０】ステップＳ８では、Ｘが［＃］であるか否
かが調べられ、Ｘ＝［＃］の場合には、ｔ＝１であれば
Ｄ^X(t,1) に０が代入され、ｔ≠１であればＤ^X(t,1)
に−∞が代入され（ステップＳ９）、ステップＳ１８か
らの処理に進む。In step S8, it is checked whether or not X is [#]. When X = [#], if t = 1, 0 is substituted into D ^X (t, 1), If t ≠ 1, D ^X (t, 1)
Is substituted into (step S9), and the process proceeds from step S18.

【００９１】これに対し、Ｘ≠［＃］の場合には、以下
に述べる部分単語の第１状態の処理（ステップＳ１０，
Ｓ１１，Ｓ１２またはＳ１０，Ｓ１１，Ｓ１３）を行
う。On the other hand, when X ≠ [#], processing of the first state of the partial word described below (step S10,
S11, S12 or S10, S11, S13) are performed.

【００９２】まず、現在着目しているＤ^Xの履歴Ｘ＝
［ｘ(1) ，ｘ(2) ，…，ｘ(J(X))］の最後の部分単語を
取り除いた履歴がＸ１＝［ｘ(1) ，ｘ(2) ，…，ｘ(J
(X)-1)］とされる（ステップＳ１０）。[0092] First of all, the history of D ^X of interest current X =
The history obtained by removing the last partial word of [x (1), x (2), ..., x (J (X))] is X1 = [x (1), x (2), ..., x (J
(X) -1)] (step S10).

【００９３】次に、ＺにＤ^X1，Ｈ^X1が存在するか否かが
調べられ（ステップＳ１１）、存在する場合はそれを取
り出し、ステップＳ１２を行う。存在しない場合はステ
ップＳ１３を行う。Next, it is checked whether or not D ^X1 and H ^X1 exist in Z (step S11), and if they exist, they are taken out and step S12 is performed. If not, step S13 is performed.

【００９４】ステップＳ１２では、ｄ₁＝Ｄ^X1（t,
Η^X1）と、ｄ₂＝Ｄ^X(t-1,1) ＋ ln ｐ(1,1) ＋ ln ｑ
(1,1,y(t))の大きいほうの値を、Ｄ^X(t,1) に代入す
る。In step S12, d ₁ = D ^X1 (t,
And ^{_{Η X1), d 2 = D}} X (t-1,1) + ln p (1,1) + ln q
The larger value of (1,1, y (t)) is substituted for D ^X (t, 1).

【００９５】これに対してステップＳ１３では、ｄ＝Ｄ
^X(t-1,1) ＋ ln ｐ(1,1) ＋ ln ｑ(1,1,y(t))を計算
し、Ｄ^X(t,1) に代入する。On the other hand, in step S13, d = D
^{X (t-1,1) + ln} p (1,1) + ln q to calculate the (1,1, y (t)) , is substituted for D ^X (t, 1).

【００９６】ステップＳ１２またはＳ１３を実行する
と、以下に述べる部分単語の第２状態以降（第２状態〜
第Ｎ^J(X)状態）の処理（ステップＳ１４〜Ｓ１７）を行
う。When step S12 or S13 is executed, the second and subsequent states of the partial words described below (second states to
The processing of the N ^{J (X)} state) (steps S14 to S17) is performed.

【００９７】ここでは、ｎを２からＮ^J(X)まで１ずつ増
加させながら（ステップＳ１５，Ｓ１７）、ステップＳ
１６を繰り返す。このステップＳ１６では、ｄ₁＝Ｄ^X
(t-1,n-1) ＋ ln ｐ(n-1,n) ＋ ln ｑ(n-1,n,y(t))と、
ｄ₂＝Ｄ^X(t-1,n) ＋ ln ｐ(n,n) ＋ ln ｑ(n,n,y(t))
とを計算し、そのｄ₁とｄ₂のうちの大きいほうの値を
Ｄ^X(t,n) に代入する。Here, while increasing n by one from 2 to N ^{J (X)} (steps S15 and S17), step n
Repeat step 16. In this step S16, d ₁ = D ^X
(t-1, n-1) + lnp (n-1, n) + lnq (n-1, n, y (t)),
d ₂ = D ^X (t-1, n) + ln p (n, n) + ln q (n, n, y (t))
Is calculated, and the larger value of d ₁ and d ₂ is substituted for D ^X (t, n).

【００９８】次に、Ｄ^X（t,Η^X）が−∞であるか否か
が調べられ、Ｄ^X（t,Η^X）が−∞の場合には何もせず
にステップＳ６に戻る。これに対してＤ^X（t,Η^X）が
−∞でない場合には、以下に述べる新しいＤの生成処理
（ステップＳ１９，Ｓ２０）を実行する。[0098] ^{^{Then, D X (t, Η X}} ) whether is examined is ^{-∞, D X (t, Η} X) returns to step S6 without nothing if the -∞. On the other hand, if D ^X (t, Η ^X ) is not −∞, a new D generation process (steps S19 and S20) described below is executed.

【００９９】ここでは、Ｘの末尾の部分単語ｘ(J(X))に
後続することのできる部分単語ｘ′₁，ｘ′₂，…を部
分単語接続表１２３から全て探し、それらをＸに接続し
て新たな系列Ｘ′₁，Ｘ′₂，…を作る（ステップＳ１
９）。Here, all the partial words x ′ ₁ , x ′ ₂ ,... That can follow the partial word x (J (X)) at the end of X are searched from the partial word connection table 123, connect a new sequence X _'1, X' _2, making ... (step S1
9).

【０１００】つまり、Ｘ′₁＝［ｘ(1) ，ｘ(2) ，…，
ｘ(J(X))，ｘ′₁］，Ｘ′₂＝［ｘ(1) ，ｘ(2) ，…，
ｘ(J(X))，ｘ′₂］，…となる。That is, X ′ ₁ = [x (1), x (2),.
x (J (X)), x ′ ₁ ], X ′ ₂ = [x (1), x (2),.
x (J (X)), x ′ ₂ ],.

【０１０１】次に、ステップＳ１９で生成した系列Ｘ′
₁，Ｘ′₂，…のそれぞれについて、それに対応するＤ
がＺ内に既に存在するか否かを調べ、存在しなかった場
合には新たにＤ，Ηを生成してＺに追加する（ステップ
Ｓ２０）。ここで、新たに作成したＤは、全てＤ(0〜T,
1〜N)＝−∞に初期化しておく。Next, the sequence X 'generated in step S19
_1, X _'2, ... for each, D the corresponding
It is checked whether or not already exists in Z. If not, D and Ｄ are newly generated and added to Z (step S20). Here, all newly created Ds are D (0 to T,
1 to N) = -∞.

【０１０２】このステップＳ２０の実行により、現在注
目しているＤ^Xの処理は終了となり、ステップＳ６に戻
る。[0102] By executing the step S20, the processing of D ^X that is currently attention will end, and the flow returns to step S6.

【０１０３】以上の動作が、ｔを１からＴまで１ずつ増
加させながら、また各ｔについてＺからＤ^Xを１つずつ
系列長J(X)の小さい順に取り出しながら繰り返された結
果、ｔの値がＴを越えた時点で、Ｚ内に含まれるＤ
^X（Ｔ，Η^X）にそれぞれの系列Χに対するビタビスコ
アが求まっていることになる。そこで、部分単語接続表
１２３を参照して、系列Ｘの末尾の部分単語が「＆」に
接続可能な部分単語系列を選択し、これを降順に並び替
えることで、ビタビスコアの大きい順に系列（部分単語
系列）Ｘとそのビタビスコアの組を求めることができ
る。The above operation was repeated while increasing t by 1 from 1 to T, and by taking out D ^X from Z for each t one by one in order of decreasing sequence length J (X). When the value exceeds T, the D included in Z
^{^X} (T, ^{Η X)} so that the Viterbi score is been determined for each series Χ to. Therefore, by referring to the partial word connection table 123, a partial word sequence in which the partial word at the end of the sequence X can be connected to "&" is selected and sorted in descending order, so that the sequence ( A set of (partial word series) X and its Viterbi score can be obtained.

【０１０４】実際にはＴが大きくなると、Ｚ内に保持す
るＤの個数が爆発的に増加するため、Ｄの生成、計算を
一定の条件下でしか行わないように制限して、高速化を
図る場合が多い。Actually, when T increases, the number of Ds held in Z increases explosively. Therefore, the generation and calculation of D are limited to be performed only under certain conditions, and the speed is increased. Often attempts.

【０１０５】簡単には、ステップＳ１８の段階で−∞と
比較するのではなく、ｆ（ｔ）＝αｔ（但しαは定数）
と比較するように変更して、新たなＤの生成を制限し、
Ｚ内のＤの個数を抑制することで高速化を図る。更に
は、ステップＳ６に戻る際に、注目しているＤ^X(t,1〜
Ｎ) が、ｇ（ｔ）＝βｔ（但しβは定数）よりも全て小
さい場合に、そのＤ^XをＺから削除することで、Ｚに含
まれるＤを減らすことも行われる。In brief, instead of comparing with -∞ at the step S18, f (t) = αt (where α is a constant)
To limit the creation of new D,
Higher speed is achieved by suppressing the number of D in Z. Further, when returning to step S6, the noted D ^X (t, 1 to
N) is, when g (t) = βt (where β is a constant) All than smaller, by removing the D ^X from Z, is also performed to reduce the D contained in Z.

【０１０６】高速化の手法を一切行わなければステップ
Ｓ１３の処理は必要ないが、例えばｇ（ｔ）を用いた高
速化を行う場合には、ステップＳ１１で参照するＤ^X1が
削除されることがあるため、ステップＳ１３が必要とな
る。If the speed-up method is not performed at all, the processing in step S13 is not necessary. However, when speeding up using, for example, g (t), D ^X1 referred to in step S11 may be deleted. Therefore, step S13 is required.

【０１０７】部分単語系列を求める計算法及び高速化法
はこの他にも様々なものが存在するが、本発明ではこの
部分の認識方式は問わない。There are various other calculation methods and speeding-up methods for obtaining the partial word sequence, but the present invention does not care about the recognition method for this part.

【０１０８】さて、部分単語ＨＭＭ認識部１２４は、以
上のようにして入力されたラベル系列に対するビタビス
コアが入力ラベル列長Ｔの関数ｇ（Ｔ）＝γＴよりも大
きい部分単語系列が存在すればそれらを全て選択し、存
在しないときは最大のビタビスコアを持つ部分単語系列
を出力する。By the way, the partial word HMM recognizing section 124 determines if there is a partial word sequence whose Viterbi score for the label sequence input as described above is larger than the function g (T) = γT of the input label sequence length T. All of them are selected, and if they do not exist, a partial word sequence having the maximum Viterbi score is output.

【０１０９】なお、出力する部分単語系列を選沢する方
法として、最大のビタビスコアを持つ系列のみを出力す
るとか、上位の予め定められた個数の系列を出力する、
などの方法もある。As a method of selecting a partial word sequence to be output, only a sequence having the maximum Viterbi score is output, or a predetermined number of higher-order sequences are output.
There are also other methods.

【０１１０】部分単語ＨＭＭ認識部１２４により出力さ
れる１つまたは複数の部分単語系列は単語ＨＭＭ生成部
１２６に渡される。単語ＨＭＭ生成部１２６は、部分単
語ＨＭＭ認識部１２４から出力された部分単語系列を構
成する部分単語によって部分単語ＨＭＭ辞書１２５を参
照する。この部分単語ＨＭＭ辞書１２５の登録内容は、
図４４に示した従来のサブワード型不特定話者音声認識
装置の部分単語ＨＭＭ辞書１２５の登録内容（図４６参
照）と同様であり、当該部分単語ＨＭＭ辞書１２５に
は、種々の部分単語音声モデルとしての部分単語ＨＭＭ
のパラメータが部分単語名（部分単語モデル名）と対を
なして登録されている。One or more partial word sequences output from partial word HMM recognizing section 124 are passed to word HMM generating section 126. The word HMM generation unit 126 refers to the partial word HMM dictionary 125 based on the partial words constituting the partial word sequence output from the partial word HMM recognition unit 124. The registered contents of the partial word HMM dictionary 125 are as follows:
This is the same as the registration content of the partial word HMM dictionary 125 (see FIG. 46) of the conventional subword type unspecified speaker voice recognition apparatus shown in FIG. 44, and the partial word HMM dictionary 125 includes various partial word speech models. HMM as a partial word
Are registered in pairs with partial word names (partial word model names).

【０１１１】これにより単語ＨＭＭ生成部１２６は、部
分単語ＨＭＭ認識部１２４から出力された部分単語系列
に従って、部分単語ＨＭＭ辞書１２５に登録されている
部分単語ＨＭＭ（のパラメータ）を連結することで、使
用者が発声した単語の単語音声モデルとしての単語ＨＭ
Ｍ（のパラメータ）を生成する。そして単語ＨＭＭ生成
部１２６は、部分単語ＨＭＭ認識部１２４から出力され
た部分単語系列の情報として、使用者が発声した単語の
単語名と、対応する単語ＨＭＭ（のパラメータ）の対を
使用者登録単語辞書１３に登録する。Thus, the word HMM generating unit 126 connects the partial words HMM (parameters) registered in the partial word HMM dictionary 125 according to the partial word sequence output from the partial word HMM recognizing unit 124, Word HM as a word speech model for words uttered by the user
Generate M (parameters of M). Then, the word HMM generation unit 126 registers the pair of the word name of the word uttered by the user and the corresponding word HMM (parameter thereof) as the information of the partial word sequence output from the partial word HMM recognition unit 124. Register in the word dictionary 13.

【０１１２】使用者登録単語辞書１３の一例を図７に示
す。この図７は、単語登録時（単語登録モード）で「社
員」という単語に対して使用者が「しゃいん」と発声し
たときに、部分単語系列生成部１２内の部分単語ＨＭＭ
認識部１２４が「ｙ，ａ，ｉ，Ｎ」という１個の部分単
語系列を出力し、「役員」という単語に対して使用者が
「やくいん」と発声したときに部分単語系列生成部１２
内の部分単語ＨＭＭ認識部１２４が「ｙ，ａ，ｋ，ｕ，
ｉ，Ｎ」「ｙ，ａ，ｐ，ｕ，ｉ，Ｎ」の２個の部分単語
系列を出力した場合の、単語名と単語ＨＭＭの対の登録
例を示している。An example of the user registered word dictionary 13 is shown in FIG. FIG. 7 shows a partial word HMM in the partial word sequence generation unit 12 when the user utters “shain” for the word “employee” during word registration (word registration mode).
The recognition unit 124 outputs one partial word sequence “y, a, i, N”, and when the user utters “Yakuin” for the word “officer”, the partial word sequence generation unit 12
The partial word HMM recognizing unit 124 in “y, a, k, u,
An example of registration of a pair of a word name and a word HMM when two partial word sequences of "i, N" and "y, a, p, u, i, N" are output is shown.

【０１１３】一方、認識処理時（音声認識モード）に
は、入力音声はモード切替部１１により主音声認識部１
４に入力される。On the other hand, during the recognition process (voice recognition mode), the input voice is input to the main voice recognition unit 1 by the mode switching unit 11.
4 is input.

【０１１４】主音声認識部１４は、図４４に示した従来
のサブワード型不特定話者音声認識装置と全く同様にし
て、（図４４中の単語ＨＭＭ辞書４５０に相当する）使
用者登録単語辞書１３に登録された単語ＨＭＭのそれぞ
れのパラメータを用いて、入力ラベル系列に対するビタ
ビスコアを単語毎に計算する。そして主音声認識部１４
は、最大のビタビスコアを持つ単語を認識結果として出
力する。The main speech recognition section 14 is a user registered word dictionary (corresponding to the word HMM dictionary 450 in FIG. 44) in exactly the same manner as the conventional subword type unspecified speaker speech recognition apparatus shown in FIG. The Viterbi score for the input label sequence is calculated for each word using the respective parameters of the word HMM registered in No. 13. And the main voice recognition unit 14
Outputs the word having the largest Viterbi score as a recognition result.

【０１１５】主音声認識部１４の構成は、従来のサブワ
ード型不特定話者音声認識装置と同様であり、図８に示
すように、（図４４中の音響分析部４４１、量子化部４
４２、及びＨＭＭ認識部４４３に相当する）音響分析部
１４１、量子化部１４２、及びＨＭＭ認識部１４３を有
している。ここで、主音声認識部１４内の音響分析部１
４１及び量子化部１４２と、部分単語系列生成部１２内
の音響分析部１２１及び量子化部１２２とを独立に設け
る必要はなく、いずれか一方を共有使用することで、他
方を不要とすることができる。The configuration of the main speech recognition section 14 is the same as that of the conventional subword type unspecified speaker speech recognition apparatus, and as shown in FIG. 8, (the acoustic analysis section 441 and the quantization section 4 in FIG. 44).
42, and an HMM recognition unit 443). The sound analysis unit 141, the quantization unit 142, and the HMM recognition unit 143 are included. Here, the sound analysis unit 1 in the main voice recognition unit 14
It is not necessary to independently provide the sound analysis unit 121 and the quantization unit 122 in the partial word sequence generation unit 12 and the quantization unit 142 and the partial word sequence generation unit 12. Can be.

【０１１６】さて、図７の使用者登録単語辞書１３の例
では、認識処理時に使用者が「しゃいん」と音声入力し
た場合、主音声認識部１４では、この音声から生成され
たラベル系列に対して、社員の単語ＨＭＭと役員の単語
ＨＭＭ（２個ある）のビタビスコアが計算される。In the example of the user-registered word dictionary 13 shown in FIG. 7, when the user inputs "shain" by voice during the recognition processing, the main voice recognition unit 14 generates a label sequence generated from this voice. On the other hand, the Viterbi score of the employee's word HMM and the officer's word HMM (there are two) is calculated.

【０１１７】もし、「社員」のビタビスコアが−４０、
「役員」のビタビスコアが−８０と−１００であるもの
とすると、主音声認識部１４での認識結果は単語「社
員」となる。If the "Employee" has a Viterbi score of -40,
Assuming that the Viterbi score of “executive” is −80 and −100, the result of recognition by the main voice recognition unit 14 is the word “employee”.

【０１１８】また、使用者が「やくいん」と音声入力し
た場合に、この音声から生成されたラベル系列に対し
て、同様にビタビスコアが計算され、「社員」のビタビ
スコアが−５０、「役員」のビタビスコアが−３０と−
４０であるならば、認識結果は単語「役員」となる。When the user inputs "Yakuin" by voice, a Viterbi score is similarly calculated for the label sequence generated from this voice, and the Viterbi score of "employee" is -50, "Officer's Viterbi score is -30 and-
If it is 40, the recognition result is the word "officer".

【０１１９】以上の例では、使用者登録単語辞書１３に
直接、単語ＨＭＭのパラメータを登録するものとして説
明したが、部分単語系列生成部１２内の部分単語ＨＭＭ
認識部１２４から出力される部分単語系列を図９に示す
ように当該使用者登録単語辞書１３に登録するようにし
ても構わない。In the above example, the description has been made assuming that the parameters of the word HMM are directly registered in the user registration word dictionary 13.
The partial word sequence output from the recognition unit 124 may be registered in the user registered word dictionary 13 as shown in FIG.

【０１２０】使用者登録単語辞書１３の辞書登録形式
（辞書構造）として、（図７ではなくて）図９のような
形式を適用する場合、部分単語系列生成部１２は部分単
語系列を出力して使用者登録単語辞書１３に登録すれば
よいため単語ＨＭＭを生成する必要がなく、したがって
図２とは異なって、部分単語系列生成部１２に単語ＨＭ
Ｍ生成部１２６を設ける必要はない。この場合の部分単
語系列生成部１２の構成を図１０に示す。When a format as shown in FIG. 9 (instead of FIG. 7) is applied as the dictionary registration format (dictionary structure) of the user registration word dictionary 13, the partial word sequence generation unit 12 outputs a partial word sequence. It is not necessary to generate a word HMM because it is only necessary to register in the user registration word dictionary 13. Therefore, unlike FIG.
There is no need to provide the M generator 126. FIG. 10 shows the configuration of the partial word sequence generation unit 12 in this case.

【０１２１】これに対して主音声認識部１４には、図８
とは異なって、図１１に示すように（図２中の部分単語
ＨＭＭ辞書１２５及び単語ＨＭＭ生成部１２６に相当す
る）部分単語ＨＭＭ辞書１４５及び単語ＨＭＭ生成部１
４６を追加する必要がある。単語ＨＭＭ生成部１４６
は、使用者登録単語辞書１３を参照して各単語の部分単
語系列を取得し、その部分単語系列を構成する部分単語
によって部分単語ＨＭＭ辞書１４５を参照することで各
部分単語の部分単語ＨＭＭ（のパラメータ）を取得し、
それを連結して各単語の単語ＨＭＭを生成する。On the other hand, the main voice recognition unit 14
Unlike FIG. 11, the partial word HMM dictionary 145 and the word HMM generation unit 1 (corresponding to the partial word HMM dictionary 125 and the word HMM generation unit 126 in FIG. 2)
46 need to be added. Word HMM generation unit 146
Refers to the user registered word dictionary 13 to obtain a partial word sequence of each word, and refers to the partial word HMM dictionary 145 with the partial words constituting the partial word sequence to obtain a partial word HMM ( Parameter), and
By concatenating them, a word HMM of each word is generated.

【０１２２】ＨＭＭ認識部１４３は、単語ＨＭＭ生成部
１４６が生成した各単語の単語ＨＭＭのビタビスコアを
それぞれ計算し、最大のビタビスコアを持つ単語を認識
結果として出力する。The HMM recognizing section 143 calculates the Viterbi score of the word HMM of each word generated by the word HMM generating section 146, and outputs the word having the maximum Viterbi score as a recognition result.

【０１２３】なお、使用者登録単語辞書１３の登録形式
を図９のようにして、図１０及び図１１の構成を適用す
る場合、図１０の構成の部分単語系列生成部１２で使用
する部分単語ＨＭＭ辞書（１２５）と、図１１の構成の
主音声認識部１４で使用する部分単語ＨＭＭ辞書（１４
５）の内容が同一のものであるならば、いずれか一方を
共有使用して、他方を不要としても構わない。この場
合、共有使用する部分単語ＨＭＭ辞書は、部分単語系列
生成部１２及び主音声認識部１４の外部に設けられてい
るものであっても構わない。また、単語ＨＭＭ生成部
（１２６または１４６）も、部分単語系列生成部１２ま
たは主音声認識部１４の一部とせず、その外部に設ける
ようにしても構わない。When the registration format of the user registration word dictionary 13 is as shown in FIG. 9 and the configuration of FIGS. 10 and 11 is applied, the partial word sequence generation unit 12 having the configuration of FIG. The HMM dictionary (125) and the partial word HMM dictionary (14
If the contents of 5) are the same, one of them may be shared and the other may not be necessary. In this case, the partial word HMM dictionary to be shared and used may be provided outside the partial word sequence generation unit 12 and the main speech recognition unit 14. Also, the word HMM generation unit (126 or 146) may not be a part of the partial word sequence generation unit 12 or the main speech recognition unit 14, but may be provided outside.

【０１２４】また、使用者登録単語辞書１３の登録形式
を図９のようにした場合、図１１の構成の主音声認識部
１４で使用する部分単語ＨＭＭと、図１０の構成の部分
単語系列生成部１２で使用する部分単語ＨＭＭとが異な
っていても構わない。例えば、部分単語系列生成部１２
にて部分単語系列を生成する場合には、高精度に照合を
行うために５状態の部分単語ＨＭＭを使用し、主音声認
識部１４では高速に大量の単語との照合を行うために３
状態の部分単語ＨＭＭを使用するような構成を適用する
ことも可能である。このように、主音声認識部１４で使
用する部分単語ＨＭＭが、部分単語系列生成部１２で使
用する部分単語ＨＭＭと異なっている場合には、図１０
及び図１１の例のように、それぞれの部分単語ＨＭＭ辞
書（１２５，１４５）を別個に用意すればよい。When the registration format of the user registration word dictionary 13 is as shown in FIG. 9, the partial word HMM used in the main speech recognition unit 14 having the configuration shown in FIG. 11 and the partial word sequence generation having the configuration shown in FIG. The partial word HMM used in the unit 12 may be different. For example, the partial word sequence generation unit 12
When a partial word sequence is generated by using a five-state partial word HMM to perform high-precision matching, the main-speech recognition unit 14 uses three-state partial-word HMMs to perform high-speed matching with a large number of words.
It is also possible to apply a configuration that uses the partial word HMM of the state. As described above, when the partial word HMM used in the main speech recognition unit 14 is different from the partial word HMM used in the partial word sequence generation unit 12, FIG.
As in the example of FIG. 11 and the example of FIG. 11, the respective partial word HMM dictionaries (125, 145) may be separately prepared.

【０１２５】さて、主音声認識部１４で使用する部分単
語の体系と、部分単語系列生成部１２で使用する部分単
語の体系が異なる場合が考えられる。例えば、主音声認
識部１４では直前の音素毎に区別した音素を部分単語と
して用いる場合がある。つまり、音素「ｋ」の後の
「ａ」を「（ｋ）ａ」とし、音素「ｓ」の後の「ａ」を
「（ｓ）ａ」として両者を区別する。The system of the partial words used in the main speech recognition unit 14 and the system of the partial words used in the partial word sequence generation unit 12 may be different. For example, the main speech recognition unit 14 may use a phoneme distinguished for each immediately preceding phoneme as a partial word. That is, “a” after the phoneme “k” is set to “(k) a”, and “a” after the phoneme “s” is set to “(s) a” to distinguish them.

【０１２６】そのためには、図１２に示すように、主音
声認識部１４で使用する部分単語の体系を適用した部分
単語ＨＭＭ辞書１５と、単語ＨＭＭ生成部１６とを設け
る必要がある。For this purpose, as shown in FIG. 12, it is necessary to provide a partial word HMM dictionary 15 to which the system of partial words used in the main speech recognition unit 14 is applied, and a word HMM generation unit 16.

【０１２７】部分単語ＨＭＭ辞書１５の登録形式自体
は、直前の音素毎に区別した音素を部分単語として用い
ている点を除けば、これまで述べてきた部分単語ＨＭＭ
辞書１２５，１４４、更には部分単語ＨＭＭ辞書４６０
（図４４参照）と同様である。この部分単語ＨＭＭ辞書
１５の一例を図１３に示す。図１３中の「＃」は語頭を
表す仮想的な音素に与えた記号である。The registration format of the partial word HMM dictionary 15 itself is the same as that of the partial word HMM described above except that a phoneme distinguished for each immediately preceding phoneme is used as a partial word.
Dictionaries 125 and 144, and a partial word HMM dictionary 460
(See FIG. 44). FIG. 13 shows an example of the partial word HMM dictionary 15. “#” In FIG. 13 is a symbol given to a virtual phoneme representing the beginning of a word.

【０１２８】図１２の構成において、単語ＨＭＭ生成部
１６は、部分単語系列生成部１２が生成した部分単語系
列（音素系列）を主音声認識部１４で使用する部分単語
の体系に変換する。そして単語ＨＭＭ生成部１６は、変
換後の部分単語系列に従って部分単語ＨＭＭ辞書１５内
に登録されている部分単語ＨＭＭを選択し、それを接続
することで単語ＨＭＭを作成して、使用者登録単語辞書
１３に登録する。In the configuration shown in FIG. 12, the word HMM generation unit 16 converts the partial word sequence (phoneme sequence) generated by the partial word sequence generation unit 12 into a system of partial words used by the main speech recognition unit 14. Then, the word HMM generation unit 16 selects a partial word HMM registered in the partial word HMM dictionary 15 in accordance with the converted partial word sequence, connects it, creates a word HMM, and creates a user registered word. Register in the dictionary 13.

【０１２９】例えば、「社員」という単語に対して使用
者が「社員」と発声して登録したときに部分単語系列生
成部１２が「ｙ，ａ，ｉ，Ｎ」という系列を出力した場
合を考える。For example, assume that when the user utters “employee” for the word “employee” and registers it, the partial word sequence generation unit 12 outputs the sequence “y, a, i, N”. Think.

【０１３０】この場合、単語ＨＭＭ生成部１６は、部分
単語系列「ｙ，ａ，ｉ，Ｎ」の各音素を、直前の音素も
考慮して部分単語系列「（＃）ｙ，（ｙ）ａ，（ａ）
ｉ，（ｉ）Ｎ」に変換する。さらに「（＃）ｙ，（ｙ）
ａ，（ａ）ｉ，（ｉ）Ｎ」という並び順に、部分単語Ｈ
ＭＭ辞書１５から対応する部分単語ＨＭＭを取り出し、
それらを連結して単語ＨＭＭを生成する。In this case, the word HMM generation unit 16 converts each phoneme of the partial word sequence “y, a, i, N” into the partial word sequence “(#) y, (y) a , (A)
i, (i) N ". Further, "(#) y, (y)
a, (a) i, (i) N "and the partial words H
The corresponding partial word HMM is extracted from the MM dictionary 15,
A word HMM is generated by concatenating them.

【０１３１】部分単語系列生成部１２及び主音声認識部
１４の動作は、図１の構成の場合と同様である。The operations of the partial word sequence generation unit 12 and the main speech recognition unit 14 are the same as those in the configuration of FIG.

【０１３２】［第２の実施形態］次に、本発明の第２の
実施形態について説明する。[Second Embodiment] Next, a second embodiment of the present invention will be described.

【０１３３】まず、単語「社員」に対して使用者が「し
ゃいん」と発声して登録使用した場合を考える。この音
声の登録の際に、音声の直前の部分で雑音が混入してし
まうと、その雑音部分も含んだ部分単語系列が生成され
てしまう。つまり、「ｐ，ａ，ｈ，ｕ，ｓｙ，ａ，ｉ，
Ｎ」のような部分単語系列が生成されてしまう。ここ
で、「ｐ，ａ，ｈ，ｕ」の部分は、誤って雑音に対して
生成してしまった部分単語系列である。First, consider the case where the user utters “Shain” for the word “employee” and uses it for registration. At the time of registration of the voice, if noise is mixed in a portion immediately before the voice, a partial word sequence including the noise portion is generated. That is, "p, a, h, u, sy, a, i,
A partial word sequence such as "N" is generated. Here, the part “p, a, h, u” is a partial word sequence that is erroneously generated for noise.

【０１３４】これをそのまま使用者登録単語辞書（１
３）に登録してしまうと、認識処理時に使用者が発声し
た音声「しゃいん」に対する単語「社員」の単語ＨＭＭ
（「ｐ，ａ，ｈ，ｕ，ｓｙ，ａ，ｉ，Ｎ」）のビタビス
コアは小さくなってしまう。したがって、認識結果に
「社員」が選ばれにくくなり、認識性能が低下する。The user registration word dictionary (1)
If registered in 3), the word HMM of the word "employee" for the voice "shain" uttered by the user during the recognition process
("P, a, h, u, sy, a, i, N") has a small Viterbi score. Therefore, it is difficult to select “employee” as the recognition result, and the recognition performance is reduced.

【０１３５】第２の実施形態は、このような誤った部分
単語系列の登録を自動的に防止する機構を実現するもの
である。The second embodiment realizes a mechanism for automatically preventing the registration of such an erroneous partial word sequence.

【０１３６】図１４は、本発明の第２の実施形態を示す
サブワード型不特定話者音声認識装置のブロック構成図
であり、図１と同一部分には同一符号を付してある。FIG. 14 is a block diagram showing a sub-word type speaker-independent speech recognition apparatus according to a second embodiment of the present invention. The same parts as those in FIG. 1 are denoted by the same reference numerals.

【０１３７】図１４の構成において、部分単語系列生成
部１２は、入力音声を１つまたは複数の部分単語系列
（少なくとも１つの部分単語系列）に変換し、その部分
単語系列を出力する。この際、部分単語系列生成部１２
は、前記第１の実施形態における図１０の構成と異なっ
て、部分単語系列と同時にその系列のビタビスコアも出
力する。In the configuration shown in FIG. 14, partial word sequence generation section 12 converts an input speech into one or a plurality of partial word sequences (at least one partial word sequence) and outputs the partial word sequence. At this time, the partial word sequence generation unit 12
Is different from the configuration of FIG. 10 in the first embodiment, and outputs the Viterbi score of the partial word sequence simultaneously with the partial word sequence.

【０１３８】図１４の構成の特徴は、図１の構成に対し
て登録条件判定部２１が追加されている点にある。この
登録条件判定部２１には、部分単語系列生成部１２から
出力される部分単語系列及びその系列のビタビスコアが
送られる。A feature of the configuration shown in FIG. 14 is that a registration condition determining unit 21 is added to the configuration shown in FIG. The registration condition determination unit 21 receives the partial word sequence output from the partial word sequence generation unit 12 and the Viterbi score of the sequence.

【０１３９】登録条件判定部２１は、部分単語系列生成
部１２から部分単語系列と対になって送られるビタビス
コアを、ラベル系列長Ｔの関数である登録判定関数γ
（Ｔ）＝ＲＴ（Ｒは定数）と比較し、当該γ（Ｔ）＝Ｒ
Ｔの値より大きいスコアの場合だけ、対応する部分単語
系列を使用者登録単語辞書１３に登録する。The registration condition judging unit 21 calculates the Viterbi score sent from the partial word sequence generation unit 12 in pair with the partial word sequence by a registration judgment function γ which is a function of the label sequence length T.
(T) = RT (R is a constant) and γ (T) = R
Only when the score is larger than the value of T, the corresponding partial word sequence is registered in the user registration word dictionary 13.

【０１４０】部分単語ＨＭＭは、対応する部分単語の音
声に対するビタビスコアが大きくなるように構成される
ため、雑音に代表される非音声に対するビタビスコアは
小さくなることが多い。したがって、上述した例では、
部分単語「ｐ，ａ，ｈ，ｕ，ｓｙ，ａ，ｉ，Ｎ」に対す
るビタビスコアは、正常な音声区間を部分単語系列に変
換したときに期待されるビタビスコアよりも小さくな
る。Since the partial word HMM is configured so that the Viterbi score for the speech of the corresponding partial word is large, the Viterbi score for non-speech represented by noise is often small. Therefore, in the above example,
The Viterbi score for the partial word “p, a, h, u, sy, a, i, N” is smaller than the Viterbi score expected when a normal speech section is converted into a partial word sequence.

【０１４１】したがって、上記のように、部分単語系列
生成部１２から出力される部分単語系列に対して、その
系列のビタビスコアをもとに使用者登録単語辞書１３に
登録すべきか否かを登録条件判定部２１にて判定するこ
とで、スコアの悪い誤った部分単語系列が使用者登録単
語辞書１３に登録されるのを自動的に防止することがで
きる。Therefore, as described above, for the partial word sequence output from the partial word sequence generation unit 12, it is registered whether or not to be registered in the user registration word dictionary 13 based on the Viterbi score of the sequence. By making a determination in the condition determination unit 21, it is possible to automatically prevent an erroneous partial word sequence having a bad score from being registered in the user registration word dictionary 13.

【０１４２】なお、登録条件判定部２１での登録判定方
法、即ち部分単語系列生成部１２で生成された部分単語
系列を使用者登録単語辞書１３に登録するか否かを判定
する方法は種々考えられる。本発明ではその判定方法は
問わないが、例えば以下に述べるように部分単語系列の
個数で登録を制限することも可能である。Various methods can be considered for the registration judgment method in the registration condition judgment unit 21, that is, the method for judging whether or not to register the partial word sequence generated by the partial word sequence generation unit 12 in the user registration word dictionary 13. Can be In the present invention, the determination method does not matter, but registration can be limited by the number of partial word sequences, for example, as described below.

【０１４３】部分単語系列のビタビスコアが比較的大き
い単語系列が多数出現する場合がある。前記第１の実施
形態における部分単語系列生成部１２では、ｇ（Ｔ）よ
りも大きいビタビスコアを持つ部分単語系列が全て出力
される。In many cases, a number of word sequences having a relatively large Viterbi score of a partial word sequence appear. The partial word sequence generation unit 12 in the first embodiment outputs all partial word sequences having a Viterbi score larger than g (T).

【０１４４】しかし、メモリ（記憶領域）の制約が強い
装置の場合には、使用者登録単語辞書１３内の１単語が
占める領域をできるだけ小さくするために、単語当たり
の部分単語系列の個数を制限したい。However, in the case of a device having a strong memory (storage area) restriction, the number of partial word sequences per word is limited in order to make the area occupied by one word in the user registered word dictionary 13 as small as possible. Want to.

【０１４５】このような場合には、登録条件判定部２１
での登録判定を以下のようにする。但し、単語当たりの
最大部分単語系列数をＮとする。In such a case, the registration condition determination unit 21
The registration judgment in is as follows. Here, N is the maximum number of partial word sequences per word.

【０１４６】まず、部分単語系列の個数がＮ以下の場合
は全て使用者登録単語辞書１３に登録する。これに対
し、部分単語系列の個数がＮ以上の場合は、ビタビスコ
アの大きい順に部分単語系列を整列し、上位Ｎ個を使用
者登録単語辞書１３に登録する。First, when the number of partial word series is N or less, all are registered in the user registration word dictionary 13. On the other hand, when the number of partial word sequences is N or more, the partial word sequences are arranged in descending order of the Viterbi score, and the upper N words are registered in the user registration word dictionary 13.

【０１４７】これにより、使用者登録単語辞書１３に登
録される部分単語系列の個数を制限することができる。As a result, the number of partial word sequences registered in the user registered word dictionary 13 can be limited.

【０１４８】［第３の実施形態］前記第２の実施形態に
おいても述べたように、雑音等の影響で、非音声区間ま
で含めて部分単語系列に変換してしまった場合、認識処
理時の主音声認識部（１４）での認識性能は低下してし
まう。[Third Embodiment] As described in the second embodiment, when the data is converted into a partial word sequence including a non-speech section due to the influence of noise or the like, the recognition process is not performed. Recognition performance in the main voice recognition unit (14) is reduced.

【０１４９】多くの場合、部分単語ＨＭＭは雑音に対し
てはうまく照合できないため、そのような部分単語系列
のビタビスコアは小さい値になるのだが、雑音の種類に
よっては部分単語ＨＭＭとたまたま照合してしまい、部
分単語系列のビタビスコアが大きくなってしまう場合が
ある。このような場合、前記第２の実施形態で適用した
登録条件判定部２１による登録判定では、誤った部分単
語系列の登録を防止することができない。In many cases, the partial word HMM cannot be successfully matched against noise, so that the Viterbi score of such a partial word sequence is small. However, depending on the type of noise, the partial word HMM is accidentally matched with the partial word HMM. As a result, the Viterbi score of the partial word sequence may increase. In such a case, the registration determination by the registration condition determination unit 21 applied in the second embodiment cannot prevent registration of an erroneous partial word sequence.

【０１５０】しかし、部分単語系列は、使用した話者が
発声した音声に相当する部分単語系列に概ね対応するた
め、このような誤りは使用者が部分単語系列を確認する
ことで発見できる。However, since the partial word sequence generally corresponds to the partial word sequence corresponding to the voice uttered by the speaker used, such an error can be found by the user confirming the partial word sequence.

【０１５１】第３の実施形態は、このような誤った部分
単語系列の登録を、使用者が確認することによって防止
する機構を実現するものである。The third embodiment implements a mechanism for preventing such erroneous registration of a partial word sequence by confirming it by a user.

【０１５２】図１５は、本発明の第３の実施形態を示す
サブワード型不特定話者音声認識装置のブロック構成図
であり、図１４と同一部分には同一符号を付してある。FIG. 15 is a block diagram of a sub-word type speaker-independent speech recognition apparatus according to a third embodiment of the present invention. In FIG. 15, the same parts as those in FIG.

【０１５３】図１５の構成の特徴は、図１４の構成にお
いて登録条件判定部２１に代えて登録確認部３１を用い
ている点、つまり図１の構成に対して登録確認部３１が
追加されている点にある。この登録確認部３１には、部
分単語系列生成部１２から出力される部分単語系列が送
られる。The feature of the configuration of FIG. 15 is that the registration confirmation unit 31 is used instead of the registration condition determination unit 21 in the configuration of FIG. 14, that is, the registration confirmation unit 31 is added to the configuration of FIG. There is in the point. The partial word sequence output from the partial word sequence generation unit 12 is sent to the registration confirmation unit 31.

【０１５４】登録確認部３１は、図１６に示すように、
部分単語系列文字列変換部３１１、部分単語系列表示文
字列対応表３１２、使用者操作部３１３、文字列表示処
理部３１４、表示器３１５、及び使用者操作判定部３１
６から構成される。As shown in FIG. 16, the registration confirmation unit 31
Partial word sequence character string conversion unit 311, partial word sequence display character string correspondence table 312, user operation unit 313, character string display processing unit 314, display 315, and user operation determination unit 31
6 is comprised.

【０１５５】部分単語系列文字列変換部３１１は、部分
単語系列生成部１２が出力した部分単語系列を、使用者
に分かりやすい系列の文字列に変換する。本実施形態で
は、使用者が確認しやすいように平仮名に変換して表示
する例を述べる。この部分単語系列文字列変換部３１１
は、図１７に示すように、部分単語系列平仮名変換部３
１１ａ及び部分単語系列平仮名対応表３１１ｂから構成
される。The partial word sequence character string conversion unit 311 converts the partial word sequence output from the partial word sequence generation unit 12 into a character string of a sequence that is easy for the user to understand. In the present embodiment, an example will be described in which a hiragana is displayed after being converted into hiragana so that the user can easily confirm it. This partial word series character string conversion unit 311
Is, as shown in FIG. 17, the partial word sequence Hiragana conversion unit 3
11a and a partial word sequence hiragana correspondence table 311b.

【０１５６】部分単語系列平仮名対応表３１１ｂの一例
を図１８に示す。図１８の例では、部分単語系列平仮名
対応表３１１ｂには、平仮名に変換できる部分単語系列
と対応する平仮名（の文字コード）とが組になって登録
されている。FIG. 18 shows an example of the partial word sequence Hiragana correspondence table 311b. In the example of FIG. 18, a partial word sequence that can be converted to hiragana and a corresponding hiragana (character code) are registered as a set in the partial word sequence hiragana correspondence table 311b.

【０１５７】部分単語系列平仮名変換部３１１ａは、図
１９のフローチャートに従って次のように動作する。The partial word sequence Hiragana converter 311a operates as follows according to the flowchart of FIG.

【０１５８】まず、平仮名変換部３１１ａは、部分単語
系列生成部１２から送られる部分単語系列を入力する
（ステップＳ３１）。この部分単語系列をＸ＝［ｘ(1)
，ｘ(2) ，…，ｘ(J(X))］とする。また、変換の結果
得られる文字列（結果文字列）をＳとする。First, the hiragana conversion unit 311a inputs the partial word sequence sent from the partial word sequence generation unit 12 (step S31). X = [x (1)
, X (2), ..., x (J (X))]. A character string (result character string) obtained as a result of the conversion is defined as S.

【０１５９】次に平仮名変換部３１１ａは、変換バッフ
ァａ及び結果文字列バッファＳを空にし、部分単語系列
Ｘ内の部分単語を指すポインタｉを１に初期設定する
（ステップＳ３１，Ｓ３２）。Next, the hiragana conversion section 311a empties the conversion buffer a and the result character string buffer S, and initializes a pointer i pointing to a partial word in the partial word sequence X to 1 (steps S31, S32).

【０１６０】次に平仮名変換部３１１ａは、ｉ＝１〜ｉ
＝J(X)まで、以下に述べるステップＳ３５〜Ｓ４０を繰
り返し、ｉがJ(X)を越えたならば（ステップＳ３４）、
一連の処理を終了する。Next, the hiragana conversion unit 311a calculates i = 1 to i
= J (X) until steps S35 to S40 described below are repeated. If i exceeds J (X) (step S34),
A series of processing ends.

【０１６１】即ち平仮名変換部３１１ａは、ｉがJ(X)以
下の場合には（ステップＳ３４）、まず部分単語系列Ｘ
内のｉ番目の部分単語ｘ(i) を変換バッファａに追加す
る（ステップＳ３５）。That is, if i is equal to or less than J (X) (step S34), the hiragana conversion unit 311a firstly sets the partial word sequence X
Is added to the conversion buffer a (step S35).

【０１６２】次に平仮名変換部３１１ａはｉを１増加す
る。Next, the hiragana converter 311a increases i by one.

【０１６３】次に文字列変換部３１１は、変換バッファ
ａ内の部分単語系列（または部分単語）と等しい部分単
語系列（または部分単語）を部分単語系列平仮名対応表
３１１ｂから探し（ステップＳ３７）、見つかった場合
には（ステップＳ３８）、ステップＳ３９に進む。これ
に対して見つからなかった場合には、ステップＳ３４に
戻る。Next, the character string converter 311 searches the partial word sequence Hiragana correspondence table 311b for a partial word sequence (or partial word) equal to the partial word sequence (or partial word) in the conversion buffer a (step S37). If found (step S38), the process proceeds to step S39. On the other hand, if not found, the process returns to step S34.

【０１６４】ステップＳ３９では、変換バッファａ内の
部分単語系列に対応する平仮名文字列を部分単語系列平
仮名対応表３１１ｂから取得して結果文字列バッファＳ
に追加し、当該バッファａの内容を消去した後（ステッ
プＳ４０）、ステップＳ３４に戻る。In step S39, a hiragana character string corresponding to the partial word sequence in the conversion buffer a is obtained from the partial word sequence hiragana correspondence table 311b, and the resulting character string buffer S
After the contents of the buffer a are deleted (step S40), the process returns to step S34.

【０１６５】以上の動作をｉ＝１〜ｉ＝J(X)まで繰り返
すことにより、部分単語系列Ｘに対する平仮名文字列へ
の変換が終了し、結果文字列バッファＳに変換結果（平
仮名文字列）が得られる。By repeating the above operation from i = 1 to i = J (X), the conversion of the partial word sequence X into the hiragana character string is completed, and the result of the conversion into the result character string buffer S (the hiragana character string) Is obtained.

【０１６６】部分単語系列文字列変換部３１１（内の平
仮名変換部３１１ａ）により部分単語系列から変換され
た平仮名文字列は当該部分単語系列と対にして部分単語
系列表示文字列対応表３１２に登録されると同時に、文
字列表示処理部３１４に送られる。この部分単語系列表
示文字列対応表３１２における登録例を図２０に示す。The hiragana character string converted from the partial word sequence by the partial word sequence character string conversion unit 311 (of which the hiragana conversion unit 311a is included) is registered in the partial word sequence display character string correspondence table 312 in pairs with the partial word sequence. At the same time, it is sent to the character string display processing unit 314. FIG. 20 shows an example of registration in the partial word series display character string correspondence table 312.

【０１６７】図１６中の使用者操作部３１３は、「カー
ソル上移動」を意味するキースイッチ、「カーソル下移
動」を意味するキースイッチ、「肯定」（ここでは「登
録する」）を意味するキースイッチ、「否定」（ここで
は「登録しない」）を意味するキースイッチ（いずれも
図示せず）を持つ。いずれかのキースイッチが押される
と、その操作情報が文字列表示処理部３１４に出力され
る。The user operation unit 313 in FIG. 16 indicates a key switch for “move up the cursor”, a key switch for “move down the cursor”, and “yes” (here, “register”). A key switch has a key switch (neither is shown) that means "deny"("notregistered" here). When any key switch is pressed, the operation information is output to the character string display processing unit 314.

【０１６８】文字列表示処理部３１４は、単語登録時に
は、使用者の指定した単語についての単語登録確認画面
を表示器３１５に表示し、その画面上に部分単語系列文
字列変換部３１１から変換出力される文字列（平仮名文
字列）を表示する。At the time of word registration, the character string display processing section 314 displays a word registration confirmation screen for the word specified by the user on the display 315, and displays the conversion output from the partial word series character string conversion section 311 on the screen. Character string (Hiragana character string) to be displayed.

【０１６９】図２１に単語登録確認画面の表示例を示
す。この単語登録確認画面には、部分単語系列文字列変
換部３１１から出力される各文字列の表示欄（文字列表
示欄）２１１毎に、その文字列を登録するか否かの指示
を入力するための登録指示欄２１２が設けられると共
に、各登録指示欄２１２の入力内容を決定するためのも
う１つの登録指示欄（決定欄）２１３が設けられる。こ
れら登録指示欄２１２及び２１３は、登録指示入力フィ
ールド２１４をなす。また、単語登録確認画面には、登
録指示入力フィールド２１４内を上下に移動可能なカー
ソル２１５が表示される。FIG. 21 shows a display example of the word registration confirmation screen. In this word registration confirmation screen, an instruction as to whether or not to register the character string is input for each character string display field (character string display field) 211 output from the partial word sequence character string conversion unit 311. A registration instruction column 212 is provided, and another registration instruction column (decision column) 213 for determining the input content of each registration instruction column 212 is provided. These registration instruction columns 212 and 213 form a registration instruction input field 214. A cursor 215 that can move up and down in the registration instruction input field 214 is displayed on the word registration confirmation screen.

【０１７０】文字列表示処理部３１４は、使用者操作部
３１３から使用者の操作情報を受け取り、それに応じ
て、以下のように表示を変化させる。The character string display processing unit 314 receives the user's operation information from the user operation unit 313, and changes the display as follows in accordance with the operation information.

【０１７１】「カーソル上移動」の場合、今カーソル２
１５がある行より上に登録指示欄２１２があれば、カー
ソル２１５を１つ上の登録指示欄２１２に移す。In the case of "move on cursor", the cursor 2
If the registration instruction column 212 is located above the line 15, the cursor 215 is moved to the registration instruction column 212 immediately above.

【０１７２】「カーソル下移動」の場合、今カーソル２
１５がある行より下に登録指示欄２１２または２１３が
あれば、カーソル２１５を１つ下の登録指示欄２１２ま
たは２１３に移す。移動先が登録指示欄２１３、即ち決
定欄２１３のときは、「登録する」のマークが付いてい
る平仮名列を使用者操作判定部３１６に出力し、動作を
終了する。In the case of "move under cursor", the cursor 2
If there is a registration instruction column 212 or 213 below the line 15, the cursor 215 is moved to the registration instruction column 212 or 213 immediately below. If the destination is the registration instruction column 213, that is, the determination column 213, the hiragana string marked with “register” is output to the user operation determination unit 316, and the operation is terminated.

【０１７３】「肯定」の場合、今カーソル２１５がある
登録指示欄２１２に「登録する」のマーク（ここでは、
○印）を付ける。In the case of “yes”, the “register” mark (here, “register”) is registered in the registration instruction field 212 where the cursor 215 is located.
○ mark).

【０１７４】「否定」の場合、今カーソル２１５がある
登録指示欄２１２に「登録しない」のマーク（ここで
は、×印）を付ける。[0174] In the case of "No", a mark of "do not register" (here, x mark) is added to the registration instruction column 212 where the cursor 215 is located.

【０１７５】図２１の表示例は、使用者が単語「社員」
の登録を要求し、単語「社員」に対応して「しゃいん」
と発声した際に、雑音の影響で部分単語系列生成部１２
から「ｇ，ａ，ｂ，ａ，ｓｙ，ａ，ｉ，Ｎ」「ｇ，ａ，
ｄ，ａ，ｓｙ，ａ，ｉ，Ｎ」「ｇ，ａ，ｂ，ａ，ｓｙ，
ａ，ｉ，ｇ，ｕ」という３つの部分単語系列が出力され
た場合の単語登録確認画面を示したものである。それぞ
れの部分単語系列は、登録確認部３１内の部分単語系列
文字列変換部３１１にて平仮名文字列に変換され、「が
ばしゃいん」「がだしゃいん」「がばしゃいぐ」が文字
列表示欄２１１に表示されている。ここでは、「がばし
ゃいん」が表示された文字列表示欄２１１に対応する登
録指示欄２１２に「登録する」を意味する「○印」が表
示され、「がだしゃいん」が表示された文字列表示欄２
１１に対応する登録指示欄２１２に、「登録する」また
は「登録しない」を選択指定するために、カーソル２１
５が移動されている。In the display example of FIG. 21, the user uses the word “employee”.
Request for registration, and in response to the word "employee", "shain"
Is generated, the partial word sequence generation unit 12
From "g, a, b, a, sy, a, i, N", "g, a,
d, a, sy, a, i, N "," g, a, b, a, sy,
3 shows a word registration confirmation screen when three partial word sequences “a, i, g, u” are output. Each partial word sequence is converted to a hiragana character string by a partial word sequence character string conversion unit 311 in the registration confirmation unit 31, and “gabashan”, “gadashain”, and “gabashaig” are converted. It is displayed in the character string display field 211. Here, “O” that means “register” is displayed in the registration instruction column 212 corresponding to the character string display column 211 in which “Gabashain” is displayed, and “Gadashain” is displayed. Character string display field 2
In the registration instruction column 212 corresponding to “11”, the cursor 21 is used to select and specify “register” or “not register”.
5 has been moved.

【０１７６】使用者操作判定部３１６は、カーソル２１
５が決定欄２１３に入った結果、文字列表示処理部３１
４から出力される文字列、即ち使用者により「登録す
る」ことが指定された文字列を、（図２０に示したよう
な）部分単語系列表示文字列対応表３１２を用いて部分
単語系列に変換し、その部分単語系列を使用者登録単語
辞書１３に登録する。The user operation judging section 316 sets the cursor 21
As a result, the character string display processing unit 31
The character string output from No. 4, ie, the character string designated to be “registered” by the user, is converted into a partial word sequence using the partial word sequence display character string correspondence table 312 (as shown in FIG. 20). After conversion, the partial word sequence is registered in the user registration word dictionary 13.

【０１７７】なお、図２１の表示例では、表示文字列が
画面の表示幅内に収まっているが、表示文字列が画面の
表示幅よりも長い場合には、左右にスクロールする機溝
を設けるか、もしくは複数行に折り畳んで表示すればよ
い。また、表示文字列の個数が画面の行数よりも多い場
合には、上下にスクロールする機構を設ければよい。In the display example of FIG. 21, the display character string is within the display width of the screen. However, when the display character string is longer than the display width of the screen, a groove for scrolling left and right is provided. Or, it may be folded and displayed on a plurality of lines. When the number of display character strings is larger than the number of lines on the screen, a mechanism for scrolling up and down may be provided.

【０１７８】このように本実施形態においては、使用者
が音声で入力した単語を、部分単語系列に変換してから
辞書登録を行うため、その辞書登録前に、部分単語系列
を使用者に分かりやすい系列の文字列（ここでは平仮名
文字列）に変換して使用者に提示することで、使用者は
音声登録しようとする内容、つまり部分単語系列生成部
１２から出力される部分単語系列が認識誤りのある系列
であるか否かを、（使用者にとって分かりにくい部分単
語系列のレベルではなくて）文字列のレベルで事前に確
認することができ、誤った部分単語系列が登録されるの
を阻止することができる。As described above, in the present embodiment, since a word input by the user by voice is converted into a partial word sequence and then registered in the dictionary, the partial word sequence is known to the user before the dictionary registration. By converting the character string into an easy-to-use character string (here, a hiragana character string) and presenting it to the user, the user recognizes the content to be voice-registered, that is, the partial word sequence output from the partial word sequence generation unit 12. It is possible to check in advance whether the sequence is erroneous at the character string level (instead of the level of the partial word sequence that is difficult for the user to understand), and to check that the incorrect partial word sequence is registered. Can be blocked.

【０１７９】なお、部分単語系列を文字列に変換して使
用者に提示するのに、その文字列を表示する他に、その
文字列を表す音声を規則合成等により出力して使用者に
提示することも可能である。When the partial word sequence is converted to a character string and presented to the user, in addition to displaying the character string, a voice representing the character string is output by rule synthesis or the like and presented to the user. It is also possible.

【０１８０】［第４の実施形態］次に、本発明の第４の
実施形態について説明する。この第４の実施形態は、誤
った部分単語系列が出力された場合であっても、その誤
った部分単語系列（の少なくとも一部分）を使用者が簡
単な編集操作で正しい部分単語系列に修正することがで
き、これにより正しい部分単語系列の登録を可能とする
機構を実現するものである。[Fourth Embodiment] Next, a fourth embodiment of the present invention will be described. In the fourth embodiment, even when an erroneous partial word sequence is output, the user corrects the erroneous partial word sequence to a correct partial word sequence by a simple editing operation. This realizes a mechanism that enables registration of a correct partial word sequence.

【０１８１】図２２は、本発明の第４の実施形態を示す
サブワード型不特定話者音声認識装置のブロック構成図
であり、図１５と同一部分には同一符号を付してある。FIG. 22 is a block diagram of a sub-word type speaker-independent speech recognition apparatus according to a fourth embodiment of the present invention. The same parts as those in FIG. 15 are denoted by the same reference numerals.

【０１８２】図２２の構成の特徴は、図１５の構成にお
いて登録確認部３１に代えて登録編集部４１を用いてい
る点、つまり図１の構成に対して登録編集部４１が追加
されている点にある。この登録編集部４１には、部分単
語系列生成部１２から出力される部分単語系列が送られ
る。The feature of the configuration of FIG. 22 is that a registration editing unit 41 is used instead of the registration confirmation unit 31 in the configuration of FIG. 15, that is, the registration editing unit 41 is added to the configuration of FIG. On the point. The partial word sequence output from the partial word sequence generation unit 12 is sent to the registration editing unit 41.

【０１８３】登録編集部４１は、図２３に示すように、
（前記第３の実施形態における登録確認部３１の１構成
要素である、図１７の部分単語系列文字列変換部３１１
と同一構成の）部分単語系列文字列変換部４１１、使用
者操作部４１３、文字列表示処理部４１４、表示器４１
５、及び文字列部分単語系列変換部４１６から構成され
る。The registration / editing section 41, as shown in FIG.
(A partial word sequence character string conversion unit 311 in FIG. 17, which is a component of the registration confirmation unit 31 in the third embodiment.
(Having the same configuration as the above), a partial word sequence character string conversion unit 411, a user operation unit 413, a character string display processing unit 414, and a display unit 41
5 and a character string partial word sequence conversion unit 416.

【０１８４】使用者操作部４１３は、「カーソル上移
動」を意味するキースイッチ、「カーソル下移動」を意
味するキースイッチ、「カーソル左移動」を意味するキ
ースイッチ、「カーソル右移動」を意味するキースイッ
チ、「フィールド切り替え」を意味するキースイッチ、
「肯定」（「登録する」）を意味するキースイッチ、
「否定」（「登録しない」）を意味するキースイッチ、
文字の「削除」を意味するキースイッチ、各「平仮名文
字」に対応するキースイッチ（いずれも図示せず）を持
つ。いずれかのキースイッチが押されると、その操作情
報が文字列表示処理部４１４に出力される。The user operation unit 413 includes a key switch meaning “move cursor”, a key switch meaning “move cursor”, a key switch meaning “move cursor left”, and “move cursor right”. Key switch, which means "field switching",
A key switch that means "yes"("register")
Key switch for "negative"("do not register")
It has a key switch that means "delete" of a character, and a key switch (both not shown) corresponding to each "hiragana character". When any key switch is pressed, the operation information is output to the character string display processing unit 414.

【０１８５】文字列表示処理部４１４は、使用者の指定
した単語についての単語登録確認時には単語登録編集画
面を表示器４１５に表示し、その画面上に部分単語系列
生成部１２から出力される部分単語系列に対応する文字
列（平仮名文字列）を表示する。この部分単語系列に対
応する文字列は、部分単語系列文字列変換部４１１によ
る、図１７の構成の部分単語系列文字列変換部３１１と
同様の変換動作により、当該文字列変換部４１１から出
力されるものである。この文字列変換部４１１での部分
単語系列から文字列（平仮名文字列）への変換には、上
記部分単語系列文字列変換部３１１内の部分単語系列平
仮名対応表３１１ｂの内容（図１８参照）と同一内容の
部分単語系列平仮名対応表（図示せず）が用いられる。The character string display processing unit 414 displays a word registration edit screen on the display 415 when confirming the word registration of the word specified by the user, and displays the part output from the partial word sequence generation unit 12 on the screen. A character string (Hiragana character string) corresponding to the word series is displayed. The character string corresponding to the partial word sequence is output from the character string conversion unit 411 by the partial word sequence character string conversion unit 411 in the same conversion operation as the partial word sequence character string conversion unit 311 having the configuration shown in FIG. Things. The conversion from the partial word sequence to the character string (Hiragana character string) in the character string conversion unit 411 is performed by the contents of the partial word sequence Hiragana correspondence table 311b in the partial word sequence character string conversion unit 311 (see FIG. 18). A partial word sequence Hiragana correspondence table (not shown) having the same contents as the above is used.

【０１８６】図２４に単語登録編集画面の表示例を示
す。この単語登録編集画面には、部分単語系列文字列変
換部４１１から出力される各文字列を表示・編集するた
めの文字列編集フィールド２４１と、当該文字列編集フ
ィールド２４１上の文字列を登録するか否かの指示を入
力するための登録指示入力フィールド２４２とが設けら
れる。登録指示入力フィールド２４２は、文字列編集フ
ィールド２４１上の各文字列に対応して設けられる登録
指示欄２４３と、各登録指示欄２４３の入力内容を決定
するためのもう１つの登録指示欄（決定欄）２４４とか
らなる。また、単語登録編集画面には、文字列編集フィ
ールド２４１及び登録指示入力フィールド２４２内を移
動可能なカーソル２４５が表示される。FIG. 24 shows a display example of the word registration / edit screen. In this word registration edit screen, a character string edit field 241 for displaying and editing each character string output from the partial word sequence character string conversion unit 411 and a character string on the character string edit field 241 are registered. A registration instruction input field 242 is provided for inputting an instruction as to whether or not the instruction is valid. The registration instruction input field 242 includes a registration instruction column 243 provided corresponding to each character string on the character string editing field 241, and another registration instruction column (decision) for determining the input content of each registration instruction column 243. Column) 244. In addition, a cursor 245 that can move in the character string editing field 241 and the registration instruction input field 242 is displayed on the word registration editing screen.

【０１８７】文字列表示処理部４１４は、使用者操作部
４１３から使用者の操作情報を受け取り、それに応じて
表示を変化させる。The character string display processing unit 414 receives user operation information from the user operation unit 413, and changes the display accordingly.

【０１８８】まず、カーソル２４５が登録指示入力フィ
ールド２４２内にある場合の動作は次の通りである。First, the operation when the cursor 245 is within the registration instruction input field 242 is as follows.

【０１８９】「カーソル上移動」の場合、今カーソル２
４５がある行より上に登録指示欄２４３があれば、カー
ソル２４５を１つ上の登録指示欄２４３に移す。In the case of "move on cursor", the cursor 2
If the registration instruction column 243 is located above the line 45, the cursor 245 is moved to the registration instruction column 243 immediately above.

【０１９０】「カーソル下移動」の場合、今カーソル２
４５がある行より下に登録指示欄２４３または２４４が
あれば、カーソル２４５を１つ下の登録指示欄２４３ま
たは２４４に移す。移動先が登録指示欄２４４、即ち決
定欄２４４のときは、「登録する」のマークが付いてい
る平仮名列を文字列部分単語系列変換部４１６に出力
し、動作を終了する。In the case of "move under cursor", the cursor 2
If there is a registration instruction column 243 or 244 below the 45 line, the cursor 245 is moved to the registration instruction column 243 or 244 immediately below. When the moving destination is the registration instruction column 244, that is, the decision column 244, the hiragana string marked with “register” is output to the character string partial word sequence conversion unit 416, and the operation is terminated.

【０１９１】「肯定」の場合、今カーソル２４５がある
登録指示欄２４３に「登録する」のマーク（ここでは、
○印）を付ける。In the case of “affirmation”, the “register” mark (here, “register”) is registered in the registration instruction field 243 where the cursor 245 is located.
○ mark).

【０１９２】「否定」の場合、今カーソル２４５がある
登録指示欄２４３に「登録しない」のマーク（ここで
は、×印）を付ける。[0192] In the case of "No", a mark of "not registered" (here, x mark) is added to the registration instruction field 243 where the cursor 245 is located.

【０１９３】「フィールド切り替え」の場合、カーソル
２４５を文字列編集フィールド２４１上の対応する文字
列の先頭位置に移動する。In the case of "field switching", the cursor 245 is moved to the head position of the corresponding character string on the character string editing field 241.

【０１９４】それ以外の場合は無視する。In other cases, this is ignored.

【０１９５】一方、カーソル２４５が文字列編集フィー
ルド２４１内にある場合の動作は次の通りである。On the other hand, the operation when the cursor 245 is within the character string editing field 241 is as follows.

【０１９６】「カーソル右移動」の場合、今カーソル２
４５がある文字の右隣に文字があればカーソル２４５を
１文字右に移す。In the case of "move cursor right", the cursor 2
If the character 45 is to the right of the character, the cursor 245 is moved one character to the right.

【０１９７】「カーソル左移動」の場合、今カーソル２
４５がある文字の左隣に文字があればカーソル２４５を
１文字左に移す。In the case of "move cursor left", the cursor 2
If there is a character 45 to the left of a character, the cursor 245 is moved one character to the left.

【０１９８】「削除」の場合、今カーソル２４５がある
文字を消し、そこから右側にある文字を全て１文字ずつ
左に詰める。In the case of "delete", the character on which the cursor 245 is located is erased, and all the characters on the right side are shifted to the left by one character.

【０１９９】「平仮名文字」のいずれかの場合、今カー
ソル２４５がある文字から右を全て１文字ずつ右にずら
し、空いた位置（もともとカーソル２４５があった位
置）にその平仮名文字を表示する。In the case of any of the "hiragana characters", the right is shifted one character at a time from the character where the cursor 245 is present to the right, and the hiragana character is displayed at an empty position (the position where the cursor 245 was originally).

【０２００】「フィールド切り替え」の場合、カーソル
２４５を登録指示入力フィールド２４２にある、対応す
る登録指示欄２４３に移動する。In the case of “field switching”, the cursor 245 is moved to the corresponding registration instruction column 243 in the registration instruction input field 242.

【０２０１】それ以外の場合は無視する。In other cases, this is ignored.

【０２０２】図２４の表示例は、前記第３の実施形態と
同様に、使用者が単語「社員」の登録を要求し、単語
「社員」に対応して「しゃいん」と発声した際に、雑音
の影響で部分単語系列生成部１２から「ｇ，ａ，ｂ，
ａ，ｓｙ，ａ，ｉ，Ｎ」「ｇ，ａ，ｄ，ａ，ｓｙ，ａ，
ｉ，Ｎ」「ｇ，ａ，ｂ，ａ，ｓｙ，ａ，ｉ，ｇ，ｕ」と
いう３つの部分単語系列が出力された場合の単語登録編
集画面を示したものである。それぞれの部分単語系列
は、登録編集部４１内の部分単語系列文字列変換部４１
１にて平仮名文字列に変換され、「がばしゃいん」「が
だしゃいん」「がばしゃいぐ」が文字列編集フィールド
２４１に表示されている。ここでは、表示文字列「がば
しゃいん」に対応する登録指示欄２４３に「登録しな
い」を意味する「×印」が表示され、表示文字列「がだ
しゃいん」中の「だ」の位置にカーソル２４５が移動さ
れている。The display example of FIG. 24 is similar to that of the third embodiment when the user requests registration of the word “employee” and utters “shain” in response to the word “employee”. , And “g, a, b,
a, sy, a, i, N "," g, a, d, a, sy, a,
9 shows a word registration / edit screen when three partial word sequences of “i, N” and “g, a, b, a, sy, a, i, g, u” are output. Each partial word sequence is stored in the partial word sequence character string conversion unit 41 in the registration / editing unit 41.
The character string is converted into a hiragana character string in step 1, and “GABA SHAIN”, “GADA SHAIN”, and “GABA SHAIGU” are displayed in the character string edit field 241. Here, “x”, which means “do not register”, is displayed in the registration instruction column 243 corresponding to the display character string “gabashan”, and “da” in the display character string “gadashain” is displayed. The cursor 245 has been moved to the position.

【０２０３】この状態で、使用者が使用者操作部４１３
を操作して「削除」キースイッチを押すと、文字列表示
処理部４１４は文字列「がだしゃいん」中の「だ」を削
除する。これにより、文字列「がだしゃいん」は「がし
ゃいん」となる。更に、使用者がカーソル２４５を「が
しゃいん」中の「が」の位置に移動させて、「削除」キ
ースイッチを押すと、文字列表示処理部４１４は文字列
「がしゃいん」中の「が」を削除する。このようにし
て、文字列「がばしゃいん」を文字列編集フィールド上
で編集して、図２５に示すように単語「社員」の入力音
声に対する正しい平仮名文字列「しゃいん」に修正する
ことができる。In this state, the user operates the user operation unit 413
Is operated and the "delete" key switch is pressed, the character string display processing unit 414 deletes "da" in the character string "gadashiin". As a result, the character string “gadashiin” becomes “gadashiin”. Further, when the user moves the cursor 245 to the position of “ga” in “gasha” and presses the “delete” key switch, the character string display processing unit 414 causes the character string “gashain” to be displayed. Delete "ga". In this way, the character string "gabashan" is edited on the character string edit field, and corrected to the correct hiragana character string "shain" for the input voice of the word "employee" as shown in FIG. Can be.

【０２０４】この状態で、使用者が「フィールド切り替
え」キースイッチを押すと、カーソル２４５は文字列
「しゃいん」に対応する登録指示欄２４３に移動され
る。更に使用者が「肯定」キースイッチを押すと、図２
５に示すように、文字列「しゃいん」に対応する登録指
示欄２４３に「登録する」のマーク（○）が表示され
る。In this state, when the user presses the "field switching" key switch, the cursor 245 is moved to the registration instruction column 243 corresponding to the character string "shain". Further, when the user presses the “YES” key switch, FIG.
As shown in FIG. 5, a mark (○) of “register” is displayed in the registration instruction column 243 corresponding to the character string “shain”.

【０２０５】また、使用者がカーソル２４５を文字列編
集フィールド２４１内の文字列「がばしゃいぐ」に対応
する登録指示欄２４３に移動させて、「否定」キースイ
ッチを押すと、当該登録指示欄２４３に図２５に示すよ
うに「登録しない」のマーク（×）が表示される。When the user moves the cursor 245 to the registration instruction column 243 corresponding to the character string “gabashiaigu” in the character string editing field 241, and presses the “negation” key switch, the registration starts. As shown in FIG. 25, a mark (×) of “do not register” is displayed in the instruction column 243.

【０２０６】この状態で、使用者がカーソル２４５を決
定欄２４４に移動させると、文字列表示処理部４１４は
「登録する」のマーク（○）が付いている平仮名列「し
ゃいん」を文字列部分単語系列変換部４１６に出力する
文字列部分単語系列変換部４１６は、前記第３の実施形
態における部分単語系列文字列変換部４１１とは逆の動
作により、文字列表示処理部４１４から出力された平仮
名文字列を部分単語系列に変換し、それを使用者登録単
語辞書１３に登録する。In this state, when the user moves the cursor 245 to the decision box 244, the character string display processing unit 414 changes the hiragana string “Shain” with the mark “O” to the character string. The character string partial word sequence conversion unit 416 that is output to the partial word sequence conversion unit 416 is output from the character string display processing unit 414 by the reverse operation of the partial word sequence character string conversion unit 411 in the third embodiment. The hiragana character string is converted into a partial word sequence and registered in the user registration word dictionary 13.

【０２０７】ここで、文字列部分単語系列変換部４１６
の詳細を説明する。Here, character string partial word series conversion section 416
Will be described in detail.

【０２０８】文字列部分単語系列変換部４１６は、図２
６に示すように、平仮名部分単語系列変換部４１６ａ、
及び部分単語系列平仮名対応表４１６ｂから構成され
る。この部分単語系列平仮名対応表４１６ｂの内容は、
部分単語系列文字列変換部４１１内の図示せぬ部分単語
系列平仮名対応表の内容と同一、つまり図１７の構成の
部分単語系列文字列変換部３１１内の部分単語系列平仮
名対応表３１１ｂの内容（図１８参照）と同一である。
したがって、部分単語系列文字列変換部４１１と文字列
部分単語系列変換部４１６とで、部分単語系列平仮名対
応表を共有することも可能である。The character string partial word sequence conversion section 416 performs the processing shown in FIG.
As shown in FIG. 6, a hiragana partial word sequence conversion unit 416a,
And a partial word sequence Hiragana correspondence table 416b. The contents of the partial word sequence Hiragana correspondence table 416b are as follows.
The contents of the partial word sequence hiragana correspondence table 311b in the partial word sequence character string conversion unit 311 having the configuration shown in FIG. 18 (see FIG. 18).
Therefore, the partial word sequence hiragana correspondence table can be shared by the partial word sequence character string conversion unit 411 and the character string partial word sequence conversion unit 416.

【０２０９】文字列部分単語系列変換部４１６内の平仮
名部分単語系列変換部４１６ａによる平仮名文字列から
部分単語系列への変換動作は次のように行われる。The operation of converting a hiragana character string into a partial word sequence by the hiragana partial word sequence conversion unit 416a in the character string partial word sequence conversion unit 416 is performed as follows.

【０２１０】まず、平仮名文字列をＳとし、長さをＪ
(S) 、ｉ文字目の平仮名をＳ(i) で表す。変換の結果得
られる部分単語系列（のバッファ）をＸとする。First, let the hiragana character string be S and the length be J
(S), the i-th hiragana is represented by S (i). Let X be (the buffer of) the partial word sequence obtained as a result of the conversion.

【０２１１】（１）Ｘを空にする。(1) X is emptied.

【０２１２】（２）ｉを１からＪ(S) まで１ずつ増加さ
せ（３）を繰り返し実行する。(2) i is incremented by 1 from 1 to J (S), and (3) is repeatedly executed.

【０２１３】（３）部分単語系列平仮名対応表４１６ｂ
から、平仮名Ｓ(i) に対応する部分単語系列を探し、そ
の部分単語系列をＸに追加する。(3) Partial word sequence Hiragana correspondence table 416b
, A partial word sequence corresponding to Hiragana S (i) is searched, and the partial word sequence is added to X.

【０２１４】このように本実施形態においては、使用者
が音声で入力した単語を、部分単語系列に変換してから
辞書登録を行うことから、その辞書登録前に、部分単語
系列を使用者に分かりやすい系列の文字列（ここでは平
仮名文字列）に変換して使用者に提示して、使用者によ
る文字列の編集操作に供することによって、使用者は音
声登録しようとする内容、つまり部分単語系列生成部１
２から出力される部分単語系列が認識誤りのある系列で
あっても、それを事前に確認して（使用者にとって分か
りにくい部分単語系列のレベルではなくて）文字列のレ
ベルで正しいものに修正することができる。しかも、修
正後の文字列を自動的に部分単語系列に変換して登録す
ることができる。したがって、本実施形態においては、
音声登録した内容を文字列編集によって編集できるとい
える。As described above, in the present embodiment, a word input by the user by voice is converted into a partial word sequence and then registered in the dictionary. Therefore, before the dictionary is registered, the partial word sequence is transmitted to the user. The contents are converted into an easy-to-understand series of character strings (here, hiragana character strings), presented to the user, and provided to the user for editing the character strings. Sequence generation unit 1
Even if the partial word sequence output from 2 is a sequence with a recognition error, check it in advance and correct it at the character string level (instead of the partial word sequence level that is difficult for the user to understand). can do. Moreover, the corrected character string can be automatically converted into a partial word sequence and registered. Therefore, in this embodiment,
It can be said that the registered contents can be edited by character string editing.

【０２１５】［第５の実施形態］次に、本発明の第５の
実施形態について説明する。[Fifth Embodiment] Next, a fifth embodiment of the present invention will be described.

【０２１６】以上に述べた実施形態、例えば第１の実施
形態において、使用者が単語を漢字表記の読み通りに登
録している場合には問題は少ない。しかし、独自の読み
・略称で登録している場合には、使用者本人がどのよう
に登録したかを忘れてしまう虞があり、その場合には問
題となる。また、複数の使用者が１台の音声認識装置を
利用する場合、他の使用者が登録した内容が分からない
という問題もある。したがって、音声登録内容を使用者
が確認できるようにことは認識装置の維持、管理の上で
非常に有用である。また、文字登録できる別の音声認識
装置がある場合には、確認した（表示された）文字列を
その音声認識装置に登録し直すことで、音声で登録した
登録内容を他の音声認識装置に容易にコピーできるよう
になる。In the embodiments described above, for example, the first embodiment, there are few problems when the user registers words as read in Chinese characters. However, in the case where registration is performed using an original reading / abbreviation, there is a possibility that the user himself / herself forgets how the registration is performed, and in that case, there is a problem. Further, when a plurality of users use one voice recognition device, there is a problem that the contents registered by other users cannot be understood. Therefore, enabling the user to confirm the registered voice content is very useful in maintaining and managing the recognition device. If there is another speech recognition device that can register characters, the confirmed (displayed) character string is re-registered in that speech recognition device, so that the registered contents by voice can be registered in another speech recognition device. You can copy easily.

【０２１７】第５の実施形態は、使用者登録単語辞書１
３の登録内容を利用者に分かりやすい形態で提示するこ
とを可能とする機構を実現するものである。In the fifth embodiment, the user registration word dictionary 1
3 realizes a mechanism that enables the registered contents to be presented to the user in an easy-to-understand form.

【０２１８】図２７は、本発明の第５の実施形態を示す
サブワード型不特定話者音声認識装置のブロック構成図
であり、図１と同一部分には同一符号を付してある。FIG. 27 is a block diagram showing a sub-word type speaker-independent speech recognition apparatus according to a fifth embodiment of the present invention. The same parts as those in FIG. 1 are denoted by the same reference numerals.

【０２１９】図２７の構成の特徴は、図１の構成に対し
て使用者単語登録辞書表示部５１が追加されている点に
ある。なお、図２７中の使用者登録単語辞書１３には、
図９に示したような登録形式、つまり単語名と対応す単
語を構成する部分単語系列の対が登録される登録形式が
用いられるものとする。The configuration of FIG. 27 is characterized in that a user word registration dictionary display section 51 is added to the configuration of FIG. The user registration word dictionary 13 in FIG.
It is assumed that a registration format as shown in FIG. 9, that is, a registration format in which a pair of partial word sequences forming a word corresponding to a word name is registered.

【０２２０】使用者単語登録辞書表示部５１は、使用者
登録単語辞書１３に登録された部分単語系列の情報を使
用者に分かりやすい系列の文字情報、例えば平仮名文字
列に変換して使用者に提示するものであり、図２８に示
すように、（前記第３の実施形態における部分単語系列
文字列変換部３１１と同一構成の）部分単語系列文字列
変換部５１１、文字列表示処理部５１４、及び表示器５
１５から構成される。[0220] The user word registration dictionary display section 51 converts the information of the partial word series registered in the user registration word dictionary 13 into character information of a series that is easy for the user to understand, for example, a hiragana character string and provides the user with the information. As shown in FIG. 28, as shown in FIG. 28, a partial word sequence character string conversion unit 511 (having the same configuration as the partial word sequence character string conversion unit 311 in the third embodiment), a character string display processing unit 514, And display 5
15 is comprised.

【０２２１】部分単語系列文字列変換部５１１は、使用
者登録単語辞書１３から単語名と部分単語系列の対を読
み出し、部分単語系列に対して前記部分単語系列文字列
変換部３１１と同様の変換を行い、使用者登録単語辞書
１３から読み出した単語名と、対応する部分単語系列を
変換して得られた文字列との対を、文字列表示処理部５
１４に出力する。The partial word sequence character string conversion unit 511 reads a pair of a word name and a partial word sequence from the user registered word dictionary 13 and converts the partial word sequence in the same manner as the partial word sequence character string conversion unit 311. And a pair of the word name read from the user registration word dictionary 13 and a character string obtained by converting the corresponding partial word sequence is converted into a character string display processing unit 5.
14 is output.

【０２２２】したがって、使用者登録単語辞書１３の内
容が図９のようになっている場合であれば、単語「社
員」と文字列「やいん」の対、単語「役員」と文字列
「やくいん」の対、そして単語「役員」と文字列「やぷ
いん」の対が文字列表示処理部５１４に出力される。Therefore, if the contents of the user registration word dictionary 13 are as shown in FIG. 9, the word “employee” and the character string “yain” are paired, and the word “officer” and the character string “yain” The pair of “Kin” and the pair of the word “officer” and the character string “Yain” are output to the character string display processing unit 514.

【０２２３】文字列表示処理部５１４は、部分単語系列
文字列変換部５１１から出力された単語名と文字列の対
を表示器５１５に一覧表示する。これにより使用者は、
使用者登録単語辞書１３の登録内容を容易に確認するこ
とができる。この表示例を図２９に示す。The character string display processing unit 514 displays a list of pairs of word names and character strings output from the partial word sequence character string conversion unit 511 on the display 515. This allows the user to
The registered contents of the user registered word dictionary 13 can be easily confirmed. This display example is shown in FIG.

【０２２４】なお、以上に述べた第５の実施形態では、
図１の構成に使用者単語登録辞書表示部５１を追加した
場合について説明したが、図２の構成、図１４の構成、
図１５の構成、または図２２の構成に使用者単語登録辞
書表示部５１を追加することも可能である。[0224] In the fifth embodiment described above,
Although the case where the user word registration dictionary display unit 51 is added to the configuration of FIG. 1 has been described, the configuration of FIG. 2, the configuration of FIG.
It is also possible to add a user word registration dictionary display unit 51 to the configuration of FIG. 15 or the configuration of FIG.

【０２２５】［第６の実施形態］次に、本発明の第６の
実施形態について説明する。[Sixth Embodiment] Next, a sixth embodiment of the present invention will be described.

【０２２６】前記第５の実施形態では、使用者登録単語
辞書１３の内容を使用者に分かりやすい形態で提示する
ことで、使用者は登録内容を容易に確認することができ
た。しかし、第５の実施形態では、部分単語系列生成部
１２で誤った部分単語系列が生成されて使用者登録単語
辞書１３に登録された場合には、それを編集（変更、削
除）する機能を持たないため、それが主音声認識部１４
の認識性能に悪影響を及ぼす虞がある。In the fifth embodiment, by presenting the contents of the user registration word dictionary 13 in a format that is easy for the user to understand, the user can easily confirm the registration contents. However, in the fifth embodiment, when an erroneous partial word sequence is generated by the partial word sequence generation unit 12 and registered in the user registration word dictionary 13, a function of editing (changing or deleting) the partial word sequence is provided. Because it does not have
May adversely affect the recognition performance.

【０２２７】そこで第６の実施形態は、使用者登録単語
辞書１３の登録内容が確認できるだけでなく、登録内容
が編集できる機構を実現するものである。Therefore, the sixth embodiment realizes a mechanism that can not only confirm the registered contents of the user registered word dictionary 13 but also edit the registered contents.

【０２２８】図３０は、本発明の第６の実施形態を示す
サブワード型不特定話者音声認識装置のブロック構成図
であり、図２７と同一部分には同一符号を付してある。FIG. 30 is a block diagram of a sub-word type speaker-independent speech recognition apparatus according to a sixth embodiment of the present invention. The same parts as those in FIG. 27 are denoted by the same reference numerals.

【０２２９】図３０の構成の特徴は、図２７の構成にお
いて使用者単語登録辞書表示部５１に代えて使用者単語
登録辞書編集部６１を用いている点、つまり図１の構成
に対して使用者単語登録辞書編集部６１が追加されてい
る点にある。The configuration of FIG. 30 is characterized in that a user word registration dictionary editing unit 61 is used instead of the user word registration dictionary display unit 51 in the configuration of FIG. In that a user word registration dictionary editing unit 61 is added.

【０２３０】使用者単語登録辞書編集部６１は、図３１
に示すように、（前記第３の実施形態における部分単語
系列文字列変換部３１１と同一構成の）部分単語系列文
字列変換部６１１、使用者操作部６１３、文字列表示処
理部６１４、表示器６１５、及び辞書操作部６１６から
構成される。The user word registration dictionary editing unit 61 reads the
As shown in the figure, a partial word sequence character string conversion unit 611 (having the same configuration as the partial word sequence character string conversion unit 311 in the third embodiment), a user operation unit 613, a character string display processing unit 614, a display 615 and a dictionary operation unit 616.

【０２３１】本実施形態で適用される図３０中の使用者
登録単語辞書１３には、図９に示したような登録形式、
つまり単語名と対応す単語を構成する部分単語系列の対
が登録される登録形式が用いられる他、図３２に示すよ
うに、それぞれの登録内容にユニークな番号（以下、単
語番号と称する）が付されているものとする。The user registration word dictionary 13 in FIG. 30 applied in the present embodiment has a registration format as shown in FIG.
That is, in addition to a registration format in which a pair of a partial word sequence forming a word corresponding to a word name is registered, a unique number (hereinafter, referred to as a word number) is assigned to each registered content as shown in FIG. Shall be attached.

【０２３２】使用者単語登録辞書編集部６１内の部分単
語系列文字列変換部６１１は、図３２に示す構造の使用
者登録単語辞書１３から単語名と部分単語系列の対を読
み出し、部分単語系列に対して前記部分単語系列文字列
変換部３１１と同様の変換を行い、その単語名に付され
ている単語番号と、その単語名と、対応する部分単語系
列を変換して得られた文字列との組を、文字列表示処理
部６１４に出力する。The partial word sequence character string conversion unit 611 in the user word registration dictionary editing unit 61 reads a pair of a word name and a partial word sequence from the user registered word dictionary 13 having the structure shown in FIG. Performs the same conversion as that of the partial word series character string conversion unit 311 to obtain the word number assigned to the word name, the word name, and the character string obtained by converting the corresponding partial word series. Is output to the character string display processing unit 614.

【０２３３】さて、使用者操作部６１３は、「カーソル
上移動」を意味するキースイッチ、「カーソル下移動」
を意味するキースイッチ、「カーソル左移動」を意味す
るキースイッチ、「カーソル右移動」を意味するキース
イッチ、「フィールド切り替え」を意味するキースイッ
チ、「項目の削除」を意味するキースイッチ、「項目の
変更」を意味するキースイッチ、文字の「削除」を意味
するキースイッチ、各「平仮名文字」に対応するキース
イッチ（いずれも図示せず）を持つ。いずれかのキース
イッチが押されると、その操作情報が文字列表示処理部
６１４に出力される。By the way, the user operation section 613 is provided with a key switch meaning "move up cursor" and a "move down cursor".
, A key switch meaning “cursor left”, a key switch meaning “cursor right”, a key switch meaning “field switching”, a key switch meaning “delete item”, “ It has a key switch that means "change of item", a key switch that means "delete" of characters, and a key switch corresponding to each "hiragana character" (none is shown). When any key switch is pressed, the operation information is output to the character string display processing unit 614.

【０２３４】文字列表示処理部６１４は、使用者登録単
語辞書１３の編集時には、使用者登録単語辞書編集画面
を表示器６１５に表示し、その画面上に部分単語系列文
字列変換部６１１から出力される単語名と文字列とを一
覧表示する。When editing the user-registered word dictionary 13, the character-string display processing unit 614 displays a user-registered word dictionary editing screen on the display 615, and outputs the partial-word-sequence character-string conversion unit 611 on the screen. List of word names and character strings to be used.

【０２３５】図３３に使用者登録単語辞書編集画面の表
示例を示す。この使用者登録単語辞書編集画面には、単
語名を表示するための単語名表示フィールド３３１と、
当該単語名表示フィールド３３１上の単語名と組をなし
て部分単語系列文字列変換部６１１から出力される文字
列を表示・編集するための文字列編集フィールド３３２
と、当該文字列編集フィールド３３２上の文字列に対す
る編集（ここでは、変更、削除）を行うか否かの指示等
を入力するための編集指示入力フィールド３３３とが設
けられる。編集指示入力フィールド３３３は、文字列編
集フィールド３３２上の各文字列に対応して設けられ、
編集指示内容（変更または削除）を入力するための編集
指示欄３３４と、各編集指示欄３３４の入力内容に従う
辞書操作部６１６による辞書操作を起動するためのもう
１つの編集指示欄（決定欄）３３５とからなる。また、
単語登録編集画面には、文字列編集フィールド３３２及
び編集指示入力フィールド３３３内を移動可能なカーソ
ル３３６が表示される。FIG. 33 shows a display example of the user registration word dictionary editing screen. The user registration word dictionary editing screen includes a word name display field 331 for displaying word names,
A character string edit field 332 for displaying and editing a character string output from the partial word sequence character string conversion unit 611 in combination with the word name on the word name display field 331.
And an edit instruction input field 333 for inputting an instruction on whether or not to edit (here, change or delete) the character string on the character string edit field 332. The edit instruction input field 333 is provided corresponding to each character string on the character string edit field 332,
An edit instruction column 334 for inputting edit instruction contents (change or deletion), and another edit instruction column (decision column) for starting a dictionary operation by the dictionary operation unit 616 according to the input contents of each edit instruction column 334. 335. Also,
On the word registration edit screen, a cursor 336 that can move in the character string edit field 332 and the edit instruction input field 333 is displayed.

【０２３６】文字列表示処理部６１４は、使用者操作部
６１３から使用者の操作情報を受け取り、それに応じて
表示を変化させる。The character string display processing unit 614 receives user operation information from the user operation unit 613 and changes the display accordingly.

【０２３７】まず、カーソル３３６が編集指示入力フィ
ールド３３３内にある場合の動作は次の通りである。First, the operation when the cursor 336 is within the editing instruction input field 333 is as follows.

【０２３８】「カーソル上移動」の場合、今カーソル３
３６がある行より上に編集指示欄３３４があれば、カー
ソル３３６を１つ上の編集指示欄３３４に移す。In the case of "move on cursor", the cursor 3
If the edit instruction column 334 is located above the line at which 36 is located, the cursor 336 is moved to the edit instruction column 334 one level higher.

【０２３９】「カーソル下移動」の場合、今カーソル３
３６がある行より下に編集指示欄３３４または３３５が
あれば、カーソル３３６を１つ下の編集指示欄３３４ま
たは３３５に移す。移動先が編集指示欄３３５、即ち決
定欄３３５のときは、「削除する」及び「変更する」の
マークがついている全ての項目について、マークの表す
操作内容、単語番号、単語名及び文字列の４個を組にし
て、辞書操作部６１６に出力し、動作を終了する。In the case of “move under cursor”, the cursor 3
If there is an edit instruction column 334 or 335 below the 36 line, the cursor 336 is moved to the edit instruction column 334 or 335 immediately below. When the moving destination is the edit instruction column 335, that is, the determination column 335, for all items marked with “delete” and “change”, the operation content, word number, word name, and character string represented by the mark are displayed. The set of four is output to the dictionary operation unit 616, and the operation is terminated.

【０２４０】「項目の変更」の場合、今カーソル３３６
がある編集指示欄３３４に「変更する」のマーク（ここ
では、○印）を付ける。In the case of "change of item", the cursor
A mark “change” (in this case, “○”) is added to a certain edit instruction column 334.

【０２４１】「項目の削除」の場合、今カーソル３３６
がある編集指示欄３３４に「削除する」のマーク（ここ
では、×印）を付け、カーソル３３６を文字列編集フィ
ールド３３２上の対応する文字列の先頭位置に移動す
る。In the case of “delete item”, the cursor 336
A mark of “delete” (here, “x”) is added to a certain edit instruction column 334, and the cursor 336 is moved to the head position of the corresponding character string on the character string edit field 332.

【０２４２】それ以外の場合は無視する。In other cases, this is ignored.

【０２４３】一方、カーソル３３６が文字列編集フィー
ルド３３２内にある場合の動作は次の通りである。On the other hand, the operation when the cursor 336 is within the character string editing field 332 is as follows.

【０２４４】「カーソル右移動」の場合、今カーソル３
３６がある文字の右隣に文字があればカーソル３３６を
１文字右に移す。In the case of “move cursor right”, the cursor 3
If there is a character 36 to the right of a character, the cursor 336 is moved one character to the right.

【０２４５】「カーソル左移動」の場合、今カーソル３
３６がある文字の左隣に文字があればカーソル３３６を
１文字左に移す。In the case of "cursor left movement", the cursor 3
If there is a character 36 to the left of a character, the cursor 336 is moved one character to the left.

【０２４６】「削除」の場合、今カーソル３３６がある
文字を消し、そこから右側にある文字を全て１文字ずつ
左に詰める。In the case of "delete", the character on which the cursor 336 is currently located is erased, and all the characters on the right side are shifted to the left by one character.

【０２４７】「平仮名文字」のいずれかの場合、今カー
ソル３３６がある文字から右を全て１文字ずつ右にずら
し、空いた位置（もともとカーソル３３６があった位
置）にその平仮名文字を表示する。In the case of any of the "hiragana characters", the right of the character at which the cursor 336 is now shifted by one character to the right, and the hiragana character is displayed at an empty position (the position where the cursor 336 was originally).

【０２４８】「フィールド切り替え」の場合、カーソル
３３６を編集指示入力フィールド３３３にある、対応す
る編集指示欄３３４に移動する。In the case of “field switching”, the cursor 336 is moved to the corresponding edit instruction column 334 in the edit instruction input field 333.

【０２４９】それ以外の場合は無視する。In all other cases, this is ignored.

【０２５０】図３３の表示例は、図３２に示した使用者
登録単語辞書１３の内容、つまり単語番号１の単語名
「社員」の部分単語系列「ｙ，ａ，ｉ，Ｎ」、単語番号
２の単語名「役員」の部分単語系列「ｙ，ａ，ｋ，ｕ，
ｉ，Ｎ」、単語番号３の単語名「役員」の部分単語系列
「ｙ，ａ，ｐ，ｕ，ｉ，Ｎ」に対応する文字列（平仮名
文字列）「やいん」「やくいん」「やぷいん」が、対応
する単語番号及び単語名と共に部分単語系列文字列変換
部６１１から出力された場合の使用者登録単語辞書編集
画面を示したものである。The display example of FIG. 33 shows the contents of the user registration word dictionary 13 shown in FIG. 32, that is, the partial word series “y, a, i, N” of the word name “employee” of word number 1, and the word number The partial word series "y, a, k, u,
i, N ", a character string (hiragana character string) corresponding to the partial word sequence" y, a, p, u, i, N "of the word name" officer "of word number 3" Yain "" Yakuin "" This shows a user registration word dictionary editing screen when “Yain” is output from the partial word sequence character string conversion unit 611 together with the corresponding word number and word name.

【０２５１】この状態で、使用者が使用者操作部６１３
のキースイッチを用いて適切な編集操作を行うことによ
って、例えば図３４のような使用者登録単語辞書編集画
面を得ることができる。In this state, the user operates the user operation unit 613.
By performing an appropriate editing operation using the key switch of, a user registration word dictionary editing screen such as that shown in FIG. 34 can be obtained.

【０２５２】図３４の画面は次のようにして得られる。The screen shown in FIG. 34 is obtained as follows.

【０２５３】まず図３３に示すように、文字列編集フィ
ールド３３２上の第１行の文字列「やいん」に対応する
決定欄３３５に「変更する」のマーク（○）を表示させ
る。すると、カーソル３３６が当該第１行の文字列「や
いん」の先頭文字「や」の位置に移動する。この状態
で、「削除」キースイッチを押して「や」を削除し、そ
のままの状態で「平仮名」キースイッチにより「「し」
「ゃ」と入力することで、文字列「やいん」を図３４の
ように「しゃいん」に訂正する。次に、カーソル３３６
を文字列編集フィールド３３２上の第３行の文字列「や
ぷいん」に対応する決定欄３３５に移動して、当該決定
欄３３５に「削除する」のマーク（×）を表示させる。
図３４は、このときの使用者登録単語辞書編集画面を示
している。First, as shown in FIG. 33, a mark (変更) of “change” is displayed in the decision column 335 corresponding to the character string “Yain” on the first line on the character string editing field 332. Then, the cursor 336 is moved to the position of the first character “Ya” of the character string “Yain” on the first line. In this state, press the “Delete” key switch to delete “Ya”, and then use the “Hiragana” key switch to change “
By inputting “@”, the character string “yain” is corrected to “shain” as shown in FIG. Next, the cursor 336
Is moved to the decision column 335 corresponding to the character string “Yain” on the third line on the character string edit field 332, and a mark (×) of “delete” is displayed in the decision column 335.
FIG. 34 shows the user registration word dictionary editing screen at this time.

【０２５４】この状態で、カーソルを決定欄３３５に移
動させると、文字列表示処理部６１４は「○」が付いて
いる行の情報、即ち「変更、単語番号１、社員、しゃい
ん」の組と、「×」が付いている行の情報、即ち「削
除、単語番号３、役員、やぷいん」の組とを辞書操作部
６１６に出力する。In this state, when the cursor is moved to the decision box 335, the character string display processing unit 614 sets the information of the line marked with “○”, that is, the set of “change, word number 1, employee, shashin”. And information of the line marked with “x”, that is, a set of “delete, word number 3, officer, yain” is output to the dictionary operation unit 616.

【０２５５】辞書操作部６１６は、文字列表示処理部６
１４から、操作内容、単語番号、単語名及び文字列から
なる情報組を受け取り、それに従って使用者登録単語辞
書１３を次のように操作する。The dictionary operation section 616 is a character string display processing section 6
An information set consisting of the operation content, word number, word name, and character string is received from 14, and the user registered word dictionary 13 is operated in accordance with the information set as follows.

【０２５６】まず、受け取った情報組中の操作内容が
「変更」であった場合、辞書操作部６１６は、当該組情
報中の単語番号を持つ項目の登録内容を使用者登録単語
辞書１３（図３２参照）から検索し、その登録内容の部
分単語系列の部分を、当該組情報中の文字列を部分単語
系列に変換したもので置き換える。したがって、当該情
報組の内容が上記した「変更、単語番号１、社員、しゃ
いん」の場合には、図３２から明らかなように、単語番
号１の項目の登録内容中の部分単語系列「ｙ，ａ，ｉ，
Ｎ」が「ｓｙ，ａ，ｉ，Ｎ」に置き換えられる。なお、
辞書操作部６１６による文字列から部分単語系列への変
換は、前記第４の実施形態における文字列部分単語系列
変換部４１６（内の平仮名部分単語系列変換部４１６
ａ）と同様にして行うことができる。First, when the operation content in the received information set is “change”, the dictionary operation section 616 stores the registered content of the item having the word number in the set information in the user registration word dictionary 13 (see FIG. 32), and replaces the part of the partial word series of the registered content with the character string in the set information converted to the partial word series. Therefore, in the case where the content of the information set is “change, word number 1, employee, employee”, as is clear from FIG. 32, the partial word series “y” in the registered content of the item of word number 1 , A, i,
N "is replaced with" sy, a, i, N ". In addition,
The conversion from the character string to the partial word sequence by the dictionary operation unit 616 is performed by the character string partial word sequence conversion unit 416 (in the hiragana partial word sequence conversion unit 416 in the fourth embodiment).
It can be performed in the same manner as in a).

【０２５７】次に、受け取った情報組中の操作内容が
「削除」であった場合、辞書操作部６１６は、当該組情
報中の単語番号を持つ項目の登録内容を使用者登録単語
辞書１３から検索し、その登録内容（項目）を使用者登
録単語辞書１３から削除する。したがって、当該情報組
の内容が上記した「削除、単語番号３、役員、やぷい
ん」の場合には、図３２から明らかなように、単語番号
３の項目が削除される。Next, when the operation content in the received information set is “delete”, the dictionary operation unit 616 reads the registered content of the item having the word number in the set information from the user registered word dictionary 13. Search and delete the registered contents (item) from the user registered word dictionary 13. Therefore, when the content of the information set is “deletion, word number 3, officer, yapain”, the item of word number 3 is deleted as is clear from FIG.

【０２５８】この結果、辞書操作部６１６での上記の操
作が終了した後の使用者登録単語辞書１３の登録内容
は、図３２の状態から図３５の状態に変わる。As a result, the registered contents of the user registered word dictionary 13 after the above-mentioned operation in the dictionary operating section 616 is changed from the state of FIG. 32 to the state of FIG.

【０２５９】なお、以上に述べた第６の実施形態で適用
した使用者単語登録辞書編集部６１は、図２の構成、図
１４の構成、図１５の構成、または図２２の構成にも同
様に適用可能である。The user word registration dictionary editing unit 61 applied in the sixth embodiment described above is similar to the configuration of FIG. 2, the configuration of FIG. 14, the configuration of FIG. 15, or the configuration of FIG. Applicable to

【０２６０】［第７の実施形態］次に、本発明の第７の
実施形態について説明する。[Seventh Embodiment] Next, a seventh embodiment of the present invention will be described.

【０２６１】従来のサブワード型不特定話者音声認識装
置では、単語の登録は読みを文字で入力することで行っ
ていた。特にシステム設計時に登録される単語（操作コ
マンドの一般的な呼称に対する読み）などは、システム
設計者が文字列で入力して登録することが多い。In the conventional sub-word type unspecified speaker's speech recognition device, the registration of a word is performed by inputting the reading in characters. In particular, words (reading for general names of operation commands) registered at the time of system design are often entered by a system designer as character strings and registered.

【０２６２】一方、以上に述べた実施形態、例えば第１
の実施形態で適用した音声での単語登録により実現され
る使用者登録単語辞書１３も部分単語で表現される。On the other hand, the embodiment described above, for example, the first embodiment
The user registration word dictionary 13 realized by the word registration by voice applied in the embodiment is also expressed by the partial words.

【０２６３】したがって、両者の単語登録情報の表現形
式を統一することによって、異なる手段（音声と文字）
で登録された辞書を区別することなく使用し、認識に用
いることが可能である。つまり、システム設計時に登録
した単語と、使用者が登録した単語を区別することなく
使用し、認識に用いることが可能である。Therefore, by unifying the expression form of both word registration information, different means (voice and character) can be used.
Can be used without distinction and used for recognition. In other words, words registered at the time of system design and words registered by the user can be used without distinction and used for recognition.

【０２６４】第７の実施形態は、システム設計時に登録
した単語と、使用者が登録した単語を区別することなく
使用することを可能とすることで、主音声認識部の構成
の簡略化を図るようにしたものである。The seventh embodiment makes it possible to use the words registered at the time of system design and the words registered by the user without distinguishing them, thereby simplifying the configuration of the main speech recognition unit. It is like that.

【０２６５】図３６は、本発明の第７の実施形態を示す
サブワード型不特定話者音声認識装置のブロック構成図
であり、図１と同一部分には同一符号を付してある。FIG. 36 is a block diagram showing a sub-word type speaker-independent speech recognition apparatus according to a seventh embodiment of the present invention. In FIG. 36, the same parts as those in FIG.

【０２６６】図３６の構成の特徴は、図１の構成に対し
て使用者登録単語辞書１３と同一の表現形式（登録形
式）の文字登録単語辞書７３が追加されている点と、図
１中の主音声認識部１４に代えて、使用者登録単語辞書
１３及び文字登録単語辞書７３の両単語辞書を用いて入
力音声の認識を行う主音声認識部７４を用いている点に
ある。The configuration of FIG. 36 is characterized in that a character registration word dictionary 73 having the same expression format (registration format) as the user registration word dictionary 13 is added to the configuration of FIG. The main voice recognition unit 74 that recognizes input voice using both the user registered word dictionary 13 and the character registered word dictionary 73 is used instead of the main voice recognition unit 14.

【０２６７】文字登録単語辞書７３は、例えば図３７の
ように、文字列部分単語系列変換部７５を用いて作成さ
れる。The character registration word dictionary 73 is created by using a character string partial word sequence conversion unit 75, for example, as shown in FIG.

【０２６８】文字列部分単語系列変換部７５は、前記第
４の実施形態における図２６の構成の文字列部分単語系
列変換部４１６と同様の変換機能を有しており、キーボ
ード等から入力された文字列（ここでは平仮名列）を、
図１８に示した部分単語系列平仮名対応表３１１ｂと同
様の内容の部分単語系列平仮名対応表（図示せず）をも
とに部分単語系列に変換する。The character string partial word sequence conversion unit 75 has the same conversion function as the character string partial word sequence conversion unit 416 of the fourth embodiment shown in FIG. 26, and is input from a keyboard or the like. String (here hiragana string)
It is converted into a partial word sequence based on a partial word sequence Hiragana correspondence table (not shown) having the same contents as the partial word sequence Hiragana correspondence table 311b shown in FIG.

【０２６９】また文字列部分単語系列変換部７５は、前
記第１の実施形態における単語ＨＭＭ生成部１２６と同
様の単語ＨＭＭ生成機能も有しており、図４６に示した
のと同様の部分単語ＨＭＭ辞書を用い、入力文字列から
変換した部分単語系列に従って当該部分単語ＨＭＭ辞書
に登録されている部分単語ＨＭＭ（のパラメータ）を連
結することで、入力文字列により構成される単語の単語
音声モデルとしての単語ＨＭＭ（のパラメータ）を生成
する。文字列部分単語系列変換部７５は、このようにし
て生成した単語ＨＭＭ（のパラメータ）を、入力文字列
により構成される単語の単語名と対にして文字登録単語
辞書７３に登録する。The character string partial word series conversion unit 75 also has a word HMM generation function similar to the word HMM generation unit 126 in the first embodiment, and has the same partial HMM generation function as that shown in FIG. A word-speech model of a word composed of an input character string by connecting (partitions of) partial words HMM registered in the partial word HMM dictionary according to a partial word sequence converted from the input character string using an HMM dictionary (Parameter of the word HMM) is generated. The character string partial word series conversion unit 75 registers the word HMM (parameter) generated in this manner in the character registration word dictionary 73 in a pair with the word name of the word formed by the input character string.

【０２７０】文字登録単語辞書７３の一例を図３８に示
す。この図３８は、「社外」という単語の登録のために
使用者が「しゃがい」という文字列を入力し、「社内」
という単語の登録のために「しゃない」という文字列を
入力した場合の登録例を示している。FIG. 38 shows an example of the character registration word dictionary 73. In FIG. 38, in order to register the word “external”, the user inputs a character string “squatting”,
A registration example is shown in the case where a character string “Shanai” is input for the registration of the word “sai”.

【０２７１】なお、文字登録単語辞書７３を作成する部
分（文字列部分単語系列変換部７５）は、図３６の音声
認識装置内に組み込まれていても、音声認識装置には組
み込まていなくても構わない。後者の場合には、作成し
た文字登録単語辞書７３の内容を、フロッピーディス
ク、ＣＤ−ＲＯＭ等の着脱可能な記録媒体に記録して音
声認識装置に装着するとか、通信回線等を介して音声認
識装置内の記憶装置にローディングすればよい。The part for creating the character registration word dictionary 73 (character string partial word series conversion unit 75) may be incorporated in the speech recognition apparatus of FIG. 36 or may not be incorporated in the speech recognition apparatus. I do not care. In the latter case, the contents of the created character registration word dictionary 73 are recorded on a removable recording medium such as a floppy disk, CD-ROM, etc., and mounted on a voice recognition device, or voice recognition is performed via a communication line or the like. What is necessary is just to load into the storage device in an apparatus.

【０２７２】さて、本実施形態における使用者登録単語
辞書１３には、図７に示した登録形式を適用している。
この図７に示した使用者登録単語辞書１３の登録形式
と、図３８に示した文字登録単語辞書７３の登録形式と
は同一であり、単語名と単語ＨＭＭ（のパラメータ）の
対が登録される形式となっている。The registration form shown in FIG. 7 is applied to the user registration word dictionary 13 in the present embodiment.
The registration format of the user registration word dictionary 13 shown in FIG. 7 is the same as the registration format of the character registration word dictionary 73 shown in FIG. 38, and a pair of a word name and a word HMM (parameter thereof) is registered. Format.

【０２７３】このため主音声認識部７４は、認識処理に
おいて文字登録単語辞書７３を使用者登録単語辞書１３
と同様に利用することができる。したがって主音声認識
部７４には、使用者登録単語辞書１３と文字登録単語辞
書７３との両単語辞書を用いるにも拘らず、例えば図８
に示した主音声認識部１４の構成と同様の構成を適用す
ることができる。但し、主音声認識部７４では、（図８
中のＨＭＭ認識部１４３に相当する）ＨＭＭ認識部（図
示せず）が、使用者登録単語辞書１３と文字登録単語辞
書７３の両辞書を参照し、両辞書に含まれる全ての単語
についてビタビスコアを求める点で異なっている。For this reason, the main speech recognition section 74 converts the character registration word dictionary 73 into the user registration word dictionary 13 in the recognition processing.
Can be used as well. Therefore, despite using both the user registered word dictionary 13 and the character registered word dictionary 73 as the main speech recognition unit 74, for example, FIG.
The same configuration as the configuration of the main voice recognition unit 14 shown in FIG. However, in the main voice recognition unit 74, (FIG.
An HMM recognizing unit (corresponding to the HMM recognizing unit 143 in FIG. 2) refers to both the user registered word dictionary 13 and the character registered word dictionary 73, and determines the Viterbi score for all the words included in both dictionaries. Is different.

【０２７４】次に、使用者登録単語辞書１３と文字登録
単語辞書７３の両辞書を利用しての主音声認識部７４で
の認識処理の具体例について説明する。Next, a specific example of recognition processing in the main speech recognition unit 74 using both the user registered word dictionary 13 and the character registered word dictionary 73 will be described.

【０２７５】認識処理時に、使用者が「しゃいん」と音
声入力したものとする。この場合、「しゃいん」と発声
された音声から生成されたラベル系列に対して、図７の
登録内容を持つ使用者登録単語辞書１３を参照して「社
員」の単語ＨＭＭと「役員」の単語ＨＭＭ（２個ある）
のビタビスコアが計算されると共に、図３８の登録内容
を持つ文字登録単語辞書７３を参照して「社内」の単語
ＨＭＭと「社外」の単語ＨＭＭのビタビスコアが計算さ
れる。ここでは、「社員」のビタビスコアが−４０、
「役員」のビタビスコアが−８０と−１００、「社外」
のビタビスコアが−７０、「社内」のビタビスコアが−
７５であるものとすると、認識結果は単語「社員」とな
る。In the recognition process, it is assumed that the user has input "Shain" by voice. In this case, for the label sequence generated from the voice uttered “Shain”, referring to the user registration word dictionary 13 having the registration contents of FIG. Word HMM (2)
38 is calculated, and the Viterbi score of the word HMM of "in-house" and the word HMM of "outside" is calculated with reference to the character registration word dictionary 73 having the registered contents of FIG. Here, the Viterbi score of "Employee" is -40,
"Executive" Viterbi scores -80 and -100, "Outside"
Has a -70 Viterbi score and "in-house" has a Viterbi score of-
If it is 75, the recognition result is the word "employee".

【０２７６】次に、使用者が「しゃない」と音声入力し
たものとする。この場合にも、「しゃない」と発声され
た音声から生成されたラベル系列に対して、同様にビタ
ビスコアが計算される。もし、「社員」のビタビスコア
が−９０、「役員」のビタビスコアが−７５と−７０、
「社外」のビタビスコアが−５５、「社内」のビタビス
コアが−３５であるものとすると、認識結果は単語「社
内」となる。[0276] Next, it is assumed that the user has voice-inputted "smart". Also in this case, a Viterbi score is similarly calculated for a label sequence generated from a voice uttered “shy”. If the Viterbi score of "employee" is -90, the Viterbi score of "executive" is -75 and -70,
Assuming that the Viterbi score of “outside” is −55 and the Viterbi score of “inside” is −35, the recognition result is the word “internal”.

【０２７７】以上の例では、使用者登録単語辞書１３と
文字登録単語辞書７３とを全く別個に持っているが、文
字で登録した単語と音声で登録した単語の区別を必要と
しない場合には、両者を同じ領域に保持していても構わ
ない。In the above example, the user-registered word dictionary 13 and the character-registered word dictionary 73 are completely separate, but if it is not necessary to distinguish between words registered by characters and words registered by voice, Alternatively, both may be held in the same area.

【０２７８】例えば、図７に示した使用者登録単語辞書
１３の登録内容及び図３８に示した文字登録単語辞書７
３の登録内容を共通の辞書（以下、文字・音声登録単語
辞書と称する）の領域に保持する場合であれば、文字・
音声登録単語辞書は図３９（ａ）のようになる。For example, the registered contents of the user registered word dictionary 13 shown in FIG. 7 and the character registered word dictionary 7 shown in FIG.
3 is stored in the area of a common dictionary (hereinafter referred to as a character / speech registered word dictionary),
The voice registration word dictionary is as shown in FIG.

【０２７９】また、図３９（ｂ）に示すように、文字・
音声登録単語辞書内にいずれの手段で登録されたかを示
す属性、例えば文字で登録されたか音声で登録されたか
を示すフラグを、登録内容毎に保持することで、両者を
区別して扱いたい場合であっても、両者を同じ領域に混
在させて保持することが可能となる。なお、図３９
（ｂ）の例では、登録手段のみをフラグで表している
が、登録された日時の情報なども属性として登録してお
くことも可能である。Also, as shown in FIG.
An attribute indicating which means has been registered in the voice registration word dictionary, for example, a flag indicating whether it has been registered as a character or registered as a voice, is stored for each registered content, so that it is possible to distinguish between the two. Even if there is, both can be mixed and held in the same area. Note that FIG.
In the example of (b), only the registration means is represented by a flag, but information on the registered date and time can be registered as an attribute.

【０２８０】以上に述べた第７の実施形態で適用した使
用者登録単語辞書１３と文字登録単語辞書７３とを併用
する構成は、図２の構成、図１４の構成、図１５の構
成、図２２の構成、図２７の構成、または図３０の構成
にも同様に適用可能である。The configuration in which the user registered word dictionary 13 and the character registered word dictionary 73 applied in the seventh embodiment described above are used together is the configuration shown in FIG. 2, the configuration shown in FIG. 14, the configuration shown in FIG. The same applies to the configuration of FIG. 22, the configuration of FIG. 27, or the configuration of FIG.

【０２８１】［第８の実施形態］次に、本発明の第８の
実施形態について説明する。[Eighth Embodiment] Next, an eighth embodiment of the present invention will be described.

【０２８２】前述の実施形態では、使用者が特殊な発声
をする場合（例：なまりが強い）、主音声認識部（１
４）の認識精度が低下することがある。このようなと
き、主音声認識部（１４）での認識処理で求められる各
単語毎の尤度（ビタビスコア）は低下する傾向がある。
このような傾向にある場合、部分単語系列生成部（１
２）の認識結果を使用者登録単語辞書（１３）に登録
し、次回からはそれも用いて認識を行うならば、主音声
認識部（１４）の認識精度を高めることが可能となる。In the above-described embodiment, when the user makes a special utterance (for example, a strong rounding), the main voice recognition unit (1)
The recognition accuracy of 4) may decrease. In such a case, the likelihood (Viterbi score) of each word obtained by the recognition processing in the main speech recognition unit (14) tends to decrease.
If there is such a tendency, the partial word sequence generation unit (1
If the recognition result of 2) is registered in the user registration word dictionary (13) and the recognition is performed using the same from the next time, the recognition accuracy of the main voice recognition unit (14) can be improved.

【０２８３】第８の実施形態は、入力音声に対して主音
声認識部だけでなく部分単語系列生成部でも認識処理を
行い、その結果をもとに部分単語系列の登録の可否を判
定して使用者登録単語辞書に自動登録することで、主音
声認識部の認識精度を高めることを可能としたものであ
る。In the eighth embodiment, the input speech is recognized not only by the main speech recognition unit but also by the partial word sequence generation unit, and based on the result, whether or not the partial word sequence can be registered is determined. By automatically registering in the user registration word dictionary, the recognition accuracy of the main voice recognition unit can be improved.

【０２８４】図４０は、本発明の第８の実施形態を示す
サブワード型不特定話者音声認識装置のブロック構成図
であり、図１と同一部分には同一符号を付してある。FIG. 40 is a block diagram of a sub-word type speaker-independent speech recognition apparatus according to an eighth embodiment of the present invention. The same parts as those in FIG. 1 are denoted by the same reference numerals.

【０２８５】図４０の構成の特徴は、音声認識モードに
おいて入力音声が主音声認識部１４だけでなく部分単語
系列生成部１２にも入力される点と、部分単語系列生成
部１２と主音声認識部１４の両者の認識処理の結果をも
とに部分単語系列の登録の可否を判定して使用者登録単
語辞書１３に登録する使用時単語登録判定部８１が新た
に設けられている点にある。The features of the configuration shown in FIG. 40 are that in the voice recognition mode, the input voice is input not only to the main voice recognition unit 14 but also to the partial word sequence generation unit 12; A point-of-use word registration determining unit 81 that determines whether or not a partial word sequence can be registered based on the results of the recognition processing of both units and registers the partial word sequence in the user registered word dictionary 13 is newly provided. .

【０２８６】また本実施形態における主音声認識部１４
が認識結果とその尤度（ビタビスコア）を出力する点
も、それまでの実施形態とは異なる。また、部分単語系
列生成部１２は、前記第２の実施形態におけるのと同様
に、部分単語系列の他にその系列の尤度（ビタビスコ
ア）を出力する。Also, the main speech recognition section 14 in the present embodiment
Output the recognition result and the likelihood (Viterbi score) from the previous embodiments. The partial word sequence generation unit 12 outputs the likelihood (Viterbi score) of the partial word sequence in addition to the partial word sequence, as in the second embodiment.

【０２８７】なお、図４０では、図１中のモード切替部
１１に相当するモード切替部は省略されている。このモ
ード切替部は、単語登録モードでは、図１中のモード切
替部１１と同様に入力音声を部分単語系列生成部１２に
入力するのに対し、音声認識モードでは、入力音声を主
音声認識部１４及び部分単語系列生成部１２の両方に入
力する。In FIG. 40, a mode switching unit corresponding to the mode switching unit 11 in FIG. 1 is omitted. In the word registration mode, the mode switching unit inputs the input speech to the partial word sequence generation unit 12 in the same manner as the mode switching unit 11 in FIG. 1, whereas in the speech recognition mode, the input speech is input to the main speech recognition unit. 14 and the partial word sequence generation unit 12.

【０２８８】図４０の構成において、入力音声は、主音
声認識部１４及び部分単語系列生成部１２のいずれにも
入力される。主音声認識部１４は、使用者登録単語辞書
１３を用いて前記第１の実施形態におけるのと同様にし
て入力音声に対する認識処理を行い、認識結果とその尤
度（ビタビスコア）を出力する。一方、部分単語系列生
成部１２は、前記第１の実施形態における単語登録モー
ドの場合と同様にして、入力音声を部分単語系列に変換
し、その部分単語系列とその尤度（ビタビスコア）を出
力する。ここで、使用者登録単語辞書１３には、単語登
録モードでの単語登録処理により単語登録がなされてい
るものとする。In the configuration shown in FIG. 40, the input voice is input to both main voice recognition unit 14 and partial word sequence generation unit 12. The main speech recognition unit 14 performs recognition processing on the input speech using the user registered word dictionary 13 in the same manner as in the first embodiment, and outputs a recognition result and its likelihood (Viterbi score). On the other hand, as in the case of the word registration mode in the first embodiment, the partial word sequence generation unit 12 converts the input speech into a partial word sequence, and converts the partial word sequence and its likelihood (Viterbi score). Output. Here, it is assumed that words have been registered in the user registration word dictionary 13 by word registration processing in the word registration mode.

【０２８９】使用時単語登録判定部８１は、主音声認識
部１４から出力される認識結果の尤度と、部分単語系列
生成部１２から出力される部分単語系列の尤度とを比較
し、後者の方が大きく、且つその差が予め定められた基
準値（閾値）Ｚよりも大きい場合に、その部分単語系列
を主音声認識部１４の認識結果に対応する部分単語系列
として、使用者登録単語辞書１３に登録する。The in-use word registration determination unit 81 compares the likelihood of the recognition result output from the main speech recognition unit 14 with the likelihood of the partial word sequence output from the partial word sequence generation unit 12, and determines the latter. Is larger and the difference is larger than a predetermined reference value (threshold) Z, the partial word sequence is regarded as a partial word sequence corresponding to the recognition result of the main speech recognition unit 14, and the user registration word Register in the dictionary 13.

【０２９０】この使用時単語登録判定部８１の動作の詳
細を、使用者登録単語辞書１３の内容が図４１（ａ）の
ようになっている場合を例に説明する。The operation of the in-use word registration judging section 81 will be described in detail by taking as an example a case where the contents of the user registration word dictionary 13 are as shown in FIG.

【０２９１】使用者Ａが「社員」を入力しようとして、
「しゃいん」と発声した結果、主音声認識部１４の出力
が、単語「社員」でそのビタビスコアが−２５、部分単
語系列生成部１２の出力が、ビタビスコアが−２０の部
分単語系列「ｓｙ，ａ，ｉ」と、ビタビスコアが−２５
の部分単語系列「ｓｙ，ａ，ｉ，Ｎ」であったものとす
る。When user A tries to enter "employee",
As a result of uttering “Shain”, the output of the main voice recognition unit 14 is a word “employee” whose Viterbi score is −25, and the output of the partial word sequence generation unit 12 is a partial word sequence whose Viterbi score is −20. sy, a, i "and the Viterbi score is -25.
Is a partial word sequence “sy, a, i, N”.

【０２９２】使用時単語登録判定部８１はまず、単語
「社員」のビタビスコア−２５と、部分単語系列「ｓ
ｙ，ａ，ｉ」のビタビスコア−２０とを比較する。部分
単語系列のビタビスコアの方が大きいので、使用時単語
登録判定部８１はその差を求め、基準値Ｚと比較する。
ここでは、基準値Ｚが２０に定められているものとする
と、差５はＺより小さいため、使用時単語登録判定部８
１は「ｓｙ，ａ，ｉ」の使用者登録単語辞書１３への登
録を行わない。The in-use word registration judgment unit 81 firstly obtains the Viterbi score -25 of the word "employee" and the partial word series "s
y, a, i "with the Viterbi score-20. Since the Viterbi score of the partial word sequence is larger, the in-use word registration determination unit 81 obtains the difference and compares it with the reference value Z.
Here, assuming that the reference value Z is set to 20, the difference 5 is smaller than Z.
No. 1 does not register “sy, a, i” in the user registration word dictionary 13.

【０２９３】次に使用時単語登録判定部８１は、単語
「社員」のビタビスコア−２５と、部分単語系列「ｓ
ｙ，ａ，ｉ，Ｎ」のビタビスコア−２５とを比較する。
部分単語系列のビタビスコアの方が大きくないので、登
録は行われない。Next, the in-use word registration judging section 81 calculates the Viterbi score -25 of the word "employee" and the partial word sequence "s
y, a, i, N "with the Viterbi score-25.
No registration is performed because the Viterbi score of the partial word sequence is not larger.

【０２９４】つまり、使用者Ａが発声した「しゃいん」
という音声は、もともと登録されていた単語「社員」に
対する部分単語系列「ｓｙ，ａ，ｉ，Ｎ」から期待され
る音声に非常に近い。これは、部分単語系列生成部１２
の出力に「ｓｙ，ａ，ｉ，Ｎ」が含まれていること、最
適な部分単語系列「ｓｙ，ａ，ｉ」のビタビスコアと、
「ｓｙ，ａ，ｉ，Ｎ」のビタビスコアが比較的近い値で
あることからそう判断できる。したがって、この場合は
使用者登録単語辞書１３に新しい項目を追加する必要は
ない。That is, "Shain" uttered by user A
Is very close to the voice expected from the partial word sequence “sy, a, i, N” for the word “employee” that was originally registered. This is because the partial word sequence generation unit 12
Contains "sy, a, i, N" in the output of the, the Viterbi score of the optimal partial word sequence "sy, a, i",
This can be determined from the fact that the Viterbi scores of “sy, a, i, N” are relatively close values. Therefore, in this case, it is not necessary to add a new item to the user registration word dictionary 13.

【０２９５】次に、別の使用者Ｂが「社員」を入力しよ
うとして、「しゃいん」と発声し、主音声認識部１４の
出力が、単語「社員」でそのビタビスコアが−５５、部
分単語系列生成部１２の出力が、ビタビスコアが−２０
の部分単語系列「ｓｙ，ｅ，ｉ，Ｎ」と、ビタビスコア
が−４５の部分単語系列「ｊ，ｅ，ｉ，Ｎ」であったも
のとする。Next, another user B tries to input "employee" and utters "shain", and the output of the main voice recognition unit 14 indicates that the word "employee" has a Viterbi score of -55 and a partial When the output of the word sequence generation unit 12 has a Viterbi score of -20
Is a partial word sequence "sy, e, i, N" and a partial word sequence "j, e, i, N" with a Viterbi score of -45.

【０２９６】使用時単語登録判定部８１はまず、単語
「社員」のビタビスコア−５５と、部分単語系列「ｓ
ｙ，ｅ，ｉ，Ｎ」のビタビスコア−２０とを比較する。
部分単語系列のビタビスコアの方が大きいので、使用時
単語登録判定部８１はその差を求め、基準値Ｚ（＝２
０）と比較する。差３５はＺより大きいため、使用時単
語登録判定部８１は、単語「社員」に対応する部分単語
系列として「ｓｙ，ｅ，ｉ，Ｎ」を新たに使用者登録単
語辞書１３に登録する。The in-use word registration determining unit 81 firstly obtains the Viterbi score −55 of the word “employee” and the partial word series “s
y, e, i, N "with the Viterbi score-20.
Since the Viterbi score of the partial word series is larger, the in-use word registration determination unit 81 obtains the difference and obtains the reference value Z (= 2
0). Since the difference 35 is larger than Z, the in-use word registration determination unit 81 newly registers “sy, e, i, N” in the user registration word dictionary 13 as a partial word sequence corresponding to the word “employee”.

【０２９７】次に使用時単語登録判定部８１は、単語
「社員」のビタビスコア−５５と、部分単語系列「ｊ，
ｅ，ｉ，Ｎ」のビタビスコア−４５とを比較する。部分
単語系列のビタビスコアの方が大きいので、その差を求
め、定数Ｚ（＝２０）と比較する。差１０はＺより小さ
いため、「ｊ，ｅ，ｉ，Ｎ」は登録されない。Next, the in-use word registration determination section 81 determines the Viterbi score -55 of the word "employee" and the partial word sequence "j,
e, i, N "with a Viterbi score of -45. Since the Viterbi score of the partial word sequence is larger, the difference is obtained and compared with a constant Z (= 20). Since the difference 10 is smaller than Z, “j, e, i, N” is not registered.

【０２９８】つまり、使用者Ｂが発声した「しゃいん」
という音声は、もともと登録されていた単語「社員」に
対する部分単語系列「ｓｙ，ａ，ｉ，Ｎ」から期待され
る音声と異なっている。これは、部分単語系列生成部１
２が出力する最適な部分単語系列「ｓｙ，ｅ，ｉ，Ｎ」
のビタビスコアが、「ｓｙ，ａ，ｉ，Ｎ」のビタビスコ
アを大きく上回っていることから、このように判断でき
る。したがって、この場合は単語「社員」に対して使用
者登録単語辞書１３に新しい部分単語系列「ｓｙ，ｅ，
ｉ，Ｎ」を追加登録するのは妥当である。In other words, "Shain" uttered by user B
Is different from the voice expected from the partial word sequence “sy, a, i, N” for the word “employee” originally registered. This is the partial word sequence generation unit 1
2 outputs the optimal partial word sequence "sy, e, i, N"
Can be determined in this way because the Viterbi score of “sy, a, i, N” greatly exceeds the Viterbi score of “sy, a, i, N”. Therefore, in this case, a new partial word sequence “sy, e,
It is appropriate to additionally register “i, N”.

【０２９９】以上の結果、図４１（ａ）の内容の使用者
登録単語辞書１３は、図４１（ｂ）のようになる。この
図４１（ｂ）に示した使用者登録単語辞書１３には、使
用者Ｂの発声傾向に従って単語「社員」に対して新たな
項目が追加されている。As a result, the user registration word dictionary 13 having the contents shown in FIG. 41A is as shown in FIG. 41B. In the user registration word dictionary 13 shown in FIG. 41B, a new item is added to the word “employee” according to the utterance tendency of the user B.

【０３００】このように、使用者登録単語辞書１３への
認識結果の自動登録が可能な本実施形態の音声認識装置
は、使用者が発声した単語が既知である場合に極めて有
効に機能する。As described above, the speech recognition apparatus of this embodiment capable of automatically registering the recognition result in the user registration word dictionary 13 functions extremely effectively when the word uttered by the user is known.

【０３０１】そこで、使用者が発声した単語が既知であ
る場合の図４０の音声認識装置の構成の変形例につい
て、便宜的に同じ図４０を参照して説明する。ここで
は、適応モードと呼ぶ新たなモードを用意すると共に、
当該適応モードでは使用者に対して単語を提示して、そ
の単語の発声を指示するユーザインタフェース（図示せ
ず）を設ける。Therefore, a modified example of the configuration of the speech recognition apparatus of FIG. 40 in the case where the word uttered by the user is known will be described with reference to the same FIG. 40 for convenience. Here, we prepare a new mode called adaptive mode,
In the adaptive mode, a user interface (not shown) for presenting a word to a user and instructing the user to utter the word is provided.

【０３０２】使用者は、適応モードにおいて装置（内の
ユーザインタフェース）から提示された単語を発声す
る。The user utters words presented from the device (the user interface therein) in the adaptive mode.

【０３０３】適応モード時に使用者から発声された音声
は主音声認識部１４及び部分単語系列生成部１２の両方
に入力される。主音声認識部１４は、使用者登録単語辞
書１３を用いて、装置（内のユーザインタフェース）が
発声を指示した単語に対する尤度（ビタビスコア）を求
めて出力する。一方、部分単語系列生成部１２は、入力
音声を部分単語系列に変換し、その部分単語系列とその
尤度（ビタビスコア）を出力する。The voice uttered by the user in the adaptive mode is input to both the main voice recognition unit 14 and the partial word sequence generation unit 12. The main voice recognition unit 14 uses the user registered word dictionary 13 to obtain and output a likelihood (Viterbi score) for the word that the device (user interface therein) has instructed to utter. On the other hand, the partial word sequence generation unit 12 converts the input speech into a partial word sequence, and outputs the partial word sequence and its likelihood (Viterbi score).

【０３０４】使用時単語登録判定部８１は、主音声認識
部１４の認識結果の尤度、即ち発声することを指示した
単語の尤度と、部分単語系列生成部１２から出力された
部分単語系列の尤度とを比較し、後者の方が大きく、且
つその差が基準値Ｚよりも大きい場合に、その部分単語
系列を発声を指示した単語と対にして、使用者登録単語
辞書１３に登録する。The in-use word registration determination unit 81 determines the likelihood of the recognition result of the main speech recognition unit 14, that is, the likelihood of the word instructed to utter, and the partial word sequence output from the partial word sequence generation unit 12. If the latter is larger and the difference is larger than the reference value Z, the partial word sequence is paired with the word instructed to be uttered and registered in the user registration word dictionary 13. I do.

【０３０５】次に、使用者が発声した単語が既知である
場合のもう一つの変形例について、便宜的に図４０を参
照して説明する。Next, another modified example in which the word uttered by the user is known will be described with reference to FIG. 40 for convenience.

【０３０６】適応モード時に発声された音声は主音声認
識部１４及び部分単語系列生成部１２の両方に入力され
る。主音声認識部１４は、先の変形例とは異なって、発
声を指示した単語に無関係に、使用者登録単語辞書１３
を使って通常の認識を行う。部分単語系列生成部１２は
入力音声を部分単語系列に変換して出力する。ここで
は、部分単語系列の尤度は出力する必要はない。The voice uttered in the adaptive mode is input to both the main voice recognition unit 14 and the partial word sequence generation unit 12. The main voice recognition unit 14 is different from the above-described modified example in that the user registration word dictionary 13
Perform normal recognition using. The partial word sequence generation unit 12 converts the input speech into a partial word sequence and outputs it. Here, there is no need to output the likelihood of the partial word sequence.

【０３０７】使用時単語登録判定部８１は、主音声認識
部１４の認識結果が発声を指示した単語と同一であるか
否かを判定し、異なっている場合には、部分単語系列生
成部１２から出力された部分単語系列を発声を指示した
単語と対にして、使用者登録単語辞書１３に登録する。
なお、部分単語系列生成部１２から部分単語系列と共に
その尤度も出力するようにして、使用時単語登録判定部
８１での判定の条件に、部分単語系列の尤度を加える構
成とすること（つまり、前記第２の実施形態における登
録条件判定部２１の機能との組み合わせ）も可能であ
る。The in-use word registration judging section 81 judges whether or not the recognition result of the main speech recognizing section 14 is the same as the word instructing the utterance. Is registered in the user registration word dictionary 13 by pairing the partial word sequence output from the word with the word instructed to utter.
Note that the partial word sequence generation unit 12 outputs the likelihood together with the partial word sequence, and the likelihood of the partial word sequence is added to the determination condition of the in-use word registration determination unit 81 ( That is, a combination with the function of the registration condition determination unit 21 in the second embodiment is also possible.

【０３０８】以上は、図４０の音声認識装置（内の使用
時単語登録判定部８１）において、主音声認識部１４と
部分単語系列生成部１２の両出力をもとに、新たな部分
単語系列を登録するか否かを決定する場合について説明
したが、これに限るものではない。例えば、主音声認識
部１４の認識結果の尤度のみで一度判定をし、その判定
の結果に応じて部分単語系列生成部１２による認識処理
を行わせ、当該部分単語系列生成部１２から出力される
部分単語系列の尤度と比較することで、新たな部分単語
系列を登録するか否かを判定するようにしても構わな
い。この図４０の構成の変形例について図４２のブロッ
ク構成図を参照して説明する。In the above description, in the speech recognition apparatus (in-use word registration / judgment unit 81) of FIG. 40, a new partial word sequence is generated based on both outputs of the main speech recognition unit 14 and the partial word sequence generation unit 12. Has been described, but the present invention is not limited to this. For example, a determination is made once only based on the likelihood of the recognition result of the main speech recognition unit 14, and a recognition process is performed by the partial word sequence generation unit 12 according to the determination result. By comparing with the likelihood of the partial word sequence, whether or not to register a new partial word sequence may be determined. A modification of the configuration in FIG. 40 will be described with reference to the block configuration diagram in FIG.

【０３０９】図４２の構成の音声認識装置では、まず入
力音声は主音声認識部１４に入力され、それに対する認
識結果と尤度（ビタビスコア）が計算される。それと同
時に入力音声は入力音声バッファ８３に一時記憶され
る。In the speech recognition apparatus having the configuration shown in FIG. 42, first, an input speech is input to the main speech recognition unit 14, and the recognition result and likelihood (Viterbi score) for the speech are calculated. At the same time, the input voice is temporarily stored in the input voice buffer 83.

【０３１０】（図４０中の使用時単語登録判定部８１に
相当する）使用時単語登録判定部８２は、主音声認識部
１４の認識結果の尤度（ビタビスコア）と、予め定めて
おいた基準値Ｚ１とを比較し、前者の方が大きい場合
は、新たな部分単語系列の登録は行わないと判定する。The in-use word registration determining unit 82 (corresponding to the in-use word registration determining unit 81 in FIG. 40) determines in advance the likelihood (Viterbi score) of the recognition result of the main speech recognition unit 14. If the former is larger than the reference value Z1, it is determined that registration of a new partial word series is not performed.

【０３１１】これに対して後者の方が大きい場合には、
使用時単語登録判定部８２は入力音声バッファ８３を制
御して、当該バッファ８３に一時的に記憶されていた入
力音声を部分単語系列生成部１２に出力させる。これに
より部分単語系列生成部１２は、入力音声を部分単語系
列に変換し、その部分単語系列と尤度（ビタビスコア）
とを出力する。これ以降の使用時単語登録判定部８２の
動作は前記した使用時単語登録判定部８１と同様であ
る。On the other hand, when the latter is larger,
The in-use word registration determination unit 82 controls the input voice buffer 83 to cause the partial word sequence generation unit 12 to output the input voice temporarily stored in the buffer 83. Thereby, the partial word sequence generation unit 12 converts the input speech into a partial word sequence, and the partial word sequence and likelihood (Viterbi score)
Is output. The subsequent operation of the in-use word registration determination unit 82 is the same as that of the in-use word registration determination unit 81 described above.

【０３１２】即ち使用時単語登録判定部８２は、先に出
力された主音声認識部１４の認識結果の尤度と、今回部
分単語系列生成部１２から出力された部分単語系列の尤
度とを比較し、後者の方が大きく、且つその差が基準値
Ｚよりも大きい場合に、その部分単語系列を主音声認識
部１４の認識結果に対応する部分単語系列として、使用
者登録単語辞書１３に登録する。That is, the in-use word registration determination unit 82 compares the likelihood of the recognition result of the main speech recognition unit 14 output earlier and the likelihood of the partial word sequence output from the current partial word sequence generation unit 12. When the latter is larger and the difference is larger than the reference value Z, the partial word sequence is regarded as a partial word sequence corresponding to the recognition result of the main speech recognition unit 14 and is registered in the user registration word dictionary 13. sign up.

【０３１３】図４２の構成の音声認識装置では、例えば
Ｚ１＝−４０とすると、前記した話者Ａが「しゃいん」
と発声した場合、その音声「しゃいん」に対しては部分
単語系列生成部１２は動作しない。一方、話者Ｂが発声
した音声「しゃいん」に対しては部分単語系列生成部１
２は動作し、更に使用者登録単語辞書１３に新たな単語
が追加登録されることになる。In the speech recognition apparatus having the configuration shown in FIG. 42, if Z1 = -40, for example, the above-mentioned speaker A will
, The partial word sequence generation unit 12 does not operate for the voice “Shain”. On the other hand, for the voice “Shain” uttered by the speaker B, the partial word sequence generation unit 1
2 operates, and a new word is additionally registered in the user registration word dictionary 13.

【０３１４】このような構成とすることによって、平均
的な発声に対しては部分単語系列生成部１２の処理を行
わずに済ますことができるので、音声認識装置の主たる
機能（部分単語系列生成部１２及び主音声認識部１４等
の機能）をコンピュータで実現する場合には、当該コン
ピュータの負荷が軽減される。したがって、音声認識装
置の主たる機能を高速ではないコンピュータで実現する
のに適している。With such a configuration, it is possible to eliminate the processing of the partial word sequence generation unit 12 for the average utterance, so that the main function of the speech recognition apparatus (the partial word sequence generation unit If the functions of the main voice recognition unit 12 and the main voice recognition unit 14 are realized by a computer, the load on the computer is reduced. Therefore, it is suitable for realizing the main functions of the speech recognition apparatus on a computer that is not high-speed.

【０３１５】なお、図４０及び図４２の構成における使
用者登録単語辞書１３には、同じ音声認識装置内で単語
登録モードを設定して単語登録することで生成されたも
のの他に、前記第１乃至第７の実施形態のいずれかで適
用した単語登録方法によって他の装置内で生成されたも
のを用いることが可能となる。この場合、図４０及び図
４２の構成の音声認識装置には、単語登録モード及び音
声認識モードは必ずしも必要でなく、音声認識装置とし
ての通常の使用において、主音声認識部１４及び部分単
語系列生成部１２を併用して使用者登録単語辞書１３へ
の単語の自動登録を行うことができる。The user registration word dictionary 13 in the configuration shown in FIGS. 40 and 42 includes, in addition to the one generated by setting the word registration mode and registering words in the same speech recognition device, the first dictionary described above. It is possible to use a word generated in another device by the word registration method applied in any of the seventh to seventh embodiments. In this case, the speech recognition apparatus having the configuration shown in FIGS. 40 and 42 does not necessarily need the word registration mode and the speech recognition mode. In a normal use as the speech recognition apparatus, the main speech recognition unit 14 and the partial word sequence generation The word can be automatically registered in the user registered word dictionary 13 by using the unit 12 together.

【０３１６】また、本実施形態における音声認識装置で
は、前記第７の実施形態で述べた文字登録単語辞書（７
３）を使用者登録単語辞書１３と併用することも可能で
ある。この場合、第７の実施形態と同様に、主音声認識
部１４は文字登録辞書（７３）及び使用者登録単語辞書
１３の両者を用いて認識を行う。この主音声認識部１４
での認識結果の尤度によっては、部分単語系列生成部１
２からの部分単語系列が使用時単語登録判定部８１によ
り使用者登録単語辞書１３に登録される。In the speech recognition apparatus according to the present embodiment, the character registration word dictionary (7
It is also possible to use 3) together with the user registration word dictionary 13. In this case, as in the seventh embodiment, the main voice recognition unit 14 performs recognition using both the character registration dictionary (73) and the user registration word dictionary 13. This main voice recognition unit 14
Depending on the likelihood of the recognition result in the partial word sequence generation unit 1
The partial word sequence from 2 is registered in the user registration word dictionary 13 by the in-use word registration determination unit 81.

【０３１７】［第９の実施形態］次に、本発明の第９の
実施形態について説明する。[Ninth Embodiment] Next, a ninth embodiment of the present invention will be described.

【０３１８】前記第８の実施形態でも述べたように、使
用者が特殊な発声をする場合（例：なまりが強い）、主
音声認識部（１４）の認識精度が低下することがある。
このようなとき、主音声認識部（１４）の尤度は低下す
る傾向がある。主音声認識部（１４）の出力が間違って
いる場合、使用者の指示に従って部分単語系列を使用者
登録単語辞書（１３）に登録し、次回からはそれも用い
て認識を行うならば、前記第８の実施形態と同様に、主
音声認識部（１４）の認識精度を高めることが可能とな
る。As described in the eighth embodiment, when the user makes a special utterance (for example, strong rounding), the recognition accuracy of the main voice recognition unit (14) may decrease.
In such a case, the likelihood of the main speech recognition unit (14) tends to decrease. If the output of the main speech recognition unit (14) is wrong, the partial word sequence is registered in the user registration word dictionary (13) according to the user's instruction, and if the recognition is performed using it again from the next time, As in the eighth embodiment, the recognition accuracy of the main voice recognition unit (14) can be improved.

【０３１９】第９の実施形態は、入力音声に対して主音
声認識部だけでなく部分単語系列生成部でも認識処理を
行い、部分単語系列生成部から出力された部分単語系列
の登録動作を行うか否かを、使用者が主音声認識部から
の認識結果をもとに指示可能な構成とすることで、主音
声認識部の認識精度を高めることを可能としたものであ
る。In the ninth embodiment, the input speech is recognized not only by the main speech recognition unit but also by the partial word sequence generation unit, and the partial word sequence output from the partial word sequence generation unit is registered. By making it possible for the user to indicate whether or not the recognition is performed based on the recognition result from the main voice recognition unit, the recognition accuracy of the main voice recognition unit can be improved.

【０３２０】図４３は、本発明の第９の実施形態を示す
サブワード型不特定話者音声認識装置のブロック構成図
であり、図４０と同一部分には同一符号を付してある。FIG. 43 is a block diagram showing a subword-type speaker-independent speech recognition apparatus according to a ninth embodiment of the present invention. The same parts as those in FIG. 40 are denoted by the same reference numerals.

【０３２１】図４３の構成の特徴は、図４０の構成にお
いて使用時単語登録判定部８１に代えて使用時単語登録
確認部９１を用いている点にある。A feature of the configuration of FIG. 43 is that the in-use word registration confirmation unit 91 is used instead of the in-use word registration determination unit 81 in the configuration of FIG.

【０３２２】図４３の構成において、入力音声は、主音
声認識部１４及び部分単語系列生成部１２のいずれにも
入力される。主音声認識部１４は、使用者登録単語辞書
１３を用いて前記第１の実施形態におけるのと同様にし
て入力音声に対する認識処理を行い、認識結果を出力す
る。一方、部分単語系列生成部１２は、前記第１の実施
形態における単語登録モードの場合と同様にして、入力
音声を部分単語系列に変換し、その部分単語系列を出力
する。ここで、使用者登録単語辞書１３には、単語登録
モードでの単語登録処理により単語登録がなされている
ものとする。In the configuration of FIG. 43, the input voice is input to both the main voice recognition unit 14 and the partial word sequence generation unit 12. The main voice recognition unit 14 performs a recognition process on the input voice using the user registered word dictionary 13 in the same manner as in the first embodiment, and outputs a recognition result. On the other hand, as in the case of the word registration mode in the first embodiment, the partial word sequence generation unit 12 converts an input voice into a partial word sequence and outputs the partial word sequence. Here, it is assumed that words have been registered in the user registration word dictionary 13 by word registration processing in the word registration mode.

【０３２３】使用時単語登録確認部９１は、使用者が操
作可能な入力部（使用者操作部）を持ち、部分単語系列
を使用者登録単語辞書１３に登録をするか否かを示す使
用者からの指示を当該入力部を通して受け取る。すると
使用時単語登録確認部９１は、主音声認識部１４の認識
結果に対応する部分単語系列として、部分単語系列生成
部１２から出力された部分単語系列を使用者登録単語辞
書１３に登録する。The in-use word registration confirmation section 91 has an input section (user operation section) which can be operated by the user, and indicates whether or not the partial word sequence is registered in the user registration word dictionary 13. From the input unit. Then, the in-use word registration confirmation unit 91 registers the partial word sequence output from the partial word sequence generation unit 12 in the user registration word dictionary 13 as a partial word sequence corresponding to the recognition result of the main speech recognition unit 14.

【０３２４】この使用時単語登録確認部９１の動作の詳
細を、使用者登録単語辞書１３の内容が、前記第８の実
施形態と同様に図４１（ａ）のようになっている場合を
例に説明する。The details of the operation of the in-use word registration confirmation section 91 will be described in the case where the contents of the user registration word dictionary 13 are as shown in FIG. 41A as in the eighth embodiment. Will be described.

【０３２５】使用者が「社員」を入力しようとして、
「しゃいん」と発声した結果、主音声認識部１４の出力
が単語「社員」であり、部分単語系列生成部１２の出力
が部分単語系列「ｓｙ，ｅ，ｉ，Ｎ」であったものとす
る。When the user tries to enter "employee",
As a result of uttering “Shain”, the output of the main voice recognition unit 14 is the word “employee”, and the output of the partial word sequence generation unit 12 is the partial word sequence “sy, e, i, N”. I do.

【０３２６】使用者は、普段図４３の音声認識装置を使
用していて、単語「社員」と認識されにくいと感じてい
るような場合、使用時単語登録確認部９１の入力部を操
作して、部分単語系列を使用者登録単語辞書１３に登録
することを指示する。If the user normally uses the voice recognition device shown in FIG. 43 and feels that it is difficult to recognize the word “employee”, the user operates the input unit of the in-use word registration confirmation unit 91 to operate. , To register the partial word sequence in the user registration word dictionary 13.

【０３２７】すると使用時単語登録確認部９１は、主音
声認識部１４の出力である単語「社員」と部分単語系列
生成部１２の出力である部分単語系列「ｓｙ，ｅ，ｉ，
Ｎ」の対を、使用者登録単語辞書１３に追加登録する。
この追加登録の結果、図４１（ａ）の内容の使用者登録
単語辞書１３は、図４１（ｂ）のようになる。Then, the in-use word registration confirmation unit 91 outputs the word “employee” output from the main speech recognition unit 14 and the partial word sequence “sy, e, i,
The pair of “N” is additionally registered in the user registration word dictionary 13.
As a result of the additional registration, the user registration word dictionary 13 having the content of FIG. 41A is as shown in FIG. 41B.

【０３２８】このように本実施形態では、使用時単語登
録判定部８１が部分単語系列の登録指示を受け取った場
合には、部分単語系列は主音声認識部１４の認識結果に
対応付けて登録される。しかし、この方式では、主音声
認識部１４の認識結果が間違っている場合には、使用者
登録単語辞書１３には、単語名と部分単語系列の誤った
組み合わせが登録されることになる。As described above, in this embodiment, when the in-use word registration determining unit 81 receives the instruction to register the partial word sequence, the partial word sequence is registered in association with the recognition result of the main speech recognition unit 14. You. However, in this method, if the recognition result of the main speech recognition unit 14 is wrong, an incorrect combination of a word name and a partial word sequence is registered in the user registration word dictionary 13.

【０３２９】そこで、このような不具合を解消するため
に、図４３中の使用時単語登録判定部８１が、部分単語
系列の登録指示の他に、その部分単語系列をどの単語に
対応付けるかという情報も受け取ることが可能な構成と
しても構わない。この第９の実施形態の変形例につい
て、上述の場合と同様に、使用者登録単語辞書１３の内
容が図４１（ａ）のようになっている場合を例に説明す
る。Therefore, in order to solve such a problem, the in-use word registration determining unit 81 in FIG. 43 uses not only a registration instruction of a partial word sequence but also information on which word the partial word sequence is to be associated with. May be received. A modification of the ninth embodiment will be described by taking as an example a case where the contents of the user registration word dictionary 13 are as shown in FIG.

【０３３０】使用者が「社員」を入力しようとして、
「しゃいん」と発声した結果、主音声認識部１４の出力
が単語「社員」であり、部分単語系列生成部１２の出力
が部分単語系列「ｓｙ，ｅ，ｉ，Ｎ」であったものとす
る。When the user tries to enter "employee",
As a result of uttering “Shain”, the output of the main voice recognition unit 14 is the word “employee”, and the output of the partial word sequence generation unit 12 is the partial word sequence “sy, e, i, N”. I do.

【０３３１】使用者は、使用時単語登録判定部８１の入
力部を操作して、部分単語列の登録を指示すると共に、
現在の発声が単語「社員」のものであったことを入力す
る。The user operates the input unit of the in-use word registration determining unit 81 to instruct registration of a partial word string,
Enter that the current utterance was for the word "employee".

【０３３２】使用時単語登録確認部９１は、部分単語列
の登録指示と、単語「社員」の情報とを受け取ると、そ
の受け取った単語「社員」、つまり使用者の指定した単
語「社員」と、部分単語系列生成部１２の出力である部
分単語系列「ｓｙ，ｅ，ｉ，Ｎ」の対を、使用者登録単
語辞書１３に追加登録する。When the use word registration confirmation unit 91 receives the instruction to register the partial word string and the information on the word “employee”, the received word “employee”, that is, the word “employee” specified by the user, is used. Then, a pair of the partial word sequence “sy, e, i, N” output from the partial word sequence generation unit 12 is additionally registered in the user registration word dictionary 13.

【０３３３】これにより、主音声認識部１４の認識結果
が間違っている場合でも、使用者登録単語辞書１３に
は、単語名と部分単語系列の正しい組み合わせが追加登
録される。この追加登録直後の使用者登録単語辞書１３
は、図４１（ｂ）のようになる。Thus, even when the recognition result of the main voice recognition unit 14 is incorrect, a correct combination of a word name and a partial word sequence is additionally registered in the user registration word dictionary 13. User registration word dictionary 13 immediately after this additional registration
Is as shown in FIG. 41 (b).

【０３３４】なお、図４３の構成に対して、前記第８の
実施形態における図４２の構成と同様に、部分単語系列
生成部１２の入力側に入力音声バッファを設け、使用時
単語登録判定部８１が部分単語系列の登録指示を受け取
ったときのみ、部分単語系列生成部１２で入力音声を処
理するようにしてもよい。こうすることで、第８の実施
形態で述べたのと同様に、音声認識装置の主たる機能を
コンピュータで実現する場合に当該コンピュータの負荷
を軽減することができる。In addition to the configuration of FIG. 43, similar to the configuration of FIG. 42 in the eighth embodiment, an input speech buffer is provided on the input side of the partial word sequence The input speech may be processed by the partial word sequence generation unit 12 only when 81 receives the instruction to register the partial word sequence. This makes it possible to reduce the load on the computer when the main function of the speech recognition device is realized by a computer, as described in the eighth embodiment.

【０３３５】以上に述べた実施形態で適用される音声認
識装置の主要な機能、例えば部分単語系列生成部１２に
よる部分単語系列生成処理、主音声認識部１４による認
識処理等は、プログラム読み取り可能なコンピュータに
当該処理を実行させるためのプログラムを記録した、Ｃ
Ｄ−ＲＯＭ、フロッピーディスク、メモリカード等の記
録媒体を装着して、当該記録媒体に記録されているプロ
グラムをコンピュータで読み取り実行させることによっ
ても実現される。ここで、主音声認識部１４による認識
処理は既存の音声認識ソフトウェアを用いて実行するこ
とが可能なため、記録媒体に記録するプログラムとして
は、上記認識処理を省いた処理をコンピュータに実行さ
せるためのプログラムであっても構わない。なお、プロ
グラムを記録した記録媒体の内容が、通信回線等を介し
てコンピュータにダウンロードされるものであっても構
わない。The main functions of the speech recognition apparatus applied in the above-described embodiment, such as the partial word sequence generation processing by the partial word sequence generation unit 12 and the recognition processing by the main speech recognition unit 14, are program-readable. C that records a program for causing a computer to execute the process.
It is also realized by mounting a recording medium such as a D-ROM, a floppy disk, and a memory card, and reading and executing a program recorded on the recording medium by a computer. Here, since the recognition processing by the main voice recognition unit 14 can be performed using existing voice recognition software, the program recorded on the recording medium is a program that causes the computer to execute the processing excluding the above recognition processing. Program may be used. The content of the recording medium on which the program is recorded may be downloaded to a computer via a communication line or the like.

【０３３６】[0336]

【発明の効果】以上詳述したように本発明によれば、特
定話者音声認識方式と同程度の容易さで不特定話者が使
用し得る単語登録を行うことができる。As described above in detail, according to the present invention, it is possible to register words that can be used by an unspecified speaker with the same ease as that of the specific speaker voice recognition system.

【０３３７】また本発明によれば、使用者の発音に応じ
て単語辞書を更新することができるまた本発明によれ
ば、登録の対象となる単語系列が登録条件を満たしてい
るか否かを判定することにより、使用者の発音に応じた
単語辞書の登録において誤った登録が行われるのを防止
して、認識性能の低下を招くのを防ぐことができる。According to the present invention, the word dictionary can be updated in accordance with the pronunciation of the user. According to the present invention, it is determined whether or not the word sequence to be registered satisfies the registration condition. By doing so, it is possible to prevent erroneous registration from being performed in the registration of a word dictionary according to the pronunciation of the user, thereby preventing a reduction in recognition performance.

【０３３８】また本発明によれば、登録の対象となる部
分単語系列を生成した際の認識誤りの有無を使用者が確
認後、登録することができるため、誤った部分単語系列
の情報が登録されるのを防止できる。Further, according to the present invention, since the user can confirm the presence or absence of a recognition error when the partial word sequence to be registered is generated, the user can register the partial word sequence. Can be prevented.

【０３３９】また本発明によれば、登録の対象となる部
分単語系列の誤りを使用者が確認し、修正した上で登録
することができるため、再発声することなく部分単語系
列の情報を登録することができる。According to the present invention, the user can confirm an error in a partial word sequence to be registered, correct the partial word sequence, and register the partial word sequence. can do.

【０３４０】また本発明によれば、使用者登録単語辞書
の内容を文字情報に変換して使用者に提示することによ
り、使用者登録単語辞書に登録された情報を、後日使用
者が確認することができる。According to the present invention, the contents registered in the user-registered word dictionary are converted into character information and presented to the user, so that the user can check the information registered in the user-registered word dictionary at a later date. be able to.

【０３４１】また本発明によれば、使用者登録単語辞書
の内容を文字情報に変換して使用者に提示するだけでな
く、その提示内容を使用者の編集操作に供して、その編
集処理の結果を使用者登録単語辞書に反映させることに
より、使用者が不具合があると確認した場合に、その不
具合を訂正することができる。According to the present invention, in addition to converting the contents of the user registration word dictionary into character information and presenting it to the user, the presented contents are used for the user's editing operation, and the editing process is performed. By reflecting the result in the user registration word dictionary, when the user confirms that there is a problem, the problem can be corrected.

【０３４２】また本発明によれば、単語の読みを表す文
字列情報から生成された部分単語系列に対応する情報が
登録された文字登録単語辞書における登録情報の表現形
式を、使用者登録単語辞書における登録情報の表現形式
に一致させることにより、使用者登録単語辞書、及び文
字登録単語辞書がそれぞれ異なる方法で単語登録がなさ
れるにも拘らず、１つの認識方式のみで両者を同時に用
いて音声認識を行うことが可能となり、装置の構成を簡
略化することができる。Further, according to the present invention, the expression form of the registered information in the character registered word dictionary in which information corresponding to the partial word sequence generated from the character string information representing the reading of the word is registered is changed to the user registered word dictionary. In this case, the user registration word dictionary and the character registration word dictionary are registered in different ways, respectively, by using the registration information expression form in Recognition can be performed, and the configuration of the device can be simplified.

【０３４３】また本発明によれば、単語登録モード時だ
けでなく、音声認識モード時にも、入力音声に対する部
分単語系列生成を行い、使用者の発声に対する認識状況
に応じて使用者登録単語辞書への追加登録を行うことに
より、使用者の音声に逐次適応して認識精度の向上を図
ることができる。According to the present invention, not only in the word registration mode but also in the speech recognition mode, a partial word sequence is generated for the input speech, and the partial word sequence is generated according to the recognition status of the user's utterance. By performing the additional registration, the recognition accuracy can be improved by sequentially adapting to the user's voice.

[Brief description of the drawings]

【図１】本発明の第１の実施形態を示すサブワード型不
特定話者音声認識装置のブロック構成図。FIG. 1 is a block diagram of a sub-word type speaker-independent speech recognition apparatus according to a first embodiment of the present invention.

【図２】図１中の部分単語系列生成部１２の構成を示す
ブロック図。FIG. 2 is a block diagram showing a configuration of a partial word sequence generation unit 12 in FIG.

【図３】図２中の部分単語接続表１２３の一例を示す
図。FIG. 3 is a view showing an example of a partial word connection table 123 in FIG. 2;

【図４】図２中の部分単語ＨＭＭ認識部１２４の動作を
説明するためのフローチャートの一部を示す図。FIG. 4 is a view showing a part of a flowchart for explaining the operation of the partial word HMM recognition unit 124 in FIG. 2;

【図５】図２中の部分単語ＨＭＭ認識部１２４の動作を
説明するためのフローチャートの他の一部を示す図。FIG. 5 is a diagram showing another part of the flowchart for explaining the operation of the partial word HMM recognition unit 124 in FIG. 2;

【図６】図２中の部分単語ＨＭＭ認識部１２４の動作を
説明するためのフローチャートの残りを示す図。FIG. 6 is a view showing the rest of the flowchart for explaining the operation of the partial word HMM recognizing unit 124 in FIG. 2;

【図７】図１中の使用者登録単語辞書１３の一例を示す
図。FIG. 7 is a view showing an example of a user registration word dictionary 13 in FIG. 1;

【図８】図１中の主音声認識部１４の構成を示すブロッ
ク図。FIG. 8 is a block diagram showing a configuration of a main speech recognition unit 14 in FIG. 1;

【図９】図１中の使用者登録単語辞書１３の他の例を示
す図。FIG. 9 is a view showing another example of the user registered word dictionary 13 in FIG. 1;

【図１０】図９の形式の使用者登録単語辞書１３を使用
する場合の、部分単語系列生成部１２の構成を示すブロ
ック図。FIG. 10 is a block diagram showing a configuration of a partial word sequence generation unit 12 when a user registration word dictionary 13 in the format of FIG. 9 is used.

【図１１】図９の形式の使用者登録単語辞書１３を使用
する場合の、主音声認識部１４の構成を示すブロック
図。11 is a block diagram showing a configuration of a main speech recognition unit 14 when a user registration word dictionary 13 in the format of FIG. 9 is used.

【図１２】主音声認識部１４で使用する部分単語の体系
と、部分単語系列生成部１２で使用する部分単語の体系
が異なる場合の、図１の構成の変形例を示すブロック
図。FIG. 12 is a block diagram showing a modification of the configuration of FIG. 1 when the system of partial words used by the main speech recognition unit 14 and the system of partial words used by the partial word sequence generation unit 12 are different.

【図１３】図１２中の部分単語ＨＭＭ辞書１５の一例を
示す図。FIG. 13 is a view showing an example of a partial word HMM dictionary 15 in FIG. 12;

【図１４】本発明の第２の実施形態を示すサブワード型
不特定話者音声認識装置のブロック構成図。FIG. 14 is a block diagram of a subword-type speaker-independent speech recognition apparatus according to a second embodiment of the present invention.

【図１５】本発明の第３の実施形態を示すサブワード型
不特定話者音声認識装置のブロック構成図。FIG. 15 is a block diagram of a sub-word type speaker-independent speech recognition apparatus according to a third embodiment of the present invention.

【図１６】図１５中の登録確認部３１の構成を示すブロ
ック図。16 is a block diagram showing a configuration of a registration confirmation unit 31 in FIG.

【図１７】図１６中の部分単語系列文字列変換部３１１
の構成を示すブロック図。17 is a partial word sequence character string conversion unit 311 in FIG.
FIG. 2 is a block diagram showing the configuration of FIG.

【図１８】図１７中の部分単語系列平仮名対応表３１１
ｂの一例を示す図。18 is a partial word sequence Hiragana correspondence table 311 in FIG.
The figure which shows an example of b.

【図１９】図１７中の部分単語系列平仮名変換部３１１
ａの動作を説明するためのフローチャート。19 is a partial word sequence hiragana conversion unit 311 in FIG.
7 is a flowchart for explaining the operation of FIG.

【図２０】図１６中の部分単語系列表示文字列対応表３
１２における登録例を示す図。20 is a partial word series display character string correspondence table 3 in FIG.
The figure which shows the example of registration in 12.

【図２１】図１６中の文字列表示処理部３１４により表
示される単語登録確認画面の一例を示す図。FIG. 21 is a diagram showing an example of a word registration confirmation screen displayed by the character string display processing unit 314 in FIG.

【図２２】本発明の第４の実施形態を示すサブワード型
不特定話者音声認識装置のブロック構成図。FIG. 22 is a block diagram of a sub-word type speaker-independent speech recognition apparatus according to a fourth embodiment of the present invention.

【図２３】図２２中の登録編集部４１の構成を示すブロ
ック図。FIG. 23 is a block diagram showing a configuration of a registration editing unit 41 in FIG. 22;

【図２４】図２３中の文字列表示処理部４１４により表
示される単語登録編集画面の一例を示す図。24 is a view showing an example of a word registration edit screen displayed by the character string display processing unit 414 in FIG. 23.

【図２５】図２４の単語登録編集画面上での文字列編集
処理後の状態例を示す図。FIG. 25 is a diagram showing an example of a state after a character string editing process on the word registration editing screen of FIG. 24;

【図２６】図２３中の文字列部分単語系列変換部４１６
の構成を示すブロック図。26 is a character string partial word sequence conversion unit 416 in FIG.
FIG. 2 is a block diagram showing the configuration of FIG.

【図２７】本発明の第５の実施形態を示すサブワード型
不特定話者音声認識装置のブロック構成図。FIG. 27 is a block diagram showing a subword-type speaker-independent speech recognition apparatus according to a fifth embodiment of the present invention.

【図２８】図２７中の使用者単語登録辞書表示部５１の
構成を示すブロック図。FIG. 28 is a block diagram showing a configuration of a user word registration dictionary display unit 51 in FIG. 27;

【図２９】図２８中の文字列表示処理部５１４による使
用者登録単語辞書内容表示例を示す図。FIG. 29 is a view showing an example of user registered word dictionary contents displayed by a character string display processing unit 514 in FIG. 28;

【図３０】本発明の第６の実施形態を示すサブワード型
不特定話者音声認識装置のブロック構成図。FIG. 30 is a block diagram of a sub-word type speaker-independent speech recognition apparatus according to a sixth embodiment of the present invention.

【図３１】図３０中の使用者単語登録辞書編集部６１の
構成を示すブロック図。FIG. 31 is a block diagram showing a configuration of a user word registration dictionary editing unit 61 in FIG. 30;

【図３２】単語番号が付された使用者登録単語辞書１３
の登録形式を示す図。FIG. 32 is a user registered word dictionary 13 to which word numbers are assigned.
The figure which shows the registration format of.

【図３３】図３１中の文字列表示処理部６１４により表
示される使用者登録単語辞書編集画面の一例を示す図。FIG. 33 is a view showing an example of a user registration word dictionary editing screen displayed by the character string display processing unit 614 in FIG. 31.

【図３４】図３３の使用者登録単語辞書編集画面上での
文字列編集処理後の状態例を示す図。FIG. 34 is a diagram showing an example of a state after a character string editing process on the user registration word dictionary editing screen of FIG. 33;

【図３５】図３３の使用者登録単語辞書編集画面上での
文字列編集処理の結果に従う辞書操作によって図３２の
状態から変化した使用者登録単語辞書１３の内容例を示
す図。FIG. 35 is a diagram showing an example of the contents of the user registration word dictionary 13 changed from the state of FIG. 32 by a dictionary operation according to the result of the character string editing process on the user registration word dictionary editing screen of FIG.

【図３６】本発明の第７の実施形態を示すサブワード型
不特定話者音声認識装置のブロック構成図。FIG. 36 is a block diagram showing a sub-word type speaker-independent speech recognition apparatus according to a seventh embodiment of the present invention.

【図３７】図３６中の文字登録単語辞書７３の作成手法
を説明するための図。FIG. 37 is a view for explaining a method of creating the character registration word dictionary 73 in FIG. 36;

【図３８】図３６中の文字登録単語辞書７３の一例を示
す図。FIG. 38 is a view showing an example of a character registration word dictionary 73 in FIG. 36;

【図３９】図３６中の使用者登録単語辞書１３及び文字
登録単語辞書７３の内容を共通の領域に保持した文字・
音声登録単語辞書の一例を示す図。FIG. 39 is a diagram showing a character / word in which the contents of the user registered word dictionary 13 and the character registered word dictionary 73 in FIG. 36 are stored in a common area;
The figure which shows an example of a voice registration word dictionary.

【図４０】本発明の第８の実施形態を示すサブワード型
不特定話者音声認識装置のブロック構成図。FIG. 40 is a block diagram showing a subword-type speaker-independent speech recognition apparatus according to an eighth embodiment of the present invention.

【図４１】図４０の構成における使用時単語登録判定部
８１による単語登録前後の使用者登録単語辞書１３の内
容例を示す図。FIG. 41 is a diagram showing an example of the contents of a user registration word dictionary 13 before and after word registration by the in-use word registration determination unit 81 in the configuration of FIG. 40;

【図４２】図４０の構成の変形例を示すブロック図。FIG. 42 is a block diagram showing a modification of the configuration in FIG. 40;

【図４３】本発明の第９の実施形態を示すサブワード型
不特定話者音声認識装置のブロック構成図。FIG. 43 is a block diagram showing a subword-type speaker-independent speech recognition apparatus according to a ninth embodiment of the present invention.

【図４４】従来のサブワード型不特定話者音声認識装置
のブロック構成図。FIG. 44 is a block diagram of a conventional sub-word type unspecified speaker speech recognition apparatus.

【図４５】３状態２ループの離散ＨＭＭを示す図。FIG. 45 is a diagram showing a three-state two-loop discrete HMM.

【図４６】図４４中の部分単語ＨＭＭ辞書４６０に登録
される部分単語ＨＭＭの記憶形式の一例を示す図。FIG. 46 is a diagram showing an example of a storage format of a partial word HMM registered in the partial word HMM dictionary 460 in FIG.

【図４７】単語「おとな」を表す部分単語系列「ｏ，
ｔ，ｏ，ｎ，ａ」に相当する単語ＨＭＭを示す図。FIG. 47 is a partial word sequence “o,
The figure which shows the word HMM corresponding to "t, o, n, a".

【図４８】図４４中の単語ＨＭＭ辞書４５０に登録され
る単語ＨＭＭの記憶形式の一例を示す図。FIG. 48 is a diagram showing an example of a storage format of a word HMM registered in the word HMM dictionary 450 in FIG. 44.

[Explanation of symbols]

１１…モード切替部１２…部分単語系列生成部１３…使用者登録単語辞書１４，７４…主音声認識部（単語音声モデル取得手段）１５，１２５，１４５…部分単語ＨＭＭ辞書１６…単語ＨＭＭ生成部（部分単語体系変換手段）２１…登録条件判定部３１…登録確認部４１…登録編集部５１…使用者単語登録辞書表示部６１…使用者単語登録辞書編集部７３…文字登録単語辞書８１，８２…使用時単語登録判定部８３…入力音声バッファ９１…使用時単語登録確認部１２１，１４１…音響分析部１２２，１４２…量子化部１２３…部分単語接続表１２４…部分単語ＨＭＭ認識部１２６，１４６…単語ＨＭＭ生成部１４３…ＨＭＭ認識部３１１，４１１，５１１，６１１…部分単語系列文字列
変換部DESCRIPTION OF SYMBOLS 11 ... Mode switching part 12 ... Partial word series generation part 13 ... User registration word dictionary 14, 74 ... Main voice recognition part (word voice model acquisition means) 15, 125, 145 ... Partial word HMM dictionary 16 ... Word HMM generation part (Partial word system conversion means) 21: registration condition determination unit 31: registration confirmation unit 41: registration editing unit 51: user word registration dictionary display unit 61: user word registration dictionary editing unit 73: character registration word dictionary 81, 82 ... In-use word registration determination section 83... Input voice buffer 91... In-use word registration confirmation section 121, 141. Acoustic analysis section 122, 142. Quantization section 123. ... word HMM generation unit 143 ... HMM recognition unit 311,411,511,611 ... partial word sequence character string conversion unit

Claims

[Claims]

1. A partial word sequence generating means for converting an input speech into a sequence of at least one partial word, and information for registering information corresponding to the partial word sequence converted by the partial word sequence generating means. User registered word dictionary; word speech model acquisition means for acquiring a word speech model in which partial word speech models are joined from information corresponding to each partial word sequence registered in the user registered word dictionary; A sub-word type unspecified speaker voice recognition device, comprising: main voice recognition means for recognizing the uttered voice using a word voice model acquired from the user registered word dictionary.

2. A partial word sequence generating means for converting input speech into a sequence of at least one partial word, and the partial word sequence converted by said partial word sequence generating means satisfies a predetermined registration condition. Registration condition determination means for determining whether or not the user condition is satisfied; a user registration word dictionary in which information corresponding to the partial word sequence determined to satisfy the registration condition by the registration condition determination means is registered; Word speech model acquisition means for acquiring a word speech model in which partial word speech models are joined from information corresponding to each partial word sequence registered in the registered word dictionary; and Main speech recognition means for recognizing by using a word speech model acquired from a word dictionary. .

3. A partial word sequence generating means for converting input speech into at least one partial word sequence, and for all partial word sequences converted by the partial word sequence generating means, the partial word sequences are Registration confirmation means for presenting information to the user, accepting designation from the user as to whether or not the registration is possible, and confirming whether or not registration of a corresponding partial word sequence is possible according to the accepted designation content; A user-registered word dictionary in which information corresponding to the partial word sequence whose registration has been confirmed by the confirmation unit is registered; and a partial-word speech from information corresponding to each partial word sequence registered in the user-registered word dictionary. A word-speech model obtaining means for obtaining a word-speech model in which the models are joined; and a voice uttered by the user is obtained from the user registration word dictionary. Word-type speaker-independent speech recognition apparatus characterized by comprising a recognizing main speech recognition means with the word speech model.

4. A partial word sequence generating means for converting input speech into at least one partial word sequence, and for all partial word sequences converted by said partial word sequence generating means, the partial word sequence Presenting the information to the user, accepting a user's editing operation for the information, performing an editing process on the information, reflecting the result of the editing process on the corresponding partial word sequence, and A registration / editing unit that accepts designation from the user as to whether or not registration is possible, and confirms whether or not registration of a corresponding partial word sequence is possible in accordance with the accepted designation content; A user-registered word dictionary in which information corresponding to the partial word series is registered, and corresponding to each of the partial word series registered in the user-registered word dictionary Word speech model acquisition means for acquiring a word speech model in which partial word speech models are joined from information; and a main unit for recognizing speech uttered by a user using the word speech model acquired from the user registration word dictionary. A subword type speaker-independent speech recognition apparatus, comprising: speech recognition means.

5. A user word registration dictionary display means for converting information corresponding to a partial word sequence registered in the user registration word dictionary into character information and presenting it to a user. The subword-type unspecified speaker speech recognition device according to any one of claims 1 to 4.

6. Converting information corresponding to a partial word sequence registered in the user registration word dictionary into character information and presenting it to a user, and accepting a user's editing operation on the information. 5. The apparatus according to claim 1, further comprising a user word registration dictionary editing unit that performs an editing process on the information and reflects a result of the editing process on the user registration word dictionary. 3. A subword-type speaker-independent speech recognition apparatus according to claim 1.

7. A character registered word dictionary in which information corresponding to a partial word sequence generated from character string information representing word reading is registered in the same expression form as the user registered word dictionary. The word speech model acquiring means acquires a word speech model in which partial word speech models are joined from information corresponding to each partial word sequence registered in the user registered word dictionary, and stores the word speech model in the character registered word dictionary. A word speech model in which partial word speech models are joined is acquired from information corresponding to each registered partial word sequence, and the main speech recognition unit acquires from the user registered word dictionary and the character registered word dictionary. The subword type feature according to any one of claims 1 to 4, wherein the speech uttered by the user is recognized using the word speech model obtained. Speaker voice recognition device.

8. A partial word sequence generating means for recognizing a voice uttered by a user in a word registration mode and a voice recognition mode to generate at least one partial word sequence, and generating the partial word sequence by the partial word sequence generating means. A user registered word dictionary in which information corresponding to the registered partial word sequence is registered, and a word voice obtained by joining partial word voice models from information corresponding to each partial word sequence registered in the user registered word dictionary. Word speech model acquisition means for acquiring a model; main speech recognition means for recognizing speech uttered by a user in the speech recognition mode using a word speech model acquired from the user registered word dictionary; speech recognition mode Sometimes, the likelihood of the partial word sequence generated by the partial word sequence generation unit, the recognition result of the main speech recognition unit, and the likelihood of the recognition result are small. Based on at least one, it is determined whether or not the partial word sequence generated by the partial word sequence generating means can be registered, and information on the partial word sequence is stored in the user registered word dictionary in accordance with the determination result. A subword-type unspecified speaker speech recognition apparatus, comprising a use-time word registration determination unit for additionally registering.

9. A partial word sequence generating means for recognizing a voice uttered by a user in a word registration mode and a voice recognition mode to generate at least one partial word sequence; A user registered word dictionary in which information corresponding to the registered partial word sequence is registered, and a word voice obtained by joining partial word voice models from information corresponding to each partial word sequence registered in the user registered word dictionary. Word speech model acquisition means for acquiring a model; and main speech recognition for recognizing speech uttered by a user in the speech recognition mode using a word speech model acquired from the user registered word dictionary and outputting a recognition result. Means for receiving a partial word sequence registration instruction from a user at the time of output of a recognition result by the main voice recognition means, and receiving the registration instruction. A use word registration confirming means for additionally registering information corresponding to the partial word series generated by the partial word series generating means and registered and instructed in the user registered word dictionary. Unspecified speaker voice recognition device.

10. A partial word system conversion unit for converting a partial word sequence converted by the partial word sequence generation unit into another partial word sequence having an expression system different from the partial word sequence, further comprising: 5. The sub-word according to claim 1, wherein information corresponding to the partial word sequence converted by the system conversion unit is registered in the user registration word dictionary. Type-independent speaker speech recognition device.

11. The registration condition determining means determines whether the partial word sequence satisfies a registration condition by determining the likelihood of the partial word sequence and the number of partial word sequences converted by the partial word sequence generation means. 3. The apparatus according to claim 2, wherein the determination is made based on at least one of the following.

12. A partial word sequence generation for generating a sequence of at least one partial word by recognizing a voice uttered by a user in a word registration mode and when a predetermined condition is satisfied in a voice recognition mode. Means, a user registered word dictionary in which information corresponding to the partial word sequence generated by the partial word sequence generating means is registered, and information corresponding to each partial word sequence registered in the user registered word dictionary A word-speech model obtaining means for obtaining a word-speech model in which partial word-speech models are joined from each other; Main voice recognition means for recognizing; input voice storage means for temporarily storing voice uttered by a user in a voice recognition mode; The presence or absence of the condition is determined based on the likelihood of the recognition result of the voice recognition unit.If the condition is determined to be satisfied, the voice stored in the input voice storage unit is transmitted to the partial word sequence generation unit. The partial word sequence generation unit is operated by inputting the at least one of the likelihood of the partial word sequence generated by the partial word sequence generation unit, the recognition result of the main voice recognition unit, and the likelihood of the recognition result. Is determined on the basis of the partial word sequence generation unit, whether or not the partial word sequence generated by the partial word sequence generation unit can be registered is determined, and information on the partial word sequence is additionally registered in the user registered word dictionary according to the determination result. A sub-word type unspecified speaker speech recognition apparatus, comprising: a use-time word registration determination unit.

13. A partial word sequence generating means for converting an input speech into a sequence of at least one partial word, and information corresponding to the converted partial word sequence is converted from information corresponding to the partial word sequence. Dictionary registration means for registering in a user registration word dictionary referred to at the time of speech recognition so as to acquire a word speech model in which partial word speech models are joined and perform recognition processing using the word speech model. A word dictionary creating apparatus for subword-type speaker-independent speech recognition.

14. A voice uttered by a user for word registration is converted into at least one partial word sequence, and information corresponding to the converted partial word sequence is converted to information corresponding to the partial word sequence. A sub-word registered in a user-registered word dictionary referred to during speech recognition so that a word-speech model in which partial word-speech models are connected to each other is obtained and recognition processing can be performed using the word-speech model. A method for creating a word dictionary for type-independent speaker speech recognition.

15. A voice uttered by a user for word registration is converted into a sequence of at least one partial word, and it is determined whether or not the converted partial word sequence satisfies a predetermined registration condition. Determining the information corresponding to the partial word sequence determined to satisfy the registration condition, obtaining the word voice model in which the partial word voice models are joined from the information corresponding to the partial word sequence, and A method of creating a word dictionary for subword-type unspecified speaker speech recognition, characterized in that the dictionary is registered in a user registration word dictionary referred to during speech recognition so that recognition processing can be performed using a model.

16. A speech uttered by a user for word registration is converted into a sequence of at least one partial word, and for each of the converted partial word sequences, information representing the partial word sequence is used by the user. And accepts the designation from the user as to whether or not the registration is possible, and confirms whether or not the registration of the corresponding partial word series is possible in accordance with the accepted specification contents. Information corresponding to the partial word sequence for which registration has been confirmed is obtained, and a word voice model in which partial word voice models are joined from information corresponding to the partial word sequence is obtained, and recognition processing can be performed using the word voice model. A method for creating a word dictionary for subword-type unspecified speaker speech recognition, characterized in that the dictionary is registered in a user registration word dictionary referred to during speech recognition.

17. A speech uttered by a user for word registration is converted into a sequence of at least one partial word, and for each of the converted partial word sequences, information representing the partial word sequence is used by the user. And accepts the user's editing operation for the information, performs the editing process on the information, and reflects the result of the editing process on the corresponding partial word sequence. Upon receiving the specification from the user, confirming whether or not registration of the corresponding partial word sequence is possible according to the received specification content. When confirming the registration possibility, the partial word sequence in which the registration is confirmed is confirmed. A word speech model in which partial word speech models are joined from information corresponding to the partial word sequence is obtained from information corresponding to the partial word sequence, and recognition processing is performed using the word speech model. A method for creating a word dictionary for subword-type unspecified speaker voice recognition, characterized in that the word dictionary is registered in a user-registered word dictionary that is referred to during voice recognition so as to perform voice recognition.

18. A speech uttered by a user in a word registration mode is converted into a sequence of at least one partial word, and information on the converted partial word sequence is registered in a user registration word dictionary. In the recognition mode, a word speech model in which partial word speech models are joined is acquired from information corresponding to each partial word sequence registered in the user registered word dictionary, and a speech uttered by the user is acquired. A speech recognition method characterized by performing recognition using a word speech model.

19. A speech uttered by a user in a word registration mode is converted into at least one partial word sequence, and it is determined whether or not the converted partial word sequence satisfies a predetermined registration condition. Determining, when it is determined that the registration condition is satisfied, the converted partial word sequence information is registered in the user registration word dictionary. In the voice recognition mode, the information is registered in the user registration word dictionary. A word speech model in which partial word speech models are joined is obtained from information corresponding to each partial word sequence, and a voice uttered by a user is recognized using the obtained word speech model. Voice recognition method.

20. A speech uttered by a user in a word registration mode is converted into a sequence of at least one partial word, and for all of the converted partial word sequences, information representing the partial word sequence is provided to the user. Presenting, accepting a designation from the user as to whether or not the registration is possible, confirming whether or not registration of the corresponding partial word sequence is possible in accordance with the accepted designation content, and confirming whether or not the registration is possible, In the voice recognition mode, information corresponding to the partial word series for which permission has been confirmed is registered in the user registered word dictionary, and a part is extracted from the information corresponding to each partial word series registered in the user registered word dictionary. Speech recognition characterized by acquiring a word speech model in which word speech models are connected, and recognizing a speech uttered by a user using the acquired word speech model. Method.

21. A speech uttered by a user in a word registration mode is converted into a sequence of at least one partial word, and for all of the converted partial word sequences, information representing the partial word sequence is provided to the user. Present and accept the user's editing operation for the information, perform the editing process on the information, reflect the result of the editing process on the corresponding partial word sequence, and use the Accepting the specification from the user, confirming whether or not the registration of the corresponding partial word series is possible according to the received specification contents. Corresponding information is registered in the user registration word dictionary, and in the voice recognition mode, information corresponding to each partial word sequence registered in the user registration word dictionary Get the word speech model Luo partial word speech model is stitched together, the voice the user has uttered, speech recognition method, characterized by recognition using word speech model the acquired.

22. Recognizing a voice uttered by a user in a word registration mode, generating at least one partial word sequence, and registering the generated partial word sequence information in a user registration word dictionary. In the voice recognition mode, a word voice model in which partial word voice models are joined from information corresponding to each partial word sequence registered in the user registered word dictionary is acquired, and the voice uttered by the user is obtained. Recognizing by using the obtained word-speech model, outputting a recognition result and its likelihood, and recognizing a voice uttered by the user to recognize at least one partial word sequence and the likelihood of the sequence. Generating the partial word system based on at least one of the likelihood of the generated partial word sequence in the voice recognition mode, the recognition result, and the likelihood of the recognition result. Speech recognition method characterized in that the determining whether the registration, additional registration information of the partial word sequence to the user registered word dictionary in accordance with the determination result.

23. Recognizing a voice uttered by a user in a word registration mode, generating at least one partial word sequence, and registering the generated partial word sequence information in a user registration word dictionary. In the voice recognition mode, a word voice model in which partial word voice models are joined from information corresponding to each partial word sequence registered in the user registered word dictionary is acquired, and a voice uttered by the user is obtained. Recognizing using the obtained word voice model and outputting the recognition result,
Generating a sequence of at least one partial word by recognizing a voice uttered by the user, receiving a registration instruction from the user when outputting the recognition result, and receiving the registration instruction. A speech recognition method, wherein information corresponding to the generated and instructed partial word sequence is additionally registered in the user registered word dictionary.

24. A process of converting a voice uttered by a user for word registration into at least one partial word sequence, and converting information corresponding to the converted partial word sequence to the partial word sequence. Processing for acquiring a word speech model in which partial word speech models are joined from information to be performed and registering it in a user registration word dictionary referred to during speech recognition so that recognition processing can be performed using the word speech model. A computer-readable recording medium on which a program to be executed by a computer is recorded.

25. A process of converting a voice uttered by a user in a word registration mode into a sequence of at least one partial word; a process of registering information of the converted partial word sequence in a user registered word dictionary; In the voice recognition mode, a word voice model in which partial word voice models are joined from information corresponding to each partial word sequence registered in the user registered word dictionary is acquired, and a voice uttered by the user is obtained. A computer-readable recording medium in which a program for causing a computer to execute a process of recognizing using the acquired word voice model is recorded.