JP2001083981A

JP2001083981A - Speech recognition system and method and recording medium readable by computer having recorded voice recognition program therein

Info

Publication number: JP2001083981A
Application number: JP25428699A
Authority: JP
Inventors: Tomohiro Iwasaki; 知弘岩▲さき▼
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1999-09-08
Filing date: 1999-09-08
Publication date: 2001-03-30
Anticipated expiration: 2019-09-08
Also published as: JP3999913B2

Abstract

PROBLEM TO BE SOLVED: To provide a speech recognition system and a method capable of suppressing a cost lowly, and simultaneously capable of shortening a recognition response time, by memorizing a word dictionary in plural recording mediums, and a recording medium readable by a computer, having a speech recognition program recorded therein. SOLUTION: In this speech recognition system equipped with an acoustic analytical means 1, a reference model memory means 2, a dictionary memory means for memorizing plural partial dictionaries obtained by dividing a word dictionary, comparison data memory means 4, and a model comparison means 5 for executing comparison processing by consulting a reference model and the word dictionary relative to a characteristic vector, and for outputting a recognition result, the dictionary memory means is composed of a first dictionary memory means 6 capable of being consulted at high speed, for memorizing a partial dictionary having a high frequency of use, and a second dictionary memory means 7 incapable of being consulted at high speed, for memorizing a residual partial dictionary having a low frequency of use. Hereby, a cost can be suppressed lowly, and simultaneously a recognition response time can be shortened.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、音声により住所
検索などの大語彙認識を行う音声認識システム及び方法
並びに音声認識プログラムを記録したコンピュータ読み
取り可能な記録媒体に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition system and method for performing large vocabulary recognition such as address search by voice, and a computer-readable recording medium storing a voice recognition program.

【０００２】[0002]

【従来の技術】大語彙の音声認識を行う場合には、一般
的にビームサーチなどの手法を利用して、演算量を低減
することが行われる。ここでは、特開平１０−２５４４
７９号公報に開示されている音声認識装置を一例とし
て、従来の音声認識装置の説明を行う。2. Description of the Related Art When speech recognition of a large vocabulary is performed, a calculation amount is generally reduced by utilizing a technique such as a beam search. Here, Japanese Patent Application Laid-Open No. 10-2544
A conventional speech recognition apparatus will be described using the speech recognition apparatus disclosed in Japanese Patent Publication No. 79 as an example.

【０００３】図１５は、例えば特開平１０−２５４４７
９号公報に示された従来の音声認識装置のモデル照合処
理部の構成を示すブロック図である。以下、音声認識方
式としてはＨＭＭ（Hidden Markov Model）を用い、認
識対象を住所とし、認識する単位であるノードを地名と
して説明を行う。FIG. 15 shows, for example, Japanese Patent Application Laid-Open No. H10-25447.
FIG. 9 is a block diagram illustrating a configuration of a model matching processing unit of a conventional speech recognition device disclosed in Japanese Patent Application Laid-Open No. 9-No. Hereinafter, a description will be given using an HMM (Hidden Markov Model) as a speech recognition method, an address as a recognition target, and a node as a recognition unit as a place name.

【０００４】図１５において、１は音声信号を入力し音
響分析を行い特徴ベクトルの時系列に変換する音響分析
手段、２は認識対象の標準モデルを記憶する標準モデル
記憶手段、３は住所を表現する辞書を記憶する辞書記憶
手段、４は照合処理に作業領域として照合データを記憶
する照合データ記憶手段、５は音響分析手段１からの特
徴ベクトルに対し、標準モデルと辞書を参照しながら照
合処理を行い、認識結果を出力するモデル照合手段であ
る。In FIG. 15, reference numeral 1 denotes a sound analysis means for inputting a voice signal, performs sound analysis and converts it into a time series of feature vectors, 2 a standard model storage means for storing a standard model to be recognized, and 3 an address. Storage means for storing the matching data as a work area in the matching processing, and reference processing for the feature vectors from the acoustic analysis means 1 with reference to the standard model and the dictionary. And outputs a recognition result.

【０００５】つぎに、従来の音声認識装置の動作につい
て図面を参照しながら説明する。Next, the operation of the conventional speech recognition apparatus will be described with reference to the drawings.

【０００６】この説明では、図１６に示す住所を認識対
象とする。認識に先立ち、標準モデル記憶手段３には標
準モデルが、辞書記憶手段３には住所を表現する辞書が
収められているものとする。In this description, the address shown in FIG. 16 is to be recognized. Prior to recognition, it is assumed that the standard model is stored in the standard model storage means 3 and a dictionary expressing addresses is stored in the dictionary storage means 3.

【０００７】図１６に示す住所を表す辞書記憶手段３の
内容を図１７に示す。図１７には認識対象の住所の地名
が四角の中に、またその接続が矢印で示されている。各
単語は角の丸い枠で示された部分辞書に分割されて記憶
されており、部分辞書を単位としてモデル照合手段５に
取り込み照合演算に用いることができる。三角は部分辞
書へのエントリーポイントを示し、例えば部分辞書net5
の「江ノ島」は部分辞書net9の「1丁目」、「２丁
目」、「３丁目」のそれぞれに接続されていることを示
す。FIG. 17 shows the contents of the dictionary storage means 3 representing the addresses shown in FIG. In FIG. 17, the place name of the address to be recognized is indicated by a square, and the connection is indicated by an arrow. Each word is divided and stored in a partial dictionary indicated by a frame with a rounded corner, and can be taken into the model matching unit 5 for each partial dictionary and used for a matching operation. Triangles indicate entry points to partial dictionaries, for example, partial dictionaries net5
"Enoshima" indicates that it is connected to each of "1 chome", "2 chome" and "3 chome" of the partial dictionary net9.

【０００８】図１８に示す部分辞書net2を一例としての
部分辞書の構造について説明する。この部分辞書net2
は、ひとつのエントリーポイント（entry0）を有し、内
部にノード番号node2の「神奈川県」と、ノード番号nod
e3の「香川県」の２つのノードを有する。The structure of the partial dictionary as an example of the partial dictionary net2 shown in FIG. 18 will be described. This partial dictionary net2
Has one entry point (entry0) and has "Kanagawa" of node number node2 inside and node number nod
It has two nodes of "Kagawa prefecture" of e3.

【０００９】神奈川県は次に部分辞書net3のエントリー
ポイントentry0に接続され、香川県は次に部分辞書net4
のエントリーポイントentry0に接続される。神奈川県の
ノードに対してモデル照合する場合には、標準モデル記
憶手段２のノード番号node2のＨＭＭパラメータを使用
することを意味する。[0009] Kanagawa is then connected to entry point entry0 of partial dictionary net3, and Kagawa is then connected to partial dictionary net4.
Connected to entry point entry0 of When model matching is performed on a node in Kanagawa Prefecture, it means that the HMM parameter of the node number node2 of the standard model storage means 2 is used.

【００１０】図１９は、標準モデル記憶手段の内容を示
す図である。それぞれのノードに対応するＨＭＭのパラ
メータが記憶されている。ＨＭＭのパラメータとして
は、状態数、状態間の遷移確率などＨＭＭの照合演算に
必要なパラメータがあらかじめ入っているものとする。FIG. 19 is a diagram showing the contents of the standard model storage means. HMM parameters corresponding to each node are stored. It is assumed that the parameters required for the HMM collation operation, such as the number of states and the transition probability between states, are included in advance as the parameters of the HMM.

【００１１】認識が開始されるとまず、最初のノードで
ある無音を含むnet1が辞書記憶手段３より読み出され、
照合データ記憶手段４に必要な作業領域が取られる。こ
の様子を図２０に示す。認識処理が進められ、続くノー
ドの照合処理が必要となると、図２１に示すように、ne
t1に続くnet2の部分辞書が辞書記憶手段３より読み込ま
れ、照合データ記憶手段４に必要な作業領域が取られ
る。このように認識処理が進むにつれ、辞書記憶手段３
より必要な部分辞書が読み込まれ照合データ記憶手段４
に作業領域が取られてゆく。When the recognition is started, first, the net1 including the silence, which is the first node, is read from the dictionary storage means 3, and
A necessary work area is set in the collation data storage means 4. This is shown in FIG. When the recognition process proceeds and the subsequent node collation process is required, as shown in FIG.
The partial dictionary of net2 following t1 is read from the dictionary storage means 3, and a necessary work area is obtained in the collation data storage means 4. As the recognition process proceeds, the dictionary storage means 3
More necessary partial dictionaries are read and collation data storage means 4
Work area is taken up.

【００１２】次に、モデル照合の動作について説明す
る。Next, the operation of model matching will be described.

【００１３】図１５に示す音響分析手段１に音声信号が
入力されると一定時間間隔で音響分析が行われ、特徴ベ
クトルに変換され出力される。音声信号が入力されてい
る間、音響分析手段１からモデル照合手段５へは繰り返
し特徴ベクトルが送られる。When a sound signal is input to the sound analysis means 1 shown in FIG. 15, sound analysis is performed at fixed time intervals, converted into a feature vector and output. While the audio signal is being input, the feature vector is repeatedly sent from the acoustic analysis unit 1 to the model matching unit 5.

【００１４】モデル照合手段５の内部では、図２２に示
すフローチャートで示される処理が特徴ベクトルが入力
されるたびに繰り返し行われる。ステップ５０１はノー
ド内の状態に対する照合演算を行うノード内演算処理、
ステップ５０２はビームサーチの評価値を決定する評価
値決定処理、ステップ５０３はビームサーチ処理を行う
ビームサーチ演算処理、ステップ５０４はノード間の演
算を行うノード間演算処理である。In the model matching means 5, the processing shown in the flowchart of FIG. 22 is repeatedly performed every time a feature vector is input. Step 501 is an intra-node operation for performing a collation operation on the state in the node;
Step 502 is an evaluation value determination process for determining an evaluation value of a beam search, step 503 is a beam search operation process for performing a beam search process, and step 504 is an inter-node operation process for performing an operation between nodes.

【００１５】図２３は、ノードｎに対する照合データ記
憶手段４の中のデータ構造の詳細を説明する図である。
図２０と図２１では照合データ記憶手段４の内容をノー
ド単位で示していたが、図２３はノードの内容について
記述してある。FIG. 23 is a diagram for explaining details of the data structure in the collation data storage means 4 for the node n.
20 and 21 show the contents of the collation data storage means 4 in node units, but FIG. 23 describes the contents of the nodes.

【００１６】ノードｎは、Ｓ_n(1)、Ｓ_n(2)、Ｓ_n(3)の３
状態から構成されているものとする。ノードｎの左端の
状態Ｓ_n(0)は、ノード間演算のために用いられる疑似状
態であり、Ｓ_n(1)、Ｓ_n(2)、Ｓ_n(3)の３つの状態がモデ
ルの実体を表す。The node n has three of S _n (1), S _n (2), and S _n (3).
It is assumed that it is composed of states. The state S _n (0) at the left end of the node n is a pseudo state used for the operation between nodes, and three states of _Sn (1), _Sn (2), and _Sn (3) correspond to the model. Represents an entity.

【００１７】ａ_n(i,j)は状態ｉからｊへの遷移確率に基
づくペナルティー、ｂ_n(i)は状態iの出力確率に基づく
ペナルティーを表す。ペナルティーは、確率が小さいほ
ど大きな値となる。これらの遷移確率に基づくペナルテ
ィー、出力確率に基づくペナルティーはＨＭＭを用いる
認識では標準的に用いられるパラメータであり、ここで
は詳細な説明は省略する。これらのパラメータはあらか
じめ図１９に示すように標準モデル記憶手段２に記憶さ
れており、辞書記憶手段３でノードｎを含む部分辞書が
読み込まれた場合に、標準モデル記憶手段２から読み出
され、照合データ記憶手段４の中に図２３に示すように
作業領域を取られる。A _n (i, j) represents a penalty based on the transition probability from state i to j, and b _n (i) represents a penalty based on the output probability of state i. The penalty increases as the probability decreases. The penalty based on the transition probability and the penalty based on the output probability are parameters used as standard in recognition using the HMM, and a detailed description thereof will be omitted. These parameters are stored in advance in the standard model storage means 2 as shown in FIG. 19, and are read from the standard model storage means 2 when the partial dictionary including the node n is read in the dictionary storage means 3, A work area is set in the collation data storage means 4 as shown in FIG.

【００１８】ノード内演算処理（ステップ５０１）にお
いては、特徴ベクトルが入力されるたびに出力確率、遷
移確率を用いて次の式１に示すモデル照合演算が行わ
れ、Ｓ _n(i)の更新が行われる。Ｉ_nはノードｎの状態数
を表す。出力確率は、入力された特徴ベクトルに対して
そのぞれの状態の音響特徴を表す分布に対する確率演算
を行い求められる。In the intra-node operation processing (step 501),
In other words, every time a feature vector is input,
The model matching operation shown in the following equation 1 is performed using the transfer probability.
And S _n(i) is updated. I_nIs the number of states of node n
Represents The output probability is based on the input feature vector.
Probability calculation for distributions representing acoustic features in each state
Is required.

【００１９】Ｓ_n(i)＝max(Ｓ_n(i)＋ａ_n(i,i)),(Ｓ_n(i−1)＋ａ_n(i−1,i))＋ｂ_n(i) i＝1,Ｉ_n ・・・式１S _n (i) = max (S _n (i) + a _n (i, i)), (S _n (i−1) + a _n (i−1, i)) + b _n (i) i = 1, I _n ··· formula 1

【００２０】評価値決定処理（ステップ５０２）では、
以下の式２のようにしてビームサーチのための評価値En
ode(n)と、ノード間遷移を行うための評価値Earc(n)
と、ビームサーチの基準値Ebestnodeを求める。Ebestno
deはノードの中で最も良いビームサーチの評価値で代表
するものとする。In the evaluation value determination process (step 502),
The evaluation value En for beam search is calculated by the following equation 2.
ode (n) and evaluation value Earc (n) for performing transition between nodes
And a beam search reference value Ebestnode. Ebestno
de is represented by the evaluation value of the best beam search among the nodes.

【００２１】 Enode(n)＝max(Ｓ_n(i))_1<i<In Earc(n)＝Ｓ_n(Ｉ_n) Ebestnode＝max(Enode(n))_1<n<N ・・・式２Enode (n) = max (S _n (i)) _{1 <i <In} Earc (n) = S _n (I _n ) Ebestnode = max (Enode (n)) _{1 <n <N} 2

【００２２】ビームサーチ演算処理（ステップ５０３）
では、以下の条件を満たさない場合、そのノードの照合
演算を非活性化して、演算量を下げるものである。非活
性化されたノードに対するノード内演算処理は行われな
い。Ｔnodeはビームサーチのための閾値であり、予め定
められた規定値を用いる。Beam search calculation processing (step 503)
Then, when the following condition is not satisfied, the collation operation of the node is inactivated to reduce the amount of operation. Intra-node arithmetic processing is not performed on the inactivated node. Tnode is a threshold value for beam search, and uses a predetermined value.

【００２３】 Enode(n)＞Ebestnode−Ｔnode ・・・式３Enode (n)> Ebestnode−Tnode Equation 3

【００２４】続いて、ノード間演算処理（ステップ５０
４）では、次の式４の条件を満たし、作業領域が照合デ
ータ記憶手段４になかった場合には、辞書記憶手段３よ
り新たなノードを含む部分辞書が読み込まれ、新たなノ
ードの作業領域を確保し新たに読み込んだノードを活性
化してノード間照合演算を行う。Subsequently, the inter-node arithmetic processing (step 50)
In 4), if the condition of the following expression 4 is satisfied and the work area is not in the collation data storage means 4, the partial dictionary including the new node is read from the dictionary storage means 3, and the work area of the new node is read. Is secured, and the newly read node is activated to perform an inter-node matching operation.

【００２５】 Earc(n)＞Ebestnode−Ｔarc ・・・式４Earc (n)> Ebestnode−Tarc Equation 4

【００２６】続くノードをｎ＋１とした場合に、ノード
ｎ＋１の作業領域が取られた様子を図２４に示す。読み
込まれた直後、ノードｎ＋１の各状態のスコアは最も低
い値に初期化される。続いて、ノードｎとノードｎ＋１
の間でノード間の照合処理を行う。ノード間の照合処理
は以下の式５で示される。FIG. 24 shows a state where the work area of the node n + 1 is taken when the subsequent node is set to n + 1. Immediately after the reading, the score of each state of the node n + 1 is initialized to the lowest value. Then, node n and node n + 1
Between nodes are collated. The collation processing between nodes is represented by the following Expression 5.

【００２７】Ｓ_n+1(0)＝Ｓ_n(Ｉ_n) ・・・式５S _{n + 1} (0) = S _n (I _n ) Equation 5

【００２８】続くノードｎ＋１が照合データ記憶手段４
に作業領域が取られており、非活性化されていた場合に
は再度活性化させて、式５に示すノード間の照合演算を
行う。The next node n + 1 is the collation data storage means 4.
In the case where the work area has been set up and has been deactivated, the work area is activated again, and the collation operation between nodes shown in Expression 5 is performed.

【００２９】モデル照合手段５では、特徴ベクトルが入
力されるたびに繰り返し上記のように照合処理を進め、
音声信号のすべての特徴ベクトルに対する処理が終了し
たら、最終的に最もスコアの高かったノード列を認識結
果として出力する。The model matching means 5 repeatedly performs the matching process as described above every time a feature vector is input.
When the processing for all the feature vectors of the audio signal is completed, the node sequence having the highest score is output as the recognition result.

【００３０】[0030]

【発明が解決しようとする課題】上述したような従来の
音声認識装置では、日本全国住所など大語彙の認識を行
う場合には辞書記憶手段３のデータ容量が４０メガバイ
ト（Ｍｂｙｔｅ）にも達し、ＥＰＲＯＭのようなコスト
の高い記録媒体に記憶するには音声認識装置のコストが
高くなるという問題点があった。In the above-described conventional speech recognition apparatus, when recognizing a large vocabulary such as a nationwide address in Japan, the data capacity of the dictionary storage unit 3 reaches 40 megabytes (Mbytes). There is a problem that the cost of the speech recognition device is increased to store the data on a high-cost recording medium such as an EPROM.

【００３１】また、ＣＤ−ＲＯＭ、あるいはＤＶＤ−Ｒ
ＯＭのような読み出し速度が低速の記録媒体に記憶する
場合には、辞書の読み出し時間が長くかかり認識結果を
得るまでの認識応答時間が長くなるという問題点があっ
た。Also, a CD-ROM or a DVD-R
When the data is stored in a low-speed recording medium such as the OM, there is a problem that the dictionary reading time is long and the recognition response time until a recognition result is obtained is long.

【００３２】この発明は、前述した問題点を解決するた
めになされたもので、単語辞書を複数の記録媒体に記憶
することにより、コストを低く抑えながら認識応答時間
を短くすることができる音声認識システム及び方法並び
に音声認識プログラムを記録したコンピュータ読み取り
可能な記録媒体を得ることを目的とする。The present invention has been made in order to solve the above-mentioned problems. By storing a word dictionary on a plurality of recording media, it is possible to shorten the recognition response time while keeping costs low. It is an object of the present invention to obtain a computer-readable recording medium recording a system and a method and a speech recognition program.

【００３３】[0033]

【課題を解決するための手段】この発明の請求項１に係
る音声認識システムは、音声信号を入力し音響分析を行
い特徴ベクトルの時系列に変換して出力する音響分析手
段と、認識対象の標準モデルを記憶する標準モデル記憶
手段と、単語辞書を分割した複数の部分辞書を記憶する
辞書記憶手段と、照合処理に作業領域として照合データ
を記憶する照合データ記憶手段と、前記音響分析手段か
らの特徴ベクトルに対し、前記標準モデル及び前記単語
辞書を参照しながら照合処理を行い、認識結果を出力す
るモデル照合手段とを備えた音声認識システムにおい
て、前記辞書記憶手段を、高速に参照可能で使用頻度の
高い部分辞書を記憶する第一辞書記憶手段と、高速に参
照不可能で残りの使用頻度の低い部分辞書を記憶する第
二辞書記憶手段とから構成したものである。According to a first aspect of the present invention, there is provided a voice recognition system which receives a voice signal, performs a voice analysis, converts the voice signal into a time series of feature vectors, and outputs the time series. A standard model storage unit that stores a standard model, a dictionary storage unit that stores a plurality of partial dictionaries obtained by dividing a word dictionary, a collation data storage unit that stores collation data as a work area in the collation processing, and a sound analysis unit. In the speech recognition system including a model matching unit that performs a matching process on the feature vector with reference to the standard model and the word dictionary and outputs a recognition result, the dictionary storage unit can be referred to at high speed. First dictionary storage means for storing frequently used partial dictionaries, and second dictionary storage means for storing remaining infrequently used partial dictionaries that cannot be referenced at high speed. Are those that you configured.

【００３４】この発明の請求項２に係る音声認識システ
ムは、前記辞書記憶手段が、部分辞書を前記第一辞書記
憶手段又は前記第二辞書記憶手段のどちらに記憶してい
るかを示す記録媒体情報を記憶する記録媒体記憶手段を
さらに含むものである。According to a second aspect of the present invention, in the speech recognition system, the dictionary storage means indicates whether the partial dictionary is stored in the first dictionary storage means or the second dictionary storage means. Is further included in the storage medium storing means.

【００３５】この発明の請求項３に係る音声認識システ
ムは、前記第一辞書記憶手段及び前記第二辞書記憶手段
が、各ノードに対応して、次に接続される部分辞書が前
記第一辞書記憶手段又は前記第二辞書記憶手段のどちら
に記憶しているかを示す記録媒体情報を含む部分辞書を
記憶するものである。In the speech recognition system according to a third aspect of the present invention, the first dictionary storage means and the second dictionary storage means correspond to each node, and the next partial dictionary to be connected is the first dictionary. The storage unit stores a partial dictionary including recording medium information indicating which of the storage unit and the second dictionary storage unit is stored.

【００３６】この発明の請求項４に係る音声認識システ
ムは、前記第一辞書記憶手段及び前記第二辞書記憶手段
が、同じ記録媒体上にある依存関係の強い部分辞書をひ
とまとめのグループとして記憶し、前記モデル照合手段
が、ある部分辞書を参照する場合にその部分辞書を含む
グループをまとめて読み出し、照合処理を行うものであ
る。In the speech recognition system according to a fourth aspect of the present invention, the first dictionary storage means and the second dictionary storage means store partial dictionaries having a strong dependency on the same recording medium as a group. When the model matching means refers to a certain partial dictionary, a group including the partial dictionary is read out collectively and a matching process is performed.

【００３７】この発明の請求項５に係る音声認識システ
ムは、前記辞書記憶手段が、ある部分辞書を参照する場
合にまとめて転送されたその部分辞書を含むグループを
記憶するグループ記憶手段をさらに有し、前記モデル照
合手段が、前記グループ記憶手段に記憶されている前記
グループ内の部分辞書を個別に参照して照合処理を行う
ものである。According to a fifth aspect of the present invention, in the speech recognition system, the dictionary storage means further includes a group storage means for storing a group including the partial dictionary transferred together when referring to the partial dictionary. The model matching means performs the matching process by individually referring to the partial dictionaries in the group stored in the group storage means.

【００３８】この発明の請求項６に係る音声認識システ
ムは、前記辞書記憶手段が、前記第二辞書記憶手段から
前記照合データ記憶手段へ読み込んだ部分辞書の量を計
測する転送量計測手段をさらに有し、前記モデル照合手
段が、前記転送量計測手段による計測量と規定量を比較
し前記規定量以上の部分辞書を読み込まないように制御
するものである。In the speech recognition system according to a sixth aspect of the present invention, the dictionary storage means further includes a transfer amount measurement means for measuring an amount of the partial dictionary read from the second dictionary storage means to the collation data storage means. The model collating means compares the measured amount by the transfer amount measuring means with a specified amount, and controls so as not to read a partial dictionary having the specified amount or more.

【００３９】この発明の請求項７に係る音声認識システ
ムは、前記第一辞書記憶手段が、前記第二辞書記憶手段
に記憶されるべきグループの先頭部分のみから構成され
る部分辞書を新たなグループとして記憶し、前記第二辞
書記憶手段が、前記先頭部分が抜けた残りから構成され
る部分辞書を新たなグループとして記憶し、前記モデル
照合手段が、ある部分辞書を参照する場合にその部分辞
書を含むグループをまとめて読み出し、照合処理を行う
ものである。According to a seventh aspect of the present invention, in the voice recognition system according to the first aspect, the first dictionary storage unit may add a partial dictionary composed of only a head portion of a group to be stored in the second dictionary storage unit to a new group. The second dictionary storage means stores a partial dictionary composed of the remainder from which the leading part has been omitted as a new group, and when the model matching means refers to a certain partial dictionary, Are read out collectively and collation processing is performed.

【００４０】この発明の請求項８に係る音声認識方法
は、音声信号を入力して音響分析を行い特徴ベクトルの
時系列に変換し、前記特徴ベクトルに対し、認識対象の
標準モデル及び単語辞書を分割した複数の部分辞書を参
照しながら照合処理を行い、認識結果を出力する音声認
識方法において、前記照合処理において、最初に、高速
に参照可能で使用頻度の高い部分辞書を記憶する第一辞
書記憶手段から部分辞書を読み出す第１の部分辞書読出
ステップと、前記照合処理において、次に、高速に参照
不可能で残りの使用頻度の低い部分辞書を記憶する第二
辞書記憶手段又は前記第一辞書記憶手段から部分辞書を
読み出す第２の部分辞書読出ステップとを含むものであ
る。According to the speech recognition method of the present invention, a speech signal is inputted, acoustic analysis is performed, and the speech signal is converted into a time series of feature vectors. In a voice recognition method for performing a matching process while referring to a plurality of divided partial dictionaries and outputting a recognition result, in the matching process, first, a first dictionary that stores a frequently referenced partial dictionary that can be referred to at high speed. In the first partial dictionary reading step of reading the partial dictionary from the storage means, and in the collation processing, the second dictionary storage means or the first dictionary storage means for storing the remaining partial dictionaries which cannot be referred to at high speed and are not used frequently. Reading a partial dictionary from the dictionary storage means.

【００４１】この発明の請求項９に係る音声認識方法
は、前記第２の部分辞書読出ステップでは、次に接続さ
れる部分辞書が前記第一辞書記憶手段又は前記第二辞書
記憶手段のどちらに記憶しているかを示す記録媒体情報
に基いて、前記第一辞書記憶手段又は前記第二辞書記憶
手段から部分辞書を読み出すものである。In the speech recognition method according to a ninth aspect of the present invention, in the second partial dictionary reading step, a partial dictionary to be connected next is stored in either the first dictionary storage unit or the second dictionary storage unit. A partial dictionary is read from the first dictionary storage unit or the second dictionary storage unit based on recording medium information indicating whether the dictionary is stored.

【００４２】この発明の請求項１０に係る音声認識方法
は、前記第２の部分辞書読出ステップでは、ある部分辞
書を参照する場合に、依存関係の強い部分辞書をひとま
とめのグループとして記憶している第二辞書記憶手段か
らその部分辞書を含むグループをまとめて読み出すもの
である。According to a tenth aspect of the present invention, in the second partial dictionary reading step, when referring to a certain partial dictionary, the partial dictionaries having a strong dependency are stored as a group. A group including the partial dictionary is read out from the second dictionary storage means.

【００４３】この発明の請求項１１に係る音声認識方法
は、前記第２の部分辞書読出ステップでは、ある部分辞
書を参照する場合に、前記第二辞書記憶手段からその部
分辞書を含むグループをまとめて読み出してグループ記
憶手段に記憶するものである。In the speech recognition method according to an eleventh aspect of the present invention, in the second partial dictionary reading step, when a certain partial dictionary is referred to, a group including the partial dictionary is collected from the second dictionary storage means. The data is read out and stored in the group storage means.

【００４４】この発明の請求項１２に係る音声認識プロ
グラムを記録したコンピュータ読み取り可能な記録媒体
は、音声信号を入力し音響分析を行い特徴ベクトルの時
系列に変換して出力する音響分析手順と、認識対象の標
準モデルを記憶する標準モデル記憶領域と、単語辞書を
分割した複数の部分辞書を記憶する辞書記憶領域と、照
合処理に作業領域として照合データを記憶する照合デー
タ記憶領域と、前記音響分析手順からの特徴ベクトルに
対し、前記標準モデル及び前記単語辞書を参照しながら
照合処理を行い、認識結果を出力するモデル照合手順と
を含む音声認識プログラムを記録したコンピュータ読み
取り可能な記録媒体において、前記辞書記憶領域を、高
速に参照可能で使用頻度の高い部分辞書を記憶する第一
辞書記憶領域と、高速に参照不可能で残りの使用頻度の
低い部分辞書を記憶する第二辞書記憶領域とから構成し
たものである。According to a twelfth aspect of the present invention, there is provided a computer-readable recording medium storing a speech recognition program, comprising: a sound analysis step of inputting a sound signal, performing sound analysis, converting the sound signal into a time series of feature vectors, and outputting the result; A standard model storage area for storing a standard model to be recognized, a dictionary storage area for storing a plurality of partial dictionaries obtained by dividing the word dictionary, a collation data storage area for storing collation data as a work area in the collation processing, For a feature vector from the analysis procedure, a matching process is performed while referring to the standard model and the word dictionary, and a model recognition procedure for outputting a recognition result. The dictionary storage area, a first dictionary storage area that stores a partial dictionary that can be referred to at high speed and that is frequently used, Impossible reference speed is obtained by construction of a second dictionary storage area for storing the rest of the used infrequently moiety dictionary.

【００４５】この発明の請求項１３に係る音声認識プロ
グラムを記録したコンピュータ読み取り可能な記録媒体
は、前記辞書記憶領域が、部分辞書を前記第一辞書記憶
領域又は前記第二辞書記憶領域のどちらに記憶している
かを示す記録媒体情報を記憶する記録媒体記憶領域をさ
らに含むものである。According to a thirteenth aspect of the present invention, in the computer-readable recording medium storing the voice recognition program, the dictionary storage area stores the partial dictionary in either the first dictionary storage area or the second dictionary storage area. It further includes a recording medium storage area for storing recording medium information indicating whether the information is stored.

【００４６】この発明の請求項１４に係る音声認識プロ
グラムを記録したコンピュータ読み取り可能な記録媒体
は、前記第一辞書記憶領域及び前記第二辞書記憶領域
が、各ノードに対応して、次に接続される部分辞書が前
記第一辞書記憶領域又は前記第二辞書記憶領域のどちら
に記憶しているかを示す記録媒体情報を含む部分辞書を
記憶するものである。According to a fourteenth aspect of the present invention, there is provided a computer readable recording medium storing a speech recognition program, wherein the first dictionary storage area and the second dictionary storage area correspond to each node and are connected next. The storage unit stores a partial dictionary including recording medium information indicating whether the partial dictionary to be stored is stored in the first dictionary storage area or the second dictionary storage area.

【００４７】この発明の請求項１５に係る音声認識プロ
グラムを記録したコンピュータ読み取り可能な記録媒体
は、前記第一辞書記憶領域及び前記第二辞書記憶領域
が、同じ記録媒体上にある依存関係の強い部分辞書をひ
とまとめのグループとして記憶し、前記モデル照合手順
が、ある部分辞書を参照する場合にその部分辞書を含む
グループをまとめて読み出し、照合処理を行うものであ
る。According to a fifteenth aspect of the present invention, in the computer-readable recording medium recording the speech recognition program, the first dictionary storage area and the second dictionary storage area are on the same recording medium and have a strong dependency. The partial dictionaries are stored as a group, and when the model collation procedure refers to a certain partial dictionary, a group including the partial dictionary is collectively read and collation processing is performed.

【００４８】この発明の請求項１６に係る音声認識プロ
グラムを記録したコンピュータ読み取り可能な記録媒体
は、前記辞書記憶領域が、ある部分辞書を参照する場合
にまとめて転送されたその部分辞書を含むグループを記
憶するグループ記憶領域をさらに有し、前記モデル照合
手順が、前記グループ記憶領域に記憶されている前記グ
ループ内の部分辞書を個別に参照して照合処理を行うも
のである。A computer-readable recording medium on which a speech recognition program according to claim 16 of the present invention is recorded, wherein the dictionary storage area includes a group including a partial dictionary transferred together when referring to the partial dictionary. Is further provided, and the model collation procedure performs collation processing by individually referring to partial dictionaries in the group stored in the group memory area.

【００４９】この発明の請求項１７に係る音声認識プロ
グラムを記録したコンピュータ読み取り可能な記録媒体
は、前記第二辞書記憶領域から前記照合データ記憶領域
へ読み込んだ部分辞書の量を計測する転送量計測手順を
さらに有し、前記モデル照合手順が、前記転送量計測手
順による計測量と規定量を比較し前記規定量以上の部分
辞書を読み込まないように制御するものである。A computer readable recording medium having recorded thereon a speech recognition program according to claim 17 of the present invention, wherein a transfer amount measurement for measuring an amount of a partial dictionary read from said second dictionary storage area to said collation data storage area. The method further includes a step of comparing the measured amount in the transfer amount measuring step with a specified amount, and controlling not to read a partial dictionary having the specified amount or more.

【００５０】この発明の請求項１８に係る音声認識プロ
グラムを記録したコンピュータ読み取り可能な記録媒体
は、前記第一辞書記憶領域が、前記第二辞書記憶領域に
記憶されるべきグループの先頭部分のみから構成される
部分辞書を新たなグループとして記憶し、前記第二辞書
記憶領域が、前記先頭部分が抜けた残りから構成される
部分辞書を新たなグループとして記憶し、前記モデル照
合手順が、ある部分辞書を参照する場合にその部分辞書
を含むグループをまとめて読み出し、照合処理を行うも
のである。A computer-readable recording medium having recorded thereon a speech recognition program according to claim 18 of the present invention, wherein the first dictionary storage area starts from only the head of a group to be stored in the second dictionary storage area. The second dictionary storage area stores a partial dictionary composed of the remaining part from which the leading part has been omitted as a new group, and the model collation procedure includes a step of: When a dictionary is referred to, a group including the partial dictionary is read out collectively and a collation process is performed.

【００５１】[0051]

【発明の実施の形態】実施の形態１．この発明の実施の
形態１に係る音声認識システムについて図面を参照しな
がら説明する。図１は、この発明の実施の形態１に係る
音声認識システムの構成を示すブロック図である。な
お、各図中、同一符号は同一又は相当部分を示す。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiment 1 A speech recognition system according to Embodiment 1 of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a speech recognition system according to Embodiment 1 of the present invention. In the drawings, the same reference numerals indicate the same or corresponding parts.

【００５２】図１において、１は音声信号を入力し音響
分析を行い特徴ベクトルの時系列に変換する音響分析手
段、２は認識対象の標準モデルを記憶する標準モデル記
憶手段、４は照合処理に作業領域として照合データを記
憶する照合データ記憶手段、５は音響分析手段１からの
特徴ベクトルに対し、標準モデルと辞書を参照しながら
照合処理を行い、認識結果を出力するモデル照合手段で
ある。In FIG. 1, reference numeral 1 denotes a sound analysis means for inputting an audio signal, performs sound analysis and converts the sound signal into a time series of feature vectors, 2 denotes a standard model storage means for storing a standard model to be recognized, and 4 denotes a matching process. A collation data storage unit 5 that stores collation data as a work area performs collation processing on the feature vector from the acoustic analysis unit 1 with reference to the standard model and the dictionary, and outputs model recognition results.

【００５３】また、同図において、６は高速に参照可能
な記録媒体を用いる第一辞書記憶手段、７は高速では参
照不可能な記録媒体を用いる第二辞書記憶手段、８は部
分辞書を記録した記録媒体（記憶手段）を記憶する記録
媒体記憶手段である。In FIG. 6, reference numeral 6 denotes a first dictionary storage unit using a recording medium that can be referred to at high speed, 7 denotes a second dictionary storage unit that uses a recording medium that cannot be referred to at high speed, and 8 denotes a partial dictionary. Storage means for storing the recorded recording medium (storage means).

【００５４】以下、第一辞書記憶手段６をＥＥＰＲＯＭ
（Electrical Erasable Programmable Read Only Memor
y）、第二辞書記憶手段７をＣＤ−ＲＯＭ（Compact Dis
k-Read Only Memory）として説明を行う。ＥＥＰＲＯＭ
は、高速に参照可能な記録媒体であるが、コストが非常
に高いという欠点がある。また、ＣＤ−ＲＯＭは、非常
に大容量でコストが安いが読み出し速度が遅いという欠
点がある。Hereinafter, the first dictionary storage means 6 will be referred to as an EEPROM.
(Electrical Erasable Programmable Read Only Memor
y), the second dictionary storage means 7 is stored in a CD-ROM (Compact Dis
The description will be made as “k-Read Only Memory”. EEPROM
Is a recording medium that can be referenced at high speed, but has the disadvantage that the cost is very high. Further, the CD-ROM has an extremely large capacity and a low cost, but has a drawback that the reading speed is low.

【００５５】住所などの大語彙の認識を行う場合には辞
書の大きさが膨大となり、ＥＥＰＲＯＭに全部記憶して
おくことはコストが大きくなる。さらに、すべてをＣＤ
−ＲＯＭに記憶した場合には、ＣＤ−ＲＯＭの参照回数
が大きくなり、参照によるオーバーヘッドにより認識す
る時間が非常に長くなり、音声認識システムとしての認
識応答性に問題が生じる。そのため、辞書を記憶する記
憶媒体を２種類用い、使用頻度の高い部分辞書を選択し
て選択された部分辞書のみＥＥＰＲＯＭに記憶し、使用
頻度の低い部分辞書はコストの安いＣＤ−ＲＯＭに記憶
するものとする。When recognizing a large vocabulary such as an address, the size of the dictionary becomes enormous, and storing all the data in the EEPROM increases the cost. Plus everything on CD
-When stored in the ROM, the number of times the CD-ROM is referred to increases, and the time required for recognition increases due to the overhead caused by the reference, which causes a problem in recognition responsiveness as a speech recognition system. Therefore, two types of storage media for storing dictionaries are used, a frequently used partial dictionary is selected and only the selected partial dictionary is stored in the EEPROM, and the infrequently used partial dictionary is stored on a low-cost CD-ROM. Shall be.

【００５６】図２は、この実施の形態１に係る音声認識
システムの単語辞書の構成を示す図である。図２におい
て、実線で囲ってある部分辞書net1、net2、net9、net1
2が第一辞書記憶手段６（ＥＥＰＲＯＭ）に記憶され、
残りの点線で囲まれている部分辞書は第二辞書記憶手段
７（ＣＤ−ＲＯＭ）に記憶するものとする。FIG. 2 is a diagram showing the configuration of the word dictionary of the speech recognition system according to the first embodiment. In FIG. 2, the partial dictionaries net1, net2, net9, net1 enclosed by solid lines
2 is stored in the first dictionary storage means 6 (EEPROM),
The remaining partial dictionaries surrounded by dotted lines are stored in the second dictionary storage means 7 (CD-ROM).

【００５７】部分辞書net1、net2、net12は認識する場
合に必ず必要となるため、高速に参照可能な記録媒体で
ある第一辞書記憶手段６に記録する。また、部分辞書ne
t9は多くの部分辞書から接続されているため参照される
可能性が高いものとして同じくに記録する。図２におい
て、破線で囲まれた四角で示されている残りの部分辞書
は第二辞書手段７に記憶しておき、必要に応じて読み込
む構成とする。Since the partial dictionaries net1, net2, and net12 are always required for recognition, they are recorded in the first dictionary storage means 6, which is a recording medium that can be referenced at high speed. Also, the partial dictionary ne
Since t9 is connected from many partial dictionaries, it is also recorded as having a high possibility of being referred to. In FIG. 2, the remaining partial dictionaries indicated by squares surrounded by broken lines are stored in the second dictionary means 7, and are read as needed.

【００５８】音声認識処理の基本動作は従来例の説明と
同様であるため、ここでは説明を省略する。上記のよう
に構成することにより、神奈川県の発声であれば神奈川
県のノードが式４の条件を満たし、続く部分辞書net3を
読み込むことができる。また、香川県の発声であれば香
川県のノードが式４の条件を満たし続く部分辞書net4を
読み込むことができる。The basic operation of the speech recognition process is the same as that of the conventional example, and the description is omitted here. With the above configuration, if the utterance is Kanagawa Prefecture, the node of Kanagawa Prefecture satisfies the condition of Expression 4, and the subsequent partial dictionary net3 can be read. Further, in the case of the utterance of Kagawa Prefecture, the partial dictionary net4 in which the node of Kagawa Prefecture satisfies the condition of Expression 4 can be read.

【００５９】これらの部分辞書がどの記録媒体に入って
いるかは記録媒体記憶手段８を参照して決定する。図３
は、記録媒体記憶手段８の内容を示したものであり、認
識に先立ち記録媒体記憶手段８に記憶されているものと
する。Which recording medium contains these partial dictionaries is determined by referring to the recording medium storage means 8. FIG.
Indicates the contents of the storage medium storage means 8 and is assumed to be stored in the storage medium storage means 8 prior to recognition.

【００６０】なお、別の単語辞書の例についても説明す
る。図４は、この実施の形態１に係る音声認識システム
の別な単語辞書の例を示す図である。これは神奈川県に
在住している使用者がこの音声認識装置を使う場合、神
奈川県の住所を検索する頻度が高くなるため、神奈川県
の地名を含む部分辞書net3、net5、net6を第一辞書記憶
手段６（ＥＥＰＲＯＭ）に置く構成としたものである。An example of another word dictionary will be described. FIG. 4 is a diagram showing an example of another word dictionary of the speech recognition system according to the first embodiment. This means that if a user residing in Kanagawa Prefecture uses this speech recognition device, the frequency of searching for an address in Kanagawa Prefecture will increase, so the partial dictionaries net3, net5, and net6 containing the names of places in Kanagawa Prefecture will be the first dictionary It is configured to be stored in the storage means 6 (EEPROM).

【００６１】また、音声認識システムを使用中に各部分
辞書の使用頻度を学習して、第一辞書記憶手段６（ＥＥ
ＰＲＯＭ）の部分辞書が参照頻度の高いものとなるよう
内容を書き換えることも有効である。これらの機能を有
する音声認識システムも本発明の範疇とする。While using the speech recognition system, the use frequency of each partial dictionary is learned, and the first dictionary storage means 6 (EE
It is also effective to rewrite the contents so that the partial dictionary of (PROM) has a high reference frequency. A speech recognition system having these functions is also included in the scope of the present invention.

【００６２】上記の説明ではＥＥＰＲＯＭとＣＤ−ＲＯ
Ｍを記憶媒体として用いる場合を一例として説明した
が、ＥＥＰＲＯＭの代わりにフラッシュＲＯＭ、ＥＰＲ
ＯＭ、ＲＯＭ、ＲＡＭなど、また、ＣＤ−ＲＯＭの代わ
りにＣＤ−ＲＷ（書き換え可能ＣＤ）、ＤＶＤ−ＲＯ
Ｍ、ＤＶＤ−ＲＡＭ、ＤＶＤ−ＲＷ（書き換え可能ＤＶ
Ｄ）、ハードディスクなどであっても良く、同様に効果
を奏する。In the above description, the EEPROM and the CD-RO
M has been described as an example of a storage medium, but instead of an EEPROM, a flash ROM, an EPR
OM, ROM, RAM, etc. Also, instead of CD-ROM, CD-RW (rewritable CD), DVD-RO
M, DVD-RAM, DVD-RW (rewritable DV
D), a hard disk or the like may be used, and the same effect can be obtained.

【００６３】また、上記の説明で用いた部分辞書の分類
は一例を示したものであり、他の分類の方法でも良い。The classification of the partial dictionaries used in the above description is merely an example, and other classification methods may be used.

【００６４】また、上記の説明では２種類の記憶媒体を
用いる方法について説明したが、３種類以上の記憶媒体
を用いてもよく、同様に効果を奏する。In the above description, a method using two types of storage media has been described. However, three or more types of storage media may be used, and the same effect can be obtained.

【００６５】さらに、上記の説明ではノードを単語とし
て説明したが、ノードは音素片、音素、半音節、音節、
形態素などの単位でも良く、同様に効果を奏する。Further, in the above description, nodes are described as words, but nodes are phonemes, phonemes, semi-syllables, syllables,
A unit such as a morpheme may be used, and the same effect can be obtained.

【００６６】また、音声認識方式もＨＭＭとして説明し
たが、これはＤＰ（Dynamic Programming）マッチング
やニューラルネットを用いる音声認識方式でも良く同様
に効果を奏する。Although the speech recognition method has been described as an HMM, the same effect may be obtained by a DP (Dynamic Programming) matching or a speech recognition method using a neural network.

【００６７】実施の形態２．この発明の実施の形態２に
係る音声認識システムについて図面を参照しながら説明
する。図５は、この発明の実施の形態２に係る音声認識
システムの構成を示すブロック図である。Embodiment 2 Embodiment 2 A speech recognition system according to Embodiment 2 of the present invention will be described with reference to the drawings. FIG. 5 is a block diagram showing a configuration of a speech recognition system according to Embodiment 2 of the present invention.

【００６８】上記の実施の形態１に係る音声認識システ
ムでは、部分辞書の個数が多くなった場合に記録媒体記
憶手段８のテーブルサイズが大きくなるという欠点があ
る。ここでは、これを解決する方法として次に続く記録
媒体を部分辞書の内部に記憶した音声認識システムにつ
いて説明する。The speech recognition system according to the first embodiment has a disadvantage that the table size of the recording medium storage means 8 increases when the number of partial dictionaries increases. Here, as a method for solving this, a speech recognition system in which the following recording medium is stored in a partial dictionary will be described.

【００６９】図５において、６は高速に参照可能な記録
媒体を用いる第一辞書記憶手段、７は高速では参照不可
能な記録媒体を用いる第二辞書記憶手段である。In FIG. 5, reference numeral 6 denotes a first dictionary storage unit that uses a recording medium that can be referenced at high speed, and 7 denotes a second dictionary storage unit that uses a recording medium that cannot be referenced at high speed.

【００７０】その他の部分は従来の音声認識装置と同一
のため、説明を省略する。以下、上記の実施の形態１と
同様に、第一辞書記憶手段６をＥＥＰＲＯＭ、第二辞書
記憶手段７をＣＤ−ＲＯＭとして説明を行う。ただし、
実施の形態１とは異なり、部分辞書の中に次に接続され
る部分辞書の記録媒体の情報を記録する。The other parts are the same as those of the conventional speech recognition apparatus, and the description is omitted. Hereinafter, as in the first embodiment, the first dictionary storage unit 6 will be described as an EEPROM, and the second dictionary storage unit 7 will be described as a CD-ROM. However,
Unlike the first embodiment, the information of the recording medium of the partial dictionary to be connected next is recorded in the partial dictionary.

【００７１】本実施の形態２における部分辞書の一例を
net2を例として説明をする。図６に部分辞書net2の構造
を示す。図１８に示す従来の音声認識装置の部分辞書の
構造に比べ、図６には次に続く部分辞書の記憶されてい
る記録媒体の情報がＣＤ−ＲＯＭと追加されている。こ
の情報により、続く部分辞書がどの記憶媒体にあるのか
判定することが可能である。An example of the partial dictionary according to the second embodiment is
This is explained using net2 as an example. FIG. 6 shows the structure of the partial dictionary net2. Compared with the structure of the partial dictionary of the conventional speech recognition apparatus shown in FIG. 18, in FIG. 6, information on the recording medium storing the next partial dictionary is added to the CD-ROM. Based on this information, it is possible to determine which storage medium has the subsequent partial dictionary.

【００７２】このような構成にすることにより、上記の
実施の形態１では必要であった記録媒体記憶手段８が不
要となる。By adopting such a configuration, the recording medium storage means 8, which is required in the first embodiment, becomes unnecessary.

【００７３】上記の説明ではＥＥＰＲＯＭとＣＤ−ＲＯ
Ｍを記憶媒体として用いる場合を一例として説明した
が、ＥＥＰＲＯＭの代わりにフラッシュＲＯＭ、ＥＰＲ
ＯＭ、ＲＯＭ、ＲＡＭなど、また、ＣＤ−ＲＯＭの代わ
りにＣＤ−ＲＷ（書き換え可能ＣＤ）、ＤＶＤ−ＲＯ
Ｍ、ＤＶＤ−ＲＡＭ、ＤＶＤ−ＲＷ（書き換え可能ＤＶ
Ｄ）、ハードディスクなどであっても良く、同様に効果
を奏する。In the above description, the EEPROM and the CD-RO
M has been described as an example of a storage medium, but instead of an EEPROM, a flash ROM, an EPR
OM, ROM, RAM, etc. Also, instead of CD-ROM, CD-RW (rewritable CD), DVD-RO
M, DVD-RAM, DVD-RW (rewritable DV
D), a hard disk or the like may be used, and the same effect can be obtained.

【００７４】また、上記の説明で用いた部分辞書の分類
は一例を示したものであり、他の分類の方法でも良い。The classification of the partial dictionaries used in the above description is merely an example, and other classification methods may be used.

【００７５】また、上記の説明では２種類の記憶媒体を
用いる方法について説明したが、３種類以上の記憶媒体
を用いてもよく同様に効果を奏する。In the above description, a method using two types of storage media has been described. However, three or more types of storage media may be used, and the same effect can be obtained.

【００７６】さらに、上記の説明ではノードを単語とし
て説明したが、ノードは音素片、音素、半音節、音節、
形態素などの単位でも良く、同様に効果を奏する。Further, in the above description, the node has been described as a word, but the node is a phoneme, a phoneme, a half syllable, a syllable,
A unit such as a morpheme may be used, and the same effect can be obtained.

【００７７】また、音声認識方式もＨＭＭとして説明し
たが、これはＤＰ（Dynamic Programming）マッチング
やニューラルネットを用いる音声認識方式でも良く、同
様に効果を奏する。Although the speech recognition method has been described as an HMM, it may be a DP (Dynamic Programming) matching or a speech recognition method using a neural network, and the same effect can be obtained.

【００７８】実施の形態３．この実施の形態３に係る音
声認識システムの構成は、実施の形態２に係る音声認識
システムの構成と同様のため、ここでは説明を省略す
る。Embodiment 3 The configuration of the speech recognition system according to the third embodiment is the same as the configuration of the speech recognition system according to the second embodiment, and a description thereof will not be repeated.

【００７９】ＣＤ−ＲＯＭは、１回の参照に時間がかか
るという問題がある。上記の実施の形態１及び２ではＣ
Ｄ−ＲＯＭ上の部分辞書の個数が多くなり、ＣＤ−ＲＯ
Ｍの参照回数が多くなるという問題があった。本実施の
形態３はこの問題を回避するために、同じ記録媒体に記
憶してある依存関係の強い部分辞書をひとまとめのグル
ープとして記憶して、参照する場合にまとめて読み出す
ことを特徴とする。The CD-ROM has a problem that it takes time for one reference. In the first and second embodiments, C
The number of partial dictionaries on the D-ROM increases, and CD-RO
There has been a problem that the number of references to M is increased. In order to avoid this problem, the third embodiment is characterized in that partial dictionaries having a strong dependence stored on the same recording medium are stored as a group, and are read out collectively when referred to.

【００８０】住所認識の場合は地名部分が木構造になる
ため、部分辞書の依存関係が明確である。例えば、ＣＤ
−ＲＯＭに記憶されている同一の県の下の地名をまとめ
てひとつのグループとした辞書を図７に示す。図中、一
点鎖線で囲まれた部分辞書群をグループとすることを表
している。In the case of address recognition, since the place name portion has a tree structure, the dependency of the partial dictionary is clear. For example, CD
FIG. 7 shows a dictionary in which place names under the same prefecture stored in the ROM are grouped into one group. In the drawing, a partial dictionary group surrounded by a chain line is represented as a group.

【００８１】図８は、部分辞書net2の構造を示す図であ
る。図６に比べ、grp3、grp4というグループ番号が付加
されている。これにより、続く部分辞書のグループ番号
を知ることが可能である。実施の形態１の記録媒体記憶
手段８のように独立したグループ番号を記憶するテーブ
ルを作成しても、同様の効果がある。FIG. 8 is a diagram showing the structure of the partial dictionary net2. 6, group numbers grp3 and grp4 are added. Thereby, it is possible to know the group number of the subsequent partial dictionary. The same effect can be obtained by creating a table for storing independent group numbers as in the recording medium storage unit 8 of the first embodiment.

【００８２】次に、動作について説明する。照合データ
記憶手段４上に部分辞書net1とnet2が取り込まれている
ものとする。発声が「香川県大川郡長尾町西」であった
場合、ノード「香川県」が式４の条件に当てはまったと
きは、次につながるグループgrp4を参照する。Next, the operation will be described. It is assumed that the partial dictionaries net1 and net2 are fetched on the collation data storage means 4. If the utterance is “Nagao-cho, Okawa-gun, Kagawa prefecture”, and the node “Kagawa prefecture” satisfies the condition of Expression 4, the group grp4 connected next is referred to.

【００８３】この時点での照合データ記憶手段４の内容
を図９に示す。図中、実線で囲まれているノードは活性
化されているものとし、破線で囲まれているノードは非
活性化されていることを表す。グループgrp4には部分辞
書net7、net8、net10、net11が含まれているため、まと
めて読み出さて、照合データ記憶手段４に作業領域が取
られる。ただし、グループgrp4内のノードであっても
「志度町」、「長尾町」以外のノードは式４の条件に合
わないため、ノード間転送が生じないため、この時点で
は活性化されない。FIG. 9 shows the contents of the collation data storage means 4 at this time. In the figure, a node surrounded by a solid line indicates that the node is activated, and a node surrounded by a broken line indicates that the node is deactivated. Since the group grp4 includes the partial dictionaries net7, net8, net10, and net11, they are read out collectively and a work area is set in the collation data storage means 4. However, even the nodes in the group grp4, the nodes other than “Shido-cho” and “Nagao-cho” do not meet the condition of Expression 4, so that no inter-node transfer occurs, so that they are not activated at this time.

【００８４】認識処理がさらに進み、大川郡のノードが
式４の条件に合致した場合、部分辞書net8に対する照合
処理が必要となるが、既に作業領域に取りこまれている
ため、改めて第二辞書記憶手段７（ＣＤ−ＲＯＭ）を参
照する必要はない。If the recognition process proceeds further and the node of Okawa-gun meets the condition of the expression 4, the collation process for the partial dictionary net8 is necessary. However, since the partial dictionary net8 has already been incorporated in the work area, the second dictionary is renewed. It is not necessary to refer to the storage means 7 (CD-ROM).

【００８５】上記のように構成することにより、照合処
理で将来必要となる部分辞書をまとめて読み出す効果が
あり、第二辞書記憶手段７（ＣＤ−ＲＯＭ）の参照回数
を少なく抑えられる。With the above configuration, there is an effect that the partial dictionaries which will be required in the collation processing in the future are read out collectively, and the number of times of reference to the second dictionary storage means 7 (CD-ROM) can be reduced.

【００８６】上記の説明ではＣＤ−ＲＯＭを記憶媒体と
して用いる場合を一例として説明したが、ＣＤ−ＲＯＭ
の代わりにＣＤ−ＲＷ（書き換え可能ＣＤ）、ＤＶＤ−
ＲＯＭ、ＤＶＤ−ＲＡＭ、ＤＶＤ−ＲＷ（書き換え可能
ＤＶＤ）、ハードディスクなどであっても良く同様に効
果を奏する。In the above description, the case where a CD-ROM is used as a storage medium has been described as an example.
CD-RW (rewritable CD), DVD-
A ROM, a DVD-RAM, a DVD-RW (rewritable DVD), a hard disk, or the like may be used, and the same effect can be obtained.

【００８７】また、上記の説明で用いた部分辞書の分類
は一例を示したものであり、他の分類の方法でも良い。The classification of the partial dictionaries used in the above description is merely an example, and other classification methods may be used.

【００８８】また、上記の説明では２種類の記憶媒体を
用いる方法について説明したが、３種類以上の記憶媒体
を用いてもよく同様に効果を奏する。In the above description, a method using two types of storage media has been described. However, three or more types of storage media may be used, and the same effect can be obtained.

【００８９】さらに、上記の説明ではノードを単語とし
て説明したが、ノードは音素片、音素、半音節、音節、
形態素などの単位でも良く、同様に効果を奏する。Further, in the above description, the node has been described as a word, but the node is a phoneme piece, a phoneme, a half syllable, a syllable,
A unit such as a morpheme may be used, and the same effect can be obtained.

【００９０】また、音声認識方式もＨＭＭとして説明し
たが、これはＤＰ（Dynamic Programming）マッチング
やニューラルネットを用いる音声認識方式でも良く同様
に効果を奏する。Although the speech recognition system has been described as an HMM, a DP (Dynamic Programming) matching or a speech recognition system using a neural network may be similarly effective.

【００９１】実施の形態４．上記の実施の形態３では部
分辞書をグループ化することにより、第二辞書記憶手段
７（ＣＤ−ＲＯＭ）の参照回数を減少できたが、照合デ
ータ記憶手段４にまとめて部分辞書の作業領域を取るこ
とが必要なため、照合データ記憶手段４のメモリ量が多
いという欠点があった。部分辞書のデータ形式より、照
合データ記憶手段４の作業領域のデータ形式の方がかな
り大きいためである。本実施の形態４では高速に参照す
ることが可能な中間的な記憶領域を設けることにより、
第二辞書記憶手段７のグループの部分辞書を転送した
後、照合処理で参照された部分辞書のみ照合データ記憶
手段４に作業領域を取ることを特徴とする。Embodiment 4 In the third embodiment, the number of times of reference to the second dictionary storage unit 7 (CD-ROM) can be reduced by grouping the partial dictionaries. There is a drawback that the amount of memory of the collation data storage means 4 is large because it is necessary to take the data. This is because the data format of the work area of the collation data storage unit 4 is considerably larger than the data format of the partial dictionary. In the fourth embodiment, by providing an intermediate storage area that can be referenced at high speed,
After the partial dictionaries of the group of the second dictionary storage means 7 are transferred, only the partial dictionaries referred to in the collation processing have a work area in the collation data storage means 4.

【００９２】図１０は、この発明の実施の形態４に係る
音声認識システムの構成を示す図である。FIG. 10 is a diagram showing a configuration of a speech recognition system according to Embodiment 4 of the present invention.

【００９３】同図において、６はグループにまとめられ
ている部分辞書を記憶する高速に参照可能な記録媒体を
用いる第一辞書記憶手段、７はグループにまとめられて
いる部分辞書を記憶する高速では参照不可能な記録媒体
を用いる第二辞書記憶手段、９は第二辞書記憶手段７よ
りグループ単位で部分辞書郡を転送し、記憶する高速に
参照可能なグループ記憶手段である。なお、５はグルー
プ記憶手段９の部分辞書を個別に参照して照合処理を行
うモデル照合手段である。In the figure, reference numeral 6 denotes a first dictionary storage means which uses a high-speed referable recording medium for storing partial dictionaries organized in groups, and 7 denotes a high-speed storage means for storing partial dictionaries organized in groups. A second dictionary storage unit 9 using a non-referenceable recording medium is a high-speed reference group storage unit that transfers and stores partial dictionary groups in groups from the second dictionary storage unit 7. Reference numeral 5 denotes a model matching unit that performs a matching process by individually referring to the partial dictionaries of the group storage unit 9.

【００９４】以下、第二辞書記憶手段７をＣＤ−ＲＯ
Ｍ、グループ記憶手段９をＲＡＭとして説明を行う。Hereinafter, the second dictionary storage means 7 is stored in the CD-RO
M, the group storage means 9 will be described as a RAM.

【００９５】次に、動作について説明する。認識処理の
途中で香川県のノードが式４の条件を満たし、グループ
grp4が参照されたとする。この段階のグループ記憶手段
９と照合データ記憶手段４の内容を図１１に示す。同図
において、上段の（ａ）にはグループ記憶手段９の内
容、下段の（ｂ）には照合データ記憶手段４の内容がそ
れぞれ示されている。Next, the operation will be described. During the recognition process, the node in Kagawa Prefecture satisfies the condition of Equation 4 and the group
Suppose grp4 is referenced. FIG. 11 shows the contents of the group storage means 9 and the collation data storage means 4 at this stage. In the figure, the upper part (a) shows the contents of the group storage means 9, and the lower part (b) shows the contents of the collation data storage means 4, respectively.

【００９６】まず、グループgrp4の部分辞書郡が第二辞
書記憶手段７（ＣＤ−ＲＯＭ）よりグループ記憶手段９
（ＲＡＭ）に転送される。続いてグループ記憶手段９
（ＲＡＭ）上の必要な部分辞書net4のみが参照され照合
データ記憶手段４に作業領域をとられる。First, the partial dictionary group of the group grp4 is stored in the group storage unit 9 from the second dictionary storage unit 7 (CD-ROM).
(RAM). Subsequently, the group storage means 9
Only the necessary partial dictionary net4 on the (RAM) is referred to, and a work area is set in the collation data storage means 4.

【００９７】このように構成することにより、第二辞書
記憶手段７（ＣＤ−ＲＯＭ）の参照回数を減少させたま
まで、照合データ記憶手段４のメモリ量を抑制すること
ができる。With such a configuration, the amount of memory of the collation data storage unit 4 can be suppressed while the number of times of reference to the second dictionary storage unit 7 (CD-ROM) is reduced.

【００９８】上記の説明ではＲＡＭ、ＥＥＰＲＯＭとＣ
Ｄ−ＲＯＭを記憶媒体として用いる場合を一例として説
明したが、ＲＡＭやＥＥＰＲＯＭの代わりにフラッシュ
ＲＯＭ、ＥＰＲＯＭ、ＲＯＭ、ＲＡＭなど、また、ＣＤ
−ＲＯＭの代わりにＣＤ−ＲＷ（書き換え可能ＣＤ）、
ＤＶＤ−ＲＯＭ、ＤＶＤ−ＲＡＭ、ＤＶＤ−ＲＷ（書き
換え可能ＤＶＤ）、ハードディスクなどであっても良く
同様に効果を奏する。In the above description, RAM, EEPROM and C
Although the case where a D-ROM is used as a storage medium has been described as an example, a flash ROM, an EPROM, a ROM, a RAM, or the like may be used instead of a RAM or an EEPROM.
-CD-RW (rewritable CD) instead of ROM,
A DVD-ROM, a DVD-RAM, a DVD-RW (rewritable DVD), a hard disk, or the like may be used, and the same effects can be obtained.

【００９９】また、上記の説明で用いた部分辞書の分類
は一例を示したものであり、他の分類の方法でも良い。The classification of the partial dictionaries used in the above description is an example, and other classification methods may be used.

【０１００】また、上記の説明では３種類の記憶媒体を
用いる方法について説明したが、４種類以上の記憶媒体
を用いてもよく同様に効果を奏する。In the above description, a method using three types of storage media has been described. However, four or more types of storage media may be used, and the same effect can be obtained.

【０１０１】さらに、上記の説明ではノードを単語とし
て説明したが、ノードは音素片、音素、半音節、音節、
形態素などの単位でも良く、同様に効果を奏する。Further, in the above description, the node has been described as a word, but the node is a phoneme, a phoneme, a half syllable, a syllable,
A unit such as a morpheme may be used, and the same effect can be obtained.

【０１０２】また、音声認識方式もＨＭＭとして説明し
たが、これはＤＰ（Dynamic Programming）マッチング
やニューラルネットを用いる音声認識方式でも良く同様
に効果を奏する。Although the speech recognition method has been described as an HMM, the same effect can be obtained by a DP (Dynamic Programming) matching or a speech recognition method using a neural network.

【０１０３】実施の形態５．認識の時にＣＤ−ＲＯＭの
参照回数や、転送量が多くなりすぎると、認識処理のオ
ーバーヘッドが大きくなり、認識結果がえられるまでの
反応時間が長くなる場合がある。このため、この実施の
形態５では、ＣＤ−ＲＯＭの参照回数、あるいは転送量
の上限を設け、規定の反応時間を確保するものである。Embodiment 5 FIG. If the number of references to the CD-ROM or the transfer amount becomes too large during the recognition, the overhead of the recognition process increases, and the reaction time until a recognition result is obtained may be long. For this reason, in the fifth embodiment, the upper limit of the number of times of reference to the CD-ROM or the transfer amount is set to secure a specified reaction time.

【０１０４】ここでは、グループの参照回数を計測し、
規定の値に達すると参照を禁止する音声認識システムを
一例として説明を行う。また、ＣＤ−ＲＯＭの参照回数
の上限値を規定する場合を一例として説明を行う。Here, the number of times of reference to the group is measured,
A description will be given of an example of a voice recognition system that prohibits reference when a predetermined value is reached. The case where the upper limit value of the number of times of reference to the CD-ROM is specified will be described as an example.

【０１０５】図１２は、この発明の実施の形態５に係る
音声認識システムの構成を示すブロック図である。同図
において、１０は第二辞書記憶手段７の参照回数を計測
する転送量計測手段である。なお、他の手段は実施の形
態３と同様のため、ここでは説明を省略する。FIG. 12 is a block diagram showing a configuration of a speech recognition system according to Embodiment 5 of the present invention. In the figure, reference numeral 10 denotes a transfer amount measuring means for measuring the number of times of reference of the second dictionary storage means 7. The other means are the same as those in the third embodiment, and the description is omitted here.

【０１０６】ノード間の照合演算を式４の条件で行って
いたが、新たに第二辞書記憶手段７（ＣＤ−ＲＯＭ）に
グループの参照を必要とする場合には以下の式６で示す
条件で行うものとする。The collation operation between nodes has been performed under the condition of Expression 4, but when it is necessary to newly refer to the group in the second dictionary storage means 7 (CD-ROM), the condition expressed by Expression 6 below is used. Shall be performed.

【０１０７】 Earc(n)＞Ebestnode−Ｔgrp ・・・式６Earc (n)> Ebestnode−Tgrp Equation 6

【０１０８】閾値Ｔgrpは、認識の最初では以下の式７
の条件である。At the beginning of recognition, the threshold value Tgrp is calculated by the following equation (7).
Is the condition.

【０１０９】Ｔgrp＝Ｔarc ・・・式７Tgrp = Tarc Equation 7

【０１１０】ただし、第二辞書記憶手段７（ＣＤ−ＲＯ
Ｍ）の参照回数が規定値を超えた場合には、以下の式８
のように設定する。However, the second dictionary storage means 7 (CD-RO
If the number of references in M) exceeds the specified value, the following equation 8
Set as follows.

【０１１１】Ｔgrp＝∞ ・・・式８Tgrp = ∞ Equation 8

【０１１２】上記の式８のように設定された後は、式６
の条件を満たさないようになるため、続くノードへの参
照要求が生じないため第二辞書記憶手段７（ＣＤ−ＲＯ
Ｍ）の参照回数を規定の値に制限することができる。After being set as in the above equation 8, the equation 6
Is not satisfied, the reference request to the subsequent node does not occur, and the second dictionary storage unit 7 (CD-RO
The reference count of M) can be limited to a prescribed value.

【０１１３】第二辞書記憶手段７（ＣＤ−ＲＯＭ）への
参照を制限する条件としては第二辞書記憶手段７（ＣＤ
−ＲＯＭ）の参照回数のほか、部分辞書の転送量あるい
は、参照回数と部分辞書の転送量の両方で制限を加える
こともでき、同様に効果を奏する。The conditions for restricting reference to the second dictionary storage means 7 (CD-ROM) are as follows.
-ROM), the amount of transfer of the partial dictionary, or both the number of references and the amount of transfer of the partial dictionary can be limited, and the same effect can be obtained.

【０１１４】上記の説明ではＥＥＰＲＯＭとＣＤ−ＲＯ
Ｍを記憶媒体として用いる場合を一例として説明した
が、ＥＥＰＲＯＭの代わりにフラッシュＲＯＭ、ＥＰＲ
ＯＭ、ＲＯＭ、ＲＡＭなど、また、ＣＤ−ＲＯＭの代わ
りにＣＤ−ＲＷ（書き換え可能ＣＤ）、ＤＶＤ−ＲＯ
Ｍ、ＤＶＤ−ＲＡＭ、ＤＶＤ−ＲＷ（書き換え可能ＤＶ
Ｄ）、ハードディスクなどであっても良く同様に効果を
奏する。In the above description, the EEPROM and the CD-RO
M has been described as an example of a storage medium, but instead of an EEPROM, a flash ROM, an EPR
OM, ROM, RAM, etc. Also, instead of CD-ROM, CD-RW (rewritable CD), DVD-RO
M, DVD-RAM, DVD-RW (rewritable DV
D), a hard disk or the like may be used, and the same effect can be obtained.

【０１１５】また、上記の説明で用いた部分辞書の分類
は一例を示したものであり、他の分類の方法でも良い。The classification of the partial dictionaries used in the above description is merely an example, and other classification methods may be used.

【０１１６】また、上記の説明では２種類の記憶媒体を
用いる方法について説明したが、３種類以上の記憶媒体
を用いてもよく同様に効果を奏する。In the above description, a method using two types of storage media has been described. However, three or more types of storage media may be used, and the same effect can be obtained.

【０１１７】さらに、上記の説明ではノードを単語とし
て説明したが、ノードは音素片、音素、半音節、音節、
形態素などの単位でも良く、同様に効果を奏する。Further, in the above description, the node has been described as a word, but the node is a phoneme, a phoneme, a half syllable, a syllable,
A unit such as a morpheme may be used, and the same effect can be obtained.

【０１１８】また、音声認識方式もＨＭＭとして説明し
たが、これはＤＰ（Dynamic Programming）マッチング
やニューラルネットを用いる音声認識方式でも良く同様
に効果を奏する。Although the speech recognition method has been described as an HMM, the same effect can be obtained by a DP (Dynamic Programming) matching or a speech recognition method using a neural network.

【０１１９】実施の形態６．上記の実施の形態５のよう
に第二辞書記憶手段７（ＣＤ−ＲＯＭ）の参照回数の上
限を設けた時に、グループの選択を誤った場合、認識不
能となる可能性が高い。本発明の例として示した「神奈
川県」、「香川県」の場合、一音節しか違いがなく似て
いるため、特に誤りやすい。そのため、この実施の形態
６では、県名に加え市町村名の一部の照合処理を行った
後、第二辞書記憶手段７（ＣＤ−ＲＯＭ）に記憶してあ
るグループを選択することにより、グループの選択の精
度を高めることを特徴とする。県名だけでなく続く市町
村名の一部を別の部分辞書として第一辞書記憶手段６
（ＥＥＰＲＯＭ）に記憶する。Embodiment 6 FIG. When the upper limit of the number of times of reference to the second dictionary storage means 7 (CD-ROM) is set as in the above-described fifth embodiment, if the group is incorrectly selected, there is a high possibility that recognition becomes impossible. In the case of "Kanagawa prefecture" and "Kagawa prefecture" shown as examples of the present invention, since only one syllable is different and similar, it is particularly prone to error. Therefore, in the sixth embodiment, after performing a part of the collation processing of the municipal name in addition to the prefecture name, the group stored in the second dictionary storage means 7 (CD-ROM) is selected. It is characterized in that the accuracy of the selection is increased. First dictionary storage means 6 which stores not only the prefecture name but also a part of the following municipal names as another partial dictionary
(EEPROM).

【０１２０】本実施の形態６に係る音声認識システムの
構成は、上記の実施の形態３に示したものと同一のた
め、ここでは説明を省略する。第一及び第二辞書記憶手
段６、７の記憶方式に本実施の形態６の特徴があるた
め、以下説明を行う。The configuration of the speech recognition system according to the sixth embodiment is the same as that described in the third embodiment, and a description thereof will not be repeated. Since the storage method of the first and second dictionary storage means 6 and 7 has the feature of the sixth embodiment, it will be described below.

【０１２１】図１３は、本実施の形態６における辞書の
構成の一例を示すものである。第二辞書記憶手段７（Ｃ
Ｄ−ＲＯＭ）に記憶しているグループgrp3、grp4の最初
の先頭の２音節を別の部分辞書net3hおよびnet4hとて第
一辞書記憶手段６（ＥＥＰＲＯＭ）に記憶するものとす
る。FIG. 13 shows an example of the configuration of the dictionary according to the sixth embodiment. Second dictionary storage means 7 (C
The first two syllables of the groups grp3 and grp4 stored in the D-ROM) are stored in the first dictionary storage means 6 (EEPROM) as another partial dictionary net3h and net4h.

【０１２２】図１３では「藤沢市」を「フジ」と「サワ
シ」に分割している。同様に「鎌倉市」、「高松市」お
よび「大川郡」も最初の２音節を分離している。図１３
では、第一辞書記憶手段６（ＥＥＰＲＯＭ）に記憶する
部分辞書は実線で囲っており、第二辞書記憶手段７（Ｃ
Ｄ−ＲＯＭ）に記憶する部分辞書は破線で囲って示して
いる。また、部分辞書net3hはグループgrp3hに属し、部
分辞書net4hはグループgrp4hに属すものとしている。In FIG. 13, "Fujisawa City" is divided into "Fuji" and "Sawashi". Similarly, "Kamakura City", "Takamatsu City" and "Okawa-gun" also separate the first two syllables. FIG.
Then, the partial dictionaries stored in the first dictionary storage means 6 (EEPROM) are surrounded by solid lines, and the second dictionary storage means 7 (C
The partial dictionary stored in the D-ROM) is enclosed by a broken line. The partial dictionary net3h belongs to the group grp3h, and the partial dictionary net4h belongs to the group grp4h.

【０１２３】このような構成とすることにより、「香川
県大川郡長尾町西」の発声に対し、「カガワケンオオ」
までの情報を用いて「香川県」の市町村名のグループの
選択をできるため、「神奈川県」の市町村名のグループ
であるgrp3を誤って選択する可能性を減少させることが
できる。その結果として認識率を向上できるという効果
がある。With such a configuration, “Kagawa Kawao” can be used in response to the utterance of “Nagao-cho, Okawa-gun, Kagawa”.
Since the group with the municipal name of “Kagawa Prefecture” can be selected using the information up to, the possibility of erroneously selecting grp3 which is the group with the municipal name of “Kanagawa Prefecture” can be reduced. As a result, there is an effect that the recognition rate can be improved.

【０１２４】上記の説明はグループの先頭の２音節を第
一辞書記憶手段６（ＥＥＰＲＯＭ）に記憶する構成を例
として説明したが、本発明は２音節に限るものではな
い。In the above description, the first syllable of the group is stored in the first dictionary storage means 6 (EEPROM), but the present invention is not limited to the two syllables.

【０１２５】また、図１４のように、「藤沢市」、「鎌
倉市」、「高松市」および「大川郡」全体を分離して第
一辞書記憶手段６（ＥＥＰＲＯＭ）に記憶しても良く、
同様に効果を奏する。As shown in FIG. 14, the entire "Fujisawa City", "Kamakura City", "Takamatsu City" and "Okawa-gun" may be separated and stored in the first dictionary storage means 6 (EEPROM). ,
It works similarly.

【０１２６】上記の説明ではＥＥＰＲＯＭとＣＤ−ＲＯ
Ｍを記憶媒体として用いる場合を一例として説明した
が、ＥＥＰＲＯＭの代わりにフラッシュＲＯＭ、ＥＰＲ
ＯＭ、ＲＯＭ、ＲＡＭなど、また、ＣＤ−ＲＯＭの代わ
りにＣＤ−ＲＷ（書き換え可能ＣＤ）、ＤＶＤ−ＲＯ
Ｍ、ＤＶＤ−ＲＡＭ、ＤＶＤ−ＲＷ（書き換え可能ＤＶ
Ｄ）、ハードディスクなどであっても良く、同様に効果
を奏する。In the above description, the EEPROM and the CD-RO
M has been described as an example of a storage medium, but instead of an EEPROM, a flash ROM, an EPR
OM, ROM, RAM, etc. Also, instead of CD-ROM, CD-RW (rewritable CD), DVD-RO
M, DVD-RAM, DVD-RW (rewritable DV
D), a hard disk or the like may be used, and the same effect can be obtained.

【０１２７】また、上記の説明で用いた部分辞書の分類
は一例を示したものであり、他の分類の方法でも良い。The classification of the partial dictionary used in the above description is an example, and other classification methods may be used.

【０１２８】また、上記の説明では２種類の記憶媒体を
用いる方法について説明したが、３種類以上の記憶媒体
を用いてもよく同様に効果を奏する。In the above description, a method using two types of storage media has been described. However, three or more types of storage media may be used, and the same effect can be obtained.

【０１２９】さらに、上記の説明ではノードを単語とし
て説明したが、ノードは音素片、音素、半音節、音節、
形態素などの単位でも良く、同様に効果を奏する。Further, in the above description, the node has been described as a word, but the node is a phoneme, a phoneme, a half syllable, a syllable,
A unit such as a morpheme may be used, and the same effect can be obtained.

【０１３０】また、音声認識方式もＨＭＭとして説明し
たが、これはＤＰ（Dynamic Programming）マッチング
やニューラルネットを用いる音声認識方式でも良く同様
に効果を奏する。Although the speech recognition method has been described as an HMM, the same effect can be obtained by a DP (Dynamic Programming) matching or a speech recognition method using a neural network.

【０１３１】[0131]

【発明の効果】この発明の請求項１に係る音声認識シス
テムは、以上説明したとおり、音声信号を入力し音響分
析を行い特徴ベクトルの時系列に変換して出力する音響
分析手段と、認識対象の標準モデルを記憶する標準モデ
ル記憶手段と、単語辞書を分割した複数の部分辞書を記
憶する辞書記憶手段と、照合処理に作業領域として照合
データを記憶する照合データ記憶手段と、前記音響分析
手段からの特徴ベクトルに対し、前記標準モデル及び前
記単語辞書を参照しながら照合処理を行い、認識結果を
出力するモデル照合手段とを備えた音声認識システムに
おいて、前記辞書記憶手段を、高速に参照可能で使用頻
度の高い部分辞書を記憶する第一辞書記憶手段と、高速
に参照不可能で残りの使用頻度の低い部分辞書を記憶す
る第二辞書記憶手段とから構成したので、コストを低く
抑えながら認識応答時間を短くすることができるという
効果を奏する。As described above, the speech recognition system according to the first aspect of the present invention comprises: a sound analysis unit that inputs a speech signal, performs sound analysis, converts the signal into a time series of feature vectors, and outputs the time series; Standard model storage means for storing a standard model of the above, dictionary storage means for storing a plurality of partial dictionaries obtained by dividing the word dictionary, collation data storage means for storing collation data as a work area in the collation processing, and the acoustic analysis means In the speech recognition system including a matching process for performing a matching process on the feature vector from the reference model with reference to the standard model and the word dictionary and outputting a recognition result, the dictionary storage unit can be referred to at high speed. First dictionary storage means for storing a partial dictionary frequently used in the first dictionary, and a second dictionary storage means for storing remaining partial dictionaries which cannot be referred to at high speed and remain infrequently used Since it is configured from a, an effect that it is possible to shorten the recognition response time while keeping costs low.

【０１３２】この発明の請求項２に係る音声認識システ
ムは、以上説明したとおり、前記辞書記憶手段が、部分
辞書を前記第一辞書記憶手段又は前記第二辞書記憶手段
のどちらに記憶しているかを示す記録媒体情報を記憶す
る記録媒体記憶手段をさらに含むので、コストを低く抑
えながら認識応答時間を短くすることができるという効
果を奏する。In the speech recognition system according to a second aspect of the present invention, as described above, whether the dictionary storage unit stores a partial dictionary in the first dictionary storage unit or the second dictionary storage unit Since the recording medium storage means for storing the recording medium information indicating the above is further included, it is possible to shorten the recognition response time while keeping the cost low.

【０１３３】この発明の請求項３に係る音声認識システ
ムは、以上説明したとおり、前記第一辞書記憶手段及び
前記第二辞書記憶手段が、各ノードに対応して、次に接
続される部分辞書が前記第一辞書記憶手段又は前記第二
辞書記憶手段のどちらに記憶しているかを示す記録媒体
情報を含む部分辞書を記憶するので、コストを低く抑え
ながら認識応答時間を短くすることができるという効果
を奏する。As described above, in the speech recognition system according to claim 3 of the present invention, the first dictionary storage means and the second dictionary storage means correspond to each node, and Stores a partial dictionary including recording medium information indicating which of the first dictionary storage unit and the second dictionary storage unit is stored, so that it is possible to shorten the recognition response time while keeping costs low. It works.

【０１３４】この発明の請求項４に係る音声認識システ
ムは、以上説明したとおり、前記第一辞書記憶手段及び
前記第二辞書記憶手段が、同じ記録媒体上にある依存関
係の強い部分辞書をひとまとめのグループとして記憶
し、前記モデル照合手段が、ある部分辞書を参照する場
合にその部分辞書を含むグループをまとめて読み出し、
照合処理を行うので、コストを低く抑えながら認識応答
時間を短くすることができ、各記憶手段の参照回数を減
らすことができるという効果を奏する。In the speech recognition system according to a fourth aspect of the present invention, as described above, the first dictionary storage unit and the second dictionary storage unit collectively combine the partial dictionaries having a strong dependency on the same recording medium. When the model matching unit refers to a certain partial dictionary, the group including the partial dictionary is read out collectively,
Since the collation processing is performed, the recognition response time can be shortened while keeping the cost low, and the number of times of referring to each storage means can be reduced.

【０１３５】この発明の請求項５に係る音声認識システ
ムは、以上説明したとおり、前記辞書記憶手段が、ある
部分辞書を参照する場合にまとめて転送されたその部分
辞書を含むグループを記憶するグループ記憶手段をさら
に有し、前記モデル照合手段が、前記グループ記憶手段
に記憶されている前記グループ内の部分辞書を個別に参
照して照合処理を行うので、コストを低く抑えながら認
識応答時間を短くすることができ、各記憶手段の参照回
数を減らすことができるという効果を奏する。As described above, in the speech recognition system according to the fifth aspect of the present invention, the dictionary storage means stores a group including a partial dictionary transferred together when referring to the partial dictionary. A storage unit, wherein the model matching unit performs the matching process by individually referring to the partial dictionaries in the group stored in the group storage unit, so that the recognition response time is shortened while keeping costs low. This makes it possible to reduce the number of references to each storage means.

【０１３６】この発明の請求項６に係る音声認識システ
ムは、以上説明したとおり、前記辞書記憶手段が、前記
第二辞書記憶手段から前記照合データ記憶手段へ読み込
んだ部分辞書の量を計測する転送量計測手段をさらに有
し、前記モデル照合手段が、前記転送量計測手段による
計測量と規定量を比較し前記規定量以上の部分辞書を読
み込まないように制御するので、コストを低く抑えなが
ら認識応答時間を短くすることができるという効果を奏
する。In the speech recognition system according to the sixth aspect of the present invention, as described above, the dictionary storage means transfers the partial dictionary read from the second dictionary storage means to the collation data storage means. The apparatus further comprises an amount measuring unit, wherein the model matching unit compares the amount measured by the transfer amount measuring unit with a specified amount and controls not to read a partial dictionary of the specified amount or more. This has the effect of shortening the response time.

【０１３７】この発明の請求項７に係る音声認識システ
ムは、以上説明したとおり、前記第一辞書記憶手段が、
前記第二辞書記憶手段に記憶されるべきグループの先頭
部分のみから構成される部分辞書を新たなグループとし
て記憶し、前記第二辞書記憶手段が、前記先頭部分が抜
けた残りから構成される部分辞書を新たなグループとし
て記憶し、前記モデル照合手段が、ある部分辞書を参照
する場合にその部分辞書を含むグループをまとめて読み
出し、照合処理を行うので、認識率を向上することがで
きるという効果を奏する。In the speech recognition system according to a seventh aspect of the present invention, as described above, the first dictionary storage means includes:
The second dictionary storage unit stores a partial dictionary consisting only of the head portion of the group to be stored as a new group, and the second dictionary storage unit stores the partial dictionary consisting of the remainder left out of the head portion. The dictionary is stored as a new group, and when the model matching unit refers to a certain partial dictionary, the group including the partial dictionary is collectively read and the matching process is performed, so that the recognition rate can be improved. To play.

【０１３８】この発明の請求項８に係る音声認識方法
は、以上説明したとおり、音声信号を入力して音響分析
を行い特徴ベクトルの時系列に変換し、前記特徴ベクト
ルに対し、認識対象の標準モデル及び単語辞書を分割し
た複数の部分辞書を参照しながら照合処理を行い、認識
結果を出力する音声認識方法において、前記照合処理に
おいて、最初に、高速に参照可能で使用頻度の高い部分
辞書を記憶する第一辞書記憶手段から部分辞書を読み出
す第１の部分辞書読出ステップと、前記照合処理におい
て、次に、高速に参照不可能で残りの使用頻度の低い部
分辞書を記憶する第二辞書記憶手段又は前記第一辞書記
憶手段から部分辞書を読み出す第２の部分辞書読出ステ
ップとを含むので、コストを低く抑えながら認識応答時
間を短くすることができるという効果を奏する。As described above, in the speech recognition method according to the eighth aspect of the present invention, a speech signal is input and subjected to acoustic analysis to convert the feature vector into a time series of feature vectors. In a voice recognition method of performing a matching process with reference to a plurality of partial dictionaries obtained by dividing a model and a word dictionary and outputting a recognition result, in the matching process, first, a partial dictionary that can be referred to at high speed and that is frequently used is used. A first partial dictionary reading step of reading a partial dictionary from the first dictionary storage means to be stored; and a second dictionary storage for storing the remaining infrequently inaccessible partial dictionaries in the matching process. Means for reading the partial dictionary from the first dictionary storage means or the first dictionary storage means, so that the recognition response time can be shortened while keeping the cost low. There is an effect that kill.

【０１３９】この発明の請求項９に係る音声認識方法
は、以上説明したとおり、前記第２の部分辞書読出ステ
ップでは、次に接続される部分辞書が前記第一辞書記憶
手段又は前記第二辞書記憶手段のどちらに記憶している
かを示す記録媒体情報に基いて、前記第一辞書記憶手段
又は前記第二辞書記憶手段から部分辞書を読み出すの
で、コストを低く抑えながら認識応答時間を短くするこ
とができるという効果を奏する。As described above, in the speech recognition method according to the ninth aspect of the present invention, in the second partial dictionary reading step, the next partial dictionary to be connected is the first dictionary storage unit or the second dictionary. Since the partial dictionary is read from the first dictionary storage unit or the second dictionary storage unit based on the storage medium information indicating which storage unit is stored, the recognition response time can be reduced while keeping the cost low. This has the effect that it can be performed.

【０１４０】この発明の請求項１０に係る音声認識方法
は、以上説明したとおり、前記第２の部分辞書読出ステ
ップでは、ある部分辞書を参照する場合に、依存関係の
強い部分辞書をひとまとめのグループとして記憶してい
る第二辞書記憶手段からその部分辞書を含むグループを
まとめて読み出すので、コストを低く抑えながら認識応
答時間を短くすることができ、記憶手段の参照回数を減
らすことができるという効果を奏する。As described above, in the speech recognition method according to the tenth aspect of the present invention, in the second partial dictionary reading step, when referring to a certain partial dictionary, a partial dictionary having a strong dependency is grouped together. Since the group including the partial dictionary is collectively read from the second dictionary storage unit that is stored as, the recognition response time can be shortened while the cost is kept low, and the number of references to the storage unit can be reduced. To play.

【０１４１】この発明の請求項１１に係る音声認識方法
は、以上説明したとおり、前記第２の部分辞書読出ステ
ップでは、ある部分辞書を参照する場合に、前記第二辞
書記憶手段からその部分辞書を含むグループをまとめて
読み出してグループ記憶手段に記憶するので、コストを
低く抑えながら認識応答時間を短くすることができ、記
憶手段の参照回数を減らすことができるという効果を奏
する。As described above, in the voice recognition method according to the eleventh aspect of the present invention, in the second partial dictionary reading step, when a certain partial dictionary is referred to, the second partial dictionary storage means reads the partial dictionary. Are collectively read and stored in the group storage means, so that the recognition response time can be shortened while the cost is kept low, and the number of references to the storage means can be reduced.

【０１４２】この発明の請求項１２に係る音声認識プロ
グラムを記録したコンピュータ読み取り可能な記録媒体
は、以上説明したとおり、音声信号を入力し音響分析を
行い特徴ベクトルの時系列に変換して出力する音響分析
手順と、認識対象の標準モデルを記憶する標準モデル記
憶領域と、単語辞書を分割した複数の部分辞書を記憶す
る辞書記憶領域と、照合処理に作業領域として照合デー
タを記憶する照合データ記憶領域と、前記音響分析手順
からの特徴ベクトルに対し、前記標準モデル及び前記単
語辞書を参照しながら照合処理を行い、認識結果を出力
するモデル照合手順とを含む音声認識プログラムを記録
したコンピュータ読み取り可能な記録媒体において、前
記辞書記憶領域を、高速に参照可能で使用頻度の高い部
分辞書を記憶する第一辞書記憶領域と、高速に参照不可
能で残りの使用頻度の低い部分辞書を記憶する第二辞書
記憶領域とから構成したので、コストを低く抑えながら
認識応答時間を短くすることができるという効果を奏す
る。According to a twelfth aspect of the present invention, a computer-readable recording medium storing a speech recognition program receives a speech signal, performs acoustic analysis, converts the signal into a time series of feature vectors, and outputs the result. Acoustic analysis procedure, a standard model storage area for storing a standard model to be recognized, a dictionary storage area for storing a plurality of partial dictionaries obtained by dividing a word dictionary, and collation data storage for storing collation data as a work area in the collation processing A computer-readable recording of a speech recognition program including a region and a model matching procedure for performing a matching process on the feature vector from the acoustic analysis procedure with reference to the standard model and the word dictionary and outputting a recognition result In a simple recording medium, the dictionary storage area stores a partial dictionary that can be referred to at high speed and that is frequently used. Since it is composed of one dictionary storage area and a second dictionary storage area that stores the remaining partial dictionaries that cannot be referenced at high speed and that are used less frequently, the recognition response time can be shortened while keeping costs low. To play.

【０１４３】この発明の請求項１３に係る音声認識プロ
グラムを記録したコンピュータ読み取り可能な記録媒体
は、以上説明したとおり、前記辞書記憶領域が、部分辞
書を前記第一辞書記憶領域又は前記第二辞書記憶領域の
どちらに記憶しているかを示す記録媒体情報を記憶する
記録媒体記憶領域をさらに含むので、コストを低く抑え
ながら認識応答時間を短くすることができるという効果
を奏する。According to a thirteenth aspect of the present invention, as described above, the computer-readable recording medium recording the speech recognition program is such that the dictionary storage area stores the partial dictionary in the first dictionary storage area or the second dictionary storage area. Since the recording medium storage area for storing the recording medium information indicating which of the storage areas is stored is further included, an effect is obtained that the recognition response time can be shortened while keeping costs low.

【０１４４】この発明の請求項１４に係る音声認識プロ
グラムを記録したコンピュータ読み取り可能な記録媒体
は、以上説明したとおり、前記第一辞書記憶領域及び前
記第二辞書記憶領域が、各ノードに対応して、次に接続
される部分辞書が前記第一辞書記憶領域又は前記第二辞
書記憶領域のどちらに記憶しているかを示す記録媒体情
報を含む部分辞書を記憶するので、コストを低く抑えな
がら認識応答時間を短くすることができるという効果を
奏する。According to a fourteenth aspect of the present invention, there is provided a computer readable recording medium storing a speech recognition program, wherein the first dictionary storage area and the second dictionary storage area correspond to each node, as described above. Since a partial dictionary including recording medium information indicating whether the next partial dictionary is stored in the first dictionary storage area or the second dictionary storage area is stored, recognition can be performed while keeping costs low. This has the effect of shortening the response time.

【０１４５】この発明の請求項１５に係る音声認識プロ
グラムを記録したコンピュータ読み取り可能な記録媒体
は、以上説明したとおり、前記第一辞書記憶領域及び前
記第二辞書記憶領域が、同じ記録媒体上にある依存関係
の強い部分辞書をひとまとめのグループとして記憶し、
前記モデル照合手順が、ある部分辞書を参照する場合に
その部分辞書を含むグループをまとめて読み出し、照合
処理を行うので、コストを低く抑えながら認識応答時間
を短くすることができ、各記憶領域の参照回数を減らす
ことができるという効果を奏する。According to a computer-readable recording medium having recorded thereon a speech recognition program according to claim 15 of the present invention, as described above, the first dictionary storage area and the second dictionary storage area are stored on the same recording medium. Memorize a partial dictionary with a strong dependency as a group,
When the model matching procedure refers to a certain partial dictionary, a group including the partial dictionary is read out collectively and the matching process is performed, so that the recognition response time can be shortened while keeping costs low, and the There is an effect that the number of times of reference can be reduced.

【０１４６】この発明の請求項１６に係る音声認識プロ
グラムを記録したコンピュータ読み取り可能な記録媒体
は、以上説明したとおり、前記辞書記憶領域が、ある部
分辞書を参照する場合にまとめて転送されたその部分辞
書を含むグループを記憶するグループ記憶領域をさらに
有し、前記モデル照合手順が、前記グループ記憶領域に
記憶されている前記グループ内の部分辞書を個別に参照
して照合処理を行うので、コストを低く抑えながら認識
応答時間を短くすることができ、各記憶領域の参照回数
を減らすことができるという効果を奏する。As described above, in the computer-readable recording medium storing the speech recognition program according to claim 16 of the present invention, when the dictionary storage area refers to a certain partial dictionary, The apparatus further includes a group storage area for storing a group including a partial dictionary, and the model matching procedure performs a matching process by individually referring to partial dictionaries in the group stored in the group storage area. , The recognition response time can be shortened, and the number of times of referring to each storage area can be reduced.

【０１４７】この発明の請求項１７に係る音声認識プロ
グラムを記録したコンピュータ読み取り可能な記録媒体
は、以上説明したとおり、前記第二辞書記憶領域から前
記照合データ記憶領域へ読み込んだ部分辞書の量を計測
する転送量計測手順をさらに有し、前記モデル照合手順
が、前記転送量計測手順による計測量と規定量を比較し
前記規定量以上の部分辞書を読み込まないように制御す
るので、コストを低く抑えながら認識応答時間を短くす
ることができるという効果を奏する。The computer-readable recording medium having recorded thereon the speech recognition program according to claim 17 of the present invention, as described above, stores the amount of the partial dictionary read from the second dictionary storage area to the collation data storage area. The method further includes a transfer amount measurement procedure for measuring, and the model matching procedure controls the comparison between the measurement amount and the specified amount by the transfer amount measurement procedure so as not to read a partial dictionary of the specified amount or more, thereby reducing costs. There is an effect that the recognition response time can be shortened while suppressing it.

【０１４８】この発明の請求項１８に係る音声認識プロ
グラムを記録したコンピュータ読み取り可能な記録媒体
は、以上説明したとおり、前記第一辞書記憶領域が、前
記第二辞書記憶領域に記憶されるべきグループの先頭部
分のみから構成される部分辞書を新たなグループとして
記憶し、前記第二辞書記憶領域が、前記先頭部分が抜け
た残りから構成される部分辞書を新たなグループとして
記憶し、前記モデル照合手順が、ある部分辞書を参照す
る場合にその部分辞書を含むグループをまとめて読み出
し、照合処理を行うので、認識率を向上することができ
るという効果を奏する。According to a computer-readable recording medium having recorded thereon a speech recognition program according to claim 18 of the present invention, as described above, the first dictionary storage area is a group to be stored in the second dictionary storage area. The second dictionary storage area stores a partial dictionary composed of the remainder of the first part as a new group, and stores the partial dictionary composed only of the first part of the model dictionary as a new group. When the procedure refers to a certain partial dictionary, a group including the partial dictionary is read out collectively and collation processing is performed, so that the recognition rate can be improved.

[Brief description of the drawings]

【図１】この発明の実施の形態１に係る音声認識シス
テムの構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a speech recognition system according to Embodiment 1 of the present invention.

【図２】この発明の実施の形態１に係る音声認識シス
テムの単語辞書の構成を示す図である。FIG. 2 is a diagram showing a configuration of a word dictionary of the speech recognition system according to the first embodiment of the present invention.

【図３】この発明の実施の形態１に係る音声認識シス
テムの記録媒体記憶手段の内容を示す図である。FIG. 3 is a diagram showing contents of a recording medium storage means of the voice recognition system according to the first embodiment of the present invention.

【図４】この発明の実施の形態１に係る音声認識シス
テムの単語辞書の別の構成を示す図である。FIG. 4 is a diagram showing another configuration of the word dictionary of the speech recognition system according to the first embodiment of the present invention.

【図５】この発明の実施の形態２に係る音声認識シス
テムの構成を示すブロック図である。FIG. 5 is a block diagram showing a configuration of a speech recognition system according to Embodiment 2 of the present invention.

【図６】この発明の実施の形態２に係る音声認識シス
テムの部分辞書の構造を示す図である。FIG. 6 is a diagram showing a structure of a partial dictionary of the speech recognition system according to Embodiment 2 of the present invention.

【図７】この発明の実施の形態３に係る音声認識シス
テムの単語辞書の構成を示す図である。FIG. 7 is a diagram showing a configuration of a word dictionary of a speech recognition system according to Embodiment 3 of the present invention.

【図８】この発明の実施の形態３に係る音声認識シス
テムの部分辞書の構造を示す図である。FIG. 8 is a diagram showing a structure of a partial dictionary of the speech recognition system according to Embodiment 3 of the present invention.

【図９】この発明の実施の形態３に係る音声認識シス
テムの照合データ記憶手段の内容を示す図である。FIG. 9 is a diagram showing contents of a collation data storage means of the voice recognition system according to Embodiment 3 of the present invention.

【図１０】この発明の実施の形態４に係る音声認識シ
ステムの構成を示すブロック図である。FIG. 10 is a block diagram showing a configuration of a speech recognition system according to Embodiment 4 of the present invention.

【図１１】この発明の実施の形態４に係る音声認識シ
ステムのグループ記憶手段及び照合データ記憶手段の内
容を示す図である。FIG. 11 is a diagram showing contents of a group storage unit and a collation data storage unit of the voice recognition system according to the fourth embodiment of the present invention.

【図１２】この発明の実施の形態５に係る音声認識シ
ステムの構成を示すブロック図である。FIG. 12 is a block diagram showing a configuration of a speech recognition system according to Embodiment 5 of the present invention.

【図１３】この発明の実施の形態６に係る音声認識シ
ステムの単語辞書の構成を示す図である。FIG. 13 is a diagram showing a configuration of a word dictionary of a speech recognition system according to Embodiment 6 of the present invention.

【図１４】この発明の実施の形態６に係る音声認識シ
ステムの単語辞書の別の構成を示す図である。FIG. 14 is a diagram showing another configuration of the word dictionary of the speech recognition system according to Embodiment 6 of the present invention.

【図１５】従来の音声認識システムの構成を示すブロ
ック図である。FIG. 15 is a block diagram showing a configuration of a conventional speech recognition system.

【図１６】従来の音声認識システムにおける認識対象
の住所を示す図である。FIG. 16 is a diagram showing addresses to be recognized in a conventional voice recognition system.

【図１７】従来の音声認識システムの単語辞書の構成
を示す図である。FIG. 17 is a diagram showing a configuration of a word dictionary of a conventional speech recognition system.

【図１８】従来の音声認識システムの部分辞書の構造
を示す図である。FIG. 18 is a diagram showing a structure of a partial dictionary of a conventional speech recognition system.

【図１９】従来の音声認識システムの標準モデル記憶
手段の内容を示す図である。FIG. 19 is a diagram showing the contents of a standard model storage means of a conventional speech recognition system.

【図２０】従来の音声認識システムの照合データ記憶
手段の内容を示す図である。FIG. 20 is a diagram showing contents of a collation data storage means of a conventional speech recognition system.

【図２１】従来の音声認識システムの照合データ記憶
手段の内容を示す図である。FIG. 21 is a diagram showing the contents of a collation data storage means of a conventional speech recognition system.

【図２２】従来の音声認識システムのモデル照合手段
の動作を示すフローチャートである。FIG. 22 is a flowchart showing the operation of a model matching unit of a conventional speech recognition system.

【図２３】従来の音声認識システムの照合データ記憶
手段のＨＭＭの内容を示す図である。FIG. 23 is a diagram showing the contents of the HMM of the collation data storage means of the conventional speech recognition system.

【図２４】従来の音声認識システムの照合データ記憶
手段のＨＭＭの内容を示す図である。FIG. 24 is a diagram showing the contents of an HMM in a collation data storage means of a conventional speech recognition system.

[Explanation of symbols]

１音響分析手段、２標準モデル記憶手段、４照合
データ記憶手段、５モデル照合手段、６第一辞書記憶
手段、７第二辞書記憶手段、８記録媒体記憶手段、
９グループ記憶手段、１０転送量計測手段。1 acoustic analysis means, 2 standard model storage means, 4 collation data storage means, 5 model collation means, 6 first dictionary storage means, 7 second dictionary storage means, 8 recording medium storage means,
9 Group storage means, 10 Transfer amount measurement means.

Claims

[Claims]

1. A sound analysis means for inputting a sound signal, performing sound analysis, converting the sound signal into a time series of feature vectors and outputting the time series, a standard model storage means for storing a standard model to be recognized, and a plurality of divided word dictionaries. Dictionary storage means for storing partial dictionaries, collation data storage means for storing collation data as a work area in the collation processing, and referring to the standard model and the word dictionary for the feature vector from the acoustic analysis means. A speech recognition system comprising: a model matching unit that performs a matching process and outputs a recognition result; wherein the dictionary storage unit is a first dictionary storage unit that stores a partial dictionary that can be referred to at high speed and that is frequently used; And a second dictionary storage means for storing remaining incompletely used partial dictionaries that cannot be referred to.

2. The dictionary storage unit further includes a storage medium storage unit that stores storage medium information indicating whether the partial dictionary is stored in the first dictionary storage unit or the second dictionary storage unit. The speech recognition system according to claim 1, wherein:

3. The first dictionary storage means and the second dictionary storage means, corresponding to each node, wherein a partial dictionary to be connected next is either the first dictionary storage means or the second dictionary storage means. 2. The speech recognition system according to claim 1, wherein a partial dictionary including recording medium information indicating whether the information is stored in the storage unit is stored.

4. The first dictionary storage unit and the second dictionary storage unit store, as a group, partial dictionaries having a strong dependency on the same recording medium, and the model matching unit stores a certain partial dictionary. 4. The speech recognition system according to claim 3, wherein when referring to, the group including the partial dictionary is read out collectively and a collation process is performed.

5. The dictionary storage unit further includes a group storage unit configured to store a group including the partial dictionary transferred collectively when referring to the partial dictionary, and the model matching unit includes the group storage unit. 5. The speech recognition system according to claim 4, wherein the matching processing is performed by individually referring to the partial dictionaries in the group stored in the means.

6. The dictionary storage unit further includes a transfer amount measurement unit that measures an amount of a partial dictionary read from the second dictionary storage unit to the comparison data storage unit, and the model comparison unit includes the transfer unit. 5. The speech recognition system according to claim 4, wherein the amount measured by the amount measuring means is compared with a prescribed amount, and control is performed so that a partial dictionary having the prescribed amount or more is not read.

7. The first dictionary storage means stores, as a new group, a partial dictionary composed only of a head portion of a group to be stored in the second dictionary storage means, wherein the second dictionary storage means Storing a partial dictionary composed of the remainder of the leading part as a new group, wherein the model matching unit reads a group including the partial dictionary collectively when referring to a certain partial dictionary, and performs a matching process. 5. The speech recognition system according to claim 4, wherein the speech recognition is performed.

8. A speech signal is input, acoustic analysis is performed, and the speech signal is converted into a time series of feature vectors.
In a voice recognition method of performing a matching process with reference to a plurality of partial dictionaries obtained by dividing a standard model and a word dictionary to be recognized, and outputting a recognition result, in the matching process, first, a high-speed reference and a usage frequency A first partial dictionary reading step of reading a partial dictionary from a first dictionary storage unit that stores a high partial dictionary; and in the collation processing, next, a remaining partial dictionary that cannot be referred to at high speed and is used less frequently Reading a partial dictionary from the second dictionary storage means or the first dictionary storage means;
And a sub-dictionary reading step.

9. In the reading step of the second partial dictionary,
Next, the first dictionary storage unit or the second dictionary storage unit is based on recording medium information indicating which of the first dictionary storage unit and the second dictionary storage unit the partial dictionary to be connected to is stored. 9. The speech recognition method according to claim 8, wherein a partial dictionary is read out of the dictionary.

10. The second partial dictionary reading step includes, when referring to a partial dictionary, the partial dictionary from a second dictionary storage unit that stores a partial dictionary having a strong dependence as a group. The speech recognition method according to claim 9, wherein the groups are read out collectively.

11. The second partial dictionary reading step includes, when referring to a certain partial dictionary, reading out a group including the partial dictionary from the second dictionary storage unit and storing the group in the group storage unit. The voice recognition method according to claim 10, wherein

12. A sound analysis procedure for inputting a sound signal, performing sound analysis, converting the sound signal into a time series of feature vectors, and outputting the time series, a standard model storage area for storing a standard model to be recognized, and a plurality of divided word dictionaries. A dictionary storage area for storing partial dictionaries, a collation data storage area for storing collation data as a work area in the collation processing, and a feature vector from the acoustic analysis procedure with reference to the standard model and the word dictionary. A computer-readable recording medium storing a speech recognition program including a model matching procedure for performing a matching process and outputting a recognition result, wherein the dictionary storage area stores a partial dictionary that can be referred to at high speed and that is frequently used. The first dictionary storage area and the second dictionary storage area for storing the remaining infrequently used partial dictionaries that cannot be referenced at high speed. A computer-readable recording medium having recorded thereon a voice recognition program.

13. The dictionary storage area further includes a storage medium storage area for storing storage medium information indicating whether the partial dictionary is stored in the first dictionary storage area or the second dictionary storage area. A computer-readable recording medium on which the voice recognition program according to claim 12 is recorded.

14. The first dictionary storage area and the second dictionary storage area correspond to each node, and a partial dictionary to be connected next is either the first dictionary storage area or the second dictionary storage area. 13. A computer-readable recording medium storing a speech recognition program according to claim 12, wherein a partial dictionary including recording medium information indicating whether the speech recognition program is stored is stored.

15. The first dictionary storage area and the second dictionary storage area store, as a group, partial dictionaries having a strong dependency relationship on the same recording medium, and the model collation procedure includes: 15. The computer-readable recording medium storing a speech recognition program according to claim 14, wherein a group including the partial dictionary is read out at a time when the reference is made, and a collation process is performed.

16. The dictionary storage area further includes a group storage area for storing a group including the partial dictionary transferred collectively when the partial dictionary is referred to, and wherein the model collation procedure includes the step of: 16. The computer-readable recording medium according to claim 15, wherein the collation processing is performed by individually referring to the partial dictionaries in the group stored in the area.

17. The method according to claim 17, further comprising: a transfer amount measuring step of measuring an amount of the partial dictionary read from the second dictionary storage area to the comparison data storage area; 16. A computer-readable recording medium storing a speech recognition program according to claim 15, wherein a control is performed so as not to read the partial dictionary having the prescribed amount or more.

18. The first dictionary storage area stores, as a new group, a partial dictionary composed only of a leading part of a group to be stored in the second dictionary storage area, And storing a partial dictionary composed of the remainder from which the leading part is omitted as a new group. In the model matching procedure, when referring to a certain partial dictionary, the group including the partial dictionary is read out collectively, and the matching process is performed. 16. A computer-readable recording medium on which the voice recognition program according to claim 15 is recorded.