JP5161174B2

JP5161174B2 - Route search device, speech recognition device, method and program thereof

Info

Publication number: JP5161174B2
Application number: JP2009198147A
Authority: JP
Inventors: 貴明堀; 晋治渡部; 篤中村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2009-08-28
Filing date: 2009-08-28
Publication date: 2013-03-13
Anticipated expiration: 2029-08-28
Also published as: JP2011048255A

Description

この発明は、有向グラフと入力系列が与えられたとき、有向グラフの始点ノードから終点ノードに至る複数の経路のうち、入力系列に適合する経路を見つける技術に関する。また、この経路を見つける技術を用いた音声認識技術に関する。 The present invention relates to a technique for finding a route that matches an input sequence among a plurality of routes from a start point node to an end point node of a directed graph when a directed graph and an input sequence are given. The present invention also relates to a speech recognition technique using a technique for finding this route.

系列データのモデルを有向グラフで表現し、ある入力系列が与えられたとき、その入力系列がモデルに適合しているか否かを判定するために、有向グラフ中の様々な経路の中から、累積重みが最小又は最大になる経路を探索し、その探索された累積重みをもって適合度を測る方法がある。累積重みとは、経路上のノードやアークに割り当てられた重み、および経路上のアークに割り当てられたラベルと入力系列の要素との距離（又は類似度）を順に累積した値である。また、その探索された経路上のラベル列をもって、入力系列がどんな系列であるかを推測する方法もある。このような方法は、あるカテゴリに属する系列データの特徴をモデル化し、多数の系列データからそのカテゴリの特徴を持つ系列データを自動的に検出したり、ある系列データがどのようなラベル系列で構成されるかを判別する目的に利用されている。ただし、この発明における経路とはあるノードからあるノードへの遷移過程を表し、同じアークを何度通っても良いものとする。 When a model of series data is expressed by a directed graph and given an input sequence, in order to determine whether the input sequence is compatible with the model, cumulative weights are selected from various paths in the directed graph. There is a method of searching for a route that becomes the minimum or maximum and measuring the degree of fitness with the searched cumulative weight. The cumulative weight is a value obtained by sequentially accumulating the weight assigned to the node or arc on the route and the distance (or similarity) between the label assigned to the arc on the route and the element of the input sequence. There is also a method for estimating what kind of input sequence is the label sequence on the searched route. Such a method models the characteristics of series data belonging to a certain category and automatically detects series data having the characteristics of that category from a large number of series data. It is used for the purpose of discriminating whether it is done. However, the route in the present invention represents a transition process from a certain node to a certain node, and may pass through the same arc many times.

例えば、音声認識では、音素や単語の並び方などのルールを有向グラフで表し、音素や単語の連鎖確率（の対数値）を重みとしてアークに付与し、音素や単語の音響パターンやその分布関数を表すラベル（例えば音素／ａ／の短時間スペクトルパターンの分布関数を指す）を対応する各アークに割り当てる。入力された音声の短時間スペクトルパターンを表す特徴ベクトルとラベルとの類似度は、ラベルに対応する分布関数にその特徴ベクトルを入力したときの値とする。また、音素や単語の並び方のルールに従って始点ノードと終点ノードも設定する。このように用意された音声認識のための有向グラフの中から、入力された音声の短時間スペクトルパターンの時系列に対して、始点ノードから終点ノードに至る累積重みが最小又は最大となる経路を求めることで、経路上のラベル列（音素列）を音声認識結果として出力できる。 For example, in speech recognition, rules such as phoneme and word arrangement are represented by a directed graph, the phoneme and word chain probabilities (logarithmic values) are given to the arc as weights, and the phoneme and word acoustic patterns and their distribution functions are represented. A label (e.g. pointing to the distribution function of the short-time spectral pattern of phonemes / a /) is assigned to each corresponding arc. The similarity between the feature vector representing the short-time spectrum pattern of the input speech and the label is a value when the feature vector is input to the distribution function corresponding to the label. In addition, a start point node and an end point node are also set according to the rules of phoneme and word arrangement. From the prepared directed graph for speech recognition, a path with the minimum or maximum cumulative weight from the start node to the end node is obtained for the time series of the short-time spectrum pattern of the input speech. Thus, a label string (phoneme string) on the route can be output as a speech recognition result.

しかし、大きな有向グラフの中には多数の経路が存在し、その中から累積重みが大きい、又は小さい経路を探索することは簡単ではない。例えば、あらゆる可能な経路を列挙して累積重みを計算しようとすると、列挙される経路の数は長さに応じて指数的に増加する。例えば、図７に示すような簡単な有向グラフを考える。図７の丸はノード、矢印付きの弧は向きを持ったアークを表し、各アークの近くに配置されたアルファベットはそのアークのラベルを表すものとする。このとき、長さ１０の入力系列が与えられたとすると、経路の総数はアーク数５の１０乗（＝９７６５６２５）になる。このように簡単な有向グラフであっても異なる経路の数は膨大になる。 However, there are many paths in a large directed graph, and it is not easy to search for a path with a large or small cumulative weight. For example, when enumerating all possible paths and trying to calculate the cumulative weight, the number of paths listed increases exponentially with length. For example, consider a simple directed graph as shown in FIG. The circle in FIG. 7 represents a node, the arc with an arrow represents an arc with a direction, and the alphabet arranged near each arc represents the label of the arc. At this time, if an input sequence having a length of 10 is given, the total number of paths is the 10th power of the number of arcs (= 9765625). Even in such a simple directed graph, the number of different paths becomes enormous.

そこで、入力系列の要素を順に読み込み、その読み込みに合わせて経路を始点ノードから１アークずつ伸ばして行き、その都度累積重みに基づいて見込みの少なくなった経路を排除しながら処理を進めるビーム探索がよく用いられる。ここで見込みの少ない経路とは、累積重みの小さな経路を探索しているときは他の経路と比べて累積重みが大きい経路、累積重みの大きな経路を探索しているときは他の経路と比べて累積重みが小さい経路を指すものとする。 Therefore, a beam search is performed in which the elements of the input sequence are read in order, the path is extended by one arc from the start node in accordance with the reading, and the process is performed while eliminating the less likely path based on the cumulative weight each time. Often used. Here, the less likely route is a route with a larger cumulative weight compared to other routes when searching for a route with a smaller cumulative weight, and a route with a higher cumulative weight compared to other routes when searching for a route with a larger cumulative weight. It is assumed that the route has a small cumulative weight.

一般的なグラフ探索問題におけるビーム探索法は、例えば非特許文献１に記載されている。また、有向グラフに入力系列を与えたときのビーム探索方法は、例えば非特許文献２に記載されている。 A beam search method in a general graph search problem is described in Non-Patent Document 1, for example. A beam search method when an input sequence is given to a directed graph is described in Non-Patent Document 2, for example.

以下、図８を用いて、従来のビーム探索の手順を説明する。以降、探索するのは累積重みの小さな経路であることを前提とする。累積重みの大きな経路を探索する場合は、重みの符号を反転する、アークのラベルと入力要素との距離の符号を反転するかそれらの類似度を用いる、最小値計算は最大値計算に置き換えるなどの簡単な変更で実現できる。また説明の中では探索途中の一つの経路を仮説と呼ぶ。また、始点ノードから終点ノードに到達した仮説を完全仮説と呼ぶ。 Hereinafter, a conventional beam search procedure will be described with reference to FIG. Hereinafter, it is assumed that the route to be searched is a route with a small cumulative weight. When searching for a path with a large accumulated weight, the sign of the weight is reversed, the sign of the distance between the arc label and the input element is reversed or their similarity is used, the minimum value calculation is replaced with the maximum value calculation, etc. This can be achieved with simple changes. In the explanation, one route in the middle of a search is called a hypothesis. A hypothesis that reaches the end node from the start node is called a complete hypothesis.

図８に示すように、まず読み込む入力系列があるかどうかを判定し（ステップＳ１）、読み込む入力系列があれば、最初に始点ノードに到達した長さ０の仮説のみを含む仮説の集合を現在の仮説の集合として用意する（ステップＳ２）。次に、入力系列を読み込み（ステップＳ３）、その入力系列から読み込める要素があるかどうかを判定する（ステップＳ４）。入力系列に読み込む要素があった場合は仮説更新（ステップＳ５）及び仮説枝刈（ステップＳ６）を行う。 As shown in FIG. 8, it is first determined whether or not there is an input sequence to be read (step S1). If there is an input sequence to be read, a set of hypotheses including only a zero-length hypothesis that first reaches the start node is Are prepared as a set of hypotheses (step S2). Next, the input sequence is read (step S3), and it is determined whether there is an element that can be read from the input sequence (step S4). If there is an element to be read in the input series, hypothesis update (step S5) and hypothesis pruning (step S6) are performed.

ステップＳ５では、現在の仮説の集合に含まれる各仮説に、その各仮説の到達した最後のノードから遷移可能な各アークと、その各アークの遷移先のノードとを加えて、新たな仮説とする。また、仮説の累積重みと、遷移可能なアークの重みと、現在の入力要素とその遷移可能なアークのラベルとの距離を加えた値とを新たな仮説の累積重みとする。このようにして、現在の仮説の集合を生成された新たな仮説の集合で置き換えることで、仮説の更新を行う。 In step S5, each hypothesis included in the current hypothesis set is added with each arc that can be transitioned from the last node reached by each hypothesis and the transition destination node of each arc, and a new hypothesis is obtained. To do. Further, the cumulative weight of the hypothesis, the weight of the transitionable arc, and the value obtained by adding the distance between the current input element and the label of the transitionable arc are set as the new hypothesis cumulative weight. In this manner, the hypothesis is updated by replacing the current hypothesis set with the generated new hypothesis set.

ステップＳ６では、更新された新たな仮説の集合の中で累積重みが大きい仮説をその集合から排除することで、仮説の数を削減する。 In step S6, the number of hypotheses is reduced by excluding hypotheses with a large accumulated weight from the updated new hypothesis set.

入力系列に読み込む要素が無かった、つまり入力要素の最後まで読み終えている場合は完全仮説の選択及び出力を行う（ステップＳ７）。すなわち、現在の仮説の集合の中にある完全仮説の内、累積重みが小さい順に一つ以上の完全仮説を選び、これらを前記始点ノードから前記終点ノードに至る経路の候補として出力する（ステップＳ７）。その後、ステップＳ１に戻る。ステップＳ１において、次に読み込む入力系列がなければ、処理を終了する。 If there is no element to be read in the input sequence, that is, when the input element has been read to the end, the complete hypothesis is selected and output (step S7). That is, one or more complete hypotheses are selected from the complete hypotheses in the current hypothesis set in ascending order of cumulative weight, and these are output as candidate routes from the start node to the end node (step S7). ). Then, it returns to step S1. In step S1, if there is no input sequence to be read next, the process ends.

仮説の枝刈り（ステップＳ６）には、一般に、着目している仮説の累積重みが、現在の仮説の集合の中で最も小さな累積重みに正の定数（ビーム幅とも呼ばれる）を加えた値よりも大きい場合にその仮説を排除する方法や、着目している仮説の現在の仮説の集合の中での累積重みにおける順位がある正の定数（累積重みに対するビーム幅と区別して、個数ビーム幅とも呼ばれる）よりも大きい場合にその仮説を排除する方法がある。 For the pruning of the hypothesis (step S6), generally, the cumulative weight of the hypothesis of interest is a value obtained by adding a positive constant (also called a beam width) to the smallest cumulative weight in the current set of hypotheses. Is a positive constant that has a rank in the cumulative weight in the current hypothesis set of the hypothesis of interest (in contrast to the beam width for the cumulative weight, the number beam width There is a way to eliminate the hypothesis if it is larger than

探索の途中で見込みの少ない仮説、すなわち、将来この仮説を伸ばして行っても累積重みの小さな完全仮説になる可能性が小さい仮説を排除することは、一般に仮説の枝刈と呼ばれている。 Eliminating hypotheses that are less probable during the search, that is, hypotheses that are unlikely to become complete hypotheses with a small cumulative weight even if this hypothesis is extended in the future, is generally called hypothesis pruning.

ビーム探索法で累積重みの小さい完全仮説を探索するとき、仮説の枝刈りにおいて仮説の集合から多数の仮説を枝刈した方が計算量を少なく抑えることができる。なぜなら、計算の大半を占める仮説の更新では、現在の仮説の集合に含まれる個々の仮説に対し、その仮説が到達したノードから遷移可能なアークの数だけ新たな仮説を生成するので、集合に含まれる仮説の数は少なければ少ないほど計算量は少なくなるためである。 When searching for a complete hypothesis having a small cumulative weight by the beam search method, it is possible to reduce the amount of calculation by pruning a large number of hypotheses from a set of hypotheses in pruning the hypotheses. This is because updating the hypothesis that accounts for the majority of calculations generates new hypotheses for each hypothesis included in the current set of hypotheses as many arcs that can be transitioned from the node that the hypothesis has reached. This is because the smaller the number of hypotheses included, the smaller the calculation amount.

Xuedong Huang, Alex Acero, Hsiao-Wuen Hon, “Spoken Language Processing”, Prentice Hall, Upper Saddle River, NJ 07458, Chapter 12, Basic Search Algorithms, 12.1.3.2節, pp.606-608Xuedong Huang, Alex Acero, Hsiao-Wuen Hon, “Spoken Language Processing”, Prentice Hall, Upper Saddle River, NJ 07458, Chapter 12, Basic Search Algorithms, Section 12.1.3.2, pp.606-608 Takaaki Hori, Chiori Hori, Yasuhiro Minami, Atsushi Nakamura, ”Efficient WFST-based one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition”, IEEE Transactions on Audio, Speech, and Language Processing, Vol.15, No.14, pp.1352-1365, 2007, IV節B(pp. 1357-1359)Takaaki Hori, Chiori Hori, Yasuhiro Minami, Atsushi Nakamura, `` Efficient WFST-based one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition '', IEEE Transactions on Audio, Speech, and Language Processing, Vol .15, No.14, pp.1352-1365, 2007, Section IV B (pp.1357-1359)

ビーム幅又は個数ビーム幅を小さくして仮説をより枝刈されやすくすることで仮説の数を削減することはできるが、将来その仮説を伸ばして行ったときに累積重みの小さな完全仮説になり得る仮説を枝刈してしまい、目的の完全仮説が得られなくなる探索誤りを引き起こす。つまり、ビーム幅を狭くすると探索処理の計算量は小さくなるが探索誤りの危険性が増す。 Although it is possible to reduce the number of hypotheses by reducing the beam width or number beam width to make the hypothesis more easily pruned, it can become a complete hypothesis with a small cumulative weight when the hypothesis is extended in the future. Pruning the hypothesis causes a search error that makes it impossible to obtain the desired complete hypothesis. That is, when the beam width is narrowed, the amount of search processing is reduced, but the risk of search errors increases.

この発明は、従来よりも探索誤りの危険性が小さい経路探索装置、音声認識装置、これらの方法及びプログラムを提供することを目的とする。 An object of the present invention is to provide a route search device, a speech recognition device, and a method and a program thereof that are less likely to cause a search error than before.

仮説の枝刈をする際に、各仮説についての素性を抽出する。素性を入力として仮説を排除するか否かの指標を出力する判別関数に抽出された素性を入力した場合の出力値を算出してその各仮説についての判別値とする。判別値が所定の閾値より大きいかどうかを判定して大きい場合には上記判別値に対応する仮説を仮説の集合から除き、又は、判別値が所定の閾値より小さいかどうかを判定して小さい場合には上記判別値に対応する仮説を仮説の集合から除く。 When pruning hypotheses, the features for each hypothesis are extracted. The output value when the extracted feature is input to the discriminant function that outputs an indicator as to whether the hypothesis is to be excluded or not by using the feature as an input is calculated and used as a discriminant value for each hypothesis. If the discriminant value is larger than the predetermined threshold value, and if it is large, remove the hypothesis corresponding to the discriminant value from the hypothesis set, or if the discriminant value is smaller than the predetermined threshold value and smaller Excludes the hypothesis corresponding to the discriminant value from the set of hypotheses.

仮説の累積重み以外の属性、例えばどのノードに到達したか、どのアークを通ったか等の素性を用いることでより正確な枝刈判定を行うことができ、探索誤りの危険性を従来よりも小さくすることができる。 By using attributes other than the hypothetical cumulative weight, for example, which node has been reached, which arc has been passed, etc., more accurate pruning determination can be performed, and the risk of search error is smaller than before. can do.

経路探索装置の例の機能ブロック図。The functional block diagram of the example of a route search apparatus. 音声認識装置の例の機能ブロック図。The functional block diagram of the example of a speech recognition apparatus. 経路探索方法の例の流れ図。The flowchart of the example of a route search method. 音声認識方法の例の流れ図。The flowchart of the example of the speech recognition method. 音声認識用の有向グラフの一例を示す図Diagram showing an example of a directed graph for speech recognition 従来法とこの発明の認識処理時間の比較評価結果を示す図。The figure which shows the comparative evaluation result of the recognition processing time of a conventional method and this invention. 有向グラフを例示する図Diagram illustrating a directed graph 従来の経路探索方法の例の流れ図。The flowchart of the example of the conventional route search method.

［経路探索装置及び方法］
図１に経路探索装置の例の機能ブロック図を示す。図３に経路探索方法の例の流れ図を示す。
この発明の経路探索装置及び方法は、主に仮説枝刈部６の処理及び必要に応じて仮説枝刈判別関数更新部８を含む点で従来と異なる。以下では、従来とは異なる部分を主に説明をして、従来と同様の部分については重複説明を略する場合がある。
経路探索装置１００は、記憶部１、制御部２、初期仮説生成部３、入力系列読込部４、仮説更新部５、仮説枝刈部６、完全仮説出力部７及び仮説枝刈判別関数更新部８を例えば含む。 [Route search apparatus and method]
FIG. 1 shows a functional block diagram of an example of a route search apparatus. FIG. 3 shows a flowchart of an example of the route search method.
The route search apparatus and method of the present invention differs from the conventional one in that it mainly includes the processing of the hypothesis pruning unit 6 and the hypothesis pruning discrimination function update unit 8 as necessary. In the following description, parts different from the conventional ones will be mainly described, and duplicate explanations may be omitted for the same parts as the conventional ones.
The route search device 100 includes a storage unit 1, a control unit 2, an initial hypothesis generation unit 3, an input series reading unit 4, a hypothesis update unit 5, a hypothesis pruning unit 6, a complete hypothesis output unit 7, and a hypothesis pruning discrimination function update unit. 8 is included, for example.

制御部２は、読み込む入力系列があるかどうかを判定する（ステップＳ１）。
読み込む入力系列があれば、初期仮説生成部３は、最初に始点ノードに到達した長さ０の仮説のみを含む仮説の集合を現在の仮説の集合として、記憶部１に格納する（ステップＳ２）。最初に始点ノードに達した長さ０の仮説の累積重みは例えば０とする。
入力系列読込部４は、入力系列を読み込み（ステップＳ３）、その入力系列から読み込める要素があるかどうかを判定する（ステップＳ４）。
入力系列に読み込む要素があった場合は、仮説更新部５は仮説の更新を行い（ステップＳ５）、仮説枝刈部６が仮説の枝刈を行う（ステップＳ６）。 The control unit 2 determines whether there is an input sequence to be read (step S1).
If there is an input sequence to be read, the initial hypothesis generation unit 3 stores the set of hypotheses including only the zero-length hypothesis that first reaches the start node as the current hypothesis set in the storage unit 1 (step S2). . The cumulative weight of the hypothesis of length 0 that reaches the start point node first is assumed to be 0, for example.
The input sequence reading unit 4 reads the input sequence (step S3) and determines whether there is an element that can be read from the input sequence (step S4).
When there is an element to be read in the input series, the hypothesis updating unit 5 updates the hypothesis (step S5), and the hypothesis pruning unit 6 performs the hypothesis pruning (step S6).

仮説更新部５は、記憶部１から読み込んだ現在の仮説の集合に含まれる各仮説に、その各仮説の到達した最後のノードから遷移可能な各アークと、その各アークの遷移先のノードとを加えて、新たな仮説とする。また、仮説の累積重みと、遷移可能なアークの重みと、現在の入力要素とその遷移可能なアークのラベルとの距離を加えた値とを新たな仮説の累積重みとする。このようにして、現在の仮説の集合を生成された新たな仮説の集合で置き換えることで、仮説の更新を行う。新たに生成された仮説の集合は記憶部１に格納される。 The hypothesis updating unit 5 includes, for each hypothesis included in the current hypothesis set read from the storage unit 1, each arc that can be transitioned from the last node reached by each hypothesis, and the transition destination node of each arc. To make a new hypothesis. Further, the cumulative weight of the hypothesis, the weight of the transitionable arc, and the value obtained by adding the distance between the current input element and the label of the transitionable arc are set as the new hypothesis cumulative weight. In this manner, the hypothesis is updated by replacing the current hypothesis set with the generated new hypothesis set. A newly generated set of hypotheses is stored in the storage unit 1.

仮説枝刈部６は、更新された新たな仮説の集合の中で累積重みが大きい仮説をその集合から排除することで、仮説の数を削減する。 The hypothesis pruning unit 6 reduces the number of hypotheses by excluding hypotheses having a large cumulative weight from the updated new hypothesis set.

入力系列に読み込む要素が無かった、つまり入力要素の最後まで読み終えている場合は、完全仮説出力部７は、完全仮説の選択及び出力を行う（ステップＳ７）。すなわち、現在の仮説の集合の中にある完全仮説の内、累積重みが小さい順に一つ以上の完全仮説を選び、これらを前記始点ノードから前記終点ノードに至る経路の候補として出力する。その後、ステップＳ１に戻る。ステップＳ１において、次に読み込む入力系列がなければ、処理を終了する。 When there is no element to be read in the input sequence, that is, when the input element has been read to the end, the complete hypothesis output unit 7 selects and outputs a complete hypothesis (step S7). That is, one or more complete hypotheses are selected from the complete hypotheses in the current hypothesis set in ascending order of cumulative weight, and these are output as candidate routes from the start node to the end node. Then, it returns to step S1. In step S1, if there is no input sequence to be read next, the process ends.

以下、仮説枝刈部６について詳細に説明する。仮説枝刈部６は、図２に例示するように、各仮説についての素性を抽出する仮説素性抽出部６１と、素性を入力として仮説を排除するか否かの指標を出力する判別関数に上記抽出された素性を入力した場合の出力値を算出してその各仮説についての判別値とする仮説枝刈判別値算出部６２と、上記判別値が所定の閾値より大きいどうか又は小さいかどうかを判定して、大きい又は小さい場合には、上記判別値に対応する仮説を仮説の集合から除く仮説枝刈判定部６３と、を含む。 Hereinafter, the hypothetical pruning unit 6 will be described in detail. As illustrated in FIG. 2, the hypothesis pruning unit 6 includes a hypothesis feature extracting unit 61 that extracts a feature for each hypothesis, and a discriminant function that outputs an index as to whether or not the hypothesis is excluded with the feature as an input. A hypothesis pruning discriminant value calculation unit 62 that calculates an output value when the extracted feature is input and uses it as a discriminant value for each hypothesis, and determines whether the discriminant value is larger or smaller than a predetermined threshold value In the case where it is larger or smaller, a hypothesis pruning determination unit 63 that excludes the hypothesis corresponding to the discriminant value from the set of hypotheses is included.

仮説素性抽出部６１では、現在の仮説の集合に含まれる個々の仮説について素性を抽出する（ステップＳ６１）。仮説をｈ、現在の仮説の集合をＨ、仮説ｈの累積重みをｗ［ｈ］、仮説ｈの累積重みの集合Ｈの中での順位をｒａｎｋ（ｈ）と表すと、仮説ｈから例えば次のような素性ベクトルを抽出することができる。 The hypothesis feature extraction unit 61 extracts features for individual hypotheses included in the current set of hypotheses (step S61). If the hypothesis is h, the current set of hypotheses is H, the cumulative weight of hypothesis h is w [h], and the rank in the set of cumulative weights of hypothesis h is rank (h), A feature vector such as can be extracted.

素性ベクトルφ（ｈ，Ｈ）の１次元目の素性ｗ［ｈ］−（ｍｉｎ_ｇ∈Ｈｗ［ｇ］＋Ｂ）は、集合Ｈ内の仮説の累積重みの最小値に正の定数Ｂ（ビーム幅に相当）を加えた値を仮説ｈの累積重みｗ［ｈ］から引いた値であり、この素性が正であれば従来のビーム幅Ｂに従って仮説ｈを枝刈する、負になれば枝刈しない、という判別値を与える。 The first-dimensional feature w [h] − (min _g∈H w [g] + B) of the feature vector φ (h, H) is a positive constant B (beam) to the minimum cumulative weight of the hypotheses in the set H. Is a value obtained by adding a value obtained by adding (equivalent to the width) to the cumulative weight w [h] of the hypothesis h. If this feature is positive, the hypothesis h is pruned according to the conventional beam width B. Gives the discriminant value of not cutting.

次の２次元目の素性ｒａｎｋ（ｈ，Ｈ）−Ｒは、仮説ｈの累積重みの集合Ｈの中での順位から正の定数Ｒ（個数ビーム幅に相当）を引いた値であり、この素性が正になれば個数ビーム幅Ｒに従って仮説ｈを枝刈する、負になれば枝刈しない、という判別値を与える。 The next second-dimensional feature rank (h, H) -R is a value obtained by subtracting a positive constant R (corresponding to the number beam width) from the rank in the cumulative weight set H of the hypothesis h. If the feature is positive, a determination value is given that the hypothesis h is pruned according to the number beam width R, and if the feature is negative, the pruning value is not pruned.

これら二つの素性は、従来のビーム探索でも利用されていたが、この発明では更に仮説ｈが最後に到達したノードｎ［ｈ］やアークａ［ｈ］等のより多くの素性を利用する。前述の素性ベクトルφ（ｈ，Ｈ）の例では、ｆ_{ｎｏｄｅ［ｉ］}（ｎ［ｈ］）はノードｎ［ｈ］が有向グラフのｉ番目のノードであれば１、そうでなければ０を返すような関数であり、同様にｆ_{ａｒｃ［ｊ］}（ａ［ｈ］）はａ［ｈ］が有向グラフのｊ番目のアークであれば１、そうでなければ０を返すような関数である。ここで、有向グラフの個々のアークやノードには固有の番号が割り当てられているものとし、有向グラフのノードの総数をＮ、アークの総数をＡとしている。よって、１≦ｉ≦Ｎ、１≦ｊ≦Ａである。他にも素性として、仮説ｈの長さ（経路上のアーク数）やアークａ［ｈ］のラベル等を同様の関数として導入することも可能である。 These two features were also used in the conventional beam search, but in the present invention, more features such as the node n [h] and arc a [h] where the hypothesis h finally arrives are used. In the example of the feature vector φ (h, H) described above, f _{node [i]} (n [h]) returns 1 if the node n [h] is the i-th node of the directed graph, and returns 0 otherwise. Similarly, far _[j] (a [h]) is a function that returns 1 if a [h] is the j-th arc of the directed graph, and returns 0 otherwise. Here, it is assumed that a unique number is assigned to each arc or node of the directed graph, and the total number of nodes of the directed graph is N, and the total number of arcs is A. Therefore, 1 ≦ i ≦ N and 1 ≦ j ≦ A. As other features, the length of hypothesis h (the number of arcs on the path), the label of arc a [h], and the like can be introduced as similar functions.

次に、仮説枝刈判別値算出部６２では、仮説素性抽出部６１において抽出された素性ベクトルφ（ｈ，Ｈ）に基づいて、仮説を枝刈するか否かを判定する判別関数の値を算出する（ステップＳ６２）。判別関数はパラメータベクトルΛを持ち、Λを変化させることで判別基準をさまざまに変化させることができるものとする。例えば線形判別関数を用いた場合、素性ベクトルφ（ｈ，Ｈ）の各次元に対する係数ベクトル（以後、素性重み係数ベクトルと呼ぶ）がパラメータベクトルΛとなり、線形判別関数はΛとφ（ｈ，Ｈ）の内積Λ・φ（ｈ，Ｈ）によって表わされる。例えばΛは次のようなベクトルとなる。 Next, the hypothesis pruning discriminant value calculation unit 62 determines the value of a discriminant function that determines whether or not the hypothesis is pruned based on the feature vector φ (h, H) extracted by the hypothesis feature extraction unit 61. Calculate (step S62). The discriminant function has a parameter vector Λ, and the discriminant criteria can be changed variously by changing Λ. For example, when a linear discriminant function is used, a coefficient vector (hereinafter referred to as a feature weight coefficient vector) for each dimension of the feature vector φ (h, H) is a parameter vector Λ, and the linear discriminant function is Λ and φ (h, H ) Of the inner product Λ · φ (h, H). For example, Λ is a vector as follows.

ここで、λ_{ｗｅｇｈｔ}，λ_ｒａｎｋ，λ_{ｎｏｄｅ［１］}，λ_{ｎｏｄｅ［２］}，…，λ_{ｎｏｄｅ［Ｎ］}，λ_{ａｒｃ［１］}，λ_{ａｒｃ［２］}，…，λ_{ａｒｃ［Ａ］}，は、式（１）の素性ベクトルφ（ｈ，Ｈ）の各次元に対する素性重み係数である。この判別関数のパラメータは、予め学習データを用いて求めておくことができる。
式（１）と式（２）の例において、これらの内積は次のように計算できる。 Here, _[ lambda _] _weight , _[ lambda _] _rank , _[ lambda _{] node [1]} , _[ lambda _] _{node [2]} ,..., _[ Lambda _{] node [N]} , _[ lambda _] _{arc [1]} , _[ lambda _] _{arc [2]} , ..., _[ lambda _{] arc [A]} , The feature weighting coefficient for each dimension of the feature vector φ (h, H) in the equation (1). The parameter of this discriminant function can be obtained in advance using learning data.
In the examples of equations (1) and (2), these inner products can be calculated as follows.

続いて、仮説枝刈判定部６３ではこのような判別関数の値がある閾値（例えば０）より大きい場合は仮説を枝刈し、小さい場合は枝刈しないと判別する。式（３）の例は、従来の枝刈判定基準であるビーム幅や個数ビーム幅に基づく判定に加えて、その仮説が最後に到達したノードｎ［ｈ］やアークａ［ｈ］に応じたλ_ｎ［ｈ］やλ_ａ［ｈ］の値が判定に影響を与える。このように仮説の到達したノードやアークに応じて異なる判定基準で仮説の枝刈判定を行うため、より正確な枝刈判定が可能になる。
式（２）、式（３）の例で示した線形判別関数は一次式であるが、何らかの判別値を与える関数であればより高次の多項式を用いても良いし、何らかの非線形関数を用いても良い。 Subsequently, the hypothesis pruning determination unit 63 determines that the hypothesis is pruned when the value of such a discriminant function is greater than a certain threshold (for example, 0), and is not pruned when the value is smaller. In the example of the expression (3), in addition to the determination based on the conventional pruning determination criteria such as the beam width and the number beam width, the hypothesis corresponds to the node n [h] or the arc a [h] that the hypothesis finally reached. The values of λ _{n [h]} and λ _{a [h]} affect the determination. As described above, the pruning determination of the hypothesis is performed based on different determination criteria depending on the node or arc that the hypothesis has reached, so that more accurate pruning determination can be performed.
The linear discriminant functions shown in the examples of the equations (2) and (3) are linear equations. However, higher-order polynomials may be used as long as the function gives some discriminant value, or some nonlinear function is used. May be.

次に、判別関数のパラメータを更新する仮説枝刈判別関数更新部８について説明する。仮説ｈの素性ベクトルφ（ｈ，Ｈ）の判別関数をｄ（φ（ｈ，Ｈ）；Λ）とする。線形判別関数のときは、ｄ（φ（ｈ，Ｈ）；Λ）＝Λ・φ（ｈ，Ｈ）である。
また、判別関数の値に応じて探索誤りの起こる可能性を表す探索誤りリスク関数ｒｉｓｋ（ｄ（φ（ｈ，Ｈ）；Λ））を定義する。この関数はｄ（φ（ｈ，Ｈ）；Λ）が小さいときは仮説ｈが枝刈される可能性が小さいので０に近い値、ｄ（φ（ｈ，Ｈ）；Λ）が大きいときは仮説ｈが枝刈りされ探索誤りが起こるリスクが高まるので、０より大きな値を取るものとする。 Next, the hypothetical pruning discriminant function updating unit 8 that updates the discriminant function parameters will be described. Let d (φ (h, H); Λ) be a discriminant function of the feature vector φ (h, H) of the hypothesis h. In the case of a linear discriminant function, d (φ (h, H); Λ) = Λ · φ (h, H).
Also, a search error risk function risk (d (φ (h, H); Λ)) representing the possibility of a search error occurring is defined according to the value of the discriminant function. When d (φ (h, H); Λ) is small, this function is less likely to cause the hypothesis h to be pruned, so when d (φ (h, H); Λ) is large. Since the hypothesis h is pruned and the risk of a search error increases, it is assumed that the value is greater than zero.

この探索誤りリスク関数としては、例えば式（４）のシグモイド関数や、式（５）のヒンジ関数等を用いることができる。 As this search error risk function, for example, a sigmoid function of Expression (4), a hinge function of Expression (5), or the like can be used.

ここで、ｘ＝ｄ（φ（ｈ，Ｈ）；Λ）、ｃは定数を表す。いずれもｘが小さいときは０に近づき、大きいときは０よりも大きな値を取る。
より探索誤りリスクの少ないパラメータを求めるには、累積重みの小さな完全仮説の経路上の仮説の素性に対する探索誤りリスクの総和 Here, x = d (φ (h, H); Λ), c represents a constant. In any case, the value approaches 0 when x is small, and takes a value larger than 0 when x is large.
To find parameters with a lower search error risk, the sum of the search error risks for the hypothesis features on the path of the complete hypothesis with a smaller cumulative weight

を小さくするようにパラメータを更新する。ここで、Ｋは入力系列Ｏを最後まで読み込んで得られた完全仮説の集合、Ｇ（ｋ｜Ｋ，Ｏ）は、Ｋの中にある完全仮説ｋの重要度を表し、累積重みが小さいほど０以上の大きな値を取るものとする。例えば、有向グラフのアークの重みやアークのラベルと入力系列の各要素との距離が、負の対数確率として計算されている場合は、Ｇ（ｋ｜Ｋ，Ｏ）を Update the parameter so that Here, K is a set of complete hypotheses obtained by reading the input sequence O to the end, G (k | K, O) represents the importance of the complete hypothesis k in K, and the smaller the cumulative weight is, the smaller the cumulative weight is. A large value of 0 or more is assumed. For example, when the arc weight of the directed graph or the distance between the arc label and each element of the input series is calculated as a negative logarithmic probability, G (k | K, O) is

のようなｋの事後確率としても良い。一方、Ｈ_ｔはＯのｔ番目の要素によって更新された現在の仮説の集合、条件「ｈがｋの一部」は、始点ノードから仮説ｈの経路がｋの経路とｈが到達したノードまで一致していることを意味する。つまり、探索途中で将来完全仮説に至った仮説についてのみ和を取っている。 It is good also as posterior probability of k like. On the other hand, H _t is a set of current hypotheses updated by the t-th element of O, and the condition “h is a part of k” is the path from the start node to the node where the path of hypothesis h is k and the node where h has reached It means that they match. That is, only the hypotheses that have reached the complete hypothesis in the middle of the search are summed.

Ｌ（Λ，Ｏ）を小さくするようなΛは、例えば、Λの個々の要素でＬ（Λ，Ｏ）を偏微分し、その値に学習係数（正の定数）を乗じてΛの各要素から減算することで、Ｌ（Λ，Ｏ）を小さくすることができる。これは、確率的降下法の原理である。
式（６）をパラメータベクトルΛの要素λ_ｘ（ｘは任意の素性を指す）で偏微分すると、 For example, Λ for reducing L (Λ, O) is, for example, a partial differentiation of L (Λ, O) by each element of Λ, and multiplying the value by a learning coefficient (a positive constant) to each element of Λ L (Λ, O) can be reduced by subtracting from. This is the principle of the stochastic descent method.
When partial differentiation of equation (6) with element λ _x of parameter vector Λ (x indicates an arbitrary feature),

となる。λ_ｘはＬ（Λ，Ｏ）のλ_ｘによる偏微分を用いて It becomes. λ _x uses the partial differentiation of L (Λ, O) by λ _x

のように更新する。ここで、ηは学習係数を表す正の定数である。
例えば、式（３）の線形判別関数を用い、探索誤りリスク関数として式（４）のシグモイド関数を用いた場合、Ｌ（Λ，Ｏ）を、素性重みλ_{ｗｅｉｇｈｔ}、λ_ｒａｎｋ、および、ノードｎの素性重みλ_ｎ、アークａの素性重みλ_ａでそれぞれ偏微分すると、 Update like this. Here, η is a positive constant representing a learning coefficient.
For example, when the linear discriminant function of Expression (3) is used and the sigmoid function of Expression (4) is used as the search error risk function, L (Λ, O) is converted to feature weights λ _weight , λ _rank , and node n When the partial differentiation is performed with respect to the feature weight λ _n and the feature weight λ _a of the arc a,

のように計算できる。
また、予め用意したＹ個の入力系列からなる学習データＯ＝｛Ｏ_１，Ｏ_２，…，Ｏ_Ｙ｝を用いて、学習データＯに対する枝刈誤りリスクの総和 It can be calculated as follows.
Further, the sum of pruning error risks for the learning data O using learning data O = {O ₁ , O ₂ ,..., O _Y } consisting of Y input sequences prepared in advance.

を減少させるため、パラメータベクトルの要素λ_ｘを To reduce the element λ _{x of the} parameter vector

のように更新することもできる。 It can also be updated as follows.

また、パラメータベクトルΛと学習データＯを用いて探索処理を行い式（８）の微分値の和を求めるステップと、式（９）のΛを更新するステップとを繰り返しても良い。この繰り返しにより、Ｌ（Λ，Ｏ）を更に減少させることができる。
しかし、繰り返しＬ（Λ，Ｏ）を減少させていくと、全体として枝刈判定は甘くなるので、枝刈されない仮説の数が増え、計算量は徐々に増加していく。そこで、繰り返しによるＬ（Λ，Ｏ）の単調減少を避けるペナルティ項を導入しても良い。
例えば、式（６）にペナルティ項を導入し、 Further, the step of performing a search process using the parameter vector Λ and the learning data O to obtain the sum of the differential values of Expression (8) and the step of updating Λ of Expression (9) may be repeated. By repeating this, L (Λ, O) can be further reduced.
However, if L (Λ, O) is repeatedly reduced, the pruning determination becomes sweet as a whole, so the number of hypotheses that are not pruned increases and the amount of calculation gradually increases. Therefore, a penalty term that avoids monotonic decrease of L (Λ, O) due to repetition may be introduced.
For example, we introduce a penalty term into equation (6)

のような式を利用することができる。右辺のペナルティ項（第二項）はΛの２乗ノルムで、Λの各要素を二乗してから全要素について和を取った値である。このペナルティ項はΛの要素がすべて０のとき最小になるので、Λの更新による第一項の減少分と、Λの要素が非０を取ることによる第二項の増加分が一致したとき微分が０になり、Ｌ＾（Λ，Ｏ）が最小値をとる。すなわち、計算量の増加を抑制しながら、探索誤りリスクの総和を減少させることができる。ただし、γはペナルティ項の重み係数であり、第一項と第二項のバランスをとる正の定数である。
また、次式のように判別関数の値を反映したペナルティ項を用いても良い。 The following formula can be used. The penalty term (second term) on the right side is the square norm of Λ, which is the value obtained by summing all the elements after squaring each element of Λ. This penalty term is minimized when all the elements of Λ are 0. Therefore, when the decrease in the first term due to the update of Λ matches the increase in the second term due to the non-zero element of Λ Becomes 0, and L ^ (Λ, O) takes the minimum value. That is, it is possible to reduce the total search error risk while suppressing an increase in calculation amount. However, γ is a weighting factor of the penalty term, and is a positive constant that balances the first term and the second term.
Further, a penalty term reflecting the value of the discriminant function may be used as in the following equation.

式（１３）は、判別関数の値の総和が０に近付くように働くペナルティである。つまり、全体として正や負に偏らず、計算量を一定に抑える効果が期待できる。
このようなペナルティ項を持つ枝刈誤りリスクの総和に対しても式（９）と同様に、 Equation (13) is a penalty that works so that the sum of discriminant function values approaches zero. That is, it can be expected that the calculation amount is kept constant without being biased positively or negatively as a whole.
For the total pruning error risk with such a penalty term, as in equation (9),

のようにパラメータを更新できる。式（１１）の微分は、 The parameter can be updated as follows. The derivative of equation (11) is

式（１３）の微分は、判別関数が線形判別関数の場合、 The derivative of equation (13) is as follows when the discriminant function is a linear discriminant function:

となる。ただし、φ_ｘ（ｈ，Ｈ_ｔ）は素性ベクトルの要素ｘの値を表している。
Ｙ個の入力系列からなる学習データＯ＝｛Ｏ_１，Ｏ_２，…，Ｏ_Ｙ｝を用いる場合も、式（１１）と同様に、 It becomes. However, φ _x (h, H _t ) represents the value of the element x of the feature vector.
Even when learning data O = {O ₁ , O ₂ ,..., O _Y } composed of Y input sequences is used, as in the equation (11),

をもってパラメータを更新する。 To update the parameter.

［音声認識装置及び方法］
経路探索装置及び方法を、音声認識装置及び方法に適用することができる。
図２は音声認識装置の例の機能ブロック図であり、図４は音声認識方法の例の流れ図である。
音声認識装置は、音響モデル格納部１１、有向グラフ格納部１２、音声信号入力部１３、音声特徴ベクトル抽出部１４、経路探索装置１００、認識結果出力部１５及び学習用データ１６を例えば含む。 [Voice recognition apparatus and method]
The route search apparatus and method can be applied to a speech recognition apparatus and method.
FIG. 2 is a functional block diagram of an example of a speech recognition apparatus, and FIG. 4 is a flowchart of an example of a speech recognition method.
The speech recognition device includes, for example, an acoustic model storage unit 11, a directed graph storage unit 12, a speech signal input unit 13, a speech feature vector extraction unit 14, a route search device 100, a recognition result output unit 15, and learning data 16.

経路探索装置１００の制御部２は、入力系列の要素と有向グラフのアークのラベルとの距離を計算するための音響モデルを音響モデル格納部１１から読み込む（ステップＡ１）。ここで、音響モデルとは、例えば、音声固定単位（例えば音素）の標準的な音声特徴ベクトル又はその分布と、ある入力された特徴ベクトルとを比較して、その特徴ベクトルがその音声固定単位としてどの程度もっともらしいかを距離や確率などの数値で返すものである。 The control unit 2 of the route search apparatus 100 reads an acoustic model for calculating the distance between the element of the input sequence and the arc label of the directed graph from the acoustic model storage unit 11 (step A1). Here, the acoustic model refers to, for example, a standard speech feature vector or a distribution of a speech fixed unit (for example, phoneme) and a certain input feature vector, and the feature vector is used as the speech fixed unit. It returns what is plausible by numerical values such as distance and probability.

種々の音声固定単位（例えば音素）の音声特徴ベクトルの集合を表す音響モデルとしては、例えば、音声固定単位の音声特徴ベクトル時系列の集合を確率・統計理論に基づいてモデル化する隠れマルコフモデル法（Hidden Markov Model、以後ＨＭＭと呼ぶ）が主流である。このＨＭＭ法の詳細は、例えば、社団法人電子情報通信学会編、中川聖一著「確率モデルによる音声認識」に開示されている。 As an acoustic model representing a set of speech feature vectors of various speech fixed units (for example, phonemes), for example, a hidden Markov model method for modeling a set of speech feature vector time series of speech fixed units based on probability / statistical theory (Hidden Markov Model, hereinafter referred to as HMM) is the mainstream. Details of the HMM method are disclosed in, for example, “Recognition of Speech by Stochastic Model” by Seiichi Nakagawa, edited by the Institute of Electronics, Information and Communication Engineers.

つぎに、経路探索装置１００の制御部２は、音声認識用の有向グラフを有向グラフ格納部１２から読み込む（ステップＡ２）。 Next, the control unit 2 of the route search apparatus 100 reads a directed graph for speech recognition from the directed graph storage unit 12 (step A2).

音声認識用の有向グラフを構成する方法は、例えば、単語発音辞書や言語モデルを重み付き有限状態トランスデューサ（英訳 Weighted Finite-State Transducer: WFST）によって記述し、それらを合成して一つの重み付き有限状態トランスデューサを構成する方法は、例えば、学術論文、M. Mohri、 F. Pereira、 M. Riley著 “Weighted finite-state transducers in speech recognition”, Computer Speech and Language, Vol. 16, No. 1, pp. 69--88 (2002)に開示されている。重み付き有限状態トランスデューサは有向グラフの一種である。 A method for constructing a directed graph for speech recognition is, for example, describing a word pronunciation dictionary or a language model with a weighted finite state transducer (WFST) and combining them into a single weighted finite state. The method of constructing the transducer is described in, for example, academic papers, M. Mohri, F. Pereira, M. Riley “Weighted finite-state transducers in speech recognition”, Computer Speech and Language, Vol. 16, No. 1, pp. 69--88 (2002). A weighted finite state transducer is a type of directed graph.

音声認識用の有向グラフの個々のアークには、一般に、ＨＭＭの一つの状態を示すラベルが付与される。ＨＭＭの状態は、ある音素のその音素内におけるおおまかな位置（例えば、前半、中盤、後半）に対応する音声固定単位であり、各状態は音声特徴ベクトルの確率密度分布（例えば、多次元混合正規分布）を持っている。この実施形態では、仮説の累積重みに加える音声特徴ベクトルとアークのラベルとの距離を、そのラベルに対応する状態の確率密度分布を用いて計算された音声特徴ベクトルの確率密度の対数の符号を反転させた値として計算している。 In general, a label indicating one state of the HMM is given to each arc of the directed graph for speech recognition. The state of the HMM is a speech fixed unit corresponding to a rough position (for example, the first half, the middle, and the second half) of a certain phoneme in each phoneme, and each state is a probability density distribution (for example, a multidimensional mixed normal) of a speech feature vector. Distribution). In this embodiment, the sign of the logarithm of the probability density of the speech feature vector calculated using the probability density distribution of the state corresponding to the distance between the speech feature vector added to the cumulative weight of the hypothesis and the label of the arc. Calculated as the inverted value.

音声認識のための有向グラフは、例えば図５のように構成されている。始点ノードは黒丸、終点ノードは二重丸、それら以外のノードは丸で表されている。ノードを結ぶアークにラベルが付与されており、音素名とその音素内での位置を表すような表記になっている。例えば、ａ２は音素ａのＨＭＭの２番目の状態の確率密度分布を指すものとする。 A directed graph for speech recognition is configured as shown in FIG. 5, for example. The start node is indicated by a black circle, the end node is indicated by a double circle, and the other nodes are indicated by circles. A label is given to the arc connecting the nodes, and the phoneme name and the position in the phoneme are indicated. For example, a2 indicates the probability density distribution of the second state of the HMM of phoneme a.

また、“ａ１：赤”のように”：”で二つのラベルが書いてあるときは、ラベルと入力ベクトルの距離は”：”の前のａ１に基づいて計算し、”：”の後の”赤”は、このアークを通る完全仮説が探索結果として選ばれたときに、認識結果となるラベルである。図５の例では、上を通る経路と下を通る経路があり、上を通る完全仮説が選ばれたときは、“赤です”が認識結果となり、下を通る完全仮説が選ばれたときは、“青です”が認識結果となる。 When two labels are written with “:” as in “a1: red”, the distance between the label and the input vector is calculated based on a1 before “:”, and after “:”. “Red” is a label that becomes a recognition result when a complete hypothesis passing through this arc is selected as a search result. In the example of Fig. 5, there are a route that passes up and a route that passes down. When the complete hypothesis passing above is selected, "red" is the recognition result, and when the complete hypothesis passing below is selected. , “It is blue” is the recognition result.

音声信号入力部１３は音声を入力し、音声特徴ベクトル抽出部１４は音声信号入力部１３から送られた音声信号の短時間スペクトルパターンの特徴ベクトルの時系列を抽出する（ステップＡ３）。音声認識に用いる特徴ベクトルとしては、短い時間（例えば１０ミリ秒）ごとに音声信号を分析することにより得られるメルケプストラム（mel-frequency cepstral coefficients、MFCCと呼ばれる）、デルタＭＦＣＣ、ＬＰＣケプストラム、対数パワーなどがある。 The voice signal input unit 13 inputs voice, and the voice feature vector extraction unit 14 extracts a time series of feature vectors of the short-time spectrum pattern of the voice signal sent from the voice signal input unit 13 (step A3). Feature vectors used for speech recognition include a mel cepstrum (referred to as mel-frequency cepstral coefficients (MFCC)), delta MFCC, LPC cepstrum, logarithmic power obtained by analyzing a speech signal every short time (for example, 10 milliseconds). and so on.

経路探索装置１００は、入力系列として音声特徴ベクトル抽出部１４から得られる特徴ベクトルの時系列を用いる。経路探索装置１００は、音響モデル格納部１１から読み込んだ音響モデルと有向グラフ格納部１２から読み込んだ音声認識用の有向グラフを用いて、音声特徴ベクトル抽出部１４から送られた特徴ベクトルの時系列を読み込み、累積重み最小の完全仮説を求め、認識結果出力部１５に送る。 The route search apparatus 100 uses a time series of feature vectors obtained from the speech feature vector extraction unit 14 as an input series. The route search device 100 reads a time series of feature vectors sent from the speech feature vector extraction unit 14 using the acoustic model read from the acoustic model storage unit 11 and the directed graph for speech recognition read from the directed graph storage unit 12. The complete hypothesis with the minimum cumulative weight is obtained and sent to the recognition result output unit 15.

経路探索装置１００の機能構成及び処理（ステップＳ１からステップＳ８）は、上記［経路探索装置及び方法］の欄で説明したものと同様であるためここでは重複説明を省略する。なお、ラベルは、音素や隠れマルコフモデルの状態等の音響パターンに相当するラベルとし、ラベルと入力系列の要素との距離は、ラベルに対応する音素の平均的な特徴ベクトルからのユークリッド距離又は隠れマルコフモデルの状態に割り当てられた確率密度関数から算出される対数尤度とする。
認識結果出力部１５は、完全仮説に対応するラベルを音声認識結果として出力する（ステップＡ３）。 Since the functional configuration and processing (step S1 to step S8) of the route search device 100 are the same as those described in the section [Route search device and method] above, duplicate description is omitted here. Note that the label is a label corresponding to an acoustic pattern such as a phoneme or a state of a hidden Markov model, and the distance between the label and the element of the input sequence is the Euclidean distance or the hidden from the average feature vector of the phoneme corresponding to the label. The log likelihood is calculated from the probability density function assigned to the state of the Markov model.
The recognition result output unit 15 outputs a label corresponding to the complete hypothesis as a speech recognition result (step A3).

［実験結果］
図２に示す形態で音声認識装置を構築した。音響モデルには、４３種類の音素に対するＨＭＭを用意し、各音素ごとに３つの状態があり、各状態にはその音素のコンテキスト（前にある音素は何か、後ろに続く音素は何か）に応じて３０６４種類ある確率密度分布の内の一つが割り当てられている。 [Experimental result]
A speech recognition apparatus was constructed in the form shown in FIG. The acoustic model has HMMs for 43 phonemes, and each phoneme has three states. Each state has a phoneme context (what is the previous phoneme and what is the phoneme that follows)? Accordingly, one of the 3064 probability density distributions is assigned.

有向グラフは、２０、０００単語からなる発音辞書と、単語の３連鎖確率を与えるトライグラム言語モデルとを用いて、文献１に示される方法で構築した重み付き有限状態トランスデューサを用いた。有向グラフのノード数７２４６９、アーク数は１０２０７３となった。 The directed graph used a weighted finite state transducer constructed by the method shown in Document 1 using a pronunciation dictionary composed of 20,000 words and a trigram language model that gives a three-chain probability of words. The number of nodes in the directed graph is 72469, and the number of arcs is 102073.

音声の特徴ベクトル時系列は、１０ミリ秒ごとに音声信号を分析することにより得られるＭＦＣＣ１２次元、ＭＦＣＣの各次元の時系列方向に前後２フレーム見たきの一次回帰係数であるデルタＭＦＣＣ１２次元、各次元の時系列方向に前後２フレーム見たきの一次回帰係数であるデルタデルタＭＦＣＣ１２次元、および対数パワーを合わせた３９次元のベクトルを要素とする入力系列として抽出する。 The feature vector time series of speech is the MFCC 12 dimension obtained by analyzing the speech signal every 10 milliseconds, the delta MFCC 12 dimension which is the primary regression coefficient when looking at the two frames before and after the time series direction of each dimension of the MFCC, each dimension The delta delta MFCC 12-dimensional, which is a primary regression coefficient when viewing two frames before and after in the time series direction, and a 39-dimensional vector combined with logarithmic power are extracted as input sequences.

また、事前に男性話者が読み上げた３３８２０文を学習データとして、仮説枝刈判別関数更新部８（図１）により仮説枝刈判別関数のパラメータを１００回更新した。このとき仮説ｈの素性ベクトルとして、 Further, the hypothetical pruning discriminant function updating unit 8 (FIG. 1) updated the parameters of the hypothetical pruning discriminant function 100 times using the 33820 sentences read by the male speaker in advance as learning data. At this time, as a feature vector of hypothesis h,

を用いた。
前述したように、１次元目の素性ｗ［ｈ］−（ｍｉｎ_ｇ∈Ｈｗ［ｇ］＋Ｂ）は、集合Ｈ内の仮説の累積重みの最小値に正の定数Ｂ（ビーム幅に相当）を加えた値をｈの累積重みｗ［ｈ］から引いた値であり、この素性が正であれば従来のビーム幅Ｂに従ってｈを枝刈する、負になれば枝刈しない、という判別値を与える。２次元目以降のｆ_{ａｒｃ［ｊ］}（ａ［ｈ］）はａ［ｈ］が有向グラフのｊ番目のアークであれば１、そうでなければ０を返すような関数である。
仮説枝刈判別関数のパラメータである素性重みとして Was used.
As described above, the first-dimensional feature w [h] − (min _g∈H w [g] + B) is a positive constant B (corresponding to the beam width) to the minimum value of the hypothetical cumulative weight in the set H. Is a value obtained by subtracting h from the cumulative weight w [h] of h, and if this feature is positive, h is pruned according to the conventional beam width B, and if it is negative, it is not determined to be pruned. give. _{Far [j]} (a [h]) in the second and subsequent dimensions is a function that returns 1 if a [h] is the j-th arc of the directed graph, and returns 0 otherwise.
As feature weights as parameters of hypothetical pruning discriminant function

を用いた。仮説枝刈判別関数は線形判別関数Λ・φ（ｈ，Ｈ）とし、リスク関数としてはシグモイド関数式（４）を用いた。パラメータの更新は、式（１３）の探索誤りリスクの総和に基づき、式（１５）の微分と式（１６）の更新式を用いて行った。 Was used. The hypothetical pruning discriminant function is a linear discriminant function Λ · φ (h, H), and the sigmoid function formula (4) is used as the risk function. The update of the parameters was performed using the derivative of equation (15) and the update equation of equation (16) based on the sum of the search error risks of equation (13).

この音声認識装置を用いて、被験者が新聞記事中の１００文章を読み上げた音声を入力したときの、音声認識処理に要した処理時間を図６に示す。ただし、処理時間は、音声認識に要した処理時間を実際に発話された時間で割った値（実時間比）とする。また、単語誤り率とは、正解の文と認識結果とを比べたときの単語あたりの誤りの割合であり、通常次のように計算される。 FIG. 6 shows the processing time required for the speech recognition processing when the subject inputs speech that reads out 100 sentences in a newspaper article using this speech recognition apparatus. However, the processing time is a value (actual time ratio) obtained by dividing the processing time required for speech recognition by the actual speech time. The word error rate is a ratio of errors per word when the correct sentence is compared with the recognition result, and is usually calculated as follows.

置換誤り数は正解文の単語が認識結果の中で別の単語に認識された回数、挿入誤り数は正解文には存在しない単語が認識結果に挿入された回数、削除誤り数は正解文に含まれる単語が認識結果に含まれなかった回数を表している。単語誤り率が小さいほど、認識精度が高いことを意味する。 The number of replacement errors is the number of times the correct sentence word is recognized as another word in the recognition result, the number of insertion errors is the number of times a word that does not exist in the correct sentence is inserted into the recognition result, and the number of deletion errors is the correct sentence. This represents the number of times that the included word was not included in the recognition result. A smaller word error rate means higher recognition accuracy.

従来のビーム探索法では、前述のｗ［ｈ］−（ｍｉｎ_ｇ∈Ｈｗ［ｇ］＋Ｂ）のみによる枝刈判定を行っている。 In the conventional beam search method, the pruning determination is performed only by the aforementioned w [h] − (min _gεH w [g] + B).

図６の結果より、従来のビーム探索法とこの実験例では実時間比（処理時間）が同等で、この実験例の方が単語誤り率が小さくなった。これより、この実験例による音声認識では、探索誤りを削減できていることが示された。実時間比が同一なのは、式（１３）の探索誤りリスクの総和を用いているためで、ペナルティ項の効果により、判別関数の値が全体として正や負に偏よるのを抑え、計算量を一定に保っているためである。 From the results shown in FIG. 6, the real-time ratio (processing time) is equal between the conventional beam search method and this experimental example, and the word error rate is smaller in this experimental example. From this, it was shown that the search error can be reduced in the speech recognition according to this experimental example. The real-time ratio is the same because the sum of the search error risks in equation (13) is used. Due to the effect of the penalty term, the discriminant function value as a whole is suppressed from being biased to positive or negative, and the amount of calculation is reduced. This is because it is kept constant.

［変形例等］
図１，図２では、各部でデータが直接やり取りされているが、図示されていない記憶部を介してデータの受け渡しが行われてもよい。すなわち、各部で生成された又は受信したデータは記憶部に記憶され、各部は記憶部からそのデータを読み込んでもよい。 [Modifications, etc.]
In FIG. 1 and FIG. 2, data is directly exchanged between the respective units, but data may be transferred via a storage unit (not shown). That is, data generated or received by each unit may be stored in the storage unit, and each unit may read the data from the storage unit.

経路探索装置及び音声認識装置のそれぞれは、コンピュータによって実現することができる。この場合、各装置がそれぞれ有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、これらの装置における各処理機能が、コンピュータ上で実現される。 Each of the route search device and the speech recognition device can be realized by a computer. In this case, the processing contents of the functions that each device should have are described by a program. Then, by executing this program on a computer, each processing function in these devices is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、これらの装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. In this embodiment, these apparatuses are configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

この発明は、上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。例えば、上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。 The present invention is not limited to the above-described embodiment, and can be modified as appropriate without departing from the spirit of the present invention. For example, the various processes described above are not only executed in time series according to the description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes.

１記憶部
２制御部
３初期仮説生成部
４入力系列読込部
５仮説更新部
６仮説枝刈部
６１仮説素性抽出部
６２仮説枝刈判別値算出部
６３仮説枝刈判定部
７完全仮説出力部
８仮説枝刈判別関数更新部
１１音響モデル格納部
１２有向グラフ格納部
１３音声信号入力部
１４音声特徴ベクトル抽出部
１５認識結果出力部
１６学習用データ
１００経路探索装置 DESCRIPTION OF SYMBOLS 1 Memory | storage part 2 Control part 3 Initial hypothesis generation part 4 Input series reading part 5 Hypothesis update part 6 Hypothesis pruning part 61 Hypothesis feature extraction part 62 Hypothesis pruning discrimination value calculation part 63 Hypothesis pruning judgment part 7 Complete hypothesis output part 8 Hypothesis pruning discrimination function update unit 11 Acoustic model storage unit 12 Directed graph storage unit 13 Speech signal input unit 14 Speech feature vector extraction unit 15 Recognition result output unit 16 Learning data 100 Route search device

Claims

In a plurality of paths from the start point node to the end point node of a directed graph composed of a plurality of nodes including a start point node and an end point node, and arcs that connect the nodes and have weights and labels, a finite length In a route search device that finds a route that matches an input sequence,
An initial hypothesis generation unit having a hypothesis that is a path from the start point node to a certain node, and a hypothesis that reaches the start point node first;
An input sequence reading section for sequentially reading the elements of the input sequence;
Each hypothesis included in the set of hypotheses includes each arc that can transition from the last node reached by each hypothesis and the node to which each arc transitions, and a plurality of new hypotheses from each hypothesis. A value obtained by adding the cumulative weight of each hypothesis, the weight of each arc, and the distance between the label of each arc and the element of the read input sequence as the cumulative weight of the new hypothesis A hypothesis updating unit that replaces the new hypothesis with the new hypothesis set;
A hypothesis pruning unit that removes hypotheses with a large or small cumulative weight from the set of updated hypotheses;
Assuming that the hypothesis that has reached the end node is a complete hypothesis, after the elements to be read in the input series are eliminated, one or more complete hypotheses with a small or large cumulative weight among the complete hypotheses included in the set of hypotheses are output. A complete hypothesis output unit, and
The hypothesis pruning unit inputs the extracted feature to a hypothesis feature extracting unit that extracts a feature for each hypothesis and a discriminant function that outputs an indicator as to whether the hypothesis is excluded or not with the feature as an input. A hypothesis pruning discriminant value calculation unit that calculates an output value of each hypothesis and determines whether or not the discriminant value is greater than a predetermined threshold, and corresponds to the discriminant value A hypothesis pruning determination unit that removes a hypothesis from a set of hypotheses, or excludes a hypothesis corresponding to the discriminant value from the set of hypotheses if it is determined by determining whether the discriminant value is smaller than a predetermined threshold; seen including,
The features of the above hypothesis are
(a) The last node number that the hypothesis has reached,
(b) The last arc number that the hypothesis has reached,
(c) the length of the hypothesis and
(d) the label of the last arc that the hypothesis has reached,
A feature vector containing at least one of
The discriminant function is a function for obtaining an inner product of a feature vector and a parameter vector having the same dimension as the feature vector.
A route search apparatus characterized by that.

The route search device according to claim 1,
The discriminant value for the hypothesis on the path of the complete hypothesis becomes smaller as the complete hypothesis with the smaller cumulative weight, or the discriminant value for the hypothesis on the path of the complete hypothesis becomes larger as the complete hypothesis with the larger cumulative weight, A hypothesis pruning discriminant function updating unit for updating the discriminant function;
A route search apparatus characterized by that.

It includes a route searching apparatus according to claim 1 or 2,
The elements of the input sequence are feature vectors given by short-time spectral analysis of a speech signal,
The above label is a label corresponding to an acoustic pattern such as a phoneme or a state of a hidden Markov model,
The distance between the label and the element of the input series is a log likelihood calculated from the Euclidean distance from the average feature vector of the phoneme corresponding to the label or the probability density function assigned to the state of the hidden Markov model. ,
Search the path that best matches the input speech from the directed graph expressing the phoneme connections allowed as a language, and use the resulting label sequence on the path as the result of speech recognition.
A speech recognition apparatus characterized by that.

In a plurality of paths from the start point node to the end point node of a directed graph composed of a plurality of nodes including a start point node and an end point node, and arcs that connect the nodes and have weights and labels, a finite length In a route search method for finding a route that matches an input sequence,
An initial hypothesis generation unit that assumes one path from the start node to a certain node as a hypothesis, and an initial hypothesis generation step in which a hypothesis that first reaches the start node is a set of hypotheses;
An input sequence reading step, in which the input sequence reading unit sequentially reads the elements of the input sequence;
The hypothesis updating unit adds each arc that can be transitioned from the last node reached by each hypothesis to each hypothesis included in the set of hypotheses and the transition destination node of each arc, and then adds a plurality of hypotheses from each hypothesis. A new hypothesis is generated, and the value obtained by adding the cumulative weight of each hypothesis, the weight of each arc, and the distance between the label of each arc and the element of the read input sequence is added to the new hypothesis. As a cumulative weight, a hypothesis updating step for replacing the new hypothesis set with the new hypothesis set,
A hypothesis pruning unit that removes hypotheses with a large or small cumulative weight from the updated set of hypotheses from the set;
The complete hypothesis output unit uses the hypothesis that has reached the end node as a complete hypothesis, and after the elements to be read in the input series are eliminated, one or more cumulative weights among the complete hypotheses included in the set of hypotheses are small or large A complete hypothesis output step for outputting a complete hypothesis of
In the hypothesis pruning step, when the extracted feature is input to a hypothesis feature extraction step that extracts a feature for each hypothesis and a discriminant function that outputs an index as to whether or not the hypothesis is excluded with the feature input A hypothesis pruning discriminant value calculating step for calculating the output value of each hypothesis and determining whether or not the discriminant value is greater than a predetermined threshold value and corresponding to the discriminant value A hypothesis pruning determination step unit that excludes a hypothesis from a set of hypotheses, or excludes a hypothesis corresponding to the determination value from the set of hypotheses if it is determined by determining whether the determination value is smaller than a predetermined threshold, and only including,
The features of the above hypothesis are
(a) The last node number that the hypothesis has reached,
(b) The last arc number that the hypothesis has reached,
(c) the length of the hypothesis and
(d) the label of the last arc that the hypothesis has reached,
A feature vector containing at least one of
The discriminant function is a function for obtaining an inner product of a feature vector and a parameter vector having the same dimension as the feature vector.
A route search method characterized by that.

The route search method according to claim 4 ,
Update the discriminant function so that the discriminant value for the hypothesis on the path of the complete hypothesis decreases as the complete hypothesis with the smaller cumulative weight, or the discriminant value for the hypothesis on the path of the complete hypothesis as the complete hypothesis with the greater cumulative weight Further includes a hypothesis pruning discriminant function update step for updating the discriminant function so that
A route search method characterized by that.

Including the route search method according to claim 4 or 5 ,
The elements of the above input sequence are feature vectors given by short-time spectral analysis of speech signals.
The above label is a label corresponding to an acoustic pattern such as a phoneme or a state of a hidden Markov model,
The distance between the label and the element of the input series is a log likelihood calculated from the Euclidean distance from the average feature vector of the phoneme corresponding to the label or the probability density function assigned to the state of the hidden Markov model. ,
Search the path that best matches the input speech from the directed graph expressing the phoneme connections allowed as a language, and use the resulting label sequence on the path as the result of speech recognition.
A speech recognition method characterized by the above.

A program for causing a computer to function as the route search device according to claim 1 or 2 or the voice recognition device according to claim 3 .