JP2014016731A

JP2014016731A - Data update processing device, method, and program

Info

Publication number: JP2014016731A
Application number: JP2012152857A
Authority: JP
Inventors: Yasushi Sakurai; 保志櫻井; Yasuko Matsubara; 靖子松原; Masatoshi Yoshikawa; 正俊吉川
Original assignee: Nippon Telegraph and Telephone Corp; Kyoto University NUC
Current assignee: Nippon Telegraph and Telephone Corp; Kyoto University NUC
Priority date: 2012-07-06
Filing date: 2012-07-06
Publication date: 2014-01-30

Abstract

PROBLEM TO BE SOLVED: To update, when a new sequence is input, a hidden Markov model of a hierarchical structure in a manner that the computation time and memory usage are reduced.SOLUTION: The HMM of the top layer to the HMM of the bottom layer are selected for an input sequence, the input sequence is inserted into the selected HMM of the bottom layer, and weight is determined (S103). When a space for the new sequence does not exist, the selected HMM is divided into 2 HMMs, and a plurality of HMMs in the next lower layer belonging to the selected HMM, or a plurality of sequences belonging to the selected HMM are allocated to the two divided HMMs (S106). Each parameter of the two divided HMMs is estimated (S109). When it is determined that the two divided HMMs are in the bottom layer and the value based on the likelihood of the sequence to the HMM is inappropriate (S111), the sequence is re-inserted(S113).

Description

本発明は、データ更新処理装置、方法、及びプログラムに係り、特に、入力されたシーケンスに基づいて、隠れマルコフモデルを更新するデータ更新処理装置、方法、及びプログラムに関する。 The present invention relates to a data update processing device, method, and program, and more particularly, to a data update processing device, method, and program for updating a hidden Markov model based on an input sequence.

隠れマルコフモデル(HMM;Hidden Markov Model)は不確定な時系列のデータをモデル化するための有効な統計的手法である。HMMはノイズにロバストであり、話者認識や自然言語処理、たんぱく質やDNAを含む遺伝子列解析等多くのアプリケーションにおいて使用されている。HMMは、初期状態確率π=｛π_i｝、状態遷移確率A=｛ａ_ij｝、シンボル出力確率B=｛ｂ_i(v)｝から構成される。 Hidden Markov Model (HMM) is an effective statistical method for modeling uncertain time series data. HMM is robust to noise and is used in many applications such as speaker recognition, natural language processing, and gene sequence analysis including proteins and DNA. The HMM includes an initial state probability π = {π _i }, a state transition probability A = {a _ij }, and a symbol output probability B = {b _i (v)}.

モデルΘとシーケンスｘが与えられたとき、これらの尤度L(x｜Θ)は、Viterbiアルゴリズムにより、以下の（１）式のように計算できる。 Given a model Θ and a sequence x, these likelihoods L (x | Θ) can be calculated by the Viterbi algorithm as shown in the following equation (1).

ここで、δ_t(i)は、時刻tにおける状態iの最大確率を示す。 Here, δ _t (i) indicates the maximum probability of state i at time t.

また、EMアルゴリズムは、欠損値があるようなデータに対し、観測不可能な潜在変数の確率モデルのパラメータを最尤法に基づいて推定する手法である。EMアルゴリズムは反復法の一種であり、Eステップ(Expectation)とMステップ(Maximization)を交互に繰り返すことで計算が進行する。 The EM algorithm is a method for estimating the parameters of a probability model of an unobservable latent variable based on the maximum likelihood method for data having a missing value. The EM algorithm is a kind of iterative method, and the calculation proceeds by alternately repeating the E step (Expectation) and the M step (Maximization).

Eステップ: HMMにおいて、Baum-WelchアルゴリズムはEステップにおける効率的な期待値計算の手法である。Baum-WelchアルゴリズムはHMMのパラメータΘ=｛π,A,B｝に基づき、期待値を計算する。γ_t(i)は、シーケンスが与えられた上での時刻tにおける状態iの確率を示し、以下の（２）式のように計算される。 E step: In HMM, Baum-Welch algorithm is an efficient method of expected value calculation in E step. The Baum-Welch algorithm calculates an expected value based on the HMM parameter Θ = {π, A, B}. γ _t (i) indicates the probability of state i at time t after the sequence is given, and is calculated as in the following equation (2).

ここで、α_t(i)=P(x₁,...,x_t,q_t=i|Θ),β_t(i)=P(x_t+1,...,x_m｜ｑ_t=i,Θ)である。q_tは時刻tにおける最適な状態を示す。ξ_t(i,j)は、時刻tにおける状態iと、時刻t+1における状態jの遷移確率であり、以下の（３）式で表わされる。
Where α _t (i) = P (x ₁ , ..., x _t , q _t = i | Θ), β _t (i) = P (x _t +1, ..., x _m | q _t = i, Θ). q _t represents an optimum state at time t. ξ _t (i, j) is the transition probability between state i at time t and state j at time t + 1, and is expressed by the following equation (3).

Mステップ: アルゴリズムは、期待値γ_t(i),ξ_t(i,j)とシーケンスxに基づき、尤度を最大化する新たなモデルパラメータ~Θを計算する。モデルパラメータの更新式は、以下の（４）式〜（６）式である。 M step: The algorithm calculates a new model parameter ˜Θ that maximizes the likelihood based on the expected values γ _t (i), ξ _t (i, j) and the sequence x. The model parameter update formulas are the following formulas (4) to (6).

ここで、x_tは時刻tにおけるxのシンボルを示す。 Here, x _t indicates a symbol of x at time t.

また、階層型隠れマルコフモデル(Hierarchical Hidden Markov Model)は階層構造を持つシーケンスを表現するためにHMMを拡張したものである（非特許文献１）。隠れ状態を階層化、すなわち木構造に基づいており、EMアルゴリズムを用いて各階層のモデルパラメータの推定を行なう。 The Hierarchical Hidden Markov Model is an extension of the HMM to represent a sequence having a hierarchical structure (Non-Patent Document 1). Hidden states are hierarchized, that is, based on a tree structure, and model parameters of each hierarchy are estimated using an EM algorithm.

Fine,S.,Singer,Y.and Tishby,N. ”The Hierarchical Hidden Markov Model”, Analysis and Applications, Machine Learning, Vol.32, No.1, pp.41-62(1998).Fine, S., Singer, Y. and Tishby, N. “The Hierarchical Hidden Markov Model”, Analysis and Applications, Machine Learning, Vol.32, No.1, pp.41-62 (1998).

本発明で扱う問題が大量のシーケンス集合から任意の数のモデルを決定することにあるのに対し、上記の階層型HMMはシーケンス全体を様々な長さに分割し、階層的なモデル構造を構築することにある。階層型HMMを本発明の問題に適用することは可能であるが、モデル推定に多大な計算コストを必要とする。特に、新たにシーケンスデータが与えられると、過去に蓄積された全てのシーケンスを用いてモデル推定を行う必要がある。 While the problem dealt with in the present invention is to determine an arbitrary number of models from a large number of sequence sets, the above-mentioned hierarchical HMM divides the entire sequence into various lengths to build a hierarchical model structure There is to do. Although it is possible to apply a hierarchical HMM to the problem of the present invention, a large calculation cost is required for model estimation. In particular, when new sequence data is given, it is necessary to perform model estimation using all sequences accumulated in the past.

本発明は、上記の事情を鑑みてなされたもので、新たにシーケンスが入力された場合に、計算時間とメモリ使用量とを削減して、階層構造の隠れマルコフモデルを更新することができるデータ更新処理装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances. When a sequence is newly input, data that can reduce the calculation time and the memory usage and update the hidden Markov model of the hierarchical structure. An object of the present invention is to provide an update processing apparatus, method, and program.

上記の目的を達成するために本発明に係るデータ更新処理装置は、階層構造で定められた複数の隠れマルコフモデル（ＨＭＭ）の各々のパラメータ、及び最下層のＨＭＭの少なくとも１つに属し、かつ、属する最下層のＨＭＭの各々に対する重みが付された複数のシーケンスを記憶する記憶手段と、入力されたシーケンスについて、階層構造に従って、最上層のＨＭＭから最下層のＨＭＭまで、各ＨＭＭに対する尤度に基づいて前記ＨＭＭを選択して、前記入力されたシーケンスが属する少なくとも１つの最下層のＨＭＭを選択し、前記選択された最下層のＨＭＭの各々に属するシーケンスとして、前記入力されたシーケンスを挿入すると共に、前記選択された最下層のＨＭＭの各々に対する前記シーケンスの重みを求めるシーケンス挿入手段と、前記シーケンス挿入手段によって選択された最上層のＨＭＭから最下層のＨＭＭまでの各ＨＭＭについて、新たなシーケンスのための空きがある場合、前記選択された前記ＨＭＭのパラメータを、前記入力されたシーケンスに基づいて更新する更新手段と、前記シーケンス挿入手段によって選択された最上層のＨＭＭから最下層のＨＭＭまでの各ＨＭＭについて、新たなシーケンスのための空きがない場合、前記選択された前記ＨＭＭを、２つのＨＭＭに分割すると共に、前記選択されたＨＭＭに属する１つ下の層の複数のＨＭＭ、又は前記選択されたＨＭＭに属する複数のシーケンスを、分割された２つのＨＭＭに割り振る分割手段と、前記分割手段によって分割された２つのＨＭＭの各々のパラメータを、前記ＨＭＭに割り振られたＨＭＭの配下のシーケンス、又は前記ＨＭＭに割り振られたシーケンスに基づいて推定する推定手段と、前記分割手段によって分割された２つのＨＭＭが最下層のＨＭＭであって、前記最下層のＨＭＭに属するシーケンスの前記ＨＭＭに対する尤度に基づく値が、閾値と比較して不適切な値であると判定される場合、前記シーケンスを、前記最下層のＨＭＭに属するシーケンスから削除して、前記シーケンスに対応する前記最下層のＨＭＭから最上層のＨＭＭまでの各ＨＭＭのパラメータを更新すると共に、前記シーケンスについて、階層構造に従って、最上層のＨＭＭから最下層のＨＭＭまで、各ＨＭＭに対する尤度に基づいて前記ＨＭＭを選択して、前記シーケンスが属する少なくとも１つの最下層のＨＭＭを選択し、前記選択された最下層のＨＭＭの各々に属するシーケンスとして、前記シーケンスを再挿入すると共に、前記選択された最下層のＨＭＭの各々に対する前記シーケンスの重みを求めるシーケンス再挿入手段と、を含んで構成されている。 In order to achieve the above object, a data update processing device according to the present invention belongs to at least one of a parameter of each of a plurality of hidden Markov models (HMMs) defined in a hierarchical structure, and an HMM in a lowermost layer, and Storage means for storing a plurality of sequences weighted for each of the lowest-level HMMs to which it belongs, and the likelihood of each of the input sequences from the highest-level HMM to the lowest-level HMM according to the hierarchical structure And selecting at least one lowest-layer HMM to which the input sequence belongs, and inserting the input sequence as a sequence belonging to each of the selected lowest-layer HMMs. And a sequence insertion unit for determining a weight of the sequence for each of the selected lowermost HMMs. When there is a space for a new sequence for each HMM from the topmost HMM to the bottommost HMM selected by the sequence insertion means, the parameters of the selected HMM are input. Update means for updating based on a sequence, and when there is no free space for a new sequence for each HMM from the uppermost HMM to the lowermost HMM selected by the sequence insertion means, the selected HMM Is divided into two HMMs, and a plurality of HMMs in the next lower layer belonging to the selected HMM or a plurality of sequences belonging to the selected HMM are allocated to the two divided HMMs And each parameter of the two HMMs divided by the dividing means is assigned to the HMM. An estimation means for estimating based on a sequence under the HMM, or a sequence assigned to the HMM, and two HMMs divided by the dividing means are lowermost HMMs, and belong to the lowermost HMM If the value based on the likelihood of the HMM is determined to be an inappropriate value compared to the threshold, the sequence is deleted from the sequence belonging to the lowest-order HMM, and the sequence corresponds to the sequence Update the parameters of each HMM from the lowest HMM to the highest HMM and, for the sequence, according to the hierarchical structure, the HMM based on the likelihood for each HMM from the highest HMM to the lowest HMM To select at least one lowest HMM to which the sequence belongs, and to select the selected lowest HMM. The sequence belonging to each lower layer HMM includes a sequence reinsertion unit that reinserts the sequence and obtains the weight of the sequence for each of the selected lowermost HMMs.

本発明に係るデータ更新処理方法は、階層構造で定められた複数の隠れマルコフモデル（ＨＭＭ）の各々のパラメータ、及び最下層のＨＭＭの少なくとも１つに属し、かつ、属する最下層のＨＭＭの各々に対する重みが付された複数のシーケンスを記憶する記憶手段、シーケンス挿入手段、更新手段、分割手段、推定手段、及びシーケンス再挿入手段を含むデータ更新処理装置におけるデータ更新処理方法であって、前記シーケンス挿入手段によって、入力されたシーケンスについて、階層構造に従って、最上層のＨＭＭから最下層のＨＭＭまで、各ＨＭＭに対する尤度に基づいて前記ＨＭＭを選択して、前記入力されたシーケンスが属する少なくとも１つの最下層のＨＭＭを選択し、前記選択された最下層のＨＭＭの各々に属するシーケンスとして、前記入力されたシーケンスを挿入すると共に、前記選択された最下層のＨＭＭの各々に対する前記シーケンスの重みを求め、前記更新手段によって、前記シーケンス挿入手段によって選択された最上層のＨＭＭから最下層のＨＭＭまでの各ＨＭＭについて、新たなシーケンスのための空きがある場合、前記選択された前記ＨＭＭのパラメータを、前記入力されたシーケンスに基づいて更新し、前記分割手段によって、前記シーケンス挿入手段によって選択された最上層のＨＭＭから最下層のＨＭＭまでの各ＨＭＭについて、新たなシーケンスのための空きがない場合、前記選択された前記ＨＭＭを、２つのＨＭＭに分割すると共に、前記選択されたＨＭＭに属する１つ下の層の複数のＨＭＭ、又は前記選択されたＨＭＭに属する複数のシーケンスを、分割された２つのＨＭＭに割り振り、前記推定手段によって、前記分割手段によって分割された２つのＨＭＭの各々のパラメータを、前記ＨＭＭに割り振られたＨＭＭの配下のシーケンス、又は前記ＨＭＭに割り振られたシーケンスに基づいて推定し、前記シーケンス再挿入手段によって、前記分割手段によって分割された２つのＨＭＭが最下層のＨＭＭであって、前記最下層のＨＭＭに属するシーケンスの前記ＨＭＭに対する尤度に基づく値が、閾値と比較して不適切な値であると判定される場合、前記シーケンスを、前記最下層のＨＭＭに属するシーケンスから削除して、前記シーケンスに対応する前記最下層のＨＭＭから最上層のＨＭＭまでの各ＨＭＭのパラメータを更新すると共に、前記シーケンスについて、階層構造に従って、最上層のＨＭＭから最下層のＨＭＭまで、各ＨＭＭに対する尤度に基づいて前記ＨＭＭを選択して、前記シーケンスが属する少なくとも１つの最下層のＨＭＭを選択し、前記選択された最下層のＨＭＭの各々に属するシーケンスとして、前記シーケンスを再挿入すると共に、前記選択された最下層のＨＭＭの各々に対する前記シーケンスの重みを求める。 The data update processing method according to the present invention includes a parameter of each of a plurality of hidden Markov models (HMMs) defined in a hierarchical structure and at least one of the lowest-layer HMMs, and each of the lowest-layer HMMs to which it belongs A data update processing method in a data update processing apparatus, comprising storage means for storing a plurality of sequences weighted with respect to, sequence insertion means, update means, division means, estimation means, and sequence reinsertion means, wherein the sequence The input means selects the HMM based on the likelihood for each HMM from the highest layer HMM to the lowest layer HMM according to the hierarchical structure with respect to the input sequence by the inserting means, and at least one to which the input sequence belongs A lowermost HMM is selected, and a sequence belonging to each of the selected lowermost HMMs And inserting the inputted sequence and determining the weight of the sequence for each of the selected lowest layer HMMs, and by the updating means, from the highest layer HMM selected by the sequence insertion means to the lowest layer If there is room for a new sequence for each of the HMMs up to the HMM, the parameters of the selected HMM are updated based on the input sequence, by the dividing means, by the sequence inserting means For each HMM from the selected top HMM to the bottom HMM, if there is no room for a new sequence, the selected HMM is divided into two HMMs and the selected HMM A plurality of HMMs in the next lower layer belonging to the group, or a group belonging to the selected HMM Are assigned to the two divided HMMs, and the parameters of the two HMMs divided by the dividing means are assigned by the estimating means to the sequences under the HMM assigned to the HMM, or to the HMM. The two HMMs estimated by the sequence re-insertion means and divided by the division means are the lowest-layer HMMs, and the likelihood of the sequence belonging to the lowest-layer HMM to the HMM If the value based on is determined to be an inappropriate value compared to the threshold value, the sequence is deleted from the sequence belonging to the lowermost layer HMM, and the lowermost layer HMM corresponding to the sequence is deleted. Update the parameters of each HMM up to the top layer HMM and According to the structure, from the top layer HMM to the bottom layer HMM, select the HMM based on the likelihood for each HMM, select at least one bottom layer HMM to which the sequence belongs, and select the bottom layer selected The sequence is reinserted as a sequence belonging to each of the HMMs, and the sequence weight for each of the selected lowest-layer HMMs is obtained.

本発明に係るプログラムは、コンピュータを、上記のデータ更新処理装置の各手段として機能させるためのプログラムである。 The program according to the present invention is a program for causing a computer to function as each unit of the data update processing device.

以上説明したように、本発明のデータ更新処理装置、方法、及びプログラムによれば、入力された新たなシーケンスについて、階層構造で定められた複数の隠れマルコフモデル（ＨＭＭ）に対して、最上層のＨＭＭから最下層のＨＭＭまでＨＭＭを選択して、最下層のＨＭＭに、入力されたシーケンスを挿入し、ＨＭＭに新たなシーケンスのための空きがない場合に、ＨＭＭを分割して、下の層のＨＭＭ又はシーケンスを割り振り、分割した結果、不適切なシーケンスの割り振りが発生した場合には、当該シーケンスについて、最上層のＨＭＭから最下層のＨＭＭまで、ＨＭＭを選択して、最下層のＨＭＭに、再挿入することにより、新たにシーケンスが入力された場合に、計算時間とメモリ使用量とを削減して、階層構造の隠れマルコフモデルを更新することができる、という効果が得られる。 As described above, according to the data update processing device, method, and program of the present invention, with respect to a plurality of hidden Markov models (HMMs) defined in a hierarchical structure, a newest input sequence is the top layer. Select the HMM from the lowest HMM to the lowest HMM, insert the input sequence into the lowest HMM, and if there is no room for a new sequence in the HMM, If an inappropriate sequence allocation occurs as a result of allocating and dividing a layer HMM or sequence, an HMM is selected from the top layer HMM to the bottom layer HMM for the sequence, and the bottom layer HMM is selected. When a new sequence is input by re-inserting, the calculation time and memory usage are reduced, and the hidden Markov model of the hierarchical structure is reduced. Can be updated Le, effect is obtained that.

本発明の第１の実施の形態に係るデータ更新処理装置の構成を示す概略図である。It is the schematic which shows the structure of the data update processing apparatus which concerns on the 1st Embodiment of this invention. 階層モデル構造を例示した図である。It is the figure which illustrated the hierarchical model structure. 階層モデル構造の更新を例示した図である。It is the figure which illustrated the update of the hierarchical model structure. 本発明の第１の実施の形態に係るデータ更新処理装置における階層モデル更新処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the hierarchy model update process routine in the data update processing apparatus which concerns on the 1st Embodiment of this invention. 本発明の第２の実施の形態に係る近似データ更新処理装置の構成を示す概略図である。It is the schematic which shows the structure of the approximate data update processing apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施の形態に係る近似データ更新処理装置における階層型サンプリング処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the hierarchical sampling process routine in the approximate data update processing apparatus which concerns on the 2nd Embodiment of this invention. モーションキャプチャデータにおける左右の足の運動エネルギーを表現したシーケンスの例を示した図である。It is the figure which showed the example of the sequence showing the kinetic energy of the right and left leg in motion capture data. 比較手法及び本発明の手法によってシーケンスを分類した結果を示した図である。It is the figure which showed the result of having classified the sequence by the comparison method and the method of this invention. 計算コストに関する実験結果を示した図である。It is the figure which showed the experimental result regarding calculation cost.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

〔第１の実施の形態〕
＜発明の概要＞
X をn 個の時系列シーケンスデータとする（すなわちX=｛x₁,...,x_n｝）。本発明の目的は、Xの特徴を統計的に表現するようなc個のHMMを決定することである。全てのシーケンスx_i(i=1,...,n) は、c個のモデルの中で1つもしくは複数のモデルに確率w(すなわち重み0≦w≦1)で属しているものとみなす。そこで、本発明が解決する問題は次の通りである。 [First Embodiment]
<Outline of the invention>
Let X be n time-series sequence data (that is, X = {x ₁ , ..., x _n }). An object of the present invention is to determine c HMMs that statistically represent the features of X. All sequences x _i (i = 1, ..., n) are considered to belong to one or more of the c models with probability w (ie weight 0 ≤ w ≤ 1) . Therefore, the problems solved by the present invention are as follows.

（問題1）：n個のシーケンスからなる時系列データ集合X=｛x₁,...,x_n｝が定間隔で与えられるとき、尤度の合計Σⁿ _i=1Σ^c _j=1w_i,jL(x_i｜Θ_j)が最大化するようなモデルΘ_j(j=1,...,c)を発見する。 (Problem 1): When a time-series data set X = {x ₁ , ..., x _n } consisting of n sequences is given at regular intervals, the total likelihood Σ ⁿ _{i = 1} Σ ^c _{j = 1} Find a model Θ _j (j = 1, ..., c) such that w _{i, j} L (x _i | Θ _j ) is maximized.

モデル推定のための学習用データ集合が時間に伴い増加する、すなわちシーケンスの個数nが増加するような動的な状況下においては、上記の問題を解決するために工夫が必要である。入力シーケンスを連続的に学習し、どの時刻においてもユーザが要求した時点で任意のc個のモデルを提示する方法が必要となる。具体的に、理想的な解決策として次の3つの要求を満たす必要がある。 In a dynamic situation in which the learning data set for model estimation increases with time, that is, the number of sequences n increases, it is necessary to devise in order to solve the above problem. There is a need for a method that continuously learns the input sequence and presents arbitrary c models at the time when the user requests at any time. Specifically, it is necessary to satisfy the following three requirements as an ideal solution.

適用性:入力シーケンスを自動的に判別、グループ化し、HMMのパラメータを推定する。このとき、各グループに関する知識は事前に有していないものとする。 Applicability: Automatically discriminate and group input sequences and estimate HMM parameters. At this time, it is assumed that there is no knowledge about each group in advance.

拡張性:モデルの更新のための処理は、シーケンスの個数に対して線形時間よりも短い計算コスト(sub-linear)を実現する。 Extensibility: The process for updating the model realizes a sub-linear calculation cost that is shorter than the linear time for the number of sequences.

即時性: シーケンスを学習する際にモデルの個数c は事前に与えられないものとし、このような事前知識なしで、ユーザがc個のモデルを求めた時点で即時にユーザに提示する。 Immediateness: When learning a sequence, the number of models c is not given in advance, and without such prior knowledge, the model is immediately presented to the user when c models are obtained.

既存のデータ学習では、オリジナルのシーケンス集合を全て利用しモデルの推定を行なう方法が主流である。しかし、データは断続的に生成され続けるような動的な状況下では、モデル推定にアーカイブされた過去の大量のデータ全てを用いると計算コストが膨大になってしまう点が問題となる。そこで、本発明では、計算時間とメモリ使用量を大幅に削減し、モデルを任意の粒度で推定することができる、階層モデルのデータ更新処理装置について述べる。 In existing data learning, a method of estimating a model using all original sequence sets is the mainstream. However, under a dynamic situation where data is generated intermittently, there is a problem in that the calculation cost becomes enormous if all the past large amounts of data archived for model estimation are used. In view of this, the present invention describes a hierarchical model data update processing apparatus that can significantly reduce calculation time and memory usage, and can estimate a model with an arbitrary granularity.

＜システム構成＞
図１に示すように、本発明の第１の実施の形態に係るデータ更新処理装置１００は、ＣＰＵと、ＲＡＭと、後述する階層モデル更新処理ルーチンを実行するためのプログラムを記憶したＲＯＭとを備えたコンピュータで構成され、機能的には次に示すように構成されている。図１に示すように、データ更新処理装置１００は、入力部１０と、処理部２０と、出力部３０とを備えている。 <System configuration>
As shown in FIG. 1, the data update processing device 100 according to the first embodiment of the present invention includes a CPU, a RAM, and a ROM that stores a program for executing a hierarchical model update processing routine described later. It is composed of a computer equipped and functionally configured as follows. As shown in FIG. 1, the data update processing device 100 includes an input unit 10, a processing unit 20, and an output unit 30.

入力部１０は、ネットワークを介して受信した時系列データの入力を受け付ける。また、入力部１０は、キーボードなどの入力装置から、問い合わせデータとして、任意のモデル数の入力を受け付ける。 The input unit 10 receives input of time series data received via a network. The input unit 10 receives an input of an arbitrary number of models as inquiry data from an input device such as a keyboard.

処理部２０は、時系列データ蓄積部２１、階層モデルデータ記憶部２２、及び時系列データ処理部２３から構成される。なお、時系列データ蓄積部２１及び階層モデルデータ記憶部２２が、記憶手段の一例であり、時系列データ処理部２３が、シーケンス挿入手段、更新手段、分割手段、推定手段、シーケンス再挿入手段、及び提示手段の一例である。 The processing unit 20 includes a time series data storage unit 21, a hierarchical model data storage unit 22, and a time series data processing unit 23. The time-series data storage unit 21 and the hierarchical model data storage unit 22 are examples of storage units, and the time-series data processing unit 23 includes a sequence insertion unit, an update unit, a division unit, an estimation unit, a sequence reinsertion unit, And it is an example of a presentation means.

時系列データ蓄積部２１は、入力部１０により受け付けた時系列データを記憶する。階層モデルデータ記憶部２２は、階層構造の複数のＨＭＭのモデルパラメータを記憶する。時系列データ処理部２３は、入力部１０により時系列データを受け付けると、階層モデルデータ記憶部２２に記憶されているモデルパラメータを更新する。この際、時系列データ処理部２３は、時系列データ蓄積部２１に蓄積された全てのシーケンスにアクセスすることなくモデルパラメータを更新する。また、時系列データ処理部２３は、重み付きモデル推定、ノード分割、および再挿入処理を有することを特徴とするモデルパラメータの更新処理を行う。また、入力された時系列データは時系列データ蓄積部２１のディスク空間に格納されるものの、メモリ空間からは破棄され、データ更新処理装置１００におけるメモリ量を節約することができる。 The time series data storage unit 21 stores the time series data received by the input unit 10. The hierarchical model data storage unit 22 stores model parameters of a plurality of HMMs having a hierarchical structure. When time series data is received by the input unit 10, the time series data processing unit 23 updates the model parameters stored in the hierarchical model data storage unit 22. At this time, the time series data processing unit 23 updates the model parameters without accessing all the sequences stored in the time series data storage unit 21. The time-series data processing unit 23 performs a model parameter update process characterized by weighted model estimation, node division, and reinsertion processing. Although the input time-series data is stored in the disk space of the time-series data storage unit 21, it is discarded from the memory space, and the amount of memory in the data update processing device 100 can be saved.

また、利用者から入力された問合せデータとしてのモデル数cを受け付けると、時系列データ処理部２３は、時系列データの特徴を表現するようなc個のモデルを、出力部３０により利用者に出力する。 When the number of models c as inquiry data input from the user is received, the time-series data processing unit 23 outputs c models that express the characteristics of the time-series data to the user by the output unit 30. Output.

次に、時系列データ処理部２３によるデータ更新処理の原理について説明する。 Next, the principle of data update processing by the time-series data processing unit 23 will be described.

まず、階層モデルの構造について説明する。 First, the structure of the hierarchical model will be described.

本発明は、階層構造を用いてモデルの推定を行なう。階層構造を用いることにより、新たなシーケンスの挿入を行う際にその時点で蓄積されている過去のすべてのシーケンスにアクセスすることなく必要最小限のコストでモデルパラメータの更新処理が可能である。これが階層構造を用いる利点の1つである。第２の利点として、シーケンスのグループ化の処理過程をインクリメンタルに行なうことが可能で、事前に全てのシーケンスの集合を解析しておく必要がないことが挙げられる。これらの特徴により、シーケンスの数に対して線形時間を下回るコストでモデルの更新を行なうことができる。さらに、非常に重要な第3の利点として、任意の粒度でHMMのパラメータの集合を得ることができる点が挙げられる。すなわち、ユーザがc個のモデルを要求した際、本発明により素早くc個のモデルパラメータを提示することができる。これによりユーザの要求に対し柔軟な対応をすることができる。 In the present invention, a model is estimated using a hierarchical structure. By using the hierarchical structure, it is possible to update the model parameters at a minimum necessary cost without accessing all past sequences stored at that time when a new sequence is inserted. This is one of the advantages of using a hierarchical structure. A second advantage is that the sequence grouping process can be performed incrementally, and it is not necessary to analyze the set of all sequences in advance. These features allow model updates to be performed at a cost that is less than linear time for the number of sequences. Furthermore, a very important third advantage is that a set of HMM parameters can be obtained at an arbitrary granularity. That is, when the user requests c models, the present invention can quickly present c model parameters. This makes it possible to respond flexibly to user requests.

＜モデル構造＞
図２に本発明のモデル構造の例を示す。図２は2階層のモデル構造(すなわちh=2)を示している。モデル構造のl階層目には、b^h-l個の要素(この場合b=4)があり、各要素はその下層のb個の部分要素を含んでいる。l階層目のj番目の要素は、その配下にある部分要素の要約情報を保持しており、以下の（７）式のように定義される。 <Model structure>
FIG. 2 shows an example of the model structure of the present invention. FIG. 2 shows a two-layer model structure (ie h = 2). In the l-th layer of the model structure, there are b ^hl elements (in this case, b = 4), and each element includes b subelements below it. The j-th element in the l layer holds summary information of subelements under it, and is defined as the following equation (7).

ここでn_l;jとΘ_l,jはそれぞれ、要素N_l,jの配下の部分要素に属している全てのシーケンスの個数、そしてそれらのシーケンスを表現しているモデルパラメータを示す。ただし上記の要素と異なり、最下層(すなわちl=1)ではn_l,jとΘ_l,jの他に、さらにシーケンスのリストを保持する。l=1におけるj番目の要素は次のように定義される。 Here, n _{l; j} and Θ _{l, j} indicate the number of all sequences belonging to the subelements under the element N _{l, j and} the model parameters expressing those sequences, respectively. However, unlike the above elements, the lowest layer (ie, l = 1) holds a list of sequences in addition to n _{l, j} and Θ _{l, j} . The jth element at l = 1 is defined as follows:

ここで、シーケンスのリストD_jは、(x_i,w_i,j)の2つ組で表現されるエントリの集合であり、各エントリにはシーケンスx_iへのポインタと、x_iがN_1,jに属している確率(すなわち重み)の2つの情報が管理されている。重みw_i,jは入力シーケンスx_iが挿入される際に計算される。各シーケンスのリストにおける重みの合計は、最大b とする(すなわちb≧Σ_iw_i,j)。階層モデル構造は、ユーザが要求した任意の数のモデルを提示するため有効であるが、入力されるシーケンスをグループ化し要約する用途に対しても非常に有効である。階層構造の中でシーケンスの要約情報(すなわちモデルのパラメータ) を扱い管理することは、シーケンスに関する全ての情報を扱うことに比べ非常に効率的であり、さらに階層構造の更新をする際には、各要素で管理されているモデルと入力シーケンスとの間の尤度を計算することにより、即座にシーケンスのグループ化処理を行なうことができる。ここで、図２を用いて階層構造の例を示す。 Here, the sequence list D _j is a set of entries represented by a pair of (x _i , w _{i, j} ). Each entry has a pointer to the sequence x _i and x _i is N _{1. ,} two pieces of information of probability (ie, weight) belonging to _j are managed. The weights w _{i, j} are calculated when the input sequence x _i is inserted. The total sum of the weights in the list of each sequence is b (ie, b ≧ Σ _i w _{i, j} ). The hierarchical model structure is effective for presenting an arbitrary number of models requested by the user, but is also very effective for the purpose of grouping and summarizing input sequences. Handling and managing sequence summary information (i.e. model parameters) in the hierarchy is much more efficient than handling all information about the sequence, and when updating the hierarchy, By calculating the likelihood between the model managed by each element and the input sequence, sequence grouping processing can be performed immediately. Here, an example of a hierarchical structure is shown using FIG.

（例1）：｛x₁,...,x₆｝を階層構造に蓄積されているシーケンスの集合とする。各要素(N_2,1,N_1,1,N_1,2)は、それぞれモデルパラメータΘ_2,1,Θ_1,1,Θ_1,2を保持している。各要素のモデルは配下で管理されているシーケンス集合の特徴を表現している。具体的に、Θ_1,1はシーケンス集合｛x₁,x₂,x₆｝を、重みw_1,1,w_2,1,w_6,1で学習したモデルのパラメータである。同様に、Θ_1,2は｛x₃,...,x₆｝を重みw_3,2,w_4,2,w_5,2,w_6,2で学習したモデルである。そしてΘ_2,1は全てのシーケンスの集合｛ x₁,...,x₆｝の特徴を表現している。 (Example 1): Let {x ₁ , ..., x ₆ } be a set of sequences stored in a hierarchical structure. Each element (N _2,1 , N _1,1 , N _1,2 ) holds model parameters Θ _2,1 , Θ _1,1 , Θ _1,2 respectively. The model of each element expresses the characteristics of the sequence set managed under it. Specifically, Θ _1,1 is a parameter of a model obtained by learning the sequence set {x ₁ , x ₂ , x ₆ } with weights w _1,1 , w _2,1 , w _6,1 . Similarly, Θ _1,2 is a model obtained by learning {x ₃ ,..., X ₆ } with weights w _3,2 , w _4,2 , w _5,2 , w _6,2 . And Θ _2,1 represents the characteristics of the set {x ₁ , ..., x ₆ } of all sequences.

＜モデルの提示＞
階層構造を用いることにより、任意の時刻に任意の個数のモデルを提示することができる。ここで、ユーザがc個の代表的モデルを要求した場合を考える。まずはじめにアルゴリズムは、充分な数(≧c)の要素を構造の中から取り出す。次に、それらの要素を、尤度を用いてc個のシーケンスのグループに統合する。最後に、c個にグループ化されたシーケンス集合のモデルのパラメータを再推定し、ユーザへ提示する。 <Presentation of model>
By using a hierarchical structure, an arbitrary number of models can be presented at an arbitrary time. Here, consider a case where the user requests c representative models. First, the algorithm extracts a sufficient number (≧ c) of elements from the structure. These elements are then integrated into a group of c sequences using likelihood. Finally, the parameters of the model of the sequence set grouped in c are re-estimated and presented to the user.

＜挿入＞
次に、入力シーケンスの階層構造への挿入方法について述べる。 <Insert>
Next, a method for inserting the input sequence into the hierarchical structure will be described.

本発明で提案した階層構造は従来のデータ構造と異なり、格納されているエントリ（シーケンス）各々に対して確率(重み)が考慮されているという特徴を持つ。したがって、本発明で提案した階層構造については、従来の挿入手法を用いることができない。そこで、本発明ではこの問題を解決するために新たなアルゴリズムを提案する。提案する挿入アルゴリズムでは、入力データは重みの値によって複数個の要素の中に挿入される。すなわち、従来の挿入手法と異なり、本発明では複数の挿入パスを扱う必要がある。本発明において挿入アルゴリズムを設計するにあたり、我々は深さ優先探索のアプローチをとり、全てのパスに対する挿入処理を行うことを考える。 Unlike the conventional data structure, the hierarchical structure proposed in the present invention is characterized in that the probability (weight) is considered for each stored entry (sequence). Therefore, the conventional insertion method cannot be used for the hierarchical structure proposed in the present invention. Therefore, the present invention proposes a new algorithm for solving this problem. In the proposed insertion algorithm, input data is inserted into a plurality of elements according to weight values. That is, unlike the conventional insertion method, the present invention needs to handle a plurality of insertion paths. In designing an insertion algorithm in the present invention, we consider an approach of depth-first search and performing insertion processing for all paths.

入力シーケンスx_iが与えられたとき、アルゴリズムは以下の（１）〜（３）の順序で挿入処理を行なう。 When an input sequence x _i is given, the algorithm performs insertion processing in the following order (1) to (3).

（１）最上層のＨＭＭから開始し、入力シーケンスの当該ＨＭＭに対する尤度に従い適切な要素（ＨＭＭ）を選択しながら最下層まで降下する。ここで、計算された尤度は入力シーケンスの重みの計算にも利用される。この重みは、挿入パス（最上層から最下層までの選択されたＨＭＭ）上にある全ての要素のモデルと入力シーケンスの間の確率を表現している(（９）式参照)。アルゴリズムが最下層(l=1)の要素N_1,jに到達したのち、新たなエントリ(x_i,w_i,j)、すなわちx_iへのポインタとx_iのN_1,jに対する重み(すなわち、x_iをN_1,jに挿入するにあたっての重み)が、N_1,j内のシーケンスのリストD_jに追加される。 (1) Start from the HMM at the top layer, and descend to the bottom layer while selecting an appropriate element (HMM) according to the likelihood for the HMM in the input sequence. Here, the calculated likelihood is also used for calculating the weight of the input sequence. This weight represents the probability between the model of all elements on the insertion path (the selected HMM from the top layer to the bottom layer) and the input sequence (see equation (9)). After the algorithm has reached the element N _{1, j} of the lowermost (l = 1), a new entry _{_{(x i, w i, j}} ), i.e. weight for N _{1, j} pointer and x _i to x _i ( That is, the weight for inserting x _i into N _1, _j is added to the sequence list D _j in N _{1, j} .

（２）要素N_1,jの中にx_iが挿入されたのち、アルゴリズムでは挿入パス上にある各要素の情報の修正を行なう。すなわち、もし新たな入力(x_i,w_i,j)に対し各要素において分割が起きない場合には、単純に、挿入パス上の各要素のモデルパラメータを、x_iを用いて更新する。もし各要素で新たな挿入に対しスペースがない場合(すなわちΣ_iw_i,j≧b)は、要素の分割が生じ、分割された要素においてモデルのパラメータが再推定される。 (2) After x _i is inserted into the elements N _{1, j} , the algorithm corrects the information of each element on the insertion path. That is, if division does not occur in each element for a new input (x _i , w _{i, j} ), the model parameters of each element on the insertion path are simply updated using x _i . If there is no space for new insertions at each element (ie, Σ _i w _{i, j} ≧ b), element division occurs and the model parameters are re-estimated at the divided elements.

（３）要素の分割操作が起きた際、もし最下層の各要素において不適切なエントリが生じた場合には、アルゴリズムでは、それらのエントリを適切な要素へ再挿入する。再挿入の操作では、挿入操作と同様の処理を行ない、階層構造で管理されるシーケンスをより適切な要素の配下に移動することで、シーケンス集合をより正しく表現するモデル構造を構築することができる。 (3) When an element division operation occurs, if an inappropriate entry occurs in each element in the lowest layer, the algorithm re-inserts those entries into appropriate elements. In the reinsertion operation, the model structure that represents the sequence set more correctly can be constructed by performing the same processing as the insert operation and moving the sequence managed in the hierarchical structure under more appropriate elements. .

ここで、重要な点について言及する。理論的には、i番目のシーケンスx_iにおける重みの値の合計はΣ_jw_i,j=1である。しかし構造内で管理される要素の数は、とりわけ大規模なデータを扱う場合には非常に大きいものとなる。一方で、主要な挿入パス以外に係る重みの値は非常に小さく、無視できる程度のものである。そこで、本発明では大きな重みが係っている要素に対してのみ挿入を行ない、その他の細かな重みが付いている要素に対しては挿入を行なわないものとする。具体的には、挿入を繰り返す際に重みの合計値が閾値w_th(例えばw_th=0.9)を超えたら挿入を停止する。 Here, an important point is mentioned. Theoretically, the sum of the weight values in the i-th sequence x _i is Σ _j w _{i, j} = 1. However, the number of elements managed in the structure becomes very large especially when dealing with large-scale data. On the other hand, the value of the weight related to other than the main insertion path is very small and can be ignored. Therefore, in the present invention, it is assumed that insertion is performed only for elements with large weights, and insertion is not performed for other elements with fine weights. Specifically, the insertion is stopped when the total value of weights exceeds a threshold value w _th (for example, w _th = 0.9) when the insertion is repeated.

次に、挿入操作における重みの計算について述べる。最下層のj番目の要素N_1,jにおけるx_iの重みw_i,jはx_iの尤度を用いて、以下の（９）式のように計算される。 Next, calculation of weights in the insertion operation will be described. Weight w _i of x _i in the lowermost layer of the j-th element N _{1, _j,} _j by using the likelihood of x _i, is calculated according to the following equation (9).

なおhは階層構造の高さを示し、l番目の層におけるb_l(b_l≦b)は、挿入パス上にある兄弟要素の個数である。 Note that h indicates the height of the hierarchical structure, and b _l (b _l ≦ b) in the l-th layer is the number of sibling elements on the insertion path.

＜重みつきモデル推定＞
本発明は入力シーケンスに対して重みを保持し、その重みを用いてシーケンスのモデル推定を行なう。そこで、本発明では既存のEMアルゴリズムを改良し、重みを伴うモデルの推定を実現する。入力シーケンスxとその重みwが与えられたとき、本発明ではxの重みつき期待値γ_t(i)を、以下の（１０）式、（１１）式のように計算する。 <Weighted model estimation>
The present invention holds a weight for an input sequence and performs model estimation of the sequence using the weight. Therefore, in the present invention, the existing EM algorithm is improved to realize model estimation with weights. When an input sequence x and its weight w are given, the present invention calculates a weighted expected value γ _t (i) of x as shown in the following equations (10) and (11).

Mステップでは、重みつき期待値を利用し、上記（４）式〜（６）式に基づきモデルパラメータの更新を行なう。 In the M step, the model parameter is updated based on the above equations (4) to (6) using the weighted expected value.

（例2）：ここで、図２に示すように、新たなシーケンスx₆を挿入する例について説明を行なう。始めの挿入として、アルゴリズムでは最上部から開始し各層において最も適切なモデルΘ_2,1,Θ_1,1を発見する。さらに（９）式を用いて重みw_6,1を計算する。要素N_1,1にはこの新たなエントリの挿入スペースがあるので、要素の分割は発生せず、そのままN_1,1にエントリが追加される。エントリ追加ののち、アルゴリズムは挿入パス上にあるモデルΘ_2,1,Θ_1,1を、エントリ(x₆,w_6,1)を用いて更新する。次の処理として、挿入パスΘ_2,1,Θ_1,2を2番目に近いモデルとして発見し、N_1,2にx₆を追加する。ここでも要素の分割は生じない。続いて重みw_6,2を計算する。これらの2つの挿入操作ののち、x₆の重みの合計が閾値を超えたら(すなわちw_6,1+w_6,2>w_th)、アルゴリズムは挿入操作を終了する。 (Example 2): Here, as shown in FIG. 2 will be described for an example of inserting a new sequence x _6. As a first insertion, the algorithm starts from the top and finds the most appropriate model Θ _2,1 , Θ _1,1 in each layer. Further, the weights w _6,1 are calculated using equation (9). Since element N _1,1 has a space for inserting this new entry, no element division occurs, and the entry is added to N _1,1 as it is. After adding the entry, the algorithm updates the model Θ _2,1 , Θ _1,1 on the insertion path with the entry (x ₆ , w _6,1 ). In the next process, the insertion paths Θ _2,1 and Θ _1,2 are found as the second closest model, and x ₆ is added to N _1,2 . Again, no element splitting occurs. Subsequently, weights w _6,2 are calculated. After these two insertion operation, when exceeding the sum threshold value of the weights of x ₆ (i.e. _{_{_{w 6,1 + w 6,2> w th}}} ), the algorithm terminates the insertion operation.

＜分割操作＞
続いて分割操作の詳細について述べる。要素Nが与えられたとき、分割は以下の（１）〜（５）の手順で行なわれる。 <Division operation>
Next, details of the division operation will be described. When the element N is given, the division is performed according to the following procedures (1) to (5).

（１）要素Nに属する全ての部分要素の中からランダムに2つの部分要素を選択し、それらのモデルΘ₁、Θ₂を得る。 (1) Two subelements are selected at random from all subelements belonging to the element N, and their models Θ ₁ and Θ ₂ are obtained.

（２）Nに属する各部分要素を、モデルΘ₁とΘ₂のより近いグループに割り振る。ここでグループへの割り振りには、各モデルΘ₁、Θ₂に対する各部分要素に属するシーケンスx_iの尤度の合計Σ_iL(x_i｜Θ₁)(もしくは、Σ_iL(x_i｜Θ₁))を類似指標として用いる。 (2) Assign each subelement belonging to N to a closer group of models Θ ₁ and Θ ₂ . Here, for allocation to the group, the total likelihood Σ _i L (x _i | Θ ₁ ) of the sequence x _i belonging to each subelement for each model Θ ₁ , Θ ₂ (or Σ _i L (x _i | Θ ₁ )) is used as a similarity index.

（３）部分要素の割り振りがなされたそれぞれのグループに対し、ランダムに新たなモデルを1つずつ選択しΘ₁、Θ₂を得る。 (3) A new model is randomly selected one by one for each group to which subelements have been allocated, and Θ ₁ and Θ ₂ are obtained.

（４）各グループ間の移動がなくなるまで上記（２）〜（４）の手順を繰り返す。 (4) The above procedures (2) to (4) are repeated until there is no movement between the groups.

（５）新たに生成された2つのグループに対し、それぞれのグループに属している全てのシーケンスを用いて、モデルパラメータΘ₁、Θ₂を推定する。最後に、現在の要素Nを削除し、新たに分割され生成された要素Ｎ₁、Ｎ₂とそのモデルパラメータΘ₁、Θ₂を追加する。また、新たに分割され生成された要素Ｎ₁、Ｎ₂の各グループに割り振られた部分要素を、新たに分割され生成された当該要素に属する部分要素とする。 (5) For the two newly generated groups, model parameters Θ ₁ and Θ ₂ are estimated using all sequences belonging to each group. Finally, the current element N is deleted, and newly divided and generated elements N ₁ and N ₂ and their model parameters Θ ₁ and Θ ₂ are added. Further, the subelements assigned to the groups of the elements N ₁ and N ₂ that are newly divided and generated are set as the subelements that belong to the newly generated and divided elements.

なお、要素Ｎが最下層の要素である場合には、要素Ｎに属する部分要素を、シーケンスに置き換えて、上記の（１）〜（５）の手順で分割操作が行われる。また、各シーケンスについて、分割された２つの要素のうちの割り振られた要素に対する重みを計算する。 When the element N is the lowest layer element, the partial element belonging to the element N is replaced with a sequence, and the division operation is performed according to the above procedures (1) to (5). Further, for each sequence, a weight for the allocated element of the two divided elements is calculated.

（例3）：図３に示すように、x₇を挿入する例を考える。最上層から挿入を開始し、各層において最も近いモデルΘ_2,1、Θ_1,2を発見する。ここで、最下層の要素N_1,2にx₇を挿入しようとするが、新たな挿入をするためのスペースがないため(すなわちΣ_iw_i,2≧b)、アルゴリズムではN_1,2を2つの要素N_1,2、N_1,3に分割し、現在管理されているエントリ集合｛x₃、x₄、x₆｝に対し、｛x₅、x₇｝をN_1,2へ、｛x₅、x₇｝をN_1,3へ割り振る。要素の分割が行なわれたのち、Θ_1,2はN_1,2によって、そしてΘ_1,3はN_1,3によって再推定がなされる。同様にして、挿入パス上のΘ_2,1も更新される。 (Example 3): As shown in FIG. 3, consider the example of inserting the x _7. Insert from the top layer and find the closest models Θ _2,1 and Θ _1,2 in each layer. Here, we try to insert x ₇ in the lowest element N _1,2 but there is no space for new insertion (ie Σ _i w _{i, 2} ≧ b), so the algorithm uses N _1,2 _Is divided into two elements N _1,2 and N _1,3 and {x ₅ , x ₇ } is changed to N _1,2 for the currently managed entry set {x ₃ , x ₄ , x ₆ } , {X ₅ , x ₇ } are assigned to N _1,3 . After element splitting, Θ _1,2 is re-estimated by N _1,2 and Θ _1,3 is re-estimated by N _1,3 . Similarly, Θ _2,1 on the insertion path is also updated.

＜再挿入操作＞
アルゴリズムでは、各入力シーケンスに対して1度しか挿入操作を行なわない。このためモデル構造の更新に従い、過去に挿入されたシーケンスの重みの値が適切でない値となる可能性がある。ここで問題となるのが、分割操作が生じた際にモデルのパラメータが大幅に変化し、配下に属するシーケンスの重みが不適切になるような状況である。この問題の解決案として妥当なのは、最下層の要素で分割が起こった際に、それら要素において不適切となった過去のシーケンスを削除し、新たに挿入操作を行なうことにより、適した要素にシーケンスを移動することである。ここで、不適切となったシーケンスは、以下の（１２）式に示す尤度比検定を用いて検出することができる。 <Reinsertion operation>
The algorithm performs an insertion operation only once for each input sequence. For this reason, the value of the weight of the sequence inserted in the past may become an inappropriate value according to the update of the model structure. The problem here is the situation where the parameters of the model change drastically when the split operation occurs, and the weights of the subordinate sequences become inappropriate. A reasonable solution to this problem is to delete the past sequences that became inappropriate in those elements when division occurs in the lowermost element and perform a new insertion operation to make a sequence into a suitable element. Is to move. Here, an inappropriate sequence can be detected using a likelihood ratio test shown in the following equation (12).

なお、Θ_iはx_iから推定されたモデルパラメータ、Θ_1,jはN_1,jにおいて保持されているモデルパラメータを示す。もし尤度比ρ_iが特定の閾値ρ_th(ρ_th=0.8等)以下である場合、アルゴリズムは現在のエントリ(x_i、w_i,j)をN_1,jから削除し、このエントリを新たに挿入する。このような再挿入操作によって、シーケンスをよりモデル構造に適した位置に移動することができる。 Θ _i represents a model parameter estimated from x _i , and Θ _{1, j} represents a model parameter held in N _{1, j} . If the likelihood ratio ρ _i is less than or equal to a certain threshold ρ _th (ρ _th = 0.8, etc.), the algorithm deletes the current entry (x _i , w _{i, j} ) from N _{1, j} and removes this entry Insert a new one. By such reinsertion operation, the sequence can be moved to a position more suitable for the model structure.

＜データ更新処理装置の作用＞
次に、データ更新処理装置１００の具体的な処理について説明する。まず、データ更新処理装置１００に、シーケンス集合が入力されると、データ更新処理装置１００は、時系列データ蓄積部２１に格納する。そして、データ更新処理装置１００は、シーケンス集合に基づいて、階層構造のＨＭＭを構築し、階層モデルデータ記憶部２２に格納する。 <Operation of data update processing device>
Next, specific processing of the data update processing device 100 will be described. First, when a sequence set is input to the data update processing device 100, the data update processing device 100 stores it in the time-series data storage unit 21. Then, the data update processing device 100 constructs a hierarchical HMM based on the sequence set and stores it in the hierarchical model data storage unit 22.

そして、データ更新処理装置１００に、新たなシーケンスが入力されると、データ更新処理装置１００は、図４に示す階層モデル更新処理ルーチンを実行する。 When a new sequence is input to the data update processing device 100, the data update processing device 100 executes a hierarchical model update processing routine shown in FIG.

まず、ステップＳ１０１において、入力部１０は、入力されたシーケンスｘ_iを受け付ける。次のステップＳ１０２では、時系列データ処理部２３は、シーケンスｘ_iの付与する重みの合計ｗ_sumに初期値０を設定する。 First, in step S101, the input unit 10 accepts an input sequence x _i. In the next step S102, the time-series data processing unit 23 sets an initial value 0 to the sum w _sum of the weights of imparting the sequence x _i.

その後、時系列データ処理部２３は、重みの合計ｗ_sumが、予め定められた閾値ｗ_thより大きくなるまで、ステップＳ１０３〜Ｓ１１３の処理を繰り返す。 Thereafter, the time-series data processing unit 23 repeats the processes of steps S103 to S113 until the total weight w _sum becomes larger than a predetermined threshold value w _th .

ステップＳ１０３において、時系列データ処理部２３は、最上層から最下層まで、要素に対する尤度に基づいて選択して、適切な最下層の要素を選択し、選択した要素に属するシーケンスとして、入力されたシーケンスｘ_iを挿入し、選択した当該要素に対する重みｗを計算する。なお、ステップＳ１０３の処理が２回目以降である場合には、前回までに選択された最下層の要素を除いて、適切な最下層の要素を選択する。 In step S103, the time-series data processing unit 23 selects from the top layer to the bottom layer based on the likelihood for the element, selects an appropriate bottom layer element, and is input as a sequence belonging to the selected element. The sequence x _i is inserted, and the weight w for the selected element is calculated. When the process in step S103 is performed for the second time or later, an appropriate lowermost element is selected except for the lowermost element selected up to the previous time.

そして、時系列データ処理部２３は、上記ステップＳ１０３で選択された最上層から最下層までの要素からなる挿入パス上の全ての要素Ｎについて、ステップＳ１０４〜Ｓ１１３を繰り返す。 Then, the time-series data processing unit 23 repeats steps S104 to S113 for all elements N on the insertion path composed of elements from the top layer to the bottom layer selected in step S103.

ステップＳ１０４において、時系列データ処理部２３は、要素Ｎに、新たなエントリのための空きがあるか否かを判定する。当該要素の配下の部分要素に属するシーケンスの重みの合計が閾値ｂ未満である場合には、要素Ｎに、新たなエントリのための空きがあると判定され、ステップＳ１０５において、時系列データ処理部２３は、入力されたシーケンスｘ_iを用いて、上記（１０）式、（１１）式と上記（４）式〜（６）式とに従って、要素ＮのＨＭＭのモデルパラメータΘを更新し、上記ステップＳ１０３又はステップＳ１０４へ戻るか、処理ルーチンを終了する。 In step S104, the time-series data processing unit 23 determines whether or not the element N has a free space for a new entry. If the sum of the weights of the sequences belonging to the subelements under the element is less than the threshold value b, it is determined that the element N has a free space for a new entry. In step S105, the time-series data processing unit 23 updates the model parameter Θ of the HMM of the element N according to the above equations (10), (11) and the above equations (4) to (6) using the input sequence x _i , Return to step S103 or step S104, or end the processing routine.

一方、当該要素の配下の部分要素に属するシーケンスの重みの合計が閾値ｂ以上である場合には、要素Ｎに、新たなエントリのための空きがないと判定され、ステップＳ１０６において、時系列データ処理部２３は、上記の分割操作により、要素Ｎを、Ｎ₁とＮ₂に分割する。ステップＳ１０７では、時系列データ処理部２３は、Ｎ₁に割り振られたシーケンス集合を、Ｘ₁とし、ステップＳ１０８において、時系列データ処理部２３は、Ｎ₂に割り振られたシーケンス集合を、Ｘ₂とする。 On the other hand, when the sum of the weights of the sequences belonging to the subelements under the element is equal to or greater than the threshold value b, it is determined that the element N has no free space for a new entry. In step S106, time-series data The processing unit 23 divides the element N into N ₁ and N ₂ by the above dividing operation. In step S107, the time-series data processing unit 23 sets the sequence set allocated to N ₁ to X _1, and in step S108, the time-series data processing unit 23 converts the sequence set allocated to N ₂ to X _2. And

次のステップＳ１０９では、時系列データ処理部２３は、シーケンス集合Ｘ₁を用いて、要素Ｎ₁のＨＭＭのモデルパラメータΘ₁を更新すると共に、シーケンス集合Ｘ₂を用いて、要素Ｎ₂のＨＭＭのモデルパラメータΘ₂を更新する。 In the next step S109, the time-series data processing unit 23 updates the HMM model parameter Θ ₁ of the element N ₁ using the sequence set X ₁ and uses the sequence set X ₂ to update the HMM of the element N ₂ . Update model parameter Θ ₂ of.

ステップＳ１１０では、時系列データ処理部２３は、要素Ｎ₁、Ｎ₂が、最下層の要素であるか否かを判定する。要素Ｎ₁、Ｎ₂が、最下層の要素ではない場合には、上記ステップＳ１０３又はステップＳ１０４へ戻るか、処理ルーチンを終了する。 In step S110, the time-series data processing unit 23 determines whether or not the elements N ₁ and N ₂ are the lowest layer elements. If the elements N ₁ and N ₂ are not the lowest layer elements, the process returns to step S103 or step S104 or ends the processing routine.

一方、要素Ｎ₁、Ｎ₂が、最下層の要素である場合には、時系列データ処理部２３は、要素Ｎ₁、Ｎ₂に属する全てのシーケンスｘ_jについて、ステップＳ１１１〜Ｓ１１３を繰り返す。 On the other hand, when the elements N ₁ and N ₂ are the lowest-layer elements, the time-series data processing unit 23 repeats steps S111 to S113 for all the sequences x _j belonging to the elements N ₁ and N ₂ .

ステップＳ１１１では、時系列データ処理部２３は、シーケンスｘ_jについて、上記（１２）式に従って尤度比ρ_jを計算し、計算した尤度比ρ_jが、閾値ρ_th以下であるか否かを判定する。計算した尤度比ρ_jが、閾値ρ_thより大きい場合には、シーケンスｘ_jの重みが適切であると判断し、上記ステップＳ１０３、ステップＳ１０４、ステップＳ１１１へ戻るか、処理ルーチンを終了する。一方、計算した尤度比ρ_jが、閾値ρ_th以下である場合には、シーケンスｘ_jの重みが不適切であると判断し、ステップＳ１１２において、時系列データ処理部２３は、シーケンスｘ_jを、要素Ｎ₁又はＮ₂に属するシーケンスから除去して、当該要素Ｎ₁又はＮ₂から最上層の要素までの、シーケンスｘ_jに対するパス上の各要素のＨＭＭのモデルパラメータΘを、シーケンスｘ_jを除去したシーケンス集合に基づいて更新する。 In step S111, the time series data processing unit 23, the sequence x _j, (12) the likelihood ratio [rho _j calculated according to equation calculated likelihood ratios [rho _j is or less than the threshold value [rho _th Determine. If the calculated likelihood ratio ρ _j is larger than the threshold value ρ _th, it is determined that the weight of the sequence x _j is appropriate, and the process returns to step S103, step S104, step S111 or ends the processing routine. On the other hand, when the calculated likelihood ratio ρ _j is less than or equal to the threshold ρ _th, it is determined that the weight of the sequence x _j is inappropriate, and in step S112, the time-series data processing unit 23 determines that the sequence x _j Are removed from the sequence belonging to the element N ₁ or N _2, and the HMM model parameter Θ of each element on the path for the sequence x _j from the element N ₁ or N ₂ to the uppermost element is replaced by the sequence x Update based on the sequence set with _j removed.

ステップＳ１１３では、時系列データ処理部２３は、上記ステップＳ１０３と同様に、最上層から最下層まで、要素に対する尤度に基づいて、適切な最下層の要素を選択し、選択した要素に属するシーケンスとして、シーケンスｘ_jを再挿入し、選択した当該要素に対する重みｗを計算する。また、上記ステップＳ１０４〜ステップＳ１１３までの処理と同様の処理も行う。 In step S113, the time-series data processing unit 23 selects an appropriate lower-layer element from the uppermost layer to the lowermost layer based on the likelihood for the element, as in step S103, and a sequence belonging to the selected element. The sequence x _j is reinserted and the weight w for the selected element is calculated. Further, the same processing as the processing from step S104 to step S113 is also performed.

そして、上記ステップＳ１０３、ステップＳ１０４、ステップＳ１１１へ戻るか、処理ルーチンを終了する。 And it returns to the said step S103, step S104, step S111, or a processing routine is complete | finished.

以上説明したように、本発明の第１の実施の形態にデータ更新処理装置によれば、入力された新たなシーケンスについて、階層構造で定められた複数の隠れマルコフモデル（ＨＭＭ）に対して、最上層のＨＭＭから最下層のＨＭＭまでＨＭＭを選択して、最下層のＨＭＭに、入力されたシーケンスを挿入し、ＨＭＭに新たなシーケンスのための空きがない場合に、ＨＭＭを分割して、下の層のＨＭＭ又はシーケンスを割り振り、分割した結果、不適切なシーケンスの割り振りが発生した場合には、当該シーケンスについて、最上層のＨＭＭから最下層のＨＭＭまで、ＨＭＭを選択して、最下層のＨＭＭに、再挿入することにより、新たにシーケンスが入力された場合に、全てのシーケンスにアクセスせずに、計算時間とメモリ使用量とを削減して、階層構造の隠れマルコフモデルを更新することができる。 As described above, according to the data update processing device according to the first embodiment of the present invention, with respect to a plurality of hidden Markov models (HMMs) defined in a hierarchical structure for an input new sequence, Select the HMM from the top layer HMM to the bottom layer HMM, insert the input sequence into the bottom layer HMM, and divide the HMM when there is no room for a new sequence in the HMM, If an inappropriate sequence allocation occurs as a result of allocating and dividing the lower layer HMM or sequence, select the HMM from the uppermost layer HMM to the lowermost layer HMM for the sequence, and select the lowermost layer. Re-insert into HMM to reduce computation time and memory usage without accessing all sequences when a new sequence is entered Te, it is possible to update the hidden Markov model of a hierarchical structure.

また、階層モデルのデータ更新処理に要する計算時間とメモリ使用量を大幅に削減することが可能となり、動的な環境においてモデルを任意の粒度で推定することができる。 Further, it is possible to greatly reduce the calculation time and memory usage required for the data update processing of the hierarchical model, and the model can be estimated with an arbitrary granularity in a dynamic environment.

〔第２の実施の形態〕
次に、第２の実施の形態について説明する。なお、第１の実施の形態と同様の構成となる部分については、同一符号を付して説明を省略する。 [Second Embodiment]
Next, a second embodiment will be described. In addition, about the part which becomes the structure similar to 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

第２の実施の形態では、シーケンスのサンプリング結果を用いて、モデルパラメータを更新している点と、新たに追加されるシーケンスのみを用いて、モデルパラメータを更新している点とが第１の実施の形態と異なっている。 In the second embodiment, the first is that the model parameter is updated using the sampling result of the sequence and that the model parameter is updated using only the newly added sequence. This is different from the embodiment.

＜発明の概要＞
上記の第１の実施の形態において説明したデータ更新処理装置１００においては、階層モデルの構造内で分割処理が発生してモデルの再推定が必要となった際、分割する要素に属する全てのシーケンスを用いる必要がある。これに対して、第２の実施の形態における図2に示す近似データ更新処理装置２００は、上記の第１の実施の形態のデータ更新処理装置１００を改良したものであり、近似時系列データ処理部２２３において階層型のサンプリングを行うことにより、モデル再推定のための計算時間を節約することができる。 <Outline of the invention>
In the data update processing device 100 described in the first embodiment, when a division process occurs within the structure of the hierarchical model and the model needs to be re-estimated, all sequences belonging to the elements to be divided Must be used. On the other hand, the approximate data update processing device 200 shown in FIG. 2 in the second embodiment is an improvement of the data update processing device 100 of the first embodiment, and approximate time series data processing. By performing hierarchical sampling in the unit 223, calculation time for model re-estimation can be saved.

＜システム構成＞
図５に示すように、第２の実施の形態に係る近似データ更新処理装置２００は、 <System configuration>
As shown in FIG. 5, the approximate data update processing device 200 according to the second embodiment

ＣＰＵと、ＲＡＭと、後述する処理ルーチンを実行するためのプログラムを記憶したＲＯＭとを備えたコンピュータで構成され、機能的には次に示すように構成されている。図５に示すように、近似データ更新処理装置２００は、入力部１０と、処理部２２０と、出力部３０とを備えている。 The computer includes a CPU, a RAM, and a ROM that stores a program for executing a processing routine described later, and is functionally configured as follows. As shown in FIG. 5, the approximate data update processing device 200 includes an input unit 10, a processing unit 220, and an output unit 30.

処理部２２０は、時系列データ蓄積部２１、階層モデルデータ記憶部２２、及び近似時系列データ処理部２２３から構成される。近似時系列データ処理部２２３が、シーケンス挿入手段、更新手段、分割手段、推定手段、シーケンス再挿入手段、提示手段、及びサンプリング手段の一例である。 The processing unit 220 includes a time series data storage unit 21, a hierarchical model data storage unit 22, and an approximate time series data processing unit 223. The approximate time series data processing unit 223 is an example of a sequence insertion unit, an update unit, a division unit, an estimation unit, a sequence reinsertion unit, a presentation unit, and a sampling unit.

第２の実施の形態における近似時系列データ処理部２２３による更新処理方法の原理について説明する。 The principle of the update processing method performed by the approximate time series data processing unit 223 in the second embodiment will be described.

＜高速化処理＞
HMMのパラメータを推定するための最も単純な方法は、階層モデル構造で管理されている過去のシーケンスを全て使用することである。しかし全てのシーケンスを利用することは、ストリーミングの状況下ではボトルネックとなってしまう。そこで、本実施の形態では、次の（１）、（２）のアイデアを用いる。 <High-speed processing>
The simplest method for estimating HMM parameters is to use all past sequences managed in a hierarchical model structure. However, using all sequences is a bottleneck under streaming conditions. Therefore, in the present embodiment, the following ideas (1) and (2) are used.

（１）階層構造内で分割処理が生じ新たにモデルの再推定が必要となった際、分割する要素に属する全てのシーケンスを使用する代わりに、サンプリングのアプローチを用いてシーケンス集合の1部のみを利用する。 (1) When division processing occurs in a hierarchical structure and a new model needs to be re-estimated, instead of using all the sequences belonging to the elements to be divided, only a part of the sequence set is used by using a sampling approach. Is used.

（２）モデル推定のコストを削減するため、インクリメンタルにモデル推定を行う。EMアルゴリズムを拡張し、新たな入力データのみを用いてシーケンスのモデルパラメータを更新する。 (2) Incremental model estimation is performed to reduce model estimation costs. Extend the EM algorithm to update the model parameters of the sequence using only new input data.

上記の第１の実施の形態で説明した階層モデル更新処理ルーチンでは、入力シーケンスx_iを受け取る毎に挿入操作を実行する。要素の分割操作が生じない場合はx_iを利用しモデルを更新する。分割操作が生じる場合には、ステップＳ１０７、Ｓ１０８の前に、階層型サンプリングを実行し、シーケンスのサンプル集合を取り出す。そして、ステップＳ１０９では、シーケンスのサンプル集合を用いて、新たな要素のモデルパラメータΘ₁、Θ₂を推定する。 In the hierarchical model update processing routine described in the first embodiment, the insertion operation is executed every time the input sequence x _i is received. If no element split operation occurs, use x _i to update the model. When a division operation occurs, hierarchical sampling is performed before steps S107 and S108 to extract a sample set of the sequence. In step S109, the model parameters Θ ₁ and Θ ₂ of the new element are estimated using the sample set of the sequence.

＜階層型サンプリング＞
本実施の形態では、スタックを用いたサンプリングアルゴリズムを提案する。最初に、要素Nと、サンプリングに要求されたシーケンスの個数ε(例えばε=b×b)をスタックに格納する。サンプリングアルゴリズムでは、各ステップにおいて2つ組(N_p、ε_p)をスタックから取り出し、取り出された要素N_pに属する各部分要素N_cに対して局所的なサンプルの個数ε_cを決定し、新たな2つ組(N_c、ε_c)がスタックに格納される。ここで、要素N_pに属する全てのシーケンスの個数n_pと各部分要素N_cに属するシーケンスの個数n_cを用いて、部分要素N_cがサンプルとして選択されるか否かが、確率n_c/n_pで決定され、各部分要素に対する選択結果に基づいて、サンプルの個数ε_cが決定される。 <Hierarchical sampling>
In this embodiment, a sampling algorithm using a stack is proposed. First, the element N and the number of sequences required for sampling ε (for example, ε = b × b) are stored in the stack. In the sampling algorithm, a duplicate (N _p , ε _p ) is taken out of the stack at each step, the local number of samples ε _c is determined for each subelement N _c belonging to the extracted element N _p , A new duplicate (N _c , ε _c ) is stored in the stack. Here, using the number n _p of all sequences belonging to element N _p and the number n _c of sequences belonging to each sub-element N _c , whether or not the sub-element N _c is selected as a sample is a probability n _c / n _p is determined by, on the basis of the selection result for each part element, the number of samples epsilon _c is determined.

この処理過程は、階層構造の最下層に到達するまで行なわれ、最終的なシーケンスのサンプル集合が得られる。ここで、もし部分要素N_cがサンプルとして選択されなかった場合、(すなわちε_c=0)、N_c配下の全ての部分要素は枝刈りされる。 This process is performed until the lowest layer of the hierarchical structure is reached, and a final sequence sample set is obtained. Here, if the subelement N _c is not selected as a sample (ie, ε _c = 0), all subelements under N _c are pruned.

＜インクリメンタルモデル推定＞
上記の第１の実施の形態で説明した重みつきモデル推定(上記（１０）式〜（１１）式参照)は重要な要素技術ではあるが、重みつき期待値を計算するために全ての入力シーケンスを必要とするため、これだけではストリーミングの状況下では充分とは言えない。そこで、本実施の形態では、新たなインクリメンタルなモデル推定方法を導入する。シーケンス集合｛x₁、...x_n｝を表現するモデルパラメータΘと、新たに追加される入力シーケンスx_n+1が与えられたとき、新たなモデルパラメータ~Θは次の方法を用いてインクリメンタルに計算することができる。まずEステップでは、全てのデータセットの期待値を計算する代わりに、新しい入力シーケンスx_n+1に対してのみ期待値を計算する。次にMステップでは、以下の（１３）式によって効率的にモデルパラメータを更新する。 <Incremental model estimation>
Although the weighted model estimation described in the first embodiment (see the above formulas (10) to (11)) is an important elemental technique, all input sequences are used to calculate the weighted expectation value. This is not enough in a streaming situation. Therefore, in this embodiment, a new incremental model estimation method is introduced. Given a model parameter Θ representing a sequence set {x ₁ , ... x _n } and a newly added input sequence x _{n + 1} , the new model parameter ~ Θ is obtained using the following method: It can be calculated incrementally. First, in step E, instead of calculating the expected values for all data sets, the expected values are calculated only for the new input sequence x _{n + 1} . Next, in the M step, the model parameters are efficiently updated by the following equation (13).

ここで、N_γ1=Σⁿ _r=1Σ^k _j=1γ^(r) ₁(j)となる。同様にして~a_ijと~b_i(v)も新たな期待値を用いて更新することができる。なお、このインクリメンタルなモデル推定は、既存の手法に比べ、より早く収束に近づく。さらに重要な点として、本発明の計算コストはシーケンスの総数によらないという利点がある。 Here, the _{^{_{N γ1 = Σ n r = 1}}} Σ k j = 1 γ (r) 1 (j). Similarly, ~ a _ij and ~ b _i (v) can also be updated using new expected values. Note that this incremental model estimation approaches convergence faster than existing methods. More importantly, the computational cost of the present invention has the advantage that it does not depend on the total number of sequences.

＜近似データ更新処理装置の作用＞
次に、第２の実施の形態に係る近似データ更新処理装置２００の作用について説明する。まず、近似データ更新処理装置２００に、シーケンス集合が入力されると、近似データ更新処理装置２００は、時系列データ蓄積部２１に格納する。そして、近似データ更新処理装置２００は、シーケンス集合に基づいて、階層構造のＨＭＭを構築し、階層モデルデータ記憶部２２に格納する。 <Operation of approximate data update processing device>
Next, the operation of the approximate data update processing device 200 according to the second embodiment will be described. First, when a sequence set is input to the approximate data update processing device 200, the approximate data update processing device 200 stores the sequence set in the time-series data storage unit 21. Then, the approximate data update processing device 200 constructs a hierarchical HMM based on the sequence set and stores it in the hierarchical model data storage unit 22.

そして、近似データ更新処理装置２００に、新たなシーケンスが入力されると、近似データ更新処理装置２００は、上記図４に示す階層モデル更新処理ルーチンと同様の処理ルーチンを実行する。 When a new sequence is input to the approximate data update processing device 200, the approximate data update processing device 200 executes a processing routine similar to the hierarchical model update processing routine shown in FIG.

このとき、上記ステップＳ１０６において近似時系列データ処理部２２３による分割操作が行われた後に、図６に示す、階層型サンプリング処理ルーチンが実行される。 At this time, after the dividing operation is performed by the approximate time series data processing unit 223 in step S106, a hierarchical sampling processing routine shown in FIG. 6 is executed.

まず、ステップＳ２０１において、近似時系列データ処理部２２３は、上記ステップＳ１０６で分割して生成した要素Ｎ1と、サンプリング対象となるシーケンスの個数ε（要素Ｎ₁の配下の部分要素に属する全てのシーケンスの個数）との組（Ｎ₁、ε）をスタックに入れる。 First, in step S201, the approximate time-series data processing unit 223, an element N1 generated by dividing in step S106, all sequences belonging to the number epsilon (subelements subordinate elements N ₁ sequences of the sampled (N ₁ , ε) are put in the stack.

そして、近似時系列データ処理部２２３は、スタックが空になるまで、ステップＳ２０２〜ステップＳ２０７までの処理を繰り返す。 Then, the approximate time series data processing unit 223 repeats the processing from step S202 to step S207 until the stack becomes empty.

ステップＳ２０２では、近似時系列データ処理部２２３は、スタックから、組（Ｎ_p、ε_p）を取り出す。次のステップＳ２０３では、近似時系列データ処理部２２３は、Ｎ_pが最下層の要素であるか否かを判定し、Ｎ_pが最下層の要素でない場合には、Ｎ_pの配下の全ての部分要素Ｎ_cについて、ステップＳ２０４〜ステップＳ２０６までの処理を繰り返す。 In step S202, the approximate time-series data processing unit 223 extracts a set (N _p , ε _p ) from the stack. In the next step S203, the approximate time-series data processing unit 223, N _p is equal to or a lowermost layer of elements, when N _p is not the lowest layer of the element, all of the underlying N _p for subelements N _c, the process is repeated from step S204~ step S206.

ステップＳ２０４では、近似時系列データ処理部２２３は、確率n_c/n_pで、部分要素Ｎcに対して局所的なサンプルの個数ε_cを決定する。ステップＳ２０５では、近似時系列データ処理部２２３は、上記ステップＳ２０４で決定したサンプルの個数εcが、０より大きいか否かを判定する。サンプルの個数ε_cが、０より大きい場合には、ステップＳ２０６において、組（Ｎ_c、ε_c）をスタックに入れて、上記ステップＳ２０４又はステップＳ２０２へ戻るか、あるいは後述するステップＳ２０８へ移行する。 In step S204, the approximate time-series data processing unit 223 determines the number of samples ε _c local to the subelement Nc with the probability n _c / n _p . In step S205, the approximate time-series data processing unit 223 determines whether or not the number of samples εc determined in step S204 is greater than zero. If the number of samples ε _c is larger than 0, in step S206, the set (N _c , ε _c ) is put on the stack and the process returns to step S204 or step S202, or the process proceeds to step S208 described later. .

一方、サンプルの個数ε_cが、０以下である場合には、部分要素Ｎ_cがサンプルとして選択されなかったと判断し、上記ステップＳ２０４又はステップＳ２０２へ戻るか、あるいは後述するステップＳ２０８へ移行する。 On the other hand, if the number of samples ε _c is equal to or smaller than 0, it is determined that the subelement N _c has not been selected as a sample, and the process returns to step S204 or step S202 or shifts to step S208 described later.

また、上記ステップＳ２０３で、Ｎ_pが最下層の要素であると判定された場合には、ステップＳ２０７において、近似時系列データ処理部２２３は、Ｎ_pに属するシーケンスの集合から、ランダムにε_p個のサンプルを選択し、Ｎ₁の配下の全てのシーケンス集合Ｘ_sに加え、上記ステップＳ２０２へ戻るか、あるいは後述するステップＳ２０８へ移行する。 Further, in step S203, when N _p is determined to be the lowest layer of the element, in step S207, the approximate time-series data processing unit 223, from a set of sequences belonging to N _p, randomly epsilon _p Samples are selected and added to all the sequence sets X _s under N _{1, and} the process returns to step S202 or shifts to step S208 described later.

ステップＳ２０８では、Ｘ_sを、Ｎ₁の配下の全てのシーケンス集合のサンプリング結果として出力する。 In step S208, X _s is output as a sampling result of all sequence sets under N ₁ .

また、近似時系列データ処理部２２３は、上記ステップＳ１０６で分割して生成した要素Ｎ₂について、上記の階層型サンプリング処理ルーチンを同様に実行する。 Further, the approximate time series data processing unit 223 similarly executes the hierarchical sampling processing routine for the element N ₂ generated by dividing in step S106.

そして、上記ステップＳ１０７では、近似時系列データ処理部２２３は、Ｎ₁の配下の全てのシーケンス集合のサンプリング結果を、Ｘ_１とし、上記ステップＳ１０８では、Ｎ₂の配下の全てのシーケンス集合のサンプリング結果を、Ｘ₂とする。近似時系列データ処理部２２３は、ステップＳ１０９以降の処理を、同様に実行する。 Then, in step S107, the approximate time-series data processing unit 223, a sampling result of all sequences set under the N _1, and X _1, in the step S108, sampling of all sequences set under the N ₂ the results, and X _2. The approximate time-series data processing unit 223 similarly executes the processes after step S109.

また、上記ステップＳ１０５では、近似時系列データ処理部２２３は、上記（１０）式、（１１）式に従って、入力されたシーケンスｘ_iに対してのみ期待値を計算し、上記（１３）式、及び（１３）式と同様に（４）式、（５）式を変更した式に従って、要素ＮのＨＭＭのモデルパラメータΘを更新する。 In step S105, the approximate time series data processing unit 223 calculates an expected value only for the input sequence x _i according to the above equations (10) and (11), and the above equation (13), Similarly to the equation (13), the HMM model parameter Θ of the element N is updated according to the equations obtained by changing the equations (4) and (5).

以上説明したように、第２の実施の形態に係る近似データ更新処理装置によれば、分割された要素のモデルパラメータを更新する際に、シーケンスのサンプル集合を用いて更新するため、計算時間とメモリ使用量とを更に削減することができる。 As described above, according to the approximate data update processing device according to the second embodiment, when updating the model parameters of the divided elements, the update is performed using the sample set of the sequence. Memory usage can be further reduced.

また、挿入パス上の各要素について、入力された新たなシーケンスのみを用いて、モデルパラメータを更新することにより、計算時間とメモリ使用量とを更に削減することができる。 In addition, for each element on the insertion path, the calculation time and memory usage can be further reduced by updating the model parameters using only the input new sequence.

＜実施例＞
本発明の有効性を示すために実データを用いた実験を行った。実験は、4GBのメモリ、Intel（Ｒ） Core 2 Duo 1.86GHz のCPUを搭載したLinux（登録商標）マシン上で実施した。なお、推定するHMMはk=20と設定する。Mocapは、1秒120フレームでヒトの動きを計測したモーションキャプチャのデータセットである(http://mocap.cs.cmu.edu/)。図7は、モーションキャプチャデータにおける左右の足の運動エネルギーのシーケンスを例として示しており、それぞれのシーケンスは異なるモーション(walking, squatting, jumping,balancing)を示している。シーケンス集合を用いて、4つのモーションそれぞれに対しHMMのパラメータ推定を行うことを考える。ただし、これらの4つのグループに関する情報は事前に与えられないものとし、アルゴリズムが独自にシーケンス群の中から類似パターンを発見、グループ化し、それらのグループについて個別にパラメータの推定を行うものとする。実験で利用したデータは、2次元のデータを100のグリッドに分割し離散データとして扱った。学習用のデータには、上記の4種類の動き(walking, squatting, jumping,balancing)を使用した。 <Example>
In order to show the effectiveness of the present invention, experiments using actual data were conducted. The experiment was conducted on a Linux (registered trademark) machine equipped with 4 GB of memory and an Intel (R) Core 2 Duo 1.86 GHz CPU. The estimated HMM is set to k = 20. Mocap is a motion capture dataset that measures human movements at 120 frames per second (http://mocap.cs.cmu.edu/). FIG. 7 shows a sequence of kinetic energy of the left and right feet in the motion capture data as an example, and each sequence shows a different motion (walking, squatting, jumping, balancing). Consider using a sequence set to estimate the HMM parameters for each of the four motions. However, information on these four groups is not given in advance, and the algorithm uniquely finds similar patterns from the sequence group, groups them, and estimates parameters individually for those groups. The data used in the experiment was treated as discrete data by dividing 2D data into 100 grids. The above four types of movement (walking, squatting, jumping, balancing) were used for the learning data.

図8は、MoCapにおける分類結果を示している。テストセットには、上記4種類の動きに属する2×4個のシーケンス(#1-#8)と、学習データに含まれないrunningのモーションに属する2個のシーケンス(#9and#10)の計10個を用いた。本発明によって、最初の8個のテストセット(#1-#8)に対し、適切なモデルを発見することができた。ここで興味深いことに、本発明ではsquattingとjumpingのモーションが類似していることも示した。実際に、上記図7に示す通り、これらのシーケンスは両者とも、両足を同時に曲げる動きであり、非常に似ている動きであるといえる。同様にして、runningのモーション(#9and #10)がwalkingのモーションと類似していることも示した。学習処理における計算時間を検証するため、本発明を従来技術であるHHMMと比較する。ここでは、次に挙げる2つのバーションのHHMMを比較対象とする。１つは、完全2分木による階層構造を使用する方法（図９のHHMM(b=2)参照）であり、もう一つは、完全4分木による階層構造を使用する方法（図９のHHMM(b =4)参照）である。さらに、本発明についても2つのバーションについて比較を行った。１つは、本発明におけるストリーミング処理による学習方法（図９のHEART参照）であり、もう１つは、本発明におけるベーシックなアイデアに基づく学習方法(サンプリング法を使用しない場合であり、図９のHEART(offline)を参照)である。本実験において、本発明における計算時間にはディスクアクセスによる時間も含まれている。図9は、HHMMと本発明の計算コストを示している。データセットのサイズ(すなわち全てのシーケンスの合計長)は1,000から200,000まで変動させた。本実験では、シーケンスの長さをm=100に固定した。図に示す通り、本発明は従来技術に比べ大幅な高速化を達成している。 FIG. 8 shows the classification results in MoCap. The test set includes a total of 2 × 4 sequences (# 1- # 8) belonging to the above four types of motion and two sequences (# 9and # 10) belonging to running motion that are not included in the learning data. Ten pieces were used. According to the present invention, it was possible to find an appropriate model for the first eight test sets (# 1- # 8). Interestingly, the present invention also shows that squatting and jumping motions are similar. In fact, as shown in FIG. 7, both of these sequences are movements that bend both feet at the same time, and can be said to be very similar movements. Similarly, we showed that the running motion (# 9and # 10) is similar to the walking motion. In order to verify the computation time in the learning process, the present invention is compared with the conventional HHMM. Here, the following two versions of HHMM are used for comparison. One is a method using a hierarchical structure with a complete binary tree (see HHMM (b = 2) in FIG. 9), and the other is a method using a hierarchical structure with a complete quadtree (see FIG. 9). HHMM (see b = 4). Furthermore, for the present invention, two versions were compared. One is a learning method based on streaming processing according to the present invention (see HEART in FIG. 9), and the other is a learning method based on the basic idea according to the present invention (in the case where a sampling method is not used). (See HEART (offline)). In this experiment, the calculation time in the present invention includes the time due to disk access. FIG. 9 shows the calculation cost of the HHMM and the present invention. The size of the data set (ie the total length of all sequences) was varied from 1,000 to 200,000. In this experiment, the length of the sequence was fixed at m = 100. As shown in the figure, the present invention achieves a significant increase in speed compared to the prior art.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 Note that the present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.

例えば、時系列データは、映像や音楽配信、バイオインフォマティックス、ロボットなど様々な分野で発生する。本発明はこれら全ての分野に適用可能である。 For example, time-series data is generated in various fields such as video and music distribution, bioinformatics, and robots. The present invention is applicable to all these fields.

また、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能である。 In the present specification, the embodiment has been described in which the program is installed in advance. However, the program can be provided by being stored in a computer-readable recording medium.

１０入力部
２０、２２０処理部
２１時系列データ蓄積部
２２階層モデルデータ記憶部
２３時系列データ処理部
３０出力部
１００データ更新処理装置
２００近似データ更新処理装置
２２３近似時系列データ処理部 DESCRIPTION OF SYMBOLS 10 Input part 20, 220 Processing part 21 Time series data storage part 22 Hierarchical model data storage part 23 Time series data processing part 30 Output part 100 Data update processing apparatus 200 Approximate data update processing apparatus 223 Approximate time series data processing part

Claims

A plurality of hidden Markov models (HMM) defined in a hierarchical structure, and a plurality of sequences belonging to at least one of the lowest HMMs and weighted to each of the lowest HMMs Storage means for storing;
For the input sequence, according to the hierarchical structure, the HMM is selected based on the likelihood for each HMM from the highest layer HMM to the lowest layer HMM, and at least one lowest layer HMM to which the input sequence belongs A sequence insertion means for inserting the inputted sequence as a sequence belonging to each of the selected lowest-layer HMMs, and obtaining a weight of the sequence for each of the selected lowest-layer HMMs; ,
If there is room for a new sequence for each HMM from the uppermost HMM selected by the sequence insertion means to the lowermost HMM, the parameter of the selected HMM is set in the input sequence. Updating means for updating based on;
For each HMM from the topmost HMM selected by the sequence insertion means to the bottommost HMM, if there is no room for a new sequence, the selected HMM is divided into two HMMs, and A dividing unit for allocating a plurality of HMMs of a layer one layer below the selected HMM or a plurality of sequences belonging to the selected HMM to two divided HMMs;
Estimating means for estimating each parameter of the two HMMs divided by the dividing means based on a subordinate sequence of the HMM assigned to the HMM or a sequence assigned to the HMM;
The two HMMs divided by the dividing means are the lowest layer HMMs, and the value based on the likelihood for the HMM of the sequence belonging to the lowest layer HMM is an inappropriate value compared to the threshold value. If determined, the sequence is deleted from the sequence belonging to the lowest layer HMM, the parameters of each HMM from the lowest layer HMM to the highest layer HMM corresponding to the sequence are updated, and the sequence According to a hierarchical structure, selecting from the top layer HMM to the bottom layer HMM based on the likelihood for each HMM, selecting at least one bottom layer HMM to which the sequence belongs, and selecting the selected As a sequence belonging to each of the lowest HMMs, the sequence is reinserted and the selected lowest And sequence reinsertion means for determining the weight of said sequence for each of the HMM layers,
A data update processing device.

When the selected HMM is divided into two HMMs by the dividing unit, each of the two divided HMMs is divided into the HMM from the divided HMM to the lowest HMM according to a hierarchical structure. A sampling means for selecting by sampling the HMM of the next lower layer or the sequence belonging to the HMM;
The data update processing device according to claim 1, wherein the estimation unit estimates each parameter of the two HMMs divided by the division unit based on a sequence under the HMM selected by the sampling unit.

The update unit updates the parameter of the selected HMM based only on the input sequence when there is a space for a new sequence for each of the HMMs selected by the sequence insertion unit. Item 3. The data update processing device according to Item 1 or 2.

The sequence insertion means adds the HMMs from the top layer HMM to the bottom layer HMM for the input sequence until the total weight of the sequence for each of the selected bottom layer HMMs exceeds a threshold value. And selecting the lowest-order HMM to which the inputted sequence belongs, and repeatedly inserting the inputted sequence as a sequence belonging to each of the selected lowest-order HMMs. 4. The data update processing device according to any one of items 3.

The determination that there is no free space for a new sequence when a total value of weights assigned to all sequences under the HMM is equal to or greater than a threshold value as a result of inserting the input sequence. The data update processing device according to any one of claims 1 to 4.

Presentation that presents each parameter of the predetermined number of HMMs based on the parameters of each of the plurality of HMMs defined in a hierarchical structure stored in the storage means when the predetermined number of HMMs are requested 6. The data update processing device according to claim 1, further comprising means.

A plurality of hidden Markov models (HMM) defined in a hierarchical structure, and a plurality of sequences belonging to at least one of the lowest HMMs and weighted to each of the lowest HMMs A data update processing method in a data update processing apparatus including storage means for storing, sequence insertion means, update means, division means, estimation means, and sequence reinsertion means,
The sequence insertion means selects the HMM based on the likelihood for each HMM from the top layer HMM to the bottom layer HMM according to the hierarchical structure of the sequence input by the sequence insertion means, and at least the input sequence belongs to Select one lowermost HMM, insert the input sequence as a sequence belonging to each of the selected lowermost HMMs, and weight the sequence for each of the selected lowermost HMMs Seeking
If there is a free space for a new sequence for each HMM from the uppermost layer HMM selected by the sequence insertion unit to the lowermost layer HMM by the update unit, the parameter of the selected HMM is Update based on the sequence entered,
If there is no free space for a new sequence for each HMM from the uppermost layer HMM selected by the sequence insertion unit to the lowermost layer HMM by the dividing unit, the selected HMM is converted into two HMMs. And allocating a plurality of HMMs one layer below that belong to the selected HMM or a plurality of sequences belonging to the selected HMM to the two divided HMMs,
The estimation means estimates the parameters of each of the two HMMs divided by the dividing means based on a sequence under the HMM assigned to the HMM or a sequence assigned to the HMM,
The two HMMs divided by the dividing means by the sequence reinsertion means are the lowest HMMs, and a value based on the likelihood for the HMM of a sequence belonging to the lowest HMM is compared with a threshold value. If it is determined to be an inappropriate value, the sequence is deleted from the sequence belonging to the lowermost layer HMM, and parameters of each HMM from the lowermost layer HMM to the uppermost layer HMM corresponding to the sequence are deleted. And, for the sequence, according to the hierarchical structure, select the HMM based on the likelihood for each HMM, from the top HMM to the bottom HMM, and at least one bottom HMM to which the sequence belongs And select the sequence as a sequence belonging to each of the selected lowermost HMMs. Is inserted, data updating processing method for determining the weight of said sequence for each of the lowermost HMM said selected.

The program for functioning a computer as each means of the data update processing apparatus of any one of Claims 1-6.