JP4180137B2

JP4180137B2 - Online handwritten character recognition method

Info

Publication number: JP4180137B2
Application number: JP33067797A
Authority: JP
Inventors: 隆松本; 賢一郎高橋
Original assignee: Waseda University
Current assignee: Waseda University
Priority date: 1996-11-29
Filing date: 1997-12-01
Publication date: 2008-11-12
Anticipated expiration: 2017-12-01
Also published as: JPH11242717A

Description

【０００１】
【発明の属する技術分野】
本発明は、オンライン手書き文字認識の方法、特に楷書のみならず、続け字や崩し字をも正確に認識する手書き文字認識の方法に関する。
【０００２】
【従来の技術】
ペン入力の電子手帳、ワードプロセッサ、コンピュータなどで重要な役割を演じるオンライン手書き文字認識については、既に多くの方法が報告されている。例えば、特開平８−１０１８８９号公報には、続け字や崩し字に強い方法であるリパラメトライズド・アングル・バリエーション法が開示されている。この方法に関するその他の文献としては、小林充ら、「Reparametrized Angle Variations を用いるon-line 手書き文字認識」、信学技報、PRU94-121, pp. 23-30 (1995) や、宮本修ら、「On-line 文字認識アルゴリズムReparametrized Angle Variations を高速に実行するハードウェアボードについて」、信学技報、PRU94-136, pp. 49-56 (1995) がある。これらの方法は、独特のデータ圧縮手順を含み、続け字や崩し字に対する文字認識に関しては、先行技術を相当程度上回る精度を示したが、圧縮データと辞書データのマッチングは、ダイナミックプログラミング（ＤＰ）の手法を利用していた。
【０００３】
そのほか、総括的な論文として、Tappert, C.C., et al., IEEE Trans. Patt. Anal. Machine Intell., vol. 12, No. 18, August, pp. 787-808 (1990) があるほか、最近の論文では、鶴田彰ら、「オンライン手書き文字認識システム」、シャープ技報 57, pp5-8 (1993) 、秋山勝彦ら、「ストロークのつながりに寛容なオンライン手書き文字認識」、画像の認識・理解シンポジウム(1994)、趙鵬ら、「オンライン手書き走り書き文字認識における汎用辞書の作成」、情報処理 Vol.34, No. 3, pp.418-425などがある。
より広く携帯電子機器が広まっている現状においては、認識困難な雑な手書き文字を、より精度よく認識することができる方法が求められている。
【０００４】
【発明が解決しようとする課題】
本発明は、上述の従来技術とは全く別の観点から文字の認識と学習を行う、精度の高い手書き文字の認識方法を提供することを目的とする。本発明の方法は、電子手帳などの携帯電子機器の現行の標準的なハードウェアにより実行可能なプログラムに実施可能なものである。
【０００５】
【課題を解決するための手段】
本発明は、高速隠れマルコフモデルと呼ばれるパラダイムをオンラインの手書き文字認識方法に適用することを提案するものである。
【０００６】
本発明は、隠れマルコフモデルを用いた、手書き文字の筆跡の座標の移動と入力用ペンの入力装置表面に対するアップまたはダウンの状態を表す時系列データに基づくオンライン手書き文字認識方法であって、与えられた時系列データから、その時系列データに含まれる点の間の角度情報と距離情報とに基づいて特徴点を抽出して、該時系列データを圧縮する特徴点抽出・データ圧縮ステップと、該特徴点を結ぶ隣り合う線のなす角度とその線の長さと、該ペンのアップまたはダウンとに応じて圧縮された時系列データを量子化する量子化ステップと、量子化された時系列的なデータに、ペンアップとペンダウンの状態変化に基づいて、あるいはペンアップとペンダウンの状態変化および上記角度に関する所定の条件に基づいて区切りを入れて、区切りに挟まれたデータの１個の集まりを隠れマルコフモデルにおける１個の状態に対応させる対応ステップと、この量子化され区切られた時系列的なデータについて、認識すべき文字に対応して予め求められた複数の隠れマルコフモデルのもとで、該データが得られる確率を計算する確率計算ステップとを含み、該確率が最大になる隠れマルコフモデルに対応する文字を最も確からしい文字とする文字認識方法を提供する。
【０００７】
本発明の一実施態様として、上記文字認識方法において、特徴点抽出・データ圧縮ステップは、上記与えられた時系列データのうち、連続してペンダウンの状態にあるデータ点について、隣り合う３個以上のデータ点を選択する選択ステップと、選択された複数個のデータ点の先頭点と最終点を結んだ線分と、該選択された複数個のデータ点の内の隣り合う２点を結ぶ線分とがなす角度と、該隣り合う２点を結ぶ線分の長さを求める角度距離算出ステップと、上記角度と予め定めたしきい角度値との間または上記線分の長さと予め定めたしきい線分長値との間に所定の関係が成立するかを判断する判定ステップと、上記所定の関係が成立すると判定された場合に、上記選択された複数個のデータ点のうちの予め定めるものを特徴点として抽出し、その他のデータ点を捨てる特徴点抽出ステップと、上記に対して、上記連続してペンダウンの状態にあるデータ点に対して、上記選択ステップと、角度距離算出ステップと、判定ステップと、特徴点抽出ステップとが行われるよう、これらのステップを繰り返すステップとを含み、ペンアップを示すデータ点については、データ圧縮を行わない。
【０００８】
また、本発明の別の実施態様として、隠れマルコフモデルにおけるＮ個の状態の間で状態ｊから状態ｉへの遷移確率a _ijをｉ＝ｊとｉ＝ｊ＋１以外の場合には、ゼロに拘束し、さらにa _NNを１に拘束し、初期状態をq _iに固定する。
【０００９】
さらに、本発明のさらに別の実施態様として、上記の文字認識方法は、学習フェーズにおいて、隠れマルコフモデルにおける状態間の遷移確率a _ijを、１に拘束するa _NN以外については、上記区切りに挟まれたデータの集まりそれぞれにある上記量子化されたデータの記号列の数に基づいて得ることを特徴とする。
【００１０】
本発明のさらに別の実施態様として、上記の文字認識方法は、学習フェーズにおいて、上記区切りに挟まれたデータの集まりのそれぞれにある上記量子化されたデータの記号列の数に基づいて、隠れマルコフモデルにおける各状態からの出力確率を得る。
【００１１】
本発明のもう一つの実施態様として、上記文字認識方法は、学習フェーズにおいて、限られた数のデータを学習に用いて生じる過度のオーバーフィットを避けるため、上記遷移確率及び出力確率に対してスムージング処理を行うことを特徴とする。
【００１２】
本発明のさらにまた別の実施態様として、上記文字認識方法は、認識フェーズにおいて、文字の部分的な対応による誤認を防止するため、全ての状態に対する完全な周辺化は行わず、最終時刻における状態を隠れマルコフモデルの最後の状態に拘束して、隠れマルコフモデルに対するある時系列データの確率を計算する。
【００１３】
本発明のもう一つの実施態様として、上記文字認識方法は、学習フェーズにおいて、同一文字に対して複数セットの学習用データがあるとき、第１データセットにより作成した第１隠れマルコフモデルの第１データセットのデータに対する確率を上記の最後の状態への拘束のもとで計算し、該確率の対数値を該データ列の時間の最終値で除して、第１除算結果を得て、また、第１隠れマルコフモデルの第２データセットのデータに対する確率を上記の最後の状態への拘束のもとで計算し、該確率の対数値を該データ列の時間の最終値で除して第２除算結果を得て、ついで、第１除算結果を第２除算結果で除算して得られる値が、所定の正のしきい値より大きい場合に、該第２データセットのデータに基づいて、第２の隠れマルコフモデルを作成する。
【００１４】
本発明において「文字」とは、数字、ローマ字のアルファベット、ひらがな、カタカナ、漢字、漢字の偏や旁などの部分、発音記号、編集記号、編集操作を指示するための記号、ハングルやアラビア語など日本語以外の言語の文字、図形、符号、アイコンなど、手書きすることができ、コンピュータに入力される信号を生み出す二次元的、場合によっては三次元的な情報をいうものである。通念的な「文字」の定義にとらわれず、本明細書では、単に言葉を表すのみならず、何らかの意味または音を表す記号を全て「文字」と呼ぶことに留意されたい。また、本発明おいて、利用できる入力手段としては、ペン入力タブレットといったものが考えられるが、それに限定されるものではない。例えば、カメラによる入力画像の解析などによる身体動作に基づく入力方法なども可能である。
【００１５】
本発明の方法は、実行可能なコンピュータ・プログラムとして、ＣＤ−ＲＯＭや、フロッピーディスク、ハードディスク、メモリーチップ、その他の適当な記憶媒体に記憶させた形で、提供することができるほか、ペン入力の電子手帳、ワードプロセッサ、コンピュータ、ペン入力タブレットなどの装置に組み込んで、提供することができる。
【００１６】
【発明の実施の形態】
［生データ］
タブレットなどの入力機器から入力される生データは、通常、二次元の位置情報x(t _i)=(x₁(t i ), x₂(t i ))とあるストロークの終点か否かを判別する情報p(t _i)(ペンのアップまたはダウンに関する情報）を含んでいる。ここで、t _iは、ある一点の時間を表す。つまり、ある文字や偏や旁その他の適切な文字の一部である筆跡の最初のデータが得られた時刻をt₀とし、その後一定のタイミングでサンプリングをしたとき、i+1 番目の点のサンプリングの時刻はt _iとなる。Δt=t _i+1-t iは、通常一定であるが、一定であることには必ずしも拘束されない。通常、Δｔが十分に小さければ、Δｔを一定にして、Δｔ毎にペンがアップ状態またはダウン状態のどちらにあるかをサンプリングすれば十分である。しかし、Δｔを比較的大きくとるときは、例えば、ストロークの開始部と終端部とで、ペンが入力タブレットに接し、入力タブレットから離れる時点をとらえて、Δｔ以外のタイミングで、サンプリングすることもできる。
【００１７】
このような生データの集合を、
【数１】

と表す。ここで、Ｍは取り込まれた生データの点の数を表す。１画の漢字で１０から２０程度、１５画の漢字で８０から１００程度である。Ｒ²は二次元実空間を意味し、｛０，１｝は、ストロークの終点か否かを判断する情報p(t _i) が０と１の二値のいずれかをとることを意味する。この０と１の数値の選定は任意であり、ここでは、単に例として、０と１を選ぶ。ストロークの終点を、「ペンアップ」として、たとえば、p(t _i)=0 とし、それ以外のストローク上の点を「ペンダウン」として、p(t _i)=1 とするものである。後に行う実験例などにおいては、データベースなどで提供されている文字データを、上記のデータ形式に変換しておく必要がある。
【００１８】
［特徴抽出を含むデータ圧縮］
上述の形式の生データから、本発明の方法にふさわしい特徴を抽出し、かつ、できる限りデータを圧縮する。そのやり方は、以下に述べるようなフローにより処理を行うものをここでは採用するが、本発明は、下記の特徴抽出とデータ圧縮方法に限定されるものではない。
【００１９】
ちなみに、生データには、図４に示すように、手書きストロークの前後には余計な短いベクトル（「ひげ」と呼ぶ）がついていることが多く、学習・認識の妨げになるので、除去する必要がある。
【００２０】
まず、
【数２】

とおき、その点がペンアップの点であるか、ペンダウンの点であるかに応じて、異なる処理を施す。
【００２１】
［特徴抽出とデータ圧縮のためのフロー］
ペンダウンのとき
t₁からt _D-1の時点まで、ペンがタブレットに接触しているあるストローク内の点を表すペンダウンの状態にあったとすると、
p(t₀)=p(t₁)=p(t₂)=p(t₃)=・・・=p(t_D-1)=0
であり、t _Dにおいては、ストロークの終点であるので、上述の定義によりペンアップの状態となり、p(t _D)=1 である。このような場合、もし１ストローク内のデータ点の数Ｄが３未満（Ｄ＜３）であれば、データの圧縮は行わない。Ｄが３以上であるときは、以下の処理を行う。
【００２２】
この処理において、θ^*はベクトル圧縮のための角度のしきい値であり、以下に検討するベクトルの角度がこの値より小さければベクトルを結合する。ｌ^*は、「ひげ」の除去のための長さのしきい値で、ベクトルの長さがこの値より小さいときは、ひげであると見なす。θ^*とｌ^*の値は、経験的に選択することができるものであり、以下の各式において異なる値を採用することも可能であるが、共通の値にする方が簡明であろう。
ステップ１
３個のデータx(t₀), x(t₁), x(t₂) を選択する。これらの３点は、得られた筆跡データのうちの連続した点である。もし連続してペンダウンの状態にある１ストローク中に３点以上の連続したデータ点がない場合には、データ圧縮の操作は行わない。この図１に示される３点の座標データから、次式
【数３】

に従って、角度△θ_iを求める。図１の場合には、Δθ₁とΔθ₂の２個が得られる。つまり、図１に示すように、Δθ_iは、x(t₂)-x(t₀) を基準としたときの、x(t₂)-x(t₀) とx(t _i)-x(t _i-1)のなす角度である。
【００２３】
もし、i=1 または2 について、
【数４】

であれば、ステップ２へゆく。ここで、θ^*とｌ^*は、経験的に選ぶことができる正のしきい値である。そうでなければ、角度差と線の長さが比較的大きいと判断されるので、特徴のある部分であるものとして、x(t₀) とx(t₁) を特徴点とし、さらにx(t₀):=x(t₁)として、すなわち、処理すべき３点を１点だけ先へ進めて、ステップ１へと戻る。このしきい値θ^*は、数度から９０度程度の範囲で選択でき、データ圧縮率と認識率のかねあいから選ばれるものである。その目安としては、漢字では、４５度程度でも十分な認識率が得られる一方、ひらがな等ではこれより小さい値、例えば１０数度程度が望ましいことがわかっている。また、ｌ^*の値は、特に限定されないが、一文字が縦横２４０×２４０の要素の枠内に書き込まれたとして、通常、１５程度以下、４以上の範囲で選ぶことができる。
【００２４】
ステップ２
x(t₀),・・・, x(t₃) の４個の隣接する点のデータに対して、
【数５】

とおく。これは、数３の式と同様に、x(t₃)-x(t₀) を基準として、x(t₃)-x(t₀) とx(t _i)-x(t _i-1)の角度を表すものである（図２参照）。もし、i=1,2,3 のすべてのｉの値について、
【数６】

であれば、ステップ３へ、そうでなければ圧縮データを（x(t₀),x(t₂) ）で定義し、x(t₁) は捨てる。そして、x(t₀):=x(t₂)としてステップ１へゆく。
続くステップへゆく場合には、このような作業を点の数を一つづつ増やして繰り返すが、一般的にステップｋにおいては、次のような操作を行う。
【００２５】
ステップｋ
k+2 個のデータx(t _k+1), …, x(t₀) に対して、
【数７】

を定義する。角度は、x(t _k+1)-x(t₀)を基準とする。もし、i=1,…,k+1すべての値について、
【数８】

であれば、ステップｋ＋１へ、そうでなければ、（x(t₀),x(t _k+1)）を圧縮データとする。つまり、x(t₁),…,x(t_k) はすべて捨て去る。
【００２６】
たとえば、ステップ１において、「ひげ」となるような短くて角度のついたx(
t₂)-x(t₁) が、絶対値が比較的大きなx(t₁)-x(t₀) の後に続いているとすると、
｜Δθ₂｜＞θ^*，｜x(t₂)-x(t₁) ｜＜l ^*，｜Δθ₁｜＜θ^*，｜x(t₁)-x(t₀) ｜＞l ^*となり、ステップ２へと進むことになる。｜Δθ₂｜が小さい場合も同様である。その結果、その後の処理がどのようなものになっても、少なくともx(t₁) のデータは捨て去られることとなる。また、もし、x(t₂)-x(t₁) が比較的長さが大きく、x(t₁)-x(t₀) の長さも大きく、相対的な角度も大きいとすると
、｜Δθ₂｜＞θ^*，｜x(t₂) -x(t₁)｜＞l ^*，｜Δθ₁｜＞θ^*，｜x(t₁) -x(t₀)｜＞l ^*となり、データ圧縮することなく、次のデータセットへとステップ１の処理が進められる。
【００２７】
このような操作を、入力されたストロークの座標点数Ｄに達するまで繰り返して、データ圧縮及び特徴抽出の処理が終了する。
【００２８】
ペンアップのとき
ペンアップのとき入力用のタブレットなどから入力されるp(t)の情報は、たとえば、t _i-1≦t ≦t _iの期間はペンアップの状態(p(t)=0)にあり、t<t _i-1, t_i<tでは、ペンダウン(p(t)=1)である。このとき、t _i-1<t<t iの期間ではペンの位置情報が得られず、x(t _i-1)とx(t _i) のみが得られる。したがって、この場合には、データ圧縮を行わない。
【００２９】
ここで、以上の処理を行った結果残った点を「特徴点」と呼ぶ。上述のように、漢字の場合は、θ^*=45 °程度であるならば、認識率を犠牲にすることなくデータを圧縮することができる。得られた特徴点から、生データよりも美しい字が得られることもある。そのような例を、「木」という漢字を例にとって、図４（ａ）と図５に示す。図４（ａ）がタブレットに入力された手書きデータであり、図５がデータ圧縮の処理を行った後のデータを示す。
【００３０】
［量子化］
前処理としてのデータ圧縮を行ったデータを、改めて
【数９】

とし、このデータのペンアップとペンダウン、角度、そして長さの情報を次のように量子化する。時間t _iは時間のインデクスであるｔ(1,2, …,M）で置き換えられる。まず、ペンのアップとダウンに応じて、
【数１０】

を定義し、また、x(t)から得られる角度情報
【数１１】

を、角度の大きさにおいて均等に分割したＬ_M個の角度範囲（図６参照）のどこにはいるかにより、量子化あるいはシンボル化して、
【数１２】

とする。図６と後に述べる数値実験では、Ｌ_M＝１６であるが、Ｌ_Mは特定の数値に限定されるものではない。データのもつ長さ情報は、量子化されたデータセット(v_1k, v_2l ) の繰り返しで表現することができる。繰り返しの回数はベクトルの長さをｌとすると、ある定数l₀を基準として、(1/l₀)+ ｌの小数部を切り捨てた値で表される。これにより、例えば、上記のフローにより圧縮された図５の「木」という字は、l₀を適当に定めると、次のような記号列に変換される。
【数１３】

ここでのｔは、上記のｔとは異なるものとなるが、時系列的なインデクスであることには変わりはないので、そのまま用いる。ｔは１からＴまでの整数であるものとする。なお、上記の定数l₀は、例えば、２４０×２４０要素の文字入力範囲を用いたとき、４０から１２０程度の広い範囲から、経験的に選ぶことができることがわかっている。
【００３１】
［隠れマルコフモデル］
次に、モデル化に用いる隠れマルコフモデル(hidden Markov Model; ＨＭＭと略す) は、二重確率的モデルとして知られているパラダイムであるが、混乱などを避けるため、その概要を略説するとともに、記号の定義を整理して示す（隠れマルコフモデルについては、Rabiner, L.R., Proceedings of the IEEE, Vol. 77, No. 2, February, 1989を参照）。後に述べる認識フェーズにおけるオンライン手書き文字での特殊性を考慮する際にも明確な記述が必要になる。
【００３２】
隠れマルコフモデルは、次のような要素により特徴付けられる。
まず、あるモデルにある状態の数をＮとする。ここでの状態とは、一般に隠されたものであるが、以下に説明するように、ある特定の物理的な意味を付与することができるものである。ここでは、個々の状態をq₁,q₂,…,q_Nと表す。ある時点の状態Ｑ(t) は、q₁,q₂,…,q_Nのいずれかの状態にあることになる。
【００３３】
また、上記の状態一つ当たりの観察可能な値（場合によってはシンボルともいう）の数をＭと表す。この観察可能な値は、モデル化されるシステムの物理的な出力を表すものである。個々の値は、v₁,v₂,…,v_Mと表すことができる（この値の集合をＶ＝｛v₁,v₂,…,v_M｝とする）。たとえば、統計学での古典的な例であるコイン・トスの例を考えると、「表」と「裏」という観察可能な状態がここでいう値v₁とv₂に対応することとなる。
【００３４】
そして、これらの状態間の遷移確率a _ijを考える。ある状態から全ての状態に遷移できるとすると、この遷移確率a _ijは全てのｉとｊについて正の値をとることとなるが、以下に説明するように、必要なモデル化の方法により、遷移確率を特定のｉとｊについてゼロに設定しても、実用的な問題を生じないことが多い。
【００３５】
さらに、ある状態q _iから観察可能な値ｖ_hへの出力確率、言い換えれば、ある状態における観察可能な値の確率分布を、出力確率b _ihとする。
【００３６】
そして、初期状態の分布πを考える。これは、起点となる時点での状態q₁,q₂,…,q_Nの空間における確率分布で表される（π＝Ｑ(1) ＝｛π₁,π₂,… ,π_N｝）。
【００３７】
このように、Ｎ，Ｍ，a _ij, b _ih, πが与えられれば、ある観察可能な符号列としてのＯ＝Ｏ₁Ｏ₂Ｏ₃…Ｏ_T（ここで、各Ｏ_i(i=1,2,...,T) は、値Ｖの一つであり、Ｔは一連の観察の回数を表す）を得るための生成方法として、隠れマルコフモデル（ＨＭＭ）を用いることができる。逆に言えば、Ｎ，Ｍ，ａ_ij, b _ih, πがＨＭＭであるということができる。
【００３８】
このようなＨＭＭを現実の応用例に適用するためには、一般に次のような三つの基本的な問題を解く必要がある。
【００３９】
［問題１］
観察された符号列Ｏ＝Ｏ₁Ｏ₂Ｏ₃…Ｏ_TとＨＭＭが与えられたとして、その符号列が得られる確率をどのようにして効率的に計算するか。
【００４０】
［問題２］
観察された符号列Ｏ＝Ｏ₁Ｏ₂Ｏ₃…Ｏ_TとＨＭＭが与えられたとして、どの
ようにしてある意味のあるやり方で最適な対応する状態列q₁q₂q₃…q _Tを選べばよいのか、言い換えれば、観察の結果をどのような状態列を選べば最もよく説明できるのか。
【００４１】
［問題３］
上記の符号列を得る確率を最大にするモデルパラメータａ_ij，ｂ_ih，πの値をどのように調整するのか。
【００４２】
上記の問題１は、評価の問題である。あるモデルと観察された結果が与えられているとして、そのモデルによってその観察結果が生成される確率をどのように計算するかという問題である。見方を変えて見れば、ある観察結果が与えられたときに、あるモデルがどれほど上手くマッチするかということを数値により評価する問題と見ることもできる。つまり、ある観察結果が与えられたときに、それに最もよく適合するモデルを選ぶことができる。文字認識においては、認識フェーズともいわれる部分である。
【００４３】
問題２は、モデルの隠された部分、すなわち「正しい」状態列を見いだそうとすることである。ただし、一般的には、「正しい」状態列といったものはなく、実際上、ある最適化条件を用いて、可能な限り上手くこの問題を解こうとする程度のことしかできない。そして、最適化条件は、モデルの対象となる事象（本発明では、手書き文字入力情報）の構造に依存することとなる。
【００４４】
問題３は、どのようにして与えられた観察結果が得られたかを上手く記述するためのモデルパラメータを最適化する問題である。これは、「学習」の問題であり、学習フェーズともいわれる。この問題を解くことにより、現実の事象に最もよく適合するモデルを作成する、つまり、観察された学習データにモデルパラメータを最適に合わせることができる。
【００４５】
手書き文字認識の方法を得るためには、各文字に対するＨＭＭを作成する必要がある。それには、まず、各文字モデルのパラメータ値を見積もって、上記問題３を解決する必要がある。また、モデルにおいて用いる状態の物理的な意味を理解しつつ、学習用に用いる入力された筆跡を区分けして、ある状態列に対応させる必要がある。これは、問題２を解いて、状態の数、ＨＭＭ作成前の前処理の方法、その他のモデル化の詳細を調整して、モデルをよりよいものとすることにほかならない。最後に、各文字に対応して作成されたモデルを用いて、上記の問題１を解くことにより、文字の認識、すなわち最も尤もらしいモデルの評価を行う。
【００４６】
上に略説したＨＭＭの考え方が、本発明の手書き文字認識方法においてどのように適用されるのかを見る。
【００４７】
上で定義された(v_1k, v_2l ), k=1,2, l=1,2,…,L_Mは、上記の一般的な説明における観察可能な符号v₁に対応するベクトル量として考えられる。この観察結果である、ある文字の隠れマルコフモデル
【数１４】

は、次の諸量で定義される。
【００４８】
状態Ｑ(t)
【数１５】

とその遷移確率
【数１６】

状態の初期確率分布
【数１７】

出力確率
【数１８】

【００４９】
これらの隠された（観察されない）状態Ｑ(t) と出力確率の存在を考えることが隠れマルコフモデル（ＨＭＭ）を単なるマルコフモデルに対して特徴付けるものである。すなわち、ある時点ｔでの隠された状態Ｑ(t) は、状態q₁,q₂,…,q_Nのいずれか一つの状態にある。これらの状態は、ＨＭＭで表現しようとするシステム（ここでは、文字またはその部分あるいは図形）に対して適切に選ぶことができる。後ほどより具体的に説明するが、ここではこれらの状態は、一般的に存在する状態を表すものとして考える。
【００５０】
そしてＱ(t) がある状態q _iにあるとき、この状態qiから観察可能な状態V₁(t) を表す数値v_1kへと遷移する確率がb¹ _ikである。これに対し、遷移確率a _ijは、状態ｊから状態ｉへの遷移確率を示すもので、ここでは隠された状態に対するものとして与えられているが、その意味するところは、普通のマルコフモデルにおける遷移確率と同じである。
【００５１】
実際に入力された文字またはその一部を表す数値列（上述の符号列に対応する）
【数１９】

に対して、すでに記憶されているテンプレートとなる文字
【数２０】

が与えられたとすると、この文字に対する
【数２１】

の結合確率は、次の式で与えられる。
【数２２】

この式が隠れマルコフモデルを定義するものとも言えるもので、本発明における文字の学習と認識の出発点となるものである。
【００５２】
学習とは、入力されたデータに基づく、
【数２３】

といったパラメータとＮの決定である。そして、認識とは、学習されたパラメータからなるＨＭＭに基づき、しかるべき基準で、入力され前処理されたデータＯ(t) がどの文字から発生したものであるのかを決定することである。
【００５３】
一般に、あるパラダイムが具体的問題に対してそのまま有効に働くことはまれである。本発明の場合もその例外ではなく、与えられた問題に固有な拘束条件を工夫することによって、よい結果になりうる。まず、例として隠れマルコフモデル自体に次の拘束条件を付けるが、本発明は、この拘束条件に限定されるものではなく、種々の異なる拘束条件の付け方が可能である。例としてここで用いる拘束条件は、
【数２４】

とするものである。これは、 a_ij の行列を次の形に拘束することを意味する。
【数２５】

さらに、
π＝（１，０，・・・，０）
とする。すなわち、初期状態Ｑ(1) は、常にq₁である。上記のように遷移確率を拘束することは、インデクスが２以上小さいか大きい状態からの遷移確率をゼロとし、インデクスが一つ上の状態に移る可能性がゼロではないものと仮定することになる。このように仮定することにより、前処理済みのデータＯ(t) の時間に関する因果性と連続性を保つことができる。若干異なるが、類似の拘束条件として、i<j のときa _ij=0として、それ以外ではa _ijが正の値をとることとするものも考えることができる。さらに、i<j, i>j+ Δで、a _ij=0という条件も可能である（ここで、Δは正の整数である）。また、初期状態をq₁に拘束することは、学習に関して下記に説明する、状態と観察された数値列の「対応付け」から明らかとなる。
【００５４】
［文字認識］
以下、オンライン手書き文字認識であることの特殊性を考慮した認識と学習の方法を説明する。隠れマルコフモデルにおける認識と学習は表裏一体の関係にあるが、ここでは、まず認識について説明する。
【００５５】
まず、後に述べる学習によって、認識すべき各文字に対する隠れマルコフモデル（ＨＭＭ）
【数２６】

を少なくとも一つ用意する。
【００５６】
そして、各ＨＭＭに対して、与えられ、ある予備的な処理を行った観測値Ｏ(t) （予備的な処理については下記）の確率（蓋然性）を計算する。そして、最も高い確率値を与えるＨＭＭをもっとも確からしい文字と判断する。これは、上述の隠れマルコフモデルの基本的な問題１に該当する。すなわち、ある観測値列Ｏ＝Ｏ(1) Ｏ(2) Ｏ(3) …Ｏ(T) とＨＭＭが与えられたとして、その観測値列が得られる確率を効率的に計算するという課題である。
【００５７】
効率的な計算のために、まず、α_i(t) を次のように定義する。
【数２７】

【００５８】
これは、あるＨＭＭが与えられたとして、時刻ｔまでのＯ(1),Ｏ(2),・・・, Ｏ(t) の部分的な観測値列が得られ、時刻ｔにＱ(t) となる確率を意味する。このα_i(t) は、次のようにして解くことができる。
【００５９】
（１）開始ステップ
α_i(1) ＝π_ib _i( Ｏ(1)) i=1,…,N
（２）誘導ステップ
【数２８】

（３）終了ステップ
【数２９】

【００６０】
まず、開始ステップにおいて、状態q _iと初期観測値Ｏ₁との結合確率として、前進方向確率の初期化を行う。ここでb _i（Ｏ(1))は、Ｏ(1) への出力確率である。次いで、誘導ステップにおいて、時刻ｔにおけるＮ個の可能な状態q _i(1≦i ≦N)から、どのようにして時刻t+1 において状態q _jに到達できるかを考えるのである。すなわち、α_i(t) がＯ(1),Ｏ(2),...,Ｏ(t) という観測値が得られ、時刻ｔでの状態q _iを経由して時刻t+1 に状態q _jに到達する結合確率は、α_i(t)a_jiとなる。可能なＮ個の状態q _iの全てについて、時刻ｔにおけるこの積の和をとると、それに伴うそれ以前の全ての部分的な観測値を含む、時刻t+1 におけるq _jの確率を得ることができる。これにより、q _jの確率が分かれば、状態q _jにおけるＯ(t+1) を考慮に入れることで、すなわち、b _j(t+1) の出力確率を上記の和に掛けてやることで、α_j(t+1) が得られることは、容易に看取できる。誘導ステップの計算は、与えられた時刻ｔについて、状態を示すインデクスｊ（１≦ｊ≦Ｎ）の全ての値に関して行われる。これをt=1,2,…,T-1について繰り返す。最後に、終了ステップにおいて、あるＨＭＭのもとで観測値列Ｏ＝Ｏ(1),Ｏ(2),...,Ｏ(T) が得られる確率が、α_j(T) の単なる和として求められる。
【００６１】
以上が、ＨＭＭと観測値列が与えられたときに、その確率を計算する一般的な方法である。ある観測値列、すなわちある入力された筆跡データに対して、尤も高い確率値を与えるＨＭＭに対応する文字が、文字認識の回答となるべきものである。この計算の結果が数２２に対応するものである。ところが、本発明方法においては、上記の終了ステップで行われたような q_i ^N _t=1に関する完全な周辺化は行わず、次式で表される確率が最大となるＨＭＭを最も確からしい文字に対するＨＭＭを考える。従って、上記の数２９における和をとる終了ステップは行われないこととなる。
【００６２】
【数３０】

ここで、arg max は、最大値をとるＨＭＭのインデクスを算出することを意味する。
【００６３】
ここでは、上述のように、すべての隠された状態 q_i ^N _t=1に関する完全な周辺化は行わず、Q(T)=q_Nという拘束条件が付いている。もし完全な周辺化を行ったならば、Ｑ(T) は、上記の式には残らないはずである。いいかえれば、時刻ＴにおけるＱ(t) をＨＭＭの最終的な状態であるq _Nに強制的に固定して、確率を計算する。このような拘束条件を付ける理由を以下に説明する。
【００６４】
例えば、漢字「口」と「品」を考える。いま、ペン入力をする筆者は、「口」をタブレットなどの入力装置に記入する。この入力情報に対応する記号列
【数３１】

が得られる。ところが、「口」は「品」の部分集合であるので、「品」のＨＭＭである
【数３２】

に対しては、
【数３３】

というＰ(i) の中には、少なくとも一つはかなり大きな値を有するものが含まれていることが多い。すなわち、ある一つ以上のｉの値においては、Ｐ(i) の値がかなり大きくなりうる。従って、次式により周辺化した結果、かなり大きな値が得られる可能性があり、これは誤認につながるので、避けなければならない。
【数３４】

【００６５】
そのため、最終状態の時点ＴでのＱ(T) をｑ_iとして、異なるｉの値についての和をとるのではなく、「品」という漢字に対するＨＭＭのｑ_Nに拘束してしまうものである。これにより、上記のような誤認を激減させることができる。同様の理由から、「一」と「二」と「三」、「木」と「林」と「森」などの間での誤認を防ぐことができる。
【００６６】
次に、ＨＭＭにおける学習について説明する。ここで留意されるべきであるのは、よく知られたＨＭＭの学習法であるBaum-Welch法は、与えられた学習データ｛Ｏ(t) ｝に対して、周辺尤度
【数３５】

のグラディエントを幾つかのパラメータに関して計算し、「山登り」を行って最大化する方法である。パラメータ空間内のある点から出発して、この周辺尤度があるパラメータに関して凸になっている保証はない。従って、局所最適解の問題は深刻であり、加えて、収束するまでに多大の計算を要するという問題がある。これは、例えば、教育漢字の８８１程度の文字数で、学習セットが数十という場合であっても、膨大な時間を要するので、実用的ではない。以下に説明する本発明のある実施態様に係る方法では、Baum-Welch法におけるような反復計算は必要としない。
【００６７】
本発明の実施態様において、いま、ある一つの漢字について、学習データがＣセット与えられたと考える。すなわち、第ｃ番目の学習データをＯ_c(t) ^Tc _t=1としたとき、ｃの値が１からＣまであるとすると、データセットは、
【数３６】

と表すことができる。このデータセットの内、まず第１セットについて、下記のような処理を行う。
【００６８】
［ステップ１：第１データセットに基づく遷移確率の算出］
第１データセット
【数３７】

に対して、
(i) Ｖ₁(t)の値が変化する時刻、すなわち、ペンのアップまたはダウンの状態が変化するとき、及び、
(ii)予め与えられた正のしきい値であるθ₀に対して、Ｖ₂(t)の表す角度の変化がそのしきい値以上になるとき、
の各時点毎に、「区切り」を入れ、ある区切りと次の区切りの間を一つの状態と考えて、状態q _iを対応させる。
あるいは、上記しきい値θ₀は考慮せずに、Ｖ₁(t)の値が変化する時刻、すなわち、ペンのアップまたはダウンの状態が変化する時点毎に「区切り」を入れるようにすることも、同様にできる。
【００６９】
例えば、上記の「木」という入力文字に対して得られたＯ(t) については、
【数３８】

となる。つまり縦の線がここで加えた「区切り」を表す。従って、画数Ｋを有する入力文字情報を表す｛Ｏ(t) ｝を学習させるＨＭＭは、少なくとも２Ｋ−１個の状態を持ち、さらに、上記のようにデータ圧縮した筆跡の角度の変化が前記しきい値θ₀を越える度に状態数が増加する。このしきい値θ₀は、量子化の角度幅以上で１２０度以内の広い範囲で、経験的に求めることができる。なお、ここでの「画数」とは、通常の国語辞典や漢字辞典などにおいて採用されている正式な画数と、手書き文字における手書き入力の際の画数のいずれをも広く意味するものである。
【００７０】
このように、「区切り」を入れて、ペンのアップまたはダウンがあったときのみならず、筆跡の角度が大きく変化した場合に状態を加えるのは、続け字を認識するために、一筆で書かれていても曲がりの大きいものは分割して学習と認識の対象としようとするためである。例えば、図４（ｂ）に示すような手書きの「木」の字の場合、右上の手書きで連続している部分があるため、数３８ではｑ₁及びｑ₂の二つの状態に対応していた横の棒と縦の棒が同じ状態ｑ₁に対応する（つまり、数３８の第４，第５の要素である（２，１１）と（２，１１）がペンダウンを示す（１，１１）と（１，１１）なってしまう）など、文字の構造を全く反映しないモデルになってしまうため、数３９のように区切る必要が生じる。
【数３９】

そして、現実にはペンダウンの状態にあるが、文字の形態の上からはアップ状態に対応しているｑ₂からも（さらに同様にアップ状態に対応しているｑ₄とｑ₆からも）、ある程度の確率でダウン状態を出力できるように、パラメータを調節する（以下に述べるスムージング手続）ので、数３８を登録したＨＭＭが、図４（ｂ）に示した文字に対応する数３９の観測値列を出力する確率Ｐ（Ｏ₂｜Ｈ）を最大にする状態遷移Ｑを、後に詳細に説明するやり方で求める。
【００７１】
次に、上記のＨＭＭの一般的な説明において定義した状態の遷移確率a _ijと状態の初期確率分布πを決定する。
上のステップで得られた区切り付きの状態列において、各状態q _iは幾つかの数値ペアを含んでいる。例えば、上記の例においては、q₁は３個の数値ペアを、q₃は５個の数値ペアを含んでいる。このような数値ペアの数をｎ（Ｏ₁,q i）とおき、上述した状態の遷移確率a _ijと状態の初期確率分布πを次のように定める。
【数４０】

【００７２】
ここで、a _ijを上記のように定めたのは、ある学習データＯ₁(t)にある、q _iに対応する数値ペアの数がｎ（Ｏ₁,q i）であるので、数２５の式で規定した拘束条件のもとで、q _jからq _i（i=j またはi=j+1 ）への転移が特定の状態の性質に依存せずに、数値ペアの数に単純に比例して起こるとしたものである。
【００７３】
上記のように初期状態分布πを定めたのも、必ず状態q₁から出発するという状態の定義から明らかである。
【００７４】
また、ここで指摘しておきたいのは、Ｏ₁(t)の第２成分であるＶ₂(t)は角度情報のみであり、第１成分であるＶ₁(t)はペンのオンオフに関する情報を表すだけであるが、この隠れマルコフモデルには長さの情報も含まれていることである。すなわち、｛Ｏ₁(t)｝は、ある基準長をもとに導出されており、各状態q _iにおける数値ペアの繰り返し回数ｎ｛Ｏ₁,q i｝に対応した長さ情報が含まれている。
【００７５】
次に、出力確率の集合｛b¹ _ik｝と｛b² _ik｝の各要素の値を定義する必要があるが、状態q _iに対応するｎ（Ｏ₁,q i）個の数値ペアのうち、
Ｖ₁(t)=v_1kとなる個数をｎ（Ｏ₁,q i_,v_1k）、
Ｖ₂(t)=v_2kとなる個数をｎ（Ｏ₁,q i v_2k）とし、
【数４１】

とする。すなわち、全体の数値ペアの数に対して、特定のＶ_ikの値をとるものを数えて、その確率を出力確率とするものである。
【００７６】
以上のようにa _ij、π、b _ijを定めることは、数３６の拘束条件から自然であると考えられるが、このように定めなければならないという積極的な理由があるわけでもない。別の定義を採用することも可能である。
【００７７】
［ステップ２：スムージング］
上記のようにして得られた｛a _ij 、｛b¹ _ik｝、｛b² _ik｝は、第１データ｛Ｏ₁(t)｝のみから決定されているので、極端なオーバーフィット、すなわち、同一文字について、特定の筆跡のみを認識するが他の筆跡をうまく認識できない現象が起こるのが普通である。このオーバーフィットの問題は、ある一つの文字について数千といったオーダーの数の筆跡例のデータと、適当なアルゴリズムを用いれば自然に解消する可能性はありあるが、学習に要する時間を考えると筆跡例の数は、せいぜい数十程度が現実的であろう。したがって、ここでは、オーバーフィットを解消するために用いられるレギュラリゼーション（正則化）的な考え方を用いる。つまり、上記のようにして得られた｛a _ij 、｛b¹ _ik｝、｛b² _ik｝に対して、適当なスムージングと呼ばれる処理を行う。スムージングの目的は、数３６の拘束条件の範囲内で、a _ijや、b¹ _ik、b² _ikの値がゼロになることを防ぐことにある。それには、いくつかのやり方があるが、ここでは、もっとも簡略で代表的な例を以下に採用する。いうまでもないが、本発明の範囲はこの下記の例に限定されるものではない。かきのようなスムージングは、簡単に行える一方、後に見るようにきわめて有効である。
【００７８】
スムージング手続きＡ
｛a _ij｝と｛b¹ _ik｝を次の式により修正する。OLD が上記のもので、NEW がついているのがスムージングにより新たに定義されるものである。
【数４２】

【００７９】
スムージング手続きＢ
｛b² _ik｝を次の式により修正する。手続きＡの場合と同様、OLD が上で求めたもので、NEW がついているのがスムージングにより新たに定義されるものである。
【数４３】

ここで、ｗ_1nは、
【数４４】

を満たすように選ばれる。
【００８０】
上記の手続きＡは、いわゆるフロアリングであり、数２５の拘束条件の範囲内で、ゼロの値をとる要素を避けようとするものである。後に述べる実験では、caとcbは、０．７〜０．９の値で良好な結果が得られている。
また、上記の手続きＢは、フロアリングに加えて、出力確率の高い方向に近い方向のベクトル（上記数値ペア）もある程度の確率で出力されるようにするための手続きである。ｗ_1nの選び方は、いくつか考えられる。以下に示す数値実験では、
【数４５】

を用いた。g(l,n)は、v_2lとv_2nのなす角度であり（図６参照）、ｆ( θ) は区間(-π，π のガウス分布にフロアリングとして(1- α)/2 πを加えて、規格化定数Ｚ（α，σ）で割った値である。なお、Ｌ_Mは、上記のように、角度情報を量子化したときの両指数を示す値である。
【００８１】
なお、αとσは、経験により適切に選ぶことができる値であり、それぞれ、0.7 ≦α≦0.9 、π／16≦σ≦π／６程度の範囲内で有効な結果が得られる。図７にｆ( θ) の概形を示す。
【００８２】
［ステップ３：複数のデータセットに基づく学習］
これまでは、第１データに基づく隠れマルコフモデルの作成について述べてきたが、認識率を向上させるためには、複数のデータセットを用いて学習を行うことが望まれる。次に、第２以降のデータセット｛Ｏ_c(t) ｝，c=2,...,C に基づく学習法について述べる。ここで、C は、データ数を表す正の整数である。
【００８３】
最尤状態遷移｛Ｏ_c(t) ｝を求める。ある学習データセット｛Ｏ_c(t) ^Tc _t=1のうちc=2,...,C のそれぞれの値の学習データについて、次式
【数４６】

より、順にt=T _cからt=1 へと、最尤状態遷移Ｑ_c(t) ^Tc _t=1を求めることができる。すなわち、まず数４６の第１式により、Ｑ_c(T_c) が求まれば、それを第２式に代入して、順次、Q _c(T_c-1), Q c (T_c-2),..., Q c (1) が求まる。そして、c=2,...,C のそれぞれのc の値について（すなわち、第２から第Ｃ番目のデータについて）、第１のデータの場合と同様にして、ｎ( Ｏ_c,q_i) 、ｎ( Ｏ_c,q_i,v_1k) 、ｎ( Ｏ_c,q_i,v_2l) を求めることができる。
【００８４】
数４６の式は、よく知られたViterbi アルゴリズムを用いて解くことができる。たとえば、
【数４７】

というＨＭＭがＯ＝｛３，３，１，２，３｝という観測値列を出力するときの最適状態推移は、
【数４８】

より、Ｑ（５）＝ｑ₃
【数４９】

より（Ａ＝｛ａ_ij｝，Ｂ＝｛ｂ_ij｝）、Ｑ（４）＝ｑ₃というようにして、順次、Ｑ＝｛ｑ₁，ｑ₁，ｑ₂，ｑ₃，ｑ₃｝と求められる。
【００８５】
上記のようにして得られた、第１データセットに基づく結果と、第２から第Ｃデータセットに基づいて求めた結果とをあわせて、次の式により、Ｃ個のデータについての平均を求めることができる。
【数５０】

【００８６】
［ステップ４：筆順違いなどのモデルの作成］
ここまでの学習では、
【数５１】

の数は、認識のカテゴリ数に一致している。例えば、教育漢字８８１文字の認識を行う場合には、モデルの数も８８１個となる。しかし、学習データの中には同じ文字でも異なる筆順で書かれているものや、著しく変形したものなど、同一のモデルに学習させるのは不適当であるものが含まれている。
【００８７】
同一文字に対して、例えば数十セットの学習データがあったとき、いくつ、どのようにしてＨＭＭを作るかは大きな問題である。全データセットにおける筆順と変形を目でチェックして、別のＨＭＭを作るべきか否かを決定するのは不可能に近い。従って、このような決定を自動的に行う方法の検討が必要である。
【００８８】
以下に述べる方法は、各データの持つある種の統計量に基づく自動化された方法で、後に実証してみるように、有効である。
【００８９】
この方法を説明するため、数３０の式を思い起こし、
【数５２】

に注目する。これは、認識評価基準（数３０の式）の対数をとったものである。教育漢字８８１文字の典型的なデータセットに対して、数４８の式をすべてのＨＭＭに対して、Ｔ₁を横軸にとって、プロットしたのが図８である。
【００９０】
正確には、各文字ごとに一つのデータセットであるので、
【数５３】

などとすべきであるが、記号の煩雑さを避けるため、簡単に表した。注目すべきなのは、数５２の式が、Ｔ₁に関して、ほぼ完全な直線に乗ることである。
【００９１】
次に述べる手順では、数５２をＴ₁で割ることにより規格化し、ＨＭＭとc=2,…,Cのそれぞれのｃの値に対するデータセット｛Ｏ_c(t) ^Tc _t=1｝とについて、相対的な類似度とも呼べるものを計算し、それをもとに自動的に新しいＨＭＭを作成するか否かを決定する。
【００９２】
すなわち、ステップ１において第１データにより得られたＨＭＭ（隠れマルコフモデル）を
【数５４】

とし、
【数５５】

であるとき、ステップ１の手続きで新たなＨＭＭ
【数５６】

を作ることとする。ここで、r _thは経験的に求められる値である。また、分母のq _Nは、第１データセットに基づくものであって、第ｃデータセットによるものではないことに留意されたい。
【００９３】
すなわち、数５５により、分子中の
【数５７】

のＴ₁に関する傾きと、分母中の
【数５８】

のＴ_cに関する傾きとを比べて、これがあるしきい値r _th以上に異なる場合には、同一文字であっても類似度が低いと判断し、新しいＨＭＭを作成する。
以上説明してきた方法は、Baum-Welchアルゴリズムに見られるような繰り返し計算を必要としないため、短時間で計算することができる。そのため、この方法は、高速ＨＭＭ法と呼ぶことができる。
【００９４】
［高速化］
上述した文字認識方法においては、
【数５９】

の値を全てのＨＭＭについて計算する。具体的には、
【数６０】

とすると、
【数６１】

が得られる。
【００９５】
これをそのまま実行すると、
【数６２】

の順に求まる。これには多くの無駄が含まれている。ここで提案した方法の拘束条件から、 t<iであるとき、α_i(t)=0 であり、t>T-N+i となるとき、α_i(t) はα_N(T) に影響を与えない。
【００９６】
従って、これらの場合は、α_i(t) を計算する必要がない。また、t<i であるときα_i(t)=0 であるので、T<N のとき、α_N(T)=0 である。
【００９７】
このような考察により、計算の量をさらに減少させ、より高速な文字認識が可能となる。この高速化は任意のものであるが、認識処理の時間を短縮するためには、望ましい。
【００９８】
以上説明した文字認識と学習の各ステップの流れが、それぞれ、図９及び図１０に示されている。図９には、文字認識フェーズの全体的な流れが記載されている。図１０には、学習フェーズの全体的な流れが説明されている。
【００９９】
【実施例】
［認識実験］
オンライン手書き文字データベース（農工大kuchibue-d-96-02）（中川正樹ら、「文章形式字体制限なしオンライン手書き文字パターン収集と利用」、信学技報、PRU 95-110, pp.43-48 (1995))を用いて認識実験を行った。図９に、用いた文字データのごく一部の例を示す。ここでは、教育漢字８８１文字のみを対象とした。
【０１００】
学習データとしては、kuchibue-d-96-02のmdb0006 〜mdb0030 と別途用意した教育漢字データ６種類の合計３１種類のデータセットを用いた。学習後に、評価データとしてkuchibue-d-96-02のmdb0001 〜mdb0005 の５種類を用いて、第１候補が正解である認識率と第３位候補までの中に正しい認識結果があったら正解とする認識率（第３位候補率）をそれぞれ計算した。
【０１０１】
その結果を実施例として表１に示す。
【表１】

【０１０２】
極めて高い認識率が得られていることがわかる。教育漢字８８１文字に対して、平均認識率は、８９．３％であり、３位までの累積認識率は、９５．３％となった。ここで、各パラメータの値は、θ^*= ４５度、２４０×２４０のスペースにおいてｌ^*= ８であり。さらに、用いられたスムージング用パラメータの値は、c _a=c_b= ０．８であり、α= ０．９、σ= π／１６であった。量子化のためのｌ₀は、６０とした。角度の量子化は、Ｌ_M＝１６にて行った。ここでは、区切りの付与のためのθ₀は考慮しなかった。すなわち、ペンアップダウンに対応した区切りのみで、角度変化に対応した区切りは入れなかった。また、r _thの値は、０．８であった。
【０１０３】
このようなパラメータ値に対して、すでに報告されている特開平８−１０１８８９号公報に開示されている方法で、上記実施例と同じ文字データセットにより学習させ、同じ文字データセットを認識させた結果は、表１に比較例として示すように、平均認識率が８６．９９％、第３位までの累積認識率は９２．５９％となった。本発明方法では認識率が向上していることがわかる。
【０１０４】
これらの実施例と比較例において用いたプログラムは、ＣおよびＣ⁺⁺により書かれたものであった。上記実施例のプログラムをペンティアム１２０ＭＨｚのＤＯＳ／Ｖマシンにおいて、ＭＳ−ＷＩＮＤＯＷＳ３．１上で走らせた結果、８８１文字の教育漢字（ｍｄｂ０００１）の認識を８分５４秒で完了することができた。同じプログラムがＷＩＮＤＯＷＳ９５においては約３倍の速度で動くことがわかっているので、ＷＩＮＤＯＷＳ９５上では、約３分で認識が完了するものと考えられる。これは、上述の高速化を行ったものである。これに対し、比較例による同様の文字認識は、ＷＩＮＤＯＷＳ９５上で平均１５分程度かかった。本願発明の方法は、従来技術による方法より認識速度が相当程度向上していることがわかる。したがって、本発明の方法は、より安価で、消費電力の少ないシステムにおいても稼働させることができる。
【０１０５】
【発明の効果】
上述のように、本願発明の方法によれば、これまで認識が困難であった続け字や筆順違いの文字の認識率が向上すると同時に、勾配の計算と山登りを反復して行って最尤状態を求める方法におけるような、膨大な計算量と局在最大値による結果の不安定性を避けることができる。また、認識速度が向上するので、より簡易なシステムにおいても、高速に文字認識を行うことができる。
【図面の簡単な説明】
【図１】三つのデータ点とΔθ_i(i=1,2) の定義を示す図である。
【図２】四つのデータ点とΔθ_i(i=1,2,3) の定義を示す図である。
【図３】図２と同様であるが、Δθ₃がしきい値θ^*よりも大きい様子を示す図である。
【図４】「木」という字の手書き生データの例を示す。
【図５】図４の生データに前処理を施したデータを示す。
【図６】方向の量子化のパターンの例を示す。
【図７】数４４のｆ（θ）の概形を示す。
【図８】ｌｏｇＰの分布をＴの関数として示す。
【図９】本発明の実施例による認識フェーズのフローチャートである。ここで、ＨＭＭは、「隠れマルコフモデル」を表す。
【図１０】本発明の実施例による学習フェーズのフローチャートである。ここで、ＨＭＭは、「隠れマルコフモデル」を表す。
【図１１】認識実験において用いた手書き文字データ（kuchibue-d-96-02）の例を示す。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an on-line handwritten character recognition method, and more particularly to a handwritten character recognition method for accurately recognizing not only letters but also continuous characters and broken characters.
[0002]
[Prior art]
Many methods have already been reported for on-line handwritten character recognition, which plays an important role in pen-type electronic notebooks, word processors, computers, and the like. For example, Japanese Laid-Open Patent Publication No. 8-101889 discloses a reparametized angle variation method which is a method resistant to continuous characters and collapsible characters. Other references on this method include Mitsuru Kobayashi, "On-line handwritten character recognition using Reparametrized Angle Variations", IEICE Technical Report, PRU94-121, pp. 23-30 (1995), Osamu Miyamoto et al., "On a hardware board that executes the on-line character recognition algorithm Reparametrized Angle Variations at high speed", IEICE Technical Report, PRU94-136, pp. 49-56 (1995). Although these methods include a unique data compression procedure and have shown much better accuracy than the prior art in terms of character recognition for continuation and breakup characters, matching of compressed data to dictionary data is based on dynamic programming (DP). Was used.
[0003]
Other comprehensive papers include Tappert, CC, et al., IEEE Trans. Patt. Anal. Machine Intell., Vol. 12, No. 18, August, pp. 787-808 (1990). In his paper, Akira Tsuruta et al., "Online Handwritten Character Recognition System", Sharp Technical Bulletin 57, pp5-8 (1993), Katsuhiko Akiyama et al., "Online Handwritten Character Recognition Tolerant of Stroke Connection", Image Recognition / Understanding Symposium (1994), Sakai et al., “Generation of general-purpose dictionaries for on-line handwritten scribbled character recognition”, Information Processing Vol.34, No.3, pp.418-425.
In the present situation where portable electronic devices are widely spread, a method capable of more accurately recognizing miscellaneous handwritten characters that are difficult to recognize is required.
[0004]
[Problems to be solved by the invention]
An object of the present invention is to provide a highly accurate handwritten character recognition method that recognizes and learns characters from a viewpoint completely different from the above-described prior art. The method of the present invention can be implemented in a program executable by current standard hardware of a portable electronic device such as an electronic notebook.
[0005]
[Means for Solving the Problems]
The present invention proposes to apply a paradigm called a high-speed hidden Markov model to an online handwritten character recognition method.
[0006]
The present invention is an on-line handwritten character recognition method based on time-series data representing the movement of handwritten handwriting coordinates and the up or down state of the input pen with respect to the input device surface, using a hidden Markov model. A feature point extraction / data compression step for extracting feature points from the obtained time series data based on angle information and distance information between points included in the time series data and compressing the time series data; and A quantization step for quantizing the time series data compressed according to the angle formed by adjacent lines connecting the feature points, the length of the line, and the up or down of the pen, and the quantized time series Put a break in the data based on pen-up and pen-down state changes, or based on pen-up and pen-down state changes and the specified conditions for the above angles. Corresponding steps for associating one set of data between the delimiters with one state in the hidden Markov model, and the quantized delimited time-series data corresponding to the characters to be recognized A probability calculation step of calculating a probability that the data can be obtained under a plurality of previously determined hidden Markov models, and the character corresponding to the hidden Markov model having the maximum probability is set as the most probable character Provide a character recognition method.
[0007]
In one embodiment of the present invention, in the character recognition method, the feature point extraction / data compression step includes three or more adjacent data points in the pen-down state among the given time-series data. A selection step for selecting the data points, a line segment connecting the first and last points of the plurality of selected data points, and a line connecting two adjacent points among the plurality of selected data points An angle distance calculating step for obtaining an angle formed by a minute, a length of a line segment connecting the two adjacent points, and a distance between the angle and a predetermined threshold angle value or a length of the line segment. A determination step for determining whether or not a predetermined relationship is established with a threshold line segment length value; and when it is determined that the predetermined relationship is established, a pre-set of the plurality of selected data points Extract specified points as feature points A feature point extracting step for discarding other data points, and a selection step, an angular distance calculating step, a determining step, and a feature point for the data points continuously in a pen-down state with respect to the above. And repeating these steps so that the extraction step is performed, and no data compression is performed for data points indicating pen-up.
[0008]
As another embodiment of the present invention, the transition probability a from state j to state i among N states in the hidden Markov model_ijIf i is not i = j or i = j + 1, it is constrained to zero, and a_NNTo 1 and the initial state is q_iSecure to.
[0009]
Furthermore, as yet another embodiment of the present invention, the character recognition method described above includes a transition probability a between states in a hidden Markov model in the learning phase._ijIs bound to 1_NNOther than the above, it is obtained based on the number of symbol strings of the quantized data in each of the data sets sandwiched between the delimiters.
[0010]
As still another embodiment of the present invention, the character recognition method may be configured to perform a hidden operation based on the number of symbol strings of the quantized data in each of the data sets sandwiched between the delimiters in the learning phase. Obtain the output probability from each state in the Markov model.
[0011]
As another embodiment of the present invention, the character recognition method may perform smoothing on the transition probability and the output probability in order to avoid excessive overfit caused by using a limited number of data for learning in the learning phase. It is characterized by performing processing.
[0012]
In still another embodiment of the present invention, the character recognition method described above does not perform complete marginalization for all states in the recognition phase, so as to prevent misperception due to partial correspondence of characters. Is constrained to the last state of the hidden Markov model, and the probability of certain time series data for the hidden Markov model is calculated.
[0013]
As another embodiment of the present invention, in the character recognition method, when there are a plurality of sets of learning data for the same character in the learning phase, the first hidden Markov model created by the first data set is used. Calculating the probability for the data in the data set under the constraint to the last state, dividing the logarithm of the probability by the final value of the time of the data sequence to obtain the first division result; The probability for the data of the second data set of the first hidden Markov model is calculated under the constraint to the last state, and the logarithmic value of the probability is divided by the final value of the time of the data sequence. If the value obtained by dividing the first division result by the second division result is greater than a predetermined positive threshold value based on the data of the second data set, Create a second hidden Markov model To.
[0014]
In the present invention, the “character” means a numeral, a Roman alphabet, a hiragana, a katakana, a kanji, a part such as a bias or a kanji of a kanji, a phonetic symbol, an edit symbol, a symbol for instructing an edit operation, a Korean, an Arabic language, etc. Characters, graphics, codes, icons, etc. in languages other than Japanese can be handwritten and refer to two-dimensional and sometimes three-dimensional information that generates a signal input to a computer. It should be noted that, without being bound by the conventional definition of “letter”, in this document, not only words but also symbols representing some meaning or sound are all called “letters”. In the present invention, a usable input means may be a pen input tablet, but is not limited thereto. For example, an input method based on body movement by analysis of an input image by a camera or the like is also possible.
[0015]
The method of the present invention can be provided as an executable computer program stored in a CD-ROM, a floppy disk, a hard disk, a memory chip, or other suitable storage medium. It can be provided by being incorporated in a device such as an electronic notebook, a word processor, a computer, or a pen input tablet.
[0016]
DETAILED DESCRIPTION OF THE INVENTION
[Raw data]
Raw data input from an input device such as a tablet is usually two-dimensional position information x (t_i) = (x₁(t i), x₂(t i)) and information p (t_i) (Information about pen up or down). Where t_iRepresents a certain point of time. In other words, the time when the first data of the handwriting that is a part of a certain character or partial character₀Then, when sampling is performed at a certain timing, the sampling time of the (i + 1) th point is t_iIt becomes. Δt = t_{i + 1}-t iIs usually constant, but is not necessarily constrained to be constant. Usually, if Δt is sufficiently small, it is sufficient to make Δt constant and sample whether the pen is in the up state or the down state every Δt. However, when Δt is relatively large, for example, sampling can be performed at a timing other than Δt by capturing the time when the pen touches the input tablet at the start and end of the stroke and leaves the input tablet. .
[0017]
A collection of such raw data
[Expression 1]

It expresses. Here, M represents the number of captured raw data points. One stroke of Kanji is about 10 to 20, and 15 strokes of Kanji are about 80 to 100. R²Means a two-dimensional real space, and {0, 1} is information p (t_i) Means one of the

binary values

0 and 1. The selection of numerical values of 0 and 1 is arbitrary, and here, 0 and 1 are selected only as an example. For example, p (t_i) = 0, and other points on the stroke are `` pen down '' and p (t_i) = 1. In an experimental example to be performed later, it is necessary to convert character data provided in a database or the like into the above data format.
[0018]
[Data compression including feature extraction]
Features suitable for the method of the present invention are extracted from the raw data in the above format, and the data is compressed as much as possible. As the method, a method for performing processing according to the flow described below is adopted here, but the present invention is not limited to the following feature extraction and data compression method.
[0019]
By the way, as shown in Fig. 4, the raw data often has extra short vectors (called "beard") before and after the handwritten stroke, which hinders learning and recognition, so it must be removed. There is.
[0020]
First,
[Expression 2]

Then, different processing is performed depending on whether the point is a pen-up point or a pen-down point.
[0021]
[Flow for feature extraction and data compression]
When pen down
t₁To t_D-1Until the point in time, the pen was in a pen-down state representing a point in a stroke where it touches the tablet.
p (t₀) = p (t₁) = p (t₂) = p (t_Three) = ・・・ = P (t_D-1) = 0
And t_DSince it is the end point of the stroke, the pen-up state is established by the above definition, and p (t_D) = 1. In such a case, if the number D of data points in one stroke is less than 3 (D <3), the data is not compressed. When D is 3 or more, the following processing is performed.
[0022]
In this process, θ^*Is an angle threshold for vector compression. If the angle of the vector to be examined below is smaller than this value, the vectors are combined. l^*Is a length threshold for the removal of the “beard” and is considered to be a beard if the vector length is less than this value. θ^*And l^*The value of can be selected empirically, and different values can be adopted in the following equations, but it will be easier to use a common value.
Step 1
3 data x (t₀), x (t₁), x (t₂) Is selected. These three points are continuous points in the obtained handwriting data. If there are not three or more consecutive data points in one stroke that is continuously pen-down, the data compression operation is not performed. From the coordinate data of the three points shown in FIG.
[Equation 3]

According to the angle △ θ_iAsk for. In the case of FIG. 1, Δθ₁And Δθ₂Are obtained. That is, as shown in FIG._iX (t₂) -x (t₀) With x (t₂) -x (t₀) And x (t_i) -x (t_i-1).
[0023]
If i = 1 or 2,
[Expression 4]

If so, go to Step 2. Where θ^*And l^*Is a positive threshold that can be chosen empirically. Otherwise, it is determined that the angle difference and the line length are relatively large, so x (t₀) And x (t₁) As feature points, and x (t₀): = x (t₁), That is, three points to be processed are advanced by one point, and the process returns to step 1. This threshold θ^*Can be selected in the range of several degrees to 90 degrees, and is selected based on the balance between the data compression rate and the recognition rate. As a guideline, it is known that a sufficient recognition rate can be obtained even at about 45 degrees for kanji, while a smaller value, for example, about 10 degrees is desirable for hiragana and the like. Also, l^*The value of is not particularly limited, but can be selected within a range of about 15 or less and 4 or more, assuming that one character is written in the frame of 240 × 240 elements.
[0024]
Step 2
x (t₀), ..., x (t_Three) For the data of 4 adjacent points
[Equation 5]

far. This is equivalent to x (t_Three) -x (t₀) With x (t_Three) -x (t₀) And x (t_i) -x (t_i-1) (See FIG. 2). If for all i values i = 1,2,3,
[Formula 6]

If so, go to step 3; otherwise, compress the compressed data (x (t₀), x (t₂)) And x (t₁) Is thrown away. And x (t₀): = x (t₂) Go to step 1.
In the case of going to the subsequent step, such work is repeated with the number of points increased by one. Generally, in step k, the following operation is performed.
[0025]
Step k
k + 2 data x (t_{k + 1}),…, X (t₀)
[Expression 7]

Define The angle is x (t_{k + 1}) -x (t₀) If i = 1, ..., k + 1,
[Equation 8]

If so, go to step k + 1, otherwise (x (t₀), x (t_{k + 1})) Is the compressed data. That is, x (t₁), ..., x (t_k) Discard everything.
[0026]
For example, in step 1, the short and angled x ((
t₂) -x (t₁) But x (t₁) -x (t₀) Followed by
| Δθ₂｜＞ θ^*, | X (t₂) -x (t₁) | <L^*, | Δθ₁| <Θ^*, | X (t₁) -x (t₀) ｜＞ l^*Thus, the process proceeds to Step 2. | Δθ₂The same applies when | is small. As a result, no matter what the subsequent processing will be, at least x (t₁) Data will be discarded. Also, if x (t₂) -x (t₁) Is relatively long and x (t₁) -x (t₀) Is large and the relative angle is large.
, | Δθ₂｜＞ θ^*, | X (t₂) -x (t₁) ｜＞ l^*, | Δθ₁｜＞ θ^*, | X (t₁) -x (t₀) ｜＞ l^*Thus, the process of step 1 is advanced to the next data set without data compression.
[0027]
Such an operation is repeated until the number of coordinate points D of the input stroke is reached, and the data compression and feature extraction processes are completed.
[0028]
When pen-up
The information of p (t) input from a tablet for input when pen-up is, for example, t_i-1≦ t ≦ t_iIs in the pen-up state (p (t) = 0) and t <t_i-1, t_iFor <t, the pen is down (p (t) = 1). At this time, t_i-1<t <t iThe position information of the pen cannot be obtained during the period of x (t_i-1) And x (t_i) Only. Therefore, in this case, data compression is not performed.
[0029]
Here, the points remaining as a result of the above processing are called “feature points”. As mentioned above, in the case of Kanji, θ^*If it is about 45 °, the data can be compressed without sacrificing the recognition rate. Characters that are more beautiful than raw data may be obtained from the obtained feature points. Such an example is shown in FIG. 4A and FIG. 5 taking the Chinese character “tree” as an example. FIG. 4A shows the handwritten data input to the tablet, and FIG. 5 shows the data after the data compression processing.
[0030]
[Quantization]
Re-compress the data that was compressed as pre-processing.
[Equation 9]

Then, the pen-up and pen-down, angle, and length information of this data is quantized as follows. Time t_iIs replaced by t (1,2, ..., M) which is a time index. First, depending on the pen up and down
[Expression 10]

Angle information obtained from x (t)
[Expression 11]

Is divided evenly in the size of the angle_MDepending on where the angle range (see Fig. 6) is, quantize or symbolize,
[Expression 12]

And In FIG. 6 and the numerical experiment described later, L_M= 16 but L_MIs not limited to a specific numerical value. The length information of the data is the quantized data set (v_1k, v_2l ) Can be repeated. The number of iterations is a constant l, where l is the length of the vector.₀(1 / l₀) + Expressed by rounding down the fractional part of l. Thus, for example, the word “tree” in FIG.₀Is appropriately converted into the following symbol string.
[Formula 13]

Here, t is different from the above t, but it is used as it is because it is a time-series index. It is assumed that t is an integer from 1 to T. The above constant l₀For example, it is known that when a character input range of 240 × 240 elements is used, it can be selected empirically from a wide range of about 40 to 120.
[0031]
[Hidden Markov Model]
Next, the hidden Markov Model (HMM) used for modeling is a paradigm known as a double probabilistic model. In order to avoid confusion, the outline is outlined and the symbol (For the hidden Markov model, see Rabiner, LR, Proceedings of the IEEE, Vol. 77, No. 2, February, 1989). A clear description is also necessary when considering the specialities of online handwritten characters in the recognition phase described later.
[0032]
Hidden Markov models are characterized by the following elements:
First, let N be the number of states in a model. The state here is generally hidden, but can be given a specific physical meaning as described below. Here, the individual states are q₁, q₂,…, Q_NIt expresses. The state Q (t) at some point is q₁, q₂,…, Q_NIt will be in either state.
[0033]
The number of observable values (sometimes referred to as symbols in some cases) per state is represented by M. This observable value represents the physical output of the system being modeled. The individual values are v₁, v₂, ..., v_M(This set of values can be expressed as V = {v₁, v₂, ..., v_M}). For example, considering the example of Coin Toss, a classic example in statistics, the observable states of “front” and “back” are the values v₁And v₂It will correspond to.
[0034]
And the transition probability a between these states a_ijthink of. If we can transition from one state to all states, this transition probability a_ijWill take positive values for all i and j, but, as will be explained below, it is practical even if the transition probability is set to zero for specific i and j by the necessary modeling method. Often does not cause any problems.
[0035]
Furthermore, some state q_iValue observable from_hOutput probability, i.e., the probability distribution of observable values in a state, the output probability b_ihAnd
[0036]
Consider the initial state distribution π. This is the state q at the starting point q₁, q₂,…, Q_N(Π = Q (1) = {π₁, π₂,…, Π_N}).
[0037]
Thus, N, M, a_ij, b_ih, π, O = O as an observable code sequence₁O₂O_Three... O_T(Where each O_i(i = 1,2, ..., T) is one of the values V, and T represents the number of observations in a series) Use a hidden Markov model (HMM) as a generation method Can do. Conversely, N, M, a_ij, b_ih, π is an HMM.
[0038]
In order to apply such an HMM to an actual application example, it is generally necessary to solve the following three basic problems.
[0039]
[Problem 1]
Observed code sequence O = O₁O₂O_Three... O_TAnd how to efficiently calculate the probability that the code string is obtained, given HMM.
[0040]
[Problem 2]
Observed code sequence O = O₁O₂O_Three... O_TAnd as given by HMM, which
The optimal corresponding state sequence q in a meaningful way₁q₂q_Three… Q_TIn other words, what state sequence should best explain the observation results?
[0041]
[Problem 3]
Model parameter a that maximizes the probability of obtaining the above code sequence_ij, B_ih, How to adjust the value of π.
[0042]
The above problem 1 is an evaluation problem. Given a model and an observed result, the question is how to calculate the probability that the model will produce that observation. From a different perspective, it can also be seen as a numerical evaluation of how well a model matches given a given observation. That is, when a certain observation result is given, a model that best fits it can be selected. In character recognition, it is a part called a recognition phase.
[0043]
Problem 2 is trying to find the hidden part of the model, the “correct” state sequence. However, in general, there is no such thing as a “correct” state sequence, and in practice it is only possible to solve this problem as well as possible using certain optimization conditions. The optimization condition depends on the structure of the event to be modeled (in the present invention, handwritten character input information).
[0044]
Problem 3 is a problem of optimizing model parameters for successfully describing how a given observation result is obtained. This is a “learning” problem and is also referred to as a learning phase. By solving this problem, it is possible to create a model that best matches the actual event, that is, to optimally match the model parameters to the observed learning data.
[0045]
In order to obtain a handwritten character recognition method, it is necessary to create an HMM for each character. For this purpose, it is necessary to first solve the problem 3 by estimating the parameter values of each character model. Moreover, it is necessary to classify the input handwriting used for learning and to correspond to a certain state sequence while understanding the physical meaning of the state used in the model. This is nothing less than solving problem 2 and adjusting the number of states, preprocessing methods before HMM creation, and other modeling details to make the model better. Finally, by solving the above-mentioned problem 1 using a model created corresponding to each character, character recognition, that is, evaluation of the most likely model is performed.
[0046]
We will see how the HMM concept outlined above is applied in the handwritten character recognition method of the present invention.
[0047]
(V_1k, v_2l ), k = 1,2, l = 1,2,…, L_MIs the observable sign v in the general description above₁Can be considered as a vector quantity corresponding to. This observation is a hidden Markov model of a character
[Expression 14]

Is defined by the following quantities:
[0048]
State Q (t)
[Expression 15]

And its transition probability
[Expression 16]

Initial probability distribution of states
[Expression 17]

Output probability
[Expression 18]

[0049]
Considering the existence of these hidden (unobserved) states Q (t) and output probabilities characterizes the Hidden Markov Model (HMM) over a mere Markov model. That is, the hidden state Q (t) at a certain time t is the state q₁, q₂,…, Q_NOne of the states. These states can be appropriately selected for a system (here, a character or its part or graphic) to be expressed by an HMM. As will be described in more detail later, these states are considered to represent states that generally exist.
[0050]
And the state q with Q (t)_iObservable state V from this state qi₁Numeric value representing (t) v_1kProbability of transition to b¹ _ikIt is. In contrast, the transition probability a_ijIndicates the transition probability from state j to state i, and is given here as for the hidden state, but its meaning is the same as the transition probability in the ordinary Markov model.
[0051]
Numeric string that represents the character that was actually entered or a part of it (corresponding to the code string above)
[Equation 19]

Is a template character that is already stored
[Expression 20]

Is given, this character
[Expression 21]

Is given by the following equation.
[Expression 22]

This equation can be said to define a hidden Markov model, and is a starting point for character learning and recognition in the present invention.
[0052]
Learning is based on input data,
[Expression 23]

These parameters and N are determined. The recognition is to determine from which character the input preprocessed data O (t) is generated based on the HMM including the learned parameters and based on an appropriate standard.
[0053]
In general, it is rare that a certain paradigm works effectively on a specific problem. The present invention is not an exception, and good results can be obtained by devising constraint conditions specific to a given problem. First, as an example, the following constraint condition is attached to the hidden Markov model itself. However, the present invention is not limited to this constraint condition, and various different constraint conditions can be attached. The constraint condition used here as an example is
[Expression 24]

It is what. This is a_ij Is constrained to the following form.
[Expression 25]

further,
π = (1, 0, ..., 0)
And That is, the initial state Q (1) is always q₁It is. Constraining the transition probability as described above assumes that the transition probability from a state where the index is 2 or more is smaller or larger is zero, and the possibility that the index will move to the next higher state is not zero. . By assuming in this way, the causality and continuity regarding the time of the preprocessed data O (t) can be maintained. Slightly different, but for similar constraints, if i <j_ij= 0, a otherwise_ijIt can also be considered that takes a positive value. Furthermore, i <j, i> j + Δ, a_ijA condition of = 0 is also possible (where Δ is a positive integer). The initial state is q₁It becomes clear from the “association” between the state and the observed numerical sequence described below with respect to learning.
[0054]
[Character recognition]
Hereinafter, a recognition and learning method that takes into account the special features of online handwritten character recognition will be described. Recognition and learning in the Hidden Markov Model are two-sided, but here, recognition will be described first.
[0055]
First, a hidden Markov model (HMM) for each character to be recognized by learning described later.
[Equation 26]

Prepare at least one of
[0056]
Then, for each HMM, the probability (probability) of an observation value O (t) given (preliminary processing is described below) is given. Then, the HMM giving the highest probability value is determined as the most likely character. This corresponds to the basic problem 1 of the hidden Markov model described above. That is, given a certain observed value sequence O = O (1) O (2) O (3)... O (T) and HMM, the problem of efficiently calculating the probability of obtaining the observed value sequence is given. is there.
[0057]
For efficient calculation, first, α_i(t) is defined as follows.
[Expression 27]

[0058]
Assuming that an HMM is given, a partial observed value sequence of O (1), O (2),..., O (t) up to time t is obtained, and Q (t ) Means the probability of This α_i(t) can be solved as follows.
[0059]
(1) Start step
α_i(1) = π_ib_i(O (1)) i = 1,…, N
(2) Guidance step
[Expression 28]

(3) End step
[Expression 29]

[0060]
First, in the start step, the state q_iAnd initial observed value O₁The forward direction probability is initialized as the connection probability. Where b_i(O (1)) is the output probability to O (1). Then, in the guidance step, N possible states q at time t_iFrom (1 ≦ i ≦ N), how is the state q at time t + 1_jThink about what you can reach. That is, α_i(t) is observed as O (1), O (2), ..., O (t), and the state q at time t_iVia state q at time t + 1_jThe joint probability of reaching_i(t) a_jiIt becomes. N possible states q_iFor all of, the sum of this product at time t contains q all the previous partial observations associated with it, at time t + 1_jCan be obtained. This gives q_jIf we know the probability of, state q_jBy taking O (t + 1) into account, ie b_jBy multiplying the output probability of (t + 1) by the above sum, α_jIt can be easily seen that (t + 1) is obtained. The calculation of the guidance step is performed for all values of the index j (1 ≦ j ≦ N) indicating the state for a given time t. This is repeated for t = 1, 2,..., T-1. Finally, in the end step, the probability that an observed value sequence O = O (1), O (2),._jIt is calculated as the mere sum of (T).
[0061]
The above is a general method for calculating the probability when an HMM and an observation value sequence are given. The character corresponding to the HMM that gives the highest probability value for a certain observed value string, that is, certain input handwriting data, should be the character recognition answer. The result of this calculation corresponds to Equation 22. However, in the method of the present invention, q as performed in the above end step._i ^N _{t = 1}The HMM for the most probable character is considered as the HMM having the maximum probability represented by the following equation without performing the complete marginalization. Therefore, the end step of taking the sum in the above equation 29 is not performed.
[0062]
[30]

Here, arg max means that the index of the HMM that takes the maximum value is calculated.
[0063]
Here, as mentioned above, all hidden states q_i ^N _{t = 1}Q (T) = q_NThe constraint condition is attached. If complete marginalization is performed, Q (T) should not remain in the above equation. In other words, Q (t) at time T is the final state of the HMM q_NThe probability is calculated by forcibly fixing to. The reason for attaching such a constraint condition will be described below.
[0064]
For example, consider the Chinese characters “mouth” and “article”. Now, the writer who performs pen input enters "mouth" into an input device such as a tablet. Symbol string corresponding to this input information
[31]

Is obtained. However, since “mouth” is a subset of “goods”, it is an HMM of “goods”.
[Expression 32]

For
[Expression 33]

In many cases, P (i) includes at least one having a considerably large value. That is, for one or more values of i, the value of P (i) can be quite large. Therefore, as a result of marginalization by the following equation, a considerably large value may be obtained, which leads to misunderstandings and must be avoided.
[Expression 34]

[0065]
Therefore, let Q (T) at time T in the final state be q_iRather than taking the sum of the different values of i, the HMM's q_NIt will be restrained. Thereby, the above misidentification can be drastically reduced. For the same reason, it is possible to prevent misidentification between “one”, “two”, “three”, “tree”, “forest”, “forest”, and the like.
[0066]
Next, learning in the HMM will be described. It should be noted here that the well-known HMM learning method Baum-Welch method uses a marginal likelihood for given learning data {O (t)}.
[Expression 35]

This is a method of maximizing the gradient by calculating with respect to several parameters and performing “hill climbing”. Starting from a point in the parameter space, there is no guarantee that this marginal likelihood is convex with respect to a certain parameter. Therefore, the problem of the local optimum solution is serious, and in addition, there is a problem that a large amount of calculation is required before convergence. For example, even if the number of educational kanji characters is about 881, and the learning set is several tens, it takes a lot of time and is not practical. The method according to an embodiment of the present invention described below does not require iterative calculation as in the Baum-Welch method.
[0067]
In the embodiment of the present invention, it is assumed that learning data is given C set for a certain Chinese character. That is, the c-th learning data is represented as O_c(t)^Tc _{t = 1}If the value of c is from 1 to C, the data set is
[Expression 36]

It can be expressed as. First of all, the following processing is performed on the first set.
[0068]
[Step 1: Calculation of transition probability based on first data set]
First data set
[Expression 37]

Against
(i) V₁the time when the value of (t) changes, i.e. when the pen up or down state changes, and
(ii) θ which is a positive threshold given in advance₀V₂When the change in angle represented by (t) exceeds the threshold,
For each point in time, put a “break” and consider the state between one break and the next break as one state, and state q_iTo correspond.
Alternatively, the threshold value θ₀Without considering V₁It is also possible to insert a “break” at each time when the value of (t) changes, that is, every time when the pen up or down state changes.
[0069]
For example, for O (t) obtained for the input character “Thu” above,
[Formula 38]

It becomes. In other words, the vertical line represents the “break” added here. Accordingly, an HMM that learns {O (t)} representing input character information having the number of strokes K has at least 2K−1 states, and the change in the angle of the handwritten data compressed as described above is described above. Threshold value θ₀The number of states increases every time This threshold θ₀Can be obtained empirically over a wide range of quantization angles greater than or equal to 120 degrees. Here, the “number of strokes” broadly means both the official number of strokes adopted in ordinary Japanese language dictionaries and kanji dictionaries and the number of strokes when handwritten characters are handwritten.
[0070]
In this way, adding a “break” and adding a state when the angle of the handwriting changes greatly, as well as when the pen is up or down, is written with a single stroke to recognize continued characters. This is because, even if it is, the one with a large bend is divided into an object of learning and recognition. For example, in the case of a handwritten “tree” character as shown in FIG.₁And q₂The horizontal bar and vertical bar corresponding to the two states₁(That is, (4, 11) and (2, 11), which are the fourth and fifth elements of Equation 38, become (1, 11) and (1, 11) indicating pen down), etc. Since the model does not reflect the character structure at all, it needs to be divided as shown in Equation 39.
[39]

In reality, it is in a pen-down state, but it corresponds to the up state from the top of the character form.₂(Also q corresponding to the up state_FourAnd q₆Since the parameters are adjusted so that the down state can be output with a certain probability (smoothing procedure described below), the HMM in which the number 38 is registered corresponds to the number corresponding to the character shown in FIG. Probability P (O₂The state transition Q that maximizes | H) is obtained in the manner described in detail later.
[0071]
Next, the state transition probability a defined in the general description of the HMM above._ijAnd determine the initial probability distribution π of the state.
In the delimited state sequence obtained in the above step, each state q_iContains several number pairs. For example, in the above example, q₁Is a pair of numbers, q_ThreeContains 5 number pairs. Let n (O₁, q i) And the above state transition probability a_ijAnd the initial probability distribution π of the state is defined as follows.
[Formula 40]

[0072]
Where a_ijIs determined as described above because some learning data O₁q in (t)_iThe number of numeric pairs corresponding to is n (O₁, q i), So under the constraint condition defined by the formula 25, q_jTo q_iThe transition to (i = j or i = j + 1) occurs simply in proportion to the number of numerical pairs without depending on the nature of the particular state.
[0073]
The initial state distribution π is determined as described above.₁It is clear from the definition of the state to start from.
[0074]
Also, I want to point out that O₁V as the second component of (t)₂(t) is only angle information and is the first component V₁(t) only represents information about pen on / off, but this hidden Markov model also includes length information. That is, {O₁(t)} is derived based on a certain reference length, and each state q_iNumber of repetitions of numerical pairs in n {O₁, q i} Is included.
[0075]
Next, the set of output probabilities {b¹ _ik} And {b² _ik} Value of each element must be defined, but the state q_iN (O corresponding to₁, q i) Of number pairs
V₁(t) = v_1kN (O₁, q i_,v_1k),
V₂(t) = v_2kN (O₁, q i v_2k)age,
[Expression 41]

And That is, for a total number of numeric pairs, a specific V_ikAre counted as the output probability.
[0076]
As above_ij, Π, b_ijHowever, there is no positive reason why it must be determined in this way. Other definitions can be adopted.
[0077]
[Step 2: Smoothing]
{A obtained as above_ij , {B¹ _ik}, {B² _ik} Is the first data {O₁Since it is determined only from (t)}, an extreme overfit, that is, a phenomenon in which only a specific handwriting is recognized but other handwriting cannot be recognized well for the same character. This overfit problem may be solved naturally by using handwritten example data of thousands of characters for a single character and an appropriate algorithm, but considering the time required for learning, The number of examples will probably be at most tens. Therefore, the regularization (regularization) concept used to eliminate the overfit is used here. In other words, the {a_ij , {B¹ _ik}, {B² _ik} Is subjected to a process called appropriate smoothing. The purpose of smoothing is within the bounds of Equation 36, a_ijAnd b¹ _ik, B² _ikThis is to prevent the value of zero from becoming zero. There are several ways to do this, but here we will take the simplest and typical example below. Needless to say, the scope of the present invention is not limited to the following examples. While smoothing like oysters can be done easily, it is extremely effective as will be seen later.
[0078]
Smoothing procedure A
{A_ij} And {b¹ _ik} Is corrected by the following equation. OLD is the above, and the one with NEW is newly defined by smoothing.
[Expression 42]

[0079]
Smoothing procedure B
{B² _ik} Is corrected by the following equation. As in the case of procedure A, OLD is obtained above, and the one with NEW is newly defined by smoothing.
[Expression 43]

Where w_1nIs
(44)

Chosen to meet.
[0080]
The above procedure A is so-called flooring, and is intended to avoid an element having a value of zero within the range of the constraint condition of Formula 25. In the experiment described later, good results are obtained when ca and cb are 0.7 to 0.9.
In addition to the flooring, the procedure B is a procedure for outputting a vector (the numerical value pair) in a direction close to a direction with a high output probability with a certain probability. w_1nThere are several possible ways to choose. In the numerical experiment shown below,
[Equation 45]

Was used. g (l, n) is v_2lAnd v_2n(See FIG. 6), and f (θ) is a normalization constant Z (α, σ) by adding (1-α) / 2π as a flooring to the Gaussian distribution in the interval (−π, π). The value divided by._MIs a value indicating both indices when angle information is quantized as described above.
[0081]
Α and σ are values that can be appropriately selected based on experience, and effective results can be obtained within the ranges of 0.7 ≦ α ≦ 0.9 and π / 16 ≦ σ ≦ π / 6, respectively. FIG. 7 shows an outline of f (θ).
[0082]
[Step 3: Learning based on multiple data sets]
So far, the creation of a hidden Markov model based on the first data has been described. However, in order to improve the recognition rate, it is desired to perform learning using a plurality of data sets. Next, the second and subsequent data sets {O_cWe describe a learning method based on (t)}, c = 2, ..., C. Here, C is a positive integer representing the number of data.
[0083]
Maximum likelihood state transition {O_c(t)} is obtained. A learning data set {O_c(t)^Tc _{t = 1}Of the learning data for each value of c = 2, ..., C
[Equation 46]

T = T in order_cTo t = 1, maximum likelihood state transition Q_c(t)^Tc _{t = 1}Can be requested. That is, first, according to the first equation of Formula 46, Q_c(T_c) Is obtained, it is substituted into the second equation and Q_c(T_c-1), Q c (T_c-2), ..., Q c (1) is obtained. Then, for each c value of c = 2,..., C (that is, for the 2nd to Cth data), in the same way as for the first data, n (O_c, q_i), N (O_c, q_i, v_1k), N (O_c, q_i, v_2l) Can be requested.
[0084]
Equation 46 can be solved using the well-known Viterbi algorithm. For example,
[Equation 47]

When the HMM outputs an observation sequence of O = {3, 3, 1, 2, 3}, the optimal state transition is
[Formula 48]

Q (5) = q_Three
[Equation 49]

(A = {a_ij}, B = {b_ij}), Q (4) = q_ThreeIn this way, Q = {q₁, Q₁, Q₂, Q_Three, Q_Three}.
[0085]
By combining the result based on the first data set and the result obtained based on the second to Cth data sets obtained as described above, an average of C data is obtained by the following equation. be able to.
[Equation 50]

[0086]
[Step 4: Create models with different stroke order]
In learning so far,
[Formula 51]

The number of matches the number of categories of recognition. For example, when the recognition of educational kanji 881 characters is performed, the number of models is 881. However, the learning data includes data that is not suitable for learning by the same model, such as data written in different stroke order even with the same character, or data that is significantly deformed.
[0087]
For example, when there are dozens of sets of learning data for the same character, how many and how to make an HMM is a big problem. It is almost impossible to visually check the stroke order and deformation in the entire data set to determine whether another HMM should be created. Therefore, it is necessary to examine a method for automatically making such a determination.
[0088]
The method described below is an automated method based on certain statistics of each data and is effective as will be demonstrated later.
[0089]
To illustrate this method, recall the equation of Equation 30,
[Formula 52]

Pay attention to. This is the logarithm of the recognition evaluation standard (formula 30). For a typical data set of 881 educational kanji characters, equation 48 is applied to all HMMs, T₁FIG. 8 is a plot with the horizontal axis.
[0090]
To be precise, there is one data set for each character,
[Equation 53]

Although it should be, etc., it was expressed simply to avoid complication of symbols. It should be noted that the formula of Formula 52 is T₁Is to ride a nearly perfect straight line.
[0091]
In the following procedure, the number 52 is changed to T₁The data set {O for HMM and each c value of c = 2, ..., C_c(t)^Tc _{t = 1}}, Which can also be referred to as relative similarity, is calculated, and whether or not a new HMM is automatically created is determined based on this.
[0092]
That is, the HMM (Hidden Markov Model) obtained from the first data in Step 1 is
[Formula 54]

age,
[Expression 55]

If the new HMM is
[56]

Let's make Where r_thIs an empirically required value. The denominator q_NNote that is based on the first data set and not the c-th data set.
[0093]
That is, according to Equation 55,
[Equation 57]

T₁In the denominator
[Formula 58]

T_cCompare the slope with respect to this threshold r_thIf they are different from each other, it is determined that the similarity is low even for the same character, and a new HMM is created.
Since the method described above does not require the repeated calculation as seen in the Baum-Welch algorithm, it can be calculated in a short time. Therefore, this method can be called a fast HMM method.
[0094]
[Speeding up]
In the character recognition method described above,
[Formula 59]

Is calculated for all HMMs. In particular,
[Expression 60]

Then,
[Equation 61]

Is obtained.
[0095]
If you execute this as it is,
[62]

It is obtained in the order of. This includes a lot of waste. From the constraints of the proposed method, when t <i, α_iWhen (t) = 0 and t> T-N + i, α_i(t) is α_N(T) is not affected.
[0096]
Therefore, in these cases, α_iThere is no need to calculate (t). If t <i, α_i(t) = 0, so when T <N, α_N(T) = 0.
[0097]
Such consideration further reduces the amount of calculation and enables faster character recognition. Although this speeding up is arbitrary, it is desirable to shorten the recognition processing time.
[0098]
The flow of each step of character recognition and learning described above is shown in FIGS. 9 and 10, respectively. FIG. 9 shows the overall flow of the character recognition phase. FIG. 10 illustrates the overall flow of the learning phase.
[0099]
【Example】
[Recognition experiment]
Online handwritten character database (National Institute of Agriculture and Technology, kuchibue-d-96-02) (Masaki Nakagawa et al., "Online handwritten character pattern collection and use without sentence format fonts", IEICE Technical Report, PRU 95-110, pp.43-48 (1995)). FIG. 9 shows an example of only a part of the character data used. Here, only educational kanji 881 characters were targeted.
[0100]
As learning data, mdb0006 to mdb0030 of kuchibue-d-96-02 and six types of educational kanji data prepared separately, a total of 31 types of data sets were used. After learning, using 5 types of mdb0001 to mdb0005 of kuchibue-d-96-02 as evaluation data, if there is a correct recognition result among the recognition rate that the first candidate is correct and the third candidate, the correct answer The recognition rate (third candidate rate) to be calculated was calculated.
[0101]
The results are shown in Table 1 as examples.
[Table 1]

[0102]
It can be seen that an extremely high recognition rate is obtained. The average recognition rate for educational kanji 881 characters was 89.3%, and the cumulative recognition rate up to 3rd place was 95.3%. Here, the value of each parameter is θ^*= 45 degrees, l in a 240 x 240 space^*= 8. Furthermore, the value of the smoothing parameter used is c_a= c_b= 0.8, α = 0.9, and σ = π / 16. L for quantization₀Was 60. Angle quantization is L_M= 16. Here, θ for delimiter assignment₀Was not considered. That is, only the break corresponding to the pen up / down was not included, and the break corresponding to the angle change was not included. R_thThe value of was 0.8.
[0103]
As a result of learning such parameter values using the same character data set as in the above embodiment and recognizing the same character data set by the method disclosed in Japanese Unexamined Patent Publication No. Hei 8-101890 already reported. As shown in Table 1 as a comparative example, the average recognition rate was 8699%, and the cumulative recognition rate up to the third place was 92.59%. It can be seen that the recognition rate is improved in the method of the present invention.
[0104]
The programs used in these examples and comparative examples are C and C⁺⁺It was written by. As a result of running the program of the above example on MS-WINDOWS 3.1 on a DOS / V machine of Pentium 120 MHz, the recognition of 881 educational kanji characters (mdb0001) was completed in 8 minutes 54 seconds. Since it is known that the same program runs at about three times the speed in WINDOWS 95, it is considered that recognition is completed in about 3 minutes on WINDOWS 95. This is the result of the above-described speed-up. On the other hand, similar character recognition by the comparative example took about 15 minutes on average on WINDOWS95. It can be seen that the recognition speed of the method of the present invention is considerably improved compared to the method according to the prior art. Therefore, the method of the present invention can be operated even in a system that is less expensive and consumes less power.
[0105]
【The invention's effect】
As described above, according to the method of the present invention, the recognition rate of continuous characters and characters whose stroke order has been difficult to recognize has improved, and at the same time, the gradient calculation and hill climbing are repeated to achieve the maximum likelihood state. The instability of the result due to the enormous amount of calculation and the local maximum can be avoided as in the method for obtaining. In addition, since the recognition speed is improved, even a simpler system can perform character recognition at high speed.
[Brief description of the drawings]
FIG. 1 Three data points and Δθ_iIt is a figure which shows the definition of (i = 1,2).
FIG. 2 Four data points and Δθ_iIt is a figure which shows the definition of (i = 1,2,3).
3 is similar to FIG. 2, but Δθ_ThreeIs the threshold θ^*FIG.
FIG. 4 shows an example of handwritten raw data of a character “tree”.
5 shows data obtained by pre-processing the raw data in FIG.
FIG. 6 shows an example of a direction quantization pattern.
FIG. 7 shows a general form of f (θ) in Equation 44.
FIG. 8 shows the distribution of logP as a function of T.
FIG. 9 is a flowchart of a recognition phase according to an embodiment of the present invention. Here, the HMM represents a “hidden Markov model”.
FIG. 10 is a flowchart of a learning phase according to an embodiment of the present invention. Here, the HMM represents a “hidden Markov model”.
FIG. 11 shows an example of handwritten character data (kuchibue-d-96-02) used in a recognition experiment.

Claims

An on-line handwritten character recognition method based on time-series data representing the movement of handwriting coordinates of handwritten characters and the state of up or down of the input device surface of the input pen using a hidden Markov model,
Whether the angle between the first and second line segments connecting the adjacent data points included in the time series data or the length of the line segment is greater than or equal to a predetermined threshold from the given time series data If it is less than the threshold value, the data point connecting the first line segment is extracted as a feature point, and if it is less than the threshold value, the data point connecting the line segment is characterized. by not a point, and extracts a feature point on the basis of the angle information and distance information, the feature point extraction data compression step of compressing the time series data,
Quantizes the angle between adjacent lines connecting the feature points, and expresses the length of the line as the number of repetitions of the quantized angle and two-dimensional data corresponding to the up or down of the pen. A quantization step for creating two-dimensional time series data including one-dimensional data values ;
This time- series data is divided based on the pen-up and pen-down state changes, or based on the pen-up and pen-down state changes and predetermined conditions regarding the angle, and 1 of the data sandwiched between the delimiters. A corresponding step for corresponding a group of pieces to one state in a hidden Markov model;
A probability calculation step for calculating the probability that the quantized time-sequential data is obtained under a plurality of hidden Markov models determined in advance corresponding to the characters to be recognized. Including
A character recognition method in which the character corresponding to the hidden Markov model having the maximum probability is the most probable character.

Transition probability a _ij from state j to state i among N states in the hidden Markov model is constrained to zero when i = j and i = j + 1, and further a _NN is constrained to 1, The character recognition method according to claim 1, wherein the initial state is fixed to q _i .

In the learning phase, the transition probability a _ij between the states in the hidden Markov model is excluded of a _NN that is constrained to 1, and the symbol string of the quantized data in each of the data sets sandwiched between the delimiters is The character recognition method according to claim 2 , wherein the character recognition method is obtained based on a number.

In the learning phase, an output probability from each state in the hidden Markov model is obtained based on the number of symbol strings of the quantized data in each collection of data sandwiched between the delimiters. 4. The character recognition method according to 2 or 3 .

In the learning phase, in order to avoid a limited number excessive overfitting the data generated by using the learning of, any of the four preceding claims 1, wherein the performing the smoothing process on the transition probabilities and output probabilities The character recognition method described in 1.

In the recognition phase, all the states are not completely marginalized, and the state at the final time is constrained to the last state of the hidden Markov model, and the probability of certain time series data for the hidden Markov model is calculated. The character recognition method according to any one of claims 1 to 5 .

In the learning phase, when there are multiple sets of learning data for the same character, the probabilities for the data of the first data set of the first hidden Markov model created by the first data set are completely marginalized for all states The state at the final time is constrained to the last state of the hidden Markov model, and the logarithmic value of the probability is divided by the final value of the time of the data string to obtain the first division result. The probability for the data of the second data set of the first hidden Markov model is calculated by constraining the state at the final time to the last state of the hidden Markov model without performing full marginalization for all states. The logarithm value is divided by the final value of the time of the data string to obtain the second division result, and then the value obtained by dividing the first division result by the second division result is a predetermined positive threshold. If larger, based on the data of the second data set, the character recognition method according to any of claims 1 to 6, characterized in that to create a second hidden Markov model.

A storage medium storing a computer program for carrying out the character recognition method according to any one of claims 1 to 7.