JP2007072718A

JP2007072718A - Handwritten mathematical expression recognizing device and recognizing method

Info

Publication number: JP2007072718A
Application number: JP2005258423A
Authority: JP
Inventors: Shigeki Sagayama; 茂樹嵯峨山; Haruka Yamamoto; 遼山本; Jun Yamamoto; 隼山本; Takuya Nishimoto; 卓也西本
Original assignee: University of Tokyo NUC
Current assignee: University of Tokyo NUC
Priority date: 2005-09-06
Filing date: 2005-09-06
Publication date: 2007-03-22

Abstract

<P>PROBLEM TO BE SOLVED: To recognize a handwritten mathematical expression by adopting a stochastic approach. <P>SOLUTION: This handwritten mathematical expression recognizing device comprises: an input means for reading the mathematical expression; a means for calculating the stroke likelihood of a stroke candidate constituting the mathematical expression; a means for expressing the inputted mathematical expression by the use of a plurality of mathematical expression candidates constituted of the plurality of stroke candidates; a means for calculating a structure likelihood concerning position relation between mathematical expression components which constitute the inputted handwritten mathematical expression by the use of stochastic context-free grammar; a means for calculating the likelihood of each mathematical expression candidate by the use of the calculated stroke likelihood and the structure likelihood; and a means for determining the mathematical expression based on the likelihood of each calculated mathematical expression candidate. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、手書き数式の認識装置及び方法に関するものである。 The present invention relates to a handwritten mathematical expression recognition apparatus and method.

現在コンピュータへの数式入力は、TeXやmatlabなどの専用言語によるテキスト記述やマウスで操作する数式エディタ等が主である。しかしこれらの入力方法はトレーニングが必要であり直感的ではない。ペンタブレット等による直感的な数式の入力が実現されれば、論文等への数式入力、数式検索や数値計算ソフトの入力インタフェースなど教育研究分野において幅広い応用の可能性がある。 Currently, mathematical expressions are input to computers mainly by text descriptions in dedicated languages such as TeX and matlab, and mathematical expression editors operated with a mouse. However, these input methods require training and are not intuitive. If an intuitive mathematical expression can be input using a pen tablet or the like, there is a wide range of applications in educational and research fields, such as mathematical expression input to papers, mathematical expression search, and numerical software input interfaces.

オンライン手書き数式認識に関しては多くの既存研究がある（非特許文献１）。これらの研究では数式の認識をシンボル認識と構造認識の２段階に分けて行う。まず入力ストローク系列をシンボル単位に分割してシンボル認識を行う。そして認識されたシンボル同士をそれぞれの位置関係を用いたルールにより結合し数式を構成する。 There are many existing studies on online handwritten mathematical expression recognition (Non-Patent Document 1). In these studies, mathematical expression recognition is performed in two stages: symbol recognition and structure recognition. First, symbol recognition is performed by dividing the input stroke series into symbols. Then, the recognized symbols are combined according to a rule using their positional relationship to form a mathematical formula.

ところが、数式は必ず厳密な文法に沿って記述されるが、数１に示すように手書き数式はその揺らぎのため必ずしも一通りに解釈されるとは限らない。数１は、数式認識の曖昧性を示す例であって、２ｘなのか２^ｘなのかは人間にとってすら必ずしも明らかではない。
However, mathematical formulas are always written according to a strict grammar, but handwritten mathematical formulas are not always interpreted in a single way because of their fluctuations, as shown in equation (1). Number 1 is an example showing the ambiguity formula recognition, or 2x of the or 2 ^x thing is not always obvious even for humans.

またルールベースを用いた既存手法においてはストローク認識と構造認識は完全に分離した問題として扱われていたが、数２に示すように構造の認識とストロークの認識は切り離すことができない問題である。数２は、構造の変化がストローク認識に影響を与える例であり、中央部分は左式では括弧“)(”である可能性があるが、右式では“x”と解釈するのが自然である。
K.F.Chan,D.Y.Yeung: “Mathematical Expression Recognition: A Survey” ‘Int. J.DocumentAnal., vol.3,no.1,pp.3.15,2000. In the existing method using the rule base, the stroke recognition and the structure recognition are treated as completely separated problems. However, the structure recognition and the stroke recognition cannot be separated as shown in Equation 2. Equation 2 is an example in which a change in structure affects stroke recognition. The center part may be parentheses “) (” in the left expression, but it is natural to interpret it as “x” in the right expression. is there.
KFChan, DYYeung: “Mathematical Expression Recognition: A Survey” 'Int. J.DocumentAnal., Vol.3, no.1, pp.3.15,2000.

本発明は、従来のルールベースのアプローチに対し、確率的なアプローチを採用することで、手書き数式を認識することを目的とするものである。本発明の他の目的は、ストローク認識と構造認識とが密接に関連しているという立場に立脚して、手書き数式を認識することにある。 The present invention aims to recognize handwritten mathematical expressions by adopting a probabilistic approach to the conventional rule-based approach. Another object of the present invention is to recognize handwritten mathematical expressions based on the position that stroke recognition and structure recognition are closely related.

本発明は、手書き数式認識においては、様々な数式候補の中で尤度が最大となるものを求めるという確率的な認識が本質的であると考え、ストローク尤度と構造尤度の積を最大化する数式候補を求める問題として定式化する点に特徴を有する。ストローク尤度と構造尤度の積が最大となる数式候補を求めるにあたり、ストローク尤度は既存手法のモデルで計算することができ、構造尤度は確率文脈自由文法により計算することができる。また、本発明に係る構造尤度の計算においては、数式部品同士の位置関係により、数式生成規則の適用確率が決定される。１つの好ましい態様では、数式部品同士の位置関係を評価するための位置特徴量として、数式部品が書かれようとした領域（隠れ筆記領域）の尤度分布が用いられる。また、観測された各ストロークの外接矩形を位置特徴量として構造尤度を算出してもよい。数式候補の尤度は、CYK-algorithmにより効率的に計算可能である。本発明に係る手書き認識は、手書き数式認識装置、手書き数式認識方法、手書き数式をコンピュータに実行させるためのコンピュータプログラムあるいは当該プログラムを記録したコンピュータ読み取り可能媒体として構成される。以下に本発明の具体的な構成を説明する。 The present invention considers that, in handwritten mathematical expression recognition, probabilistic recognition of finding the maximum likelihood among various mathematical expression candidates is essential, and the product of stroke likelihood and structure likelihood is maximized. It is characterized in that it is formulated as a problem for obtaining mathematical expression candidates to be converted. In obtaining a mathematical expression candidate that maximizes the product of the stroke likelihood and the structure likelihood, the stroke likelihood can be calculated by a model of an existing method, and the structure likelihood can be calculated by a probability context free grammar. Further, in the calculation of the structure likelihood according to the present invention, the application probability of the formula generation rule is determined by the positional relationship between the formula parts. In one preferable aspect, a likelihood distribution of an area (hidden writing area) in which a mathematical expression part is to be written is used as a position feature amount for evaluating the positional relationship between the mathematical expression parts. Alternatively, the structure likelihood may be calculated using the circumscribed rectangle of each observed stroke as the position feature amount. The likelihood of a mathematical formula candidate can be efficiently calculated by CYK-algorithm. The handwriting recognition according to the present invention is configured as a handwritten mathematical expression recognition device, a handwritten mathematical expression recognition method, a computer program for causing a computer to execute a handwritten mathematical expression, or a computer-readable medium recording the program. The specific configuration of the present invention will be described below.

本発明に係る手書き数式認識装置は、手書き数式を読み込む入力手段と、数式を構成するストローク候補のストローク尤度を算出する手段と、入力された数式を、複数のストローク候補から構成される複数の数式候補で表現する手段と、入力された手書き数式を構成する数式部品間の位置関係に関する構造尤度を、確率文脈自由文法を用いて算出する手段と、算出されたストローク尤度及び構造尤度を用いて各数式候補の尤度を算出する手段と、算出された各数式候補の尤度に基づいて数式を決定する手段と、からなる。１つの態様では、数式を決定する手段は、尤度が最大の数式候補を数式とするものである。他の態様では、数式を決定する手段は、尤度が大きい複数の数式候補を抽出する手段と、抽出された複数の数式候補から１つの数式候補を選択する手段と、からなる。 The handwritten mathematical expression recognition apparatus according to the present invention includes an input means for reading a handwritten mathematical expression, a means for calculating stroke likelihood of stroke candidates constituting the mathematical expression, and a plurality of input mathematical expressions that are composed of a plurality of stroke candidates. Means for expressing with mathematical expression candidates, means for calculating the structural likelihood related to the positional relationship between mathematical parts constituting the input handwritten mathematical expression using the probability context free grammar, and the calculated stroke likelihood and structural likelihood And means for calculating the likelihood of each mathematical expression candidate and means for determining the mathematical expression based on the calculated likelihood of each mathematical expression candidate. In one aspect, the means for determining the mathematical expression is a mathematical expression candidate having the maximum likelihood. In another aspect, the means for determining a mathematical expression includes a means for extracting a plurality of mathematical expression candidates having a high likelihood, and a means for selecting one mathematical expression candidate from the plurality of extracted mathematical expression candidates.

１つの好ましい態様では、前記構造尤度算出手段は、観測された各ストロークの外接矩形と、当該ストロークの生成規則の候補ごとに、当該ストローク生成規則候補に基づいて、当該ストロークが書かれようとした領域の尤度分布を算出する手段と、算出された尤度分布を各ストロークの位置特徴量として、ストロークを含む数式部品間の位置関係により適用確率が決定される確率文脈自由文法により構造尤度を算出する手段と、を有する。１つの好ましい態様では、外接矩形を評価するのは数式部品の中でもストロークのみなので、ストロークについては、その隠れ筆記領域尤度分布が、実際の外接矩形と、ストローク生成規則(終端記号生成規則)の候補(＝そのストロークが何であるかの候補)、の二つにより決まり(P(b|a,F))、そのほかの数式部品については、その数式部品の生成規則(非終端記号生成規則)により決まる(P(a_l,a_r|a_p,R))。 In one preferred aspect, the structural likelihood calculating means is configured to write the stroke based on the stroke generation rule candidate for each observed circumscribed rectangle of the stroke and the generation rule candidate for the stroke. The structure likelihood is calculated by means of a likelihood context free grammar in which the probability of application is determined by the positional relationship between mathematical parts including the stroke, using the calculated likelihood distribution as the position feature quantity of each stroke. Means for calculating the degree. In one preferred aspect, the circumscribed rectangle is evaluated only for the stroke among the mathematical parts, so the hidden writing area likelihood distribution for the stroke is the actual circumscribed rectangle and the stroke generation rule (terminal symbol generation rule). Candidate (= candidate of what the stroke is) is determined by two (P (b | a, F)), and other formula parts are determined by the rules for generating the formula parts (non-terminal symbol generation rules) (P (a _l , a _r | a _p , R)).

１つの好ましい態様では、前記ストローク数式部品が書かれようとした領域の尤度分布は、前記外接矩形の縦中心、縦サイズ、横始点、横終点のパラメータと、各ストローク生成規則毎に設定された確率変数とから算出される。前記装置は、複数の数式生成規則からなる文脈自由文法、および、数式部品間の位置関係によって決定される各数式生成規則の適用確率を格納する手段を有している。 In one preferable aspect, the likelihood distribution of the area in which the stroke mathematical component is to be written is set for each parameter of the circumscribed rectangle, the vertical center, vertical size, horizontal start point, horizontal end point, and each stroke generation rule. Calculated from the random variable. The apparatus includes a context-free grammar composed of a plurality of formula generation rules and means for storing the application probability of each formula generation rule determined by the positional relationship between the formula parts.

本発明に係る手書き数式認識方法は、手書き数式を読み込む入力ステップと、数式を構成するストローク候補のストローク尤度を算出するステップと、入力された数式を、複数のストローク候補から構成される複数の数式候補で表現するステップと、入力された手書き数式を構成する数式部品間の位置関係に関する構造尤度を、確率文脈自由文法を用いて算出するステップと、算出されたストローク尤度及び構造尤度を用いて各数式候補の尤度を算出するステップと、算出された各数式候補の尤度に基づいて入力された手書き数式を決定するステップと、からなる。手書き数式認識方法のより具体的な構成については、上記手書き認識装置に係る記載、及び、後述の実施の形態に記載されている。 The handwritten mathematical expression recognition method according to the present invention includes an input step of reading a handwritten mathematical expression, a step of calculating a stroke likelihood of a stroke candidate constituting the mathematical expression, and a plurality of input mathematical expressions composed of a plurality of stroke candidates. A step of expressing with mathematical expression candidates, a step of calculating a structural likelihood related to a positional relationship between mathematical parts constituting the input handwritten mathematical expression using a probability context free grammar, and a calculated stroke likelihood and structural likelihood. Are used to calculate the likelihood of each mathematical expression candidate, and the step is to determine an input handwritten mathematical expression based on the calculated likelihood of each mathematical expression candidate. About the more concrete structure of the handwritten mathematical formula recognition method, it describes in the description which concerns on the said handwriting recognition apparatus, and the below-mentioned embodiment.

本発明に係る手書き数式認識装置は、具体的なハードウェアとしては、入力手段、出力手段、記憶手段、演算手段、表示手段を備えたコンピュータから構成される。したがって、本発明に係る手書き数式認識は、手書き数式を認識するために、コンピュータを、手書き数式を読み込む入力手段と、数式を構成するストローク候補のストローク尤度を算出する手段と、入力された数式を、複数のストローク候補から構成される複数の数式候補で表現する手段と、入力された手書き数式を構成する数式部品間の位置関係に関する構造尤度を、確率文脈自由文法を用いて算出する手段と、算出されたストローク尤度及び構造尤度を用いて各数式候補の尤度を算出する手段と、算出された各数式候補の尤度に基づいて数式を決定する手段と、して機能させるためのコンピュータプログラム、あるいは、当該コンピュータプログラムを記憶したコンピュータ読み取り可能な媒体として構成される。手書き数式認識を実行するためのコンピュータプログラムのより具体的な構成については、上記手書き認識装置に係る記載を援用することができる。 The handwritten mathematical expression recognition apparatus according to the present invention includes, as specific hardware, a computer including an input unit, an output unit, a storage unit, a calculation unit, and a display unit. Therefore, in the handwritten mathematical expression recognition according to the present invention, in order to recognize the handwritten mathematical expression, the computer inputs input means for reading the handwritten mathematical expression, means for calculating the stroke likelihood of the stroke candidates constituting the mathematical expression, and the inputted mathematical expression. Means for expressing a plurality of mathematical expression candidates composed of a plurality of stroke candidates, and means for calculating a structural likelihood related to the positional relationship between mathematical parts constituting the inputted handwritten mathematical expression using a probability context free grammar And a means for calculating the likelihood of each formula candidate using the calculated stroke likelihood and structure likelihood, and a means for determining a formula based on the calculated likelihood of each formula candidate Or a computer-readable medium storing the computer program. For a more specific configuration of the computer program for executing handwritten mathematical expression recognition, the description related to the handwriting recognition device can be used.

また、本発明では、基本的にストロークを最小単位として扱うが、シンボルを最小単位として扱うことも可能である。ストローク、シンボルは確率文脈自由文法における終端数式部品であり、したがって、本発明が採用した他の技術手段は、手書き数式を読み込む入力手段と、数式を構成する終端数式部品候補の終端数式部品尤度を算出する手段と、入力された数式を、複数の終端数式部品候補から構成される複数の数式候補で表現する手段と、入力された手書き数式を構成する終端数式部品及び非終端数式部品を含む数式部品間の位置関係に関する構造尤度を、確率文脈自由文法を用いて算出する手段と、算出された終端数式部品尤度及び構造尤度を用いて各数式候補の尤度を算出する手段と、算出された各数式候補の尤度に基づいて入力された手書き数式を決定する手段と、からなる。この技術手段も、手書き数式認識装置、手書き数式認識方法、手書き数式をコンピュータに実行させるためのコンピュータプログラムあるいは当該プログラムを記録したコンピュータ読み取り可能媒体として構成される。1つの態様では、手書き数式入力を分割する最小単位がストロークであり、前記終端数式部品は、ストロークである。他の態様では、手書き数式入力を分割する最小単位がシンボルであり、前記終端数式部品は、シンボルである。 In the present invention, a stroke is basically handled as a minimum unit, but a symbol can be handled as a minimum unit. Strokes and symbols are terminal formula parts in a stochastic context free grammar. Therefore, other technical means adopted by the present invention are input means for reading handwritten mathematical formulas and terminal formula component likelihoods of terminal formula component candidates constituting the mathematical formulas. Calculating means, means for expressing the input mathematical expression by a plurality of mathematical expression candidates composed of a plurality of terminal mathematical expression component candidates, and a mathematical expression including a terminal mathematical expression component and a non-terminal mathematical expression component constituting the input handwritten mathematical expression Means for calculating the structural likelihood related to the positional relationship between the parts using the probability context free grammar; means for calculating the likelihood of each mathematical expression candidate using the calculated terminal mathematical expression part likelihood and the structural likelihood; And means for determining an input handwritten mathematical expression based on the calculated likelihood of each mathematical expression candidate. This technical means is also configured as a handwritten mathematical expression recognition device, a handwritten mathematical expression recognition method, a computer program for causing a computer to execute a handwritten mathematical expression, or a computer-readable medium recording the program. In one aspect, the minimum unit for dividing the handwritten mathematical expression input is a stroke, and the terminal mathematical expression component is a stroke. In another aspect, the smallest unit for dividing the handwritten mathematical expression input is a symbol, and the terminal mathematical expression component is a symbol.

ここで、本発明を理解するのに必要な用語について説明する。
数式部品：ストローク数式部品とそれ以外の数式部品(文字、項、因数などをいい、本明細書では、上位数式部品と定義する)をいう。非終端記号生成規則に関わるものはすべて数式部品である。また終端記号生成規則においては左辺がストローク数式部品で、右辺は実際のストローク(これ自体は数式部品ではない)である。
親数式部品：各生成規則について、左辺にある数式部品(１つ)を親数式部品という。
子数式部品：各火終端記号生成規則について、右辺にある数式部品をいう(単数または複数である。生成規則がチョムスキー標準形で表されていれば常に２つである。)
ストローク：「ストローク」は実際の入力(サンプル点列)のペンダウンからペンアップまでを指す「入力ストローク・実際のストローク」の意味と、入力ストロークの認識候補としての「ストローク数式部品」の意味がある。入力ストロークは実際のサンプル点列であり、ストローク数式部品はシンボリックな情報である。終端記号生成規則により、ストローク数式部品から入力ストロークが生成される。
ストロークの生成規則：ストローク数式部品から、入力ストロークが生成される終端記号生成規則をいう。
ストロークの生成規則の候補：入力ストロークそれぞれについて、複数のストローク認識候補が有り得る。すなわちその入力ストロークが生成された終端記号生成規則は一意に特定することはできず、複数の終端記号生成規則候補を同時に考える必要がある。つまりある入力ストロークについて、それがマイナス→入力ストローク、という終端記号生成規則から生成されたかもしれないし、プラスの横棒→入力ストローク、という終端記号生成規則から生成されたかもしれない、と考える。それぞれの規則が適用された尤度はストローク尤度計算により行われる。
ストローク列：入力を入力ストロークの列と見た場合の表現である。
シンボル：数式部品の一部である。いわゆる「文字」である。単独で意味を持つ最小単位である。
終端数式部品：終端記号生成文法の親数式部品のことを指す。終端数式部品から、入力ストロークは生成される。本手法においてはストローク数式部品である。尚、必ずしもストロークを最小単位とはしない数式認識も可能である。何らかの前処理によって数式をばらばらのシンボルに分割することができれば、シンボルを最小単位にして同じような認識が行える。その場合、終端数式部品はシンボルとなる。
非終端数式部品：上位数式部品と言ったものと同じである。ストローク数式部品以外の数式部品(シンボルを最小単位とする場合は、シンボル数式部品以外の数式部品)をいう。これから直接入力ストロークを生成する生成規則はない。
隠れ筆記領域の尤度分布：数式部品には、その数式部品が占める部分をあらわす隠れ筆記領域というのが存在し、この「数式部品が書かれようとした領域」を隠れ筆記領域という。例えば、数式部品同士が「横」の位置関係にあるというのは、それぞれの数式部品の隠れ筆記領域が横に接続していると考える。ストローク数式部品（終端数式部品）からは入力ストロークが生成されるが、生成される入力ストロークの位置（外接矩形）はそのストローク数式部品の隠れ筆記領域に依存して、確率的に決まる。したがってある入力ストロークの観測とその入力ストローク生成規則候補が得られたとき、それを生成したストローク数式部品の隠れ筆記領域の尤度分布を得ることができる。隠れ筆記領域の尤度分布は、外接矩形が得られたときの隠れ筆記領域の確率分布として算出することができる。 Here, terms necessary for understanding the present invention will be described.
Mathematical parts: Stroke mathematical parts and other mathematical parts (characters, terms, factors, etc., which are defined as high-order mathematical parts in this specification). Everything related to non-terminal symbol generation rules is a mathematical component. In the terminal symbol generation rule, the left side is a stroke mathematical component, and the right side is an actual stroke (this is not a mathematical component).
Parent formula component: For each generation rule, the formula component (one) on the left side is called a parent formula component.
Child mathematical component: For each fire terminal symbol production rule, the mathematical component on the right side (single or plural. If the production rule is expressed in Chomsky standard form, there are always two).
Stroke: "Stroke" has the meaning of "input stroke / actual stroke" indicating from the pen down to the pen up of the actual input (sample point sequence), and the meaning of "stroke formula part" as a recognition candidate of the input stroke . The input stroke is an actual sample point sequence, and the stroke formula part is symbolic information. The input stroke is generated from the stroke formula part by the terminal symbol generation rule.
Stroke generation rule: A terminal symbol generation rule in which an input stroke is generated from a stroke formula part.
Stroke generation rule candidates: For each input stroke, there may be a plurality of stroke recognition candidates. That is, the terminal symbol generation rule for which the input stroke is generated cannot be uniquely specified, and a plurality of terminal symbol generation rule candidates must be considered simultaneously. In other words, it is considered that a certain input stroke may have been generated from a terminal symbol generation rule of minus → input stroke, or may be generated from a terminal symbol generation rule of positive horizontal bar → input stroke. The likelihood to which each rule is applied is performed by stroke likelihood calculation.
Stroke sequence: An expression when the input is viewed as an input stroke sequence.
Symbol: A part of a mathematical component. It is a so-called “character”. It is the smallest unit that has meaning alone.
Terminal formula component: Refers to the parent formula component of the terminal symbol generation grammar. An input stroke is generated from the terminal mathematical component. In this method, it is a stroke formula part. It is also possible to recognize mathematical expressions that do not necessarily have a stroke as the minimum unit. If the mathematical expression can be divided into discrete symbols by some preprocessing, the same recognition can be performed with the symbol as a minimum unit. In that case, the termination formula part is a symbol.
Non-terminal mathematical part: the same as the upper mathematical part. A mathematical expression component other than a stroke mathematical expression component (a mathematical expression component other than a symbol mathematical expression component when the symbol is the minimum unit). There is no generation rule that directly generates an input stroke.
Likelihood distribution of hidden writing area: There is a hidden writing area representing a portion occupied by the mathematical expression part in the mathematical expression part, and this "area where the mathematical expression part is to be written" is called a hidden writing area. For example, if the mathematical components are in a “horizontal” positional relationship, the hidden writing area of each mathematical component is considered to be connected horizontally. An input stroke is generated from the stroke mathematical component (terminal mathematical component), but the position of the generated input stroke (the circumscribed rectangle) is determined probabilistically depending on the hidden writing area of the stroke mathematical component. Therefore, when a certain input stroke is observed and an input stroke generation rule candidate is obtained, the likelihood distribution of the hidden writing area of the stroke mathematical expression part that has generated the input stroke can be obtained. The likelihood distribution of the hidden writing area can be calculated as the probability distribution of the hidden writing area when the circumscribed rectangle is obtained.

本発明の手書き数式の認識は、従来の認識手法に比べて、特に、構造認識において有利である。また、ストローク認識誤りを構造認識プロセスで補正することも可能であり、結果として、より正確な手書き数式の認識ができる。 The recognition of the handwritten mathematical expression of the present invention is particularly advantageous in the structure recognition as compared with the conventional recognition method. It is also possible to correct stroke recognition errors in the structure recognition process, and as a result, more accurate handwritten mathematical expressions can be recognized.

［Ａ］手書き数式認識の基本構成
本発明は、確率文脈自由文法による確率的なオンライン手書き数式認識に関するものであり、時系列的に入力された手書き数式情報から数式を認識する。本発明では、数式の認識を、ストローク尤度と構造尤度の積を最大化する数式候補を求める問題として定式化する。図１に示すように、手書き数式認識手法は、各ストロークの尤度計算と、数式部品（ストロークやシンボル）の位置関係による構造尤度計算と、からなる。 [A] Basic Configuration of Handwritten Formula Recognition The present invention relates to probabilistic online handwritten formula recognition using probabilistic context free grammar, and recognizes formulas from handwritten formula information input in time series. In the present invention, the recognition of the mathematical expression is formulated as a problem for obtaining a mathematical expression candidate that maximizes the product of the stroke likelihood and the structure likelihood. As shown in FIG. 1, the handwritten mathematical expression recognition method includes likelihood calculation for each stroke and structural likelihood calculation based on the positional relationship of mathematical expression parts (strokes and symbols).

ストローク尤度とは、ストロークモデルから実際にその手書きストロークが生成される確率であり、ストローク尤度は既存のモデルベースの文字認識手法により尤度計算を行うことができる。図３に基づいて言うと、ストローク尤度は、どれくらいの確率で「そのストロークである」のかということであり、３つの手書きシンボルについて、ストローク”a”である尤度を示している。さらに、ストローク尤度について図２に基づいて説明すると、時系列にしたがって、第１ストロークがルート記号（候補）である尤度は０．２であり、v（候補）である尤度は０．２であり、分数線（候補）である尤度は０．１である。第２ストロークがa（候補）である尤度は０．２であり、d（候補）である尤度は０．１である。第３ストロークが2（候補）である尤度は０．２であり、z（候補）である尤度は０．１である。第４ストロークがマイナス記号（候補）である尤度は０．２であり、プラス記号の横棒−（候補）である尤度は０．１である。第５ストロークが1（候補）である尤度は０．２であり、プラス記号の縦棒｜（候補）である尤度は０．１である。第６ストロークが6（候補）である尤度は０．２であり、Gの1画目（候補）である尤度は０．１である。 The stroke likelihood is a probability that the handwritten stroke is actually generated from the stroke model, and the stroke likelihood can be calculated by an existing model-based character recognition method. Referring to FIG. 3, the stroke likelihood indicates how much probability “is the stroke”, and indicates the likelihood of the stroke “a” for three handwritten symbols. Further, the stroke likelihood will be described with reference to FIG. 2. According to the time series, the likelihood that the first stroke is the root symbol (candidate) is 0.2, and the likelihood that v (candidate) is 0. 2, and the likelihood of being a fractional line (candidate) is 0.1. The likelihood that the second stroke is a (candidate) is 0.2, and the likelihood that the second stroke is d (candidate) is 0.1. The likelihood that the third stroke is 2 (candidate) is 0.2, and the likelihood that the third stroke is z (candidate) is 0.1. The likelihood that the fourth stroke is a minus sign (candidate) is 0.2, and the likelihood that the fourth stroke is a horizontal bar-(candidate) is 0.1. The likelihood that the fifth stroke is 1 (candidate) is 0.2, and the likelihood that the vertical stroke of the plus sign | (candidate) is 0.1. The likelihood that the sixth stroke is 6 (candidate) is 0.2, and the likelihood that it is the first stroke (candidate) of G is 0.1.

構造尤度とは、数式候補から実際にそのストローク同士の位置関係が生成される確率である。図３に基づいて言うと、構造尤度とは、どれくらいの確率で「その構造であるのか」ということであり、シンボルa,bからなる３つのパターンについて、２つのシンボルが「横につながっている尤度」を示している。構造尤度は後述する確率文脈自由文法(Stochastic Context-Free Grammar:SCFG)によりモデル化することで、計算が可能である。 The structure likelihood is a probability that the positional relationship between the strokes is actually generated from the mathematical expression candidate. Based on FIG. 3, the structure likelihood is the probability of “the structure”, and for the three patterns of symbols a and b, the two symbols are “connected side by side”. Likelihood ". The structural likelihood can be calculated by modeling with a stochastic context-free grammar (SCFG) described later.

本発明では、考えられる様々な数式候補の中で、ストローク尤度×構造尤度を最大化する数式候補を求め、尤度が最大の数式候補を、手書き数式として認識する。図３に示すように、図２のように互いに直交状に交差する二つのストロークがあった場合に、ストローク尤度だけで見ると、第４ストロークがマイナス記号（候補）である尤度は０．２であり、プラス記号の横棒−（候補）である尤度は０．１であり、また、第５ストロークが1（候補）である尤度は０．２であり、プラス記号の縦棒｜（候補）である尤度は０．１である。しかしながら、-1（候補）となる構造尤度が低く、+（候補）となる構造尤度が高いため、ストローク尤度とシンボルの構造尤度との積において、+としての候補が優位となる。 In the present invention, a mathematical expression candidate that maximizes stroke likelihood × structure likelihood is obtained from various possible mathematical expression candidates, and the mathematical expression candidate with the maximum likelihood is recognized as a handwritten mathematical expression. As shown in FIG. 3, when there are two strokes that intersect at right angles as shown in FIG. 2, the likelihood that the fourth stroke is a minus sign (candidate) is 0 when viewed from the stroke likelihood alone. .2, the likelihood that the horizontal bar of the plus sign-(candidate) is 0.1, the likelihood that the fifth stroke is 1 (candidate) is 0.2, and the likelihood of the plus sign vertical The likelihood of being a bar | (candidate) is 0.1. However, since the structure likelihood of -1 (candidate) is low and the structure likelihood of + (candidate) is high, the candidate as + is dominant in the product of the stroke likelihood and the symbol structure likelihood. .

［Ｂ］確率文脈自由文法による数式構造のモデル化
文脈自由文法(Context-Free Grammar:CFG)はA→B,AB→C,ABC→D等の形で記述される文法で、数式のように再帰性をもつ言語の生成・解析に適する。例えば図５にあるような単純な数個の数式生成規則を繰り返し適用することにより、a,b,cの3文字からなる様々な数式(例えばａｂ^ａｂｃなど)が解析可能である。 [B] Modeling of mathematical structure using probabilistic context-free grammar Context-Free Grammar (CFG) is a grammar written in the form of A → B, AB → C, ABC → D, etc. Suitable for generating and analyzing recursive languages. For example, various mathematical expressions (for example, ab ^abc ) composed of three characters a, b, and c can be analyzed by repeatedly applying several simple mathematical expression generation rules as shown in FIG.

確率文脈自由文法SCFGはCFGにおける生成規則が適用確率をもつ文法である。適用確率は、数式部位品同士の位置関係により決定される。例えば、図５において上から3番目の生成規則の適用確率は、「いかに文字に対して式の位置が右上らしいか」により定まる。このように手書き数式の構造尤度は、数式部品間の位置関係により適用確率が定まるSCFGによりモデル化ができる。SCFGによる構造尤度はCYK-algorithmにより効率的に計算可能である。 The stochastic context free grammar SCFG is a grammar in which the production rules in CFG have application probabilities. The application probability is determined by the positional relationship between the mathematical formula parts. For example, in FIG. 5, the application probability of the third generation rule from the top is determined by “how the position of the expression seems to be upper right with respect to the character”. In this way, the structural likelihood of a handwritten mathematical expression can be modeled by the SCFG whose application probability is determined by the positional relationship between mathematical parts. The structural likelihood by SCFG can be calculated efficiently by CYK-algorithm.

［Ｃ］数式文法
数式文法は、ストローク（終端）数式部品から実際のストロークを生成する終端記号生成規則と、ストローク以外の（非終端）数式部品から単数または複数の、別の数式部品を生成する非終端記号生成規則からなる。
［Ｃ−１］非終端記号生成規則
数式文法は、例えば多項式を扱う場合は、図６に示すような文脈自由文法を用意すれば十分である。文脈自由文法を構成する各数式生成規則は、数式部品間の位置関係により適用される確率が設定されている。図６に示す文脈自由文法をチョムスキー標準形に等価変換したもののうち、図１２の数式生成にかかわるものを挙げると、図７に示す５つとなる。 [C] Formula Grammar The formula grammar is a terminal symbol generation rule for generating an actual stroke from a stroke (end) formula part, and a non-terminal that generates one or more other formula parts from a non-stroke (non-end) formula part. Consists of symbol generation rules.
[C-1] The non-terminal symbol generation rule mathematical grammar suffices to prepare a context-free grammar as shown in FIG. Each formula generation rule constituting the context-free grammar has a probability of being applied depending on the positional relationship between the formula parts. Among the context-free grammars shown in FIG. 6 that are equivalently converted to the Chomsky standard form, the ones related to formula generation in FIG. 12 are the five shown in FIG.

［Ｃ−２］終端記号生成規則
終端記号生成規則Fは、ストローク数式部品∈ＳＴから実際の手書きストロークを生成する規則である。図１２に関連する終端記号生成規則は、図８に示す５つである。 [C-2] Terminal Symbol Generation Rule The terminal symbol generation rule F is a rule for generating an actual handwritten stroke from the stroke equation part εST. The terminal symbol generation rules related to FIG. 12 are the five shown in FIG.

［Ｄ］位置特徴量としての隠れ筆記領域の尤度分布
数式部品間の位置関係の評価は単純な問題ではない。従来、位置関係の特徴量として一般に用いられる外接矩形では図９のように多様なシンボル形状を統一的に扱うことはできず、文字カテゴリごとに場合分けする必要が生じる。また、数式にはアルファベット以外にもアクセント、ドット、カンマ等特殊な形状のシンボルが多いため場合分けの数が多くなり、結果としてSCFGの規則が複雑化してしまう。図９は、外接矩形の限界を示す例であり、ｑｘとｄ^ｘのシンボル同士の位置関係は異なるが、外接矩形は等しい（非特許文献１より引用）。 [D] Likelihood distribution of hidden writing area as position feature quantity Evaluation of the positional relationship between mathematical formula parts is not a simple problem. Conventionally, a circumscribed rectangle generally used as a positional relationship feature amount cannot handle various symbol shapes in a unified manner as shown in FIG. 9, and needs to be classified for each character category. In addition to the alphabet, there are many symbols with special shapes such as accents, dots, and commas in the mathematical formula, so the number of cases increases, resulting in complicated SCFG rules. Figure 9 is an example illustrating the limits of the circumscribed rectangle, the positional relationship between the symbols of qx and d ^x are different, (quoted from non-patent document 1) enclosing rectangle are equal.

本発明では、この問題を回避するため、様々な数式部品（シンボルやストローク）を統一的に扱う特徴量とし隠れ筆記領域の尤度分布を用いる。隠れ筆記領域は、実際に書かれた数式部品（シンボルやストローク）が「どこに書かれようとしたのか」を表すものであり、この「書かれようとした領域」の尤度分布を算出する。先ず、隠れ筆記領域の背景としての数式の生成モデルを説明する。図１７に示すように、数式“abc^d”は、人間の数式生成モデルを仮定して、以下のように書かれると考える。
(1)はじめに数式全体の中心と縦サイズを決定する。
(2)最初の文字”a”の始点と終点を決定。“a”の書かれようとする領域が定まる。
(3)その領域に”a”が書かれる。実際に書かれる場所は確率的な揺らぎを持つ。
(4)次に”b”の終点を決定する。“b”の書かれようとする領域が定まる。
(5)その領域に”b”が書かれる。”b”は領域の上方に書かれがちである。
(6)同様に”c”が書かれる。
(7)“d”の中心と縦サイズは、“c”の縦中心と縦サイズから確率的に決定される。
(8)“d”の終点を決定し、その領域に”d”が書かれる。
すなわち、我々は数式を、横方向に一貫性を持った、隠れたマス目にそって書いている、という仮定である。そして、構造尤度を「この生成モデルから生成された確率」であると考える。 In the present invention, in order to avoid this problem, the likelihood distribution of the hidden writing area is used as a feature quantity that uniformly handles various mathematical parts (symbols and strokes). The hidden writing area indicates where the mathematical parts (symbols and strokes) actually written are “where they are going to be written”, and the likelihood distribution of this “area where they are going to be written” is calculated. First, a formula generation model as a background of the hidden writing area will be described. As shown in FIG. 17, the formula “abc ^d ” is assumed to be written as follows, assuming a human formula generation model.
(1) First, determine the center and vertical size of the entire formula.
(2) Determine the start and end points of the first character “a”. The area where “a” is to be written is determined.
(3) “a” is written in that area. The place actually written has a stochastic fluctuation.
(4) Next, determine the end point of “b”. The area where “b” is to be written is determined.
(5) “b” is written in that area. “B” tends to be written above the area.
(6) Similarly, “c” is written.
(7) The center and vertical size of “d” are stochastically determined from the vertical center and vertical size of “c”.
(8) The end point of “d” is determined, and “d” is written in that area.
That is, we are assuming that we are writing mathematical formulas along a hidden grid that is consistent in the horizontal direction. The structure likelihood is considered to be “probability generated from this generation model”.

上記のような数式生成モデルを仮定した上で、そのモデルにおける「数式部品を書こうとする矩形領域」を、隠れ筆記領域とする。このモデルにおいて、各ストロークは、それぞれのストローク数式部品の隠れ筆記領域に対して、確率的な揺らぎを持って書かれると仮定する。言い換えると、各ストロークが実際に書かれる位置・大きさ（この２つを外接矩形で表すことができる）は、それぞれのストローク数式部品の隠れ筆記領域から確率的に決定される。また、各数式部品の隠れ筆記領域同士は、一定の位置関係を持っている。例えば数式部品ABCが、A→B+Cの規則により生成されたときは（たとえばA:2(x+1)、B:2、C:(x+1)など）、AとBとCの各隠れ筆記領域は、一定の位置関係（”横“）を持つ。 Assuming the mathematical expression generation model as described above, the “rectangular area where the mathematical expression part is to be written” in the model is set as a hidden writing area. In this model, it is assumed that each stroke is written with a probabilistic fluctuation in the hidden writing area of each stroke formula part. In other words, the position and size at which each stroke is actually written (the two can be represented by a circumscribed rectangle) are stochastically determined from the hidden writing area of each stroke formula component. Moreover, the hidden writing areas of the respective mathematical formula parts have a certain positional relationship. For example, when the formula part ABC is generated according to the rule of A → B + C (for example, A: 2 (x + 1), B: 2, C: (x + 1), etc.), A, B and C Each hidden writing area has a certain positional relationship ("horizontal").

図１１は、隠れ筆記領域の概念図である。数式abcは上の図ような矩形領域に確率的なゆらぎを持って書かれると考える。このときaやcと比べてbは長方形領域の上方に書かれがちである。一方、実際に下の図のように書かれた文字がある場合、逆にその文字の「書かれようとした矩形領域」である「隠れ筆記領域」は尤度分布を持つ。この「隠れ筆記領域書」の尤度分布を位置特徴量として、全ての数式部品（シンボル）の位置を統一的に扱う。隠れ筆記領域の尤度分布によりアルファベット、ドット、カンマやアクセントを全て同じ特徴量で扱うことができると考えられる。 FIG. 11 is a conceptual diagram of a hidden writing area. The formula abc is considered to be written with stochastic fluctuations in the rectangular area as shown in the figure above. At this time, b tends to be written above the rectangular area as compared to a and c. On the other hand, when there is a character actually written as shown in the figure below, conversely, the “hidden writing region” that is the “rectangular region to be written” of the character has a likelihood distribution. Using the likelihood distribution of this “hidden writing area book” as the position feature quantity, the positions of all the mathematical components (symbols) are handled uniformly. It is considered that alphabets, dots, commas, and accents can all be handled with the same feature amount by the likelihood distribution of the hidden writing area.

隠れ筆記領域の尤度分布について、さらに、詳述する。あるストロークが書かれたとき、書かれようとした領域の尤度分布は、書かれたストロークの外接矩形と、そのストロークの生成規則F(何のストロークを意図して書かれたのか)が得られた時の隠れ筆記領域の条件付確率分布である。ある隠れ筆記領域は４つのパラメータa=(a^c,a^s,a^b,a^e)で表すことができる。ここで、a^c：縦中心、a^s：縦サイズ、a^b：横始点、a^e：横終点、である。その確率分布P(a)を表す場合は、4次元ベクトルa=(a^c,a^s,a^b,a^e)の、4次元空間内での確率分布を表す必要がある。ここで4次元空間内での対角共分散の正規分布(すなわちa^c,a^s,a^b,a^eがそれぞれ独立に正規分布をなす)を仮定すると、P(a)を表すパラメータはその平均ベクトルと共分散行列μa=(μa^c,μa^s,μa^b,μa^e)、σa=diag(σa^c,σa^s,σa^b,σa^e)によって表される。実際の各ストロークの外接矩形b=(b^c,b^s,b^b,b^e）は、隠れ筆記領域a=(a^c,a^s,a^b,a^e)から確率的にP(b|a,F)に従い得られると考える。逆に各ストロークの外接矩形bが得られたときに、隠れ筆記領域aの確率分布は、P(b|a,F)をaの関数と見ることで得られる。この関数があらかじめ各ストローク生成規則に対して用意してある。各ストロークからどのような隠れ筆記領域の尤度分布が生成されるかを、予め学習しておく。入力ストロークの外接矩形が同じであっても、ストロークの認識結果（生成規則）によって隠れ筆記領域の尤度分布は変化する。例えば、外接矩形が同じであっても、ストローク生成規則候補が“ａ→(実際のストローク)”である場合と“ｄ→(実際のストローク)”である場合とでは、隠れ筆記領域の尤度分布は異なる。実際の各ストロークの外接矩形ｂのパラメータ(b^c,b^s,b^b,b^e、ここで、ｂ^c：縦中心ｂ^s：縦サイズ、ｂ^b：横始点、ｂ^e：横終点、である)に対して、隠れ筆記領域ａの尤度分布のパラメータ（確率変数)が、a^c=b^c+Pc, a^s=
b^s+Ps，a^b=b^b+Pb, a^e=b^e+Peにより決まると考え、各ストロークについてPc,Ps,Pb,Peを学習しておき、観測された数式部品の外接矩形に基づいて、当該数式部品が書かれようとした領域の尤度分布を算出するための関数を格納しておく。 The likelihood distribution of the hidden writing area will be further described in detail. When a stroke is written, the likelihood distribution of the area to be written is obtained by the bounding rectangle of the written stroke and the generation rule F of that stroke (what stroke was written for). Is the conditional probability distribution of the hidden writing area when A certain hidden writing area can be expressed by four parameters a = (a ^c , a ^s , a ^b , a ^e ). Here, a ^c : vertical center, a ^s : vertical size, a ^b : horizontal start point, a ^e : horizontal end point. When representing the probability distribution P (a), it is necessary to represent the probability distribution in the four-dimensional space of the four-dimensional vector a = (a ^c , a ^s , a ^b , a ^e ). Assuming a normal distribution of diagonal covariance in 4D space (i.e., a ^c , a ^s , a ^b , and a ^e are each independently a normal distribution), the parameter representing P (a) is The mean vector and the covariance matrix μa = (μa ^c , μa ^s , μa ^b , μa ^e ), σa = diag (σa ^c , σa ^s , σa ^b , σa ^e ). The circumscribed rectangle b = (b ^c , b ^s , b ^b , b ^e ) of each actual stroke is stochastically derived from the hidden writing region a = (a ^c , a ^s , a ^b , a ^e ) P (b | We think that it can be obtained according to a, F). Conversely, when the circumscribed rectangle b of each stroke is obtained, the probability distribution of the hidden writing area a can be obtained by viewing P (b | a, F) as a function of a. This function is prepared in advance for each stroke generation rule. It is learned in advance what kind of hidden writing area likelihood distribution is generated from each stroke. Even if the circumscribed rectangle of the input stroke is the same, the likelihood distribution of the hidden writing area changes depending on the recognition result (generation rule) of the stroke. For example, even if the circumscribed rectangles are the same, the likelihood of the hidden writing area is different between the case where the stroke generation rule candidate is “a → (actual stroke)” and the case where “d → (actual stroke)”. Distribution is different. Parameters of the bounding rectangle b of each actual stroke (b ^c , b ^s , b ^b , b ^e , where b ^c : vertical center b ^s : vertical size, b ^b : horizontal start point, b ^e : horizontal end point The parameter of the likelihood distribution of the hidden writing area a (random variable) is a ^c = b ^c + Pc, a ^s =
b ^s + Ps, a ^b = b ^b + Pb, a ^e = b ^e + Pe, and learn Pc, Ps, Pb, Pe for each stroke, Based on this, a function for calculating the likelihood distribution of the region in which the mathematical formula part is to be written is stored.

［Ｅ］数式部品の隠れ筆記領域を用いた構造尤度計算
［Ｅ−１］各種定義
ストローク数式部品の集合を、
とし、ストローク数式部品を含む全数式部品の集合を、
とする。また、数式部品同士の位置関係集合を、
とする。 [E] Structural likelihood calculation using the hidden writing area of the mathematical part [E-1] A set of various defined stroke mathematical parts,
And a set of all mathematical parts including stroke mathematical parts,
And In addition, a set of positional relationships between mathematical parts
And

数式生成規則のうち、非終端記号生成規則の集合を集合Ｒとする。各非終端記号生成規則の持つ情報は、Ｒ(∈Ｒ）＝（Ｒ^ｐ∈ＭＰ，Ｒ^ｌ∈ＭＰ，Ｒ^ｒ∈ＭＰ，Ｒ^ＲＰ∈ＲＰ）である。ここで、Ｒ^ｐ，Ｒ^ｌ，Ｒ^ｒ，Ｒ^ＲＰは順に、親数式部品名、子数式部品名（左）、子数式部品名（右）、子数式部品の満たす位置関係を示す。 A set of non-terminal symbol generation rules among the formula generation rules is set as a set R. Information of each non-terminal symbol generation rule is R (εR) = (R ^p εMP, R ¹ εMP, R ^r εMP, R ^RP εRP). Here, R ^p , R ^l , R ^r , and R ^RP indicate the positional relationship that the parent formula part name, the child formula part name (left), the child formula part name (right), and the child formula part satisfy in order.

終端記号生成規則の集合をＦとする。各終端記号生成規則の持つ情報は、Ｆ（∈Ｆ）＝（Ｆ^ｐ∈ＳＴ）である。規則の一覧例を図６に記載する。 Let F be a set of terminal symbol generation rules. Information that each terminal symbol generation rule has is F (εF) = (F ^p εST). An example of a list of rules is shown in FIG.

観測される手書き数式の各ストロークについて、第i番目のストローク形状データ（サンプル座標列情報）をｔ_ｉ、その外接矩形ｂ_ｉとして表現し、観測される手書き数式を、
と表す。ここでＮは観測のストローク総数である。 For each stroke of the handwritten formulas observed, the i-th stroke shape data (sample coordinate series information) t _i, expressed as the circumscribed rectangle b _i, handwritten formulas observed,
It expresses. Here, N is the total number of observation strokes.

任意の数式候補Ｈは、入力数式、
の場合には、図１２のような木構造で表現される。この木構造は数式生成規則の入れ子形式、たとえば
のように記述することができる。 Arbitrary formula candidate H is an input formula,
In this case, it is represented by a tree structure as shown in FIG. This tree structure is a nested form of formula generation rules, for example
Can be described as follows.

［Ｅ−２］数式部品の「筆記領域」
各数式部品の背後には図１３のように矩形の隠れ筆記領域が存在すると仮定する。数式生成規則Ｒ_ｉの親数式部品Ｒ_ｉ ^ｐの筆記領域を、
と表す。ここで、ａ_ｉ ^ｃ，ａ_ｉ ^ｓ，ａ_ｉ ^ｂ，ａ_ｉ ^ｅは、図１４に示す各座標と長さとする。 [E-2] "Writing area" of mathematical parts
It is assumed that there is a rectangular hidden writing area behind each mathematical expression part as shown in FIG. A writing region of the parent math part _R ^{i p} in Equation productions _{R i,}
It expresses. Here, a _i ^c , a _i ^s , a _i ^b , and a _i ^e are the coordinates and lengths shown in FIG.

次に、各位置関係について、親数式部品筆記領域がａ_ｐであるとき子数式部品筆記領域がａ_ｌ，ａ_ｒである確率、
を定める。 Next, for each positional relationship, the probability that the child formula part writing area is a _l , a _r when the parent formula part writing area is _ap ,
Determine.

［Ｅ−２−１］位置関係：横
位置関係が横である、という条件を以下の式により記述する。
すなわち、
と表せる（δ（ｘ）はDiracのDelta関数） [E-2-1] Positional relationship: The condition that the lateral positional relationship is horizontal is described by the following equation.
That is,
(Δ (x) is the Dirac Delta function)

［Ｅ−２−２］位置関係：右上
位置関係が右上である、という条件を以下の式により記述する。
ここで、ｑ_右上 ^１，ｑ_右上 ^２は、それぞれあらかじめ与えられた正規分布
に従う。したがって、
となる。 [E-2-2] Positional relationship: upper right The condition that the positional relationship is upper right is described by the following equation.
Here, q _{upper right} ¹ and q _{upper right} ² are respectively given normal distributions.
Follow. Therefore,
It becomes.

［Ｅ−２−３］その他の位置関係
右下その他の位置関係についても同様に、
の形、もしくは、あらかじめ与えられた正規分布
に従う確率変数ｑを導入し、
のように記述する。 [E-2-3] Other positional relations Similarly for the lower right and other positional relations,
Or normal distribution given in advance
Introducing a random variable q according to
Write like this.

［Ｅ−３］終端記号生成規則の適用確率
ストローク出力ｔ，ｂが、そのストロークの筆記領域と終端記号生成規則ａ，Ｆにより生成された確率Ｐ（ｔ，ｂ｜ａ，Ｆ）を以下のように定義する。
右辺第一項はストローク形状ｔがストロークＦ^ｐである確率（ストローク尤度）であり、既存の文字認識手法により計算可能である。右辺第二項は終端記号生成規則Ｆによりストロークが筆記領域ａに生成された場合の、実際のストローク外接矩形ｂの確率分布である。この確率をａ−ｂの正規分布
により定義する。 [E-3] Terminal Symbol Generation Rule Applicability Probability Stroke output t, b represents the probability P (t, b | a, F) generated by the stroke writing area and terminal symbol generation rules a, F as follows: Define as follows.
The first term on the right side is the probability (stroke likelihood) that the stroke shape t is the stroke F ^p and can be calculated by an existing character recognition method. The second term on the right side is the probability distribution of the actual stroke circumscribing rectangle b when the stroke is generated in the writing area a by the terminal symbol generation rule F. This probability is expressed as ab normal distribution
Defined by

［Ｅ−４］尤度の計算
数式候補Ｈから観測（ｔ，ｂ）を得る尤度の計算方法を、図１２の例において示す。各数式部品の筆記領域をａ_ｉとすると、
となる。積分記号右下の文字はどの文字についての積分かを表したものである。Π¹¹ _i=1Ｐ（ｔ_ｉ｜Ｒ^ＭＰ）は、ストローク尤度であり、既存の文字認識手法で計算可能である。また、残りの部分は構造尤度の式であるが、これは各文字部品の筆記領域の確率分布Ｐ（ａ_ｉ）において、ａ_ｉ ^ｃ，ａ_ｉ ^ｓ，ａ_ｉ ^ｂ，ａ_ｉ ^ｅを独立に扱う近似
を行うことにより以下のように逐次計算が可能である。 [E-4] Likelihood Calculation The likelihood calculation method for obtaining the observation (t, b) from the formula candidate H is shown in the example of FIG. If the writing area of each mathematical component is a _i ,
It becomes. The character at the lower right of the integration symbol indicates which character the integration is for. Π ¹¹ _{i = 1} P (t _i | R ^MP ) is a stroke likelihood and can be calculated by an existing character recognition method. In addition, the remaining part is an equation of structure likelihood, which is independent of a _i ^c , a _i ^s , a _i ^b , and a _i ^e in the writing probability distribution P (a _i ) of each character part. Approximation to handle
It is possible to perform sequential calculation as follows.

（１）はじめに、
の部分を計算する。
(a)P(b₂|a₂,R₂)，P(b₃|a₃,R₃)は、b₂,b₃が与えられた時は、それぞれa₂,a₃の正規分布である。
(b)P(a₂,a₃|a₇,R₇）が式１のように与えられていれば、すなわち、
と記述されていれば、
となる。ここで、正規分布関数の積は正規分布関数であり、さらに、正規分布関数とdelta関数の畳み込みも正規分布関数であるから、Ｐ（ａ_２），Ｐ（ａ_３）が正規分布であることよりＰ（a₂,a₃|a₇,R₇）は、ａ7の正規分布関数となる。
(c)Ｐ（a₂,a₃|a₇,R₇）が式２のように与えられていれば、すなわち、
と記述されているときも同様に、
となり、ここで、P(a₂),P(a₃),P(q)が正規分布であることより、やはりＰ（a₂,a₃|a₇,R₇）はa₇の正規分布関数となる。
(d)よって
は、ａ_７の正規分布関数Ｐ（ａ_７）となる。 (1) Introduction
Calculate the part.
(a) P (b ₂ | a ₂ , R ₂ ) and P (b ₃ | a ₃ , R ₃ ) are normal distributions of a ₂ and a ₃ when b ₂ and b ₃ are given, respectively. is there.
(b) If P (a ₂ , a ₃ | a ₇ , R ₇ ) is given by Equation 1, that is,
If it is described,
It becomes. Here, since the product of the normal distribution function is a normal distribution function, and the convolution of the normal distribution function and the delta function is also a normal distribution function, P (a ₂ ) and P (a ₃ ) are normal distributions. Thus, P (a ₂ , a ₃ | a ₇ , R ₇ ) is a normal distribution function of a ₇ .
(c) If P (a ₂ , a ₃ | a ₇ , R ₇ ) is given by Equation 2, that is,
Similarly, when
Where P (a ₂ , a ₃ | a ₇ , R ₇ ) is a normal distribution of a ₇ because P (a ₂ ), P (a ₃ ), and P (q) are normally distributed. It becomes a function.
(d)
Is a normal distribution function P of _{_{a 7} (a 7).}

（２）したがって、その１つ外側の積分は、
となる。ここで、Ｐ（ａ_７）、Ｐ(b₄|a₄,R₄)がともに正規分布であることから、先ほどと同様に、この積分値は正規分布Ｐ（ａ_８）となる。 (2) Therefore, the one outer integral is
It becomes. Here, since both P (a ₇ ) and P (b ₄ | a ₄ , R ₄ ) are normal distributions, the integral value is the normal distribution P (a ₈ ) as before.

（３）以下同様に積分計算を外側に向かって行うことにより、構造尤度は、∫Ｐ(ａ_１１)ｄａ_１１となり、この積分計算で構造尤度が求められる。 (3) Similarly, by performing the integral calculation outward, the structural likelihood becomes ∫P (a ₁₁ ) da ₁₁ , and the structural likelihood is obtained by this integral calculation.

［Ｅ−５］CYK-algorithmによる尤度の計算
数式候補の尤度の計算は、CYK-algorithmにより効率的に解析することができる。図１５、図１６に尤度計算の例を示す。図１５は、４ストロークの時系列が、“２”→“ａ”→“２”→“ルート記号”である場合を示す。例えば、“２ａ”である尤度は、０．１（２の尤度）Ｘ０．２（ａの尤度）×０．０５（横連結尤度）＝０．００１である。“２^ａ“である尤度は、０．１（２の尤度）Ｘ０．２（ａの尤度）×０．０２５（右上連結尤度）＝０．００１である。４ストローク全体としては、√２ａ^２（候補）の尤度が０．００００００１、√２ａ２（候補）の尤度が０．０００００００５である。図１６は、４ストロークの時系列が、”（“→”５の部分“→”５の部分“→“）”である場合を示す。4ストローク全体としては、（５）（候補）である尤度が０．００００００１であり、１５１（候補）である尤度が０．０００００００５である。 [E-5] Calculation of likelihood by CYK-algorithm Calculation of the likelihood of a mathematical expression candidate can be efficiently analyzed by CYK-algorithm. 15 and 16 show examples of likelihood calculation. FIG. 15 shows a case where the time series of 4 strokes is “2” → “a” → “2” → “route symbol”. For example, the likelihood of “2a” is 0.1 (likelihood of 2) × 0.2 (likelihood of a) × 0.05 (horizontal link likelihood) = 0.001. The likelihood of “2 ^a ” is 0.1 (likelihood of 2) × 0.2 (likelihood of a) × 0.025 (upper right connection likelihood) = 0.001. For the entire four strokes, the likelihood of √2a ² (candidate) is 0.0000001, and the likelihood of √2a2 (candidate) is 0.00000005. FIG. 16 shows a case where the time series of the four strokes is “(“ → ”5 portion“ → ”5 portion“ → “)”. For the entire four strokes, the likelihood of (5) (candidate) is 0.0000001, and the likelihood of 151 (candidate) is 0.000000005.

［Ｆ］動作確認実験
［Ｆ−１］隠れ筆記領域の尤度分布の有効性の確認
IEEE Transactionの中で中程度の複雑さを持つと思われる8数式(ストローク数30〜60)同一筆者による各10回ずつ計80の手書き数式を評価データとし、ストロークの認識が完全に行われた条件下で構造認識実験を行った。各ストロークの隠れ筆記領域の尤度分布を定めるパラメータはヒューリスティックに決定し、数式文法は図５のものの他に分数と平方根およびアクセントを許容した。その結果80数式中63数式の構造が完全に認識された。また誤りはサブスクリプトのミス等の局所的なもののみであり、パラメータのチューニングによりさらに精度の向上が可能であると考えられる。この実験により隠れ筆記領域の尤度分布が構造尤度計算の特徴量として有効であることが確認された。 [F] Operation confirmation experiment [F-1] Confirmation of effectiveness of likelihood distribution in hidden writing area
8 formulas (30 to 60 strokes) that seem to have moderate complexity in the IEEE Transaction, strokes were fully recognized using 80 handwritten formulas by the same author 10 times each as evaluation data The structure recognition experiment was conducted under the conditions. The parameters that determine the likelihood distribution of the hidden writing area of each stroke were determined heuristically, and the mathematical expression grammar allowed fractions, square roots and accents in addition to those in FIG. As a result, the structure of 63 of 80 formulas was completely recognized. In addition, errors are only local errors such as subscript mistakes, and it is considered that accuracy can be further improved by parameter tuning. From this experiment, it was confirmed that the likelihood distribution of the hidden writing area is effective as a feature quantity for the structure likelihood calculation.

［Ｆ−２］システム全体の動作確認実験
ストローク尤度を手動で設定することで、システム全体の動作を確認した。［Ｆ−１］で用いた8数式各1回ずつ計8の手書き数式を評価データとし、ストロークの認識誤りの構造認識による補正を確認するため、各ストロークのストローク尤度1位候補を全てアルファベットの''a''、2位を正解ストロークとし、数式の認識を行った。その結果最終的に正解ストロークが多数認識結果に含まれた。この実験により、ストローク認識誤りが構造認識において補正されうることが確認された。また、上記実験ではストローク尤度を人手で与えたが、ストローク尤度計算に隠れマルコフモデルHMMを用いてもよい。 [F-2] Operation Confirmation of Entire System The operation of the entire system was confirmed by manually setting the experimental stroke likelihood. The eight formulas used in [F-1] are each evaluated as eight handwritten formulas once each for evaluation data, and the stroke likelihood first-rank candidates for all strokes are all alphabetic in order to check the correction of the stroke recognition error due to the structure recognition. ”A” and 2nd place were correct strokes, and the mathematical formula was recognized. As a result, many correct strokes were finally included in the recognition results. This experiment confirmed that stroke recognition errors can be corrected in structure recognition. In the above experiment, the stroke likelihood is given manually, but a hidden Markov model HMM may be used for stroke likelihood calculation.

本発明は、オンライン手書き数式認識として利用され得るものであり、具体的には、教育、論文作成、数式検索のインタフェース、数値計算ソフトのインタフェース等に利用可能である。 The present invention can be used as online handwritten mathematical expression recognition. Specifically, it can be used for education, paper creation, mathematical expression search interface, numerical calculation software interface, and the like.

手書き数式認識の全体構成を説明する図である。It is a figure explaining the whole structure of handwritten numerical formula recognition. ストローク候補のストローク尤度を説明する図である。It is a figure explaining the stroke likelihood of a stroke candidate. ストローク尤度と構造尤度の関係を説明する図である。It is a figure explaining the relationship between stroke likelihood and structure likelihood. 確率文脈自由文法を説明する図である。It is a figure explaining probabilistic context free grammar. 数式を生成する確率文脈自由文法の例を示す図である。It is a figure which shows the example of the probability context free grammar which produces | generates a numerical formula. 数式を生成する確率文脈自由文法の例を示す図である。It is a figure which shows the example of the probability context free grammar which produces | generates a numerical formula. 図６に示す文法をチョムスキー標準形に等価変化したもので、図１２の数式生成に関係する規則を示す図である。FIG. 13 is a diagram showing the rules related to formula generation in FIG. 12, in which the grammar shown in FIG. 6 is equivalently changed to the Chomsky standard form. 図１２に関連する終端記号生成規則を示す図である。It is a figure which shows the terminal symbol production | generation rule relevant to FIG. 外接矩形の限界を示す例である。It is an example which shows the limit of a circumscribed rectangle. 数式部品間の位置関係の評価を説明する図である。It is a figure explaining the evaluation of the positional relationship between mathematical formula parts. 隠れ筆記領域の概念図である。It is a conceptual diagram of a hidden writing area. 手書き数式と、それに対する正解数式候補の木構造の対応を示す図である。It is a figure which shows a response | compatibility of a handwritten numerical formula and the tree structure of the correct numerical formula candidate with respect to it. 数式部品と隠れ筆記領域を示す図である。It is a figure which shows a formula component and a hidden writing area. 隠れ筆記領域のパラメータを示す図である。It is a figure which shows the parameter of a hidden writing area. CYK-algorithmを用いた尤度計算の例を示す図である。It is a figure which shows the example of likelihood calculation using CYK-algorithm. CYK-algorithmを用いた尤度計算の例を示す図である。It is a figure which shows the example of likelihood calculation using CYK-algorithm. 人間の数式生成モデルを説明する図である。It is a figure explaining a human mathematical formula generation model.

Claims

Input means for reading handwritten mathematical formulas;
Means for calculating a stroke likelihood of stroke candidates constituting the mathematical formula;
Means for expressing an input mathematical expression with a plurality of mathematical expression candidates composed of a plurality of stroke candidates;
Means for calculating the structural likelihood related to the positional relationship between the mathematical parts constituting the input handwritten mathematical expression using a probability context free grammar;
Means for calculating the likelihood of each mathematical formula candidate using the calculated stroke likelihood and structure likelihood;
Means for determining an input handwritten mathematical formula based on the calculated likelihood of each mathematical formula candidate;
An apparatus for recognizing handwritten mathematical formulas.

2. The handwritten mathematical expression recognition apparatus according to claim 1, wherein the mathematical expression determination means uses the mathematical expression candidate having the maximum likelihood as the mathematical expression.

The means to determine the formula is
Means for extracting a plurality of mathematical formula candidates having a high likelihood;
Means for selecting one mathematical expression candidate from the plurality of mathematical expression candidates extracted;
The recognition apparatus of the handwritten numerical formula of Claim 1 consisting of.

The structure likelihood calculating means includes
A means for calculating a likelihood distribution of an area in which the stroke is to be written based on the stroke generation rule candidate for each of the observed circumscribed rectangles of the strokes and the generation rule candidates for the strokes;
Means for calculating the structure likelihood by a probability context free grammar in which the application probability is determined by the positional relationship between the mathematical components including the stroke, using the calculated likelihood distribution as the position feature amount of each stroke;
The recognition apparatus of the handwritten numerical formula in any one of Claims 1 thru | or 3 which has these.

The structural likelihood calculating means is, for each mathematical component,
If the mathematical part is a stroke mathematical part, based on the observed circumscribed rectangle of the mathematical part and the generation rule of the mathematical part,
In the case where the mathematical part is a mathematical part other than a stroke, based on the region where each mathematical part constituting the mathematical part is to be written and the generation rule of the mathematical part,
5. The handwritten mathematical expression recognition apparatus according to claim 1, wherein a likelihood distribution of an area in which the mathematical formula part is to be written is calculated.

The likelihood distribution of the area in which the stroke formula part is to be written is calculated from the parameters of the vertical center, vertical size, horizontal start point, and horizontal end point of the circumscribed rectangle and the random variables set for each stroke generation rule. To be
The recognition apparatus of the handwritten numerical formula in any one of Claim 4, 5.

7. The apparatus according to claim 1, further comprising means for storing a context-free grammar composed of a plurality of formula generation rules and an application probability of each formula generation rule determined by a positional relationship between formula components. A handwritten mathematical expression recognition device.

An input step for reading a handwritten mathematical formula;
Calculating stroke likelihood of stroke candidates constituting the mathematical formula;
Expressing the input mathematical expression with a plurality of mathematical expression candidates composed of a plurality of stroke candidates;
Calculating a structural likelihood related to the positional relationship between mathematical parts constituting the input handwritten mathematical expression using a probability context free grammar;
Calculating the likelihood of each formula candidate using the calculated stroke likelihood and structure likelihood;
Determining an input handwritten mathematical formula based on the calculated likelihood of each mathematical formula candidate;
The recognition method of the handwritten mathematical formula which consists of.

The method for recognizing a handwritten mathematical expression according to claim 8, wherein the step of determining the mathematical expression uses a mathematical expression candidate having the maximum likelihood as a mathematical expression.

The step of determining the formula is:
Extracting a plurality of mathematical formula candidates having a high likelihood;
Selecting one mathematical expression candidate from the extracted plurality of mathematical expression candidates;
The recognition method of the handwritten numerical formula of Claim 8 consisting of.

The structural likelihood calculating step includes:
For each circumscribed rectangle of each observed stroke, and for each candidate for the generation rule of the stroke, calculating a likelihood distribution of an area in which the stroke is to be written based on the stroke generation rule candidate;
The structure likelihood is calculated by a probability context free grammar in which the application probability is determined by the positional relationship between mathematical parts including the stroke, using the calculated likelihood distribution as the position feature amount of each stroke;
The recognition method of the handwritten numerical formula in any one of Claims 8 thru | or 10 which has.

In the structural likelihood calculation step, for each mathematical component,
If the mathematical part is a stroke mathematical part, based on the observed circumscribed rectangle of the mathematical part and the generation rule of the mathematical part,
In the case where the mathematical part is a mathematical part other than a stroke, based on the region where each mathematical part constituting the mathematical part is to be written and the generation rule of the mathematical part,
The method for recognizing a handwritten mathematical expression according to any one of claims 8 to 11, wherein a likelihood distribution of an area in which the mathematical formula part is to be written is calculated.

The likelihood distribution of the area in which the stroke formula part is to be written is calculated from the parameters of the vertical center, vertical size, horizontal start point, and horizontal end point of the circumscribed rectangle and the random variables set for each stroke generation rule. To be
The method for recognizing a handwritten mathematical formula according to claim 11.

A computer program for causing a computer to function as any one of claims 1 to 7 for recognizing a handwritten mathematical expression.

A computer-readable recording medium having recorded thereon a computer program for causing a computer to function as any one of claims 1 to 7 for recognizing a handwritten mathematical expression.

Input means for reading handwritten mathematical formulas;
Means for calculating a terminal formula component likelihood of a terminal formula component candidate constituting the formula;
Means for expressing an input mathematical expression with a plurality of mathematical expression candidates composed of a plurality of terminal mathematical expression component candidates;
Means for calculating a structural likelihood related to the positional relationship between the mathematical expression parts including the terminal mathematical expression part and the non-terminal mathematical part constituting the input handwritten mathematical expression using a probability context free grammar;
Means for calculating the likelihood of each formula candidate using the calculated terminal formula component likelihood and structure likelihood;
Means for determining an input handwritten mathematical formula based on the calculated likelihood of each mathematical formula candidate;
An apparatus for recognizing handwritten mathematical formulas.

The handwritten mathematical expression recognition apparatus according to claim 16, wherein the terminal mathematical expression component is a stroke.

The handwritten mathematical expression recognition apparatus according to claim 16, wherein the terminal mathematical component is a symbol.