JP2014106961A

JP2014106961A - Method executed by computer for automatically recognizing text in arabic, and computer program

Info

Publication number: JP2014106961A
Application number: JP2013118680A
Authority: JP
Inventors: S Khorsheed Mohammad; モハメド・エス・ホルシード; Hussein K Al-Omari; フセイン・ケィ・アル−オマリ
Original assignee: King Abdulaziz City for Science and Technology KACST
Current assignee: King Abdulaziz City for Science and Technology KACST
Priority date: 2009-04-27
Filing date: 2013-06-05
Publication date: 2014-06-09
Also published as: US8369612B2; US20130251247A1; US20130077864A1; US8111911B2; US8908961B2; US8761500B2; US8472707B2; US20120087584A1; US20140219562A1; US20100272361A1

Abstract

PROBLEM TO BE SOLVED: To extract a text feature appropriately in recognizing a text in Arabic.SOLUTION: A two-dimensional array of pixels associated with pixel values each of which is expressed in a binary number is formed as a result of digitalization of a line of Arabic characters. The pixel values are expressed in a binary number. The line of Arabic characters is divided into a plurality of line images, and a plurality of cells are defined in one of the plurality of line images. Each of the plurality of cells has an adjoining pixel group. A two-value cell number is formed as a result of serialization of the pixel value of each pixel of the plurality of cells in one of the plurality of line images. A text feature vector is formed in accordance with the two-value cell number obtained from the plurality of cells in one of the plurality of line images. The text feature vector is sent to a hidden Markov model so that the line of Arabic characters is recognized.

Description

本特許出願は、同一発明者によって２０１１年１２月１４日に出願され、同一譲受人の、係属する米国特許出願第１３／３２５，７８９号、名称「効果的なアラビア語テキスト特徴の抽出に基づく、アラビア語テキスト認識のためのシステムおよび方法」の継続出願であり、当該出願についての優先権を主張する。米国特許出願第１３／３２５，７８９号は、同一発明者によって２００９年４月２７日に出願された、米国特許出願第１２／４３０，７７３号、名称「効果的なアラビア語テキスト特徴の抽出に基づく、アラビア語テキスト認識のためのシステムおよび方法」の継続出願であり、その開示は、ここに参照により組み込まれる。 This patent application was filed on Dec. 14, 2011 by the same inventor and is based on the same assignee's co-pending US patent application No. 13 / 325,789, entitled “Effective Extraction of Arabic Text Features”. , A system and method for recognizing Arabic text "and claims priority to the application. U.S. Patent Application No. 13 / 325,789 is filed by the same inventor on U.S. Patent Application No. 12 / 430,773, filed April 27, 2009, entitled "Effective Arabic Text Feature Extraction". Based on "System and Method for Arabic Text Recognition", the disclosure of which is hereby incorporated by reference.

発明の背景
本願は、概して、アラビア語テキストの自動的な認識に関する。 BACKGROUND OF THE INVENTION This application relates generally to automatic recognition of Arabic text.

テキスト認識、つまり、テキストの自動読取は、パターン認識の一分野である。テキスト認識の目的は、印刷されたテキストを、人間の精度で、かつ、より速く、読取ることである。多くのテキスト認識の方法は、テキストが個々の文字へと分離できることを前提としている。このような技術では、タイプライタで打たれた、または、活字に組まれたラテン語については首尾よくいくが、アラビア語のような筆記体には、信頼できる程度に適用することはできない。これまでのアラビア語の手描きテキストの認識についての研究によれば、アラビア語の単語を個々の文字へとセグメント化する試みにおける困難性が確認されている。 Text recognition, i.e. automatic reading of text, is a field of pattern recognition. The purpose of text recognition is to read printed text with human accuracy and faster. Many text recognition methods assume that the text can be separated into individual characters. Such techniques work well for Latin words typed or typed, but cannot be reliably applied to cursive letters such as Arabic. Previous studies on the recognition of hand-drawn text in Arabic have found difficulties in attempting to segment Arabic words into individual characters.

アラビア語テキストの認識には、統計モデルのような異なる分類体系が適用されてきた。しかしながら、適切にテキスト特徴を抽出することは、未だ、正確なアラビア語テキストの認識を達成することにおいての主要な障害のままである。 Different classification systems such as statistical models have been applied to the recognition of Arabic text. However, properly extracting text features remains a major obstacle in achieving accurate Arabic text recognition.

発明の概要
概略的な側面において、本願発明は、アラビア語テキストを自動的に認識するための方法に関する。当該方法は、アラビア語の文字のラインを含むテキスト画像を取得することと、アラビア語の文字のラインをデジタル化することにより、各々が画素値に関連付けられた二次元的の画素の配列を形成することとを含み、画素値は２進数で表現され、上記方法は、さらに、アラビア語の文字のラインを複数のライン画像へと分割することと、複数のライン画像の中の１つにおいて複数のセルを規定することとを含み、複数のセルの各々は、隣接した画素のグループを有し、上記方法は、さらに、複数のライン画像の中の１つにおいて複数のセルの各々の画素の画素値をシリアル化することにより２値セル番号を形成することと、複数のライン画像の中の１つにおける複数のセルから取得された２値セル番号に従ってテキスト特徴ベクトルを形成することと、テキスト特徴ベクトルを隠れマルコフモデル（Hidden Markov Model）に送ることによりアラビア語の文字のラインを認識することとを含む。 SUMMARY OF THE INVENTION In a general aspect, the present invention relates to a method for automatically recognizing Arabic text. The method obtains a text image that includes lines of Arabic characters and digitizes the lines of Arabic characters to form a two-dimensional array of pixels, each associated with a pixel value. The pixel values are expressed in binary numbers, and the method further includes dividing the line of the Arabic character into a plurality of line images and a plurality of lines in one of the plurality of line images. Each of the plurality of cells has a group of adjacent pixels, and the method further includes the step of: for each pixel of the plurality of cells in one of the plurality of line images. Form binary cell numbers by serializing pixel values and form text feature vectors according to binary cell numbers obtained from multiple cells in one of multiple line images Including a Rukoto, and recognizing the Arabic character line by sending a hidden Markov model and the text feature vectors (Hidden Markov Model).

他の概略的な局面において、本願発明は、アラビア語テキストを自動的に認識するための方法に関する。当該方法は、アラビア語の文字のラインを含むテキスト画像を取得すること、アラビア語の文字のラインをデジタル化することにより、各々が２進数で表現された画素値に関連付けられた二次元の画素の配列を形成することを含み、二次元の画素の配列は、第１の方向における複数の行と、第２の方向における複数の列とを含み、上記方法は、さらに、画素の列において同じ画素値を有する連続する画素の頻度をカウントすることと、画素の列から得られた頻度カウントを利用してテキスト特徴ベクトルを形成することと、当該テキスト特徴ベクトルを隠れマルコフモデルへ送ることによりアラビア語の文字のラインを認識することとを含む。 In another general aspect, the present invention relates to a method for automatically recognizing Arabic text. The method obtains a text image including a line of Arabic characters, digitizes the line of Arabic characters, thereby each of two-dimensional pixels associated with pixel values expressed in binary numbers The array of two-dimensional pixels includes a plurality of rows in a first direction and a plurality of columns in a second direction, and the method is further the same in the columns of pixels By counting the frequency of successive pixels with pixel values, forming a text feature vector using the frequency count obtained from the column of pixels, and sending the text feature vector to the hidden Markov model Recognizing the line of letters of the word.

他の概略的な局面において、本願発明は、アラビア語テキストを自動的に認識するための方法に関する。当該方法は、アラビア語の文字のラインを含むテキスト画像を取得することと、アラビア語の文字のラインをデジタル化することにより、各々が画素値に関連付けられた二次元の画素の配列を形成することと、当該アラビア語の文字のラインを複数のライン画像へと分割することと、当該複数のライン画像の少なくとも１つを小型化することにより小型化されたライン画像を生成することと、小型化されたライン画像の各々の列の画素の画素値をシリアル化することにより一連のシリアル化された番号を形成することとを含み、一連のシリアル化された番号はテキスト特徴ベクトルを形成し、上記方法は、さらに、当該テキスト特徴ベクトルを隠れマルコフモデルへ送ることによりアラビア語の文字のラインを認識することとを含む。 In another general aspect, the present invention relates to a method for automatically recognizing Arabic text. The method forms a two-dimensional array of pixels, each associated with a pixel value, by acquiring a text image that includes lines of Arabic characters and digitizing the lines of Arabic characters. And dividing the line of Arabic characters into a plurality of line images, generating a miniaturized line image by miniaturizing at least one of the plurality of line images, Forming a series of serialized numbers by serializing the pixel values of the pixels of each column of the digitized line image, the series of serialized numbers forming a text feature vector; The method further includes recognizing a line of Arabic characters by sending the text feature vector to a hidden Markov model.

他の概略的な局面において、本願発明は、コンピュータ読取可能なプログラムコード関数を含むコンピュータプログラムに関し、当該コード関数は、コンピュータに、アラビア語の文字のラインを含むテキスト画像を取得させ、アラビア語の文字のラインをデジタル化させることにより、各々が画素値に関連付けられた二次元の画素の配列を形成させ、当該画素値は２進数で表現され、上記コード関数は、上記コンピュータに、さらに、アラビア語の文字のラインを複数のライン画像へと分割させ、複数のライン画像の中の１つにおける複数のセルを規定させ、複数のセルの各々は隣接する画像のグループを有し、上記コード関数は、上記コンピュータに、さらに、複数のライン画像の中の１つにおける複数のセルの各々の画素の画素値をシリアル化させ、複数のライン画像の中の１つにおける複数のセルから取得された２進数のセル番号に応じてテキスト特徴ベクトルを形成させ、当該テキスト特徴ベクトルを隠れマルコフモデルに送ることによりアラビア語の文字のラインを認識させる。 In another general aspect, the present invention relates to a computer program that includes a computer-readable program code function that causes a computer to obtain a text image that includes a line of Arabic characters, By digitizing the character lines, a two-dimensional array of pixels, each associated with a pixel value, is formed, the pixel values are represented in binary numbers, and the code function is further transmitted to the computer, further to the Arabic language. Dividing the word character line into a plurality of line images, defining a plurality of cells in one of the plurality of line images, each of the plurality of cells having a group of adjacent images, the code function The computer further stores the pixel value of each pixel of the plurality of cells in one of the plurality of line images. Arabic by forming a text feature vector according to binary cell numbers obtained from a plurality of cells in one of a plurality of line images and sending the text feature vector to a hidden Markov model Recognize lines of characters.

システムの実現は、以下に示されたもののうち１またはそれ以上を含む場合がある。上記方法は、さらに、２進数のセル番号を１０進数のセル番号へと変換することと、複数のライン画像の中の１つにおける複数のセルから取得された１０進数のセル番号をシリアル化することにより一連の１０進数のセル番号を形成することと、複数のライン画像の中の１つにおける複数のセルから取得された一連の１０進数のセル番号に従ってテキスト特徴ベクトルを形成することとを含み得る。二次元の画素の配列は、第１の方向における複数の行と、第２の方向における複数の列とを含み得る。アラビア語の文字のラインは、実質的に上記第１の方向に沿って並び得る。複数のライン画像は、上記第１の方向に沿って連続的に並び得る。複数のライン画像の中の少なくとも１つは、第１の方向におけるＭ個の行によって定義される高さと、第２の方向におけるＮ個の列によって規定される幅とを有し得る。ＭおよびＮは、整数である。二次元の画素の配列は、Ｎ行の画素を含み得る。Ｎは、２とおよそ１００との間の範囲にあり得る。Ｎは、３とおよそ１０との間の範囲にあり得る。二次元の画素の配列における画素値は、単一のビットの２進数で表現され得る。二次元の画素の配列における画素値は、マルチビットの２進数で表現され得る。隠れマルコフモデルは、隠れマルコフモデルツールキットとして実装され得る。 System implementations may include one or more of those shown below. The method further converts a binary cell number to a decimal cell number and serializes the decimal cell numbers obtained from multiple cells in one of the multiple line images. Forming a series of decimal cell numbers, and forming a text feature vector according to the series of decimal cell numbers obtained from the plurality of cells in one of the line images. obtain. The two-dimensional array of pixels may include a plurality of rows in the first direction and a plurality of columns in the second direction. The line of Arabic characters can be aligned substantially along the first direction. The plurality of line images may be continuously arranged along the first direction. At least one of the plurality of line images may have a height defined by M rows in the first direction and a width defined by N columns in the second direction. M and N are integers. The two-dimensional array of pixels may include N rows of pixels. N may be in the range between 2 and approximately 100. N may be in the range between 3 and approximately 10. Pixel values in a two-dimensional array of pixels can be represented by a single bit binary number. The pixel values in the two-dimensional pixel array can be expressed in multi-bit binary numbers. The hidden Markov model can be implemented as a hidden Markov model toolkit.

本願において記述されるシステムおよび方法は、アラビア語テキストにおける特徴の抽出のための、包括的な、定量的な、かつ正確な技術を提供する。開示されたアラビア語の文字の認識は、いくつかの従来の技術よりも、より効率的であり、かつ計算時間が短い。開示されたシステムおよび方法は、さらにいくつかの従来の技術よりも、より単純かつ私用しやすい。 The systems and methods described in this application provide a comprehensive, quantitative and accurate technique for feature extraction in Arabic text. The disclosed Arabic character recognition is more efficient and requires less computation time than some prior art. The disclosed systems and methods are also simpler and easier to use than some prior art.

発明は複数の実施例を参照することにより具体的に示され記述されているが、形式上の種々の変更や詳細は、発明の精神および範囲を離れることなくなされ得ることが、当業者によって、理解されるであろう。 While the invention has been particularly shown and described by reference to a plurality of embodiments, it will be understood by those skilled in the art that various changes and details in form can be made without departing from the spirit and scope of the invention. Will be understood.

図面の簡単な説明
以下の図面は、出願書類に組込まれかつその一部を形成し、本願発明の実施例を説明し、かつ、明細書とともに、発明の本質を説明するために供される。 BRIEF DESCRIPTION OF THE DRAWINGS The following drawings are incorporated into and form a part of the application documents, illustrate embodiments of the invention, and together with the description, serve to explain the nature of the invention.

本開示におけるアラビア語のテキスト認識の工程を説明するためのフロー図である。It is a flowchart for demonstrating the process of the text recognition of the Arabic language in this indication. アラビア語テキストを含むテキスト画像を説明する図である。It is a figure explaining the text image containing an Arabic text. テキスト画像を、各々が複数の画素を含む複数のライン画像へと分割することを説明する図である。It is a figure explaining dividing a text image into a plurality of line images each including a plurality of pixels. 図３Ａに示されたライン画像の一部分における、画素および画素値を説明する図である。It is a figure explaining the pixel and pixel value in a part of line image shown by FIG. 3A. 図３Ａに示されたライン画像の一部分における、画素および画素値を説明する図である。It is a figure explaining the pixel and pixel value in a part of line image shown by FIG. 3A. 本願に従ったテキスト特徴抽出の方法を説明する図である。It is a figure explaining the method of text feature extraction according to this application. 図４に示されたテキスト特徴抽出の工程を説明するフロー図である。FIG. 5 is a flowchart illustrating a text feature extraction process shown in FIG. 4. 本願に従ったテキスト特徴抽出の他の方法を説明する図である。It is a figure explaining the other method of text feature extraction according to this application. 本開示に従った他のテキスト特徴抽出方法を説明する図である。It is a figure explaining other text feature extraction methods according to this indication. 本開示に従った他のテキスト特徴抽出方法を説明する図である。It is a figure explaining other text feature extraction methods according to this indication. 本開示に従った他のテキスト特徴抽出方法を説明する図である。It is a figure explaining other text feature extraction methods according to this indication. 本開示に従った他のテキスト特徴抽出方法を説明する図である。It is a figure explaining other text feature extraction methods according to this indication. 図７Ａ〜図７Ｄに示されたテキスト特徴抽出の工程を説明するフロー図である。FIG. 8 is a flowchart for explaining the text feature extraction process shown in FIGS. 7A to 7D.

発明の詳細な説明
図１は、本発明に従ったアラビア語のテキスト認識の概略的な流れを説明する。図１〜図３Ｃを参照して、アラビア語のテキスト文書から、テキスト画像２００が取得される（図１のステップ１１０）。テキスト画像２００におけるアラビア語テキストは、複数のテキストライン２１１−２１４に配置され得、その各々は、筆記体のアラビア語の文字のストリングを含む。テキストライン２１１−２１４は、複数のライン画像３１１−３１３へと分割される（図１のステップ１２０）。ライン画像３１１，３１２，または３１３は、それから、各々が画素値を割り当てられた画素３２１−３２３へと分割される（図１のステップ１３０）。ライン画像３１１，３１２，または３１３の幅は、２画素と１００画素との間の範囲にあり得、または、３画素と１０画素との間の範囲にあり得る。ライン画像３１１，３１２，または３１３は、完全な文字、部分的な文字、または結合した文字を含み得る。 DETAILED DESCRIPTION OF THE INVENTION FIG. 1 illustrates the general flow of Arabic text recognition according to the present invention. 1 to 3C, a text image 200 is acquired from an Arabic text document (step 110 in FIG. 1). The Arabic text in the text image 200 may be placed in a plurality of text lines 211-214, each of which includes a string of cursive Arabic characters. The text lines 211-214 are divided into a plurality of line images 311-313 (step 120 in FIG. 1). The line images 311, 312, or 313 are then divided into pixels 321-323, each assigned a pixel value (step 130 of FIG. 1). The width of the line images 311, 312, or 313 can be in the range between 2 and 100 pixels, or can be in the range between 3 and 10 pixels. Line images 311, 312, or 313 may include complete characters, partial characters, or combined characters.

画素値は、特定の画素の位置でのテキスト画像２００の明度値を表わす。ある実装では、明度値が高いことは、白色背景に位置し得る画素における明るい画像の色（または、低密度）を表す。明度値が低いことは、一筆のアラビア語の文字（a stroke of an Arabic character）内に位置し得る暗い画像の色（または、高密度）を表わす。画素値は、２進数、１０進数、および１６進数のような、異なる計数法で表現されてもよい。 The pixel value represents the brightness value of the text image 200 at a specific pixel position. In some implementations, a high brightness value represents the color (or low density) of a bright image at pixels that may be located on a white background. A low brightness value represents the color (or high density) of a dark image that may be located within a stroke of an Arabic character. Pixel values may be expressed in different counting methods, such as binary, decimal, and hexadecimal.

図３Ａ〜図３Ｃを参照して、ライン画像３１１は、複数の画素３２１−３２３を含む画像部分３２０を含む。画素３２１−３２３の各々は、２進数の画素値「０」または「１」を割り当てられている。画素値「１」は、白色の背景を表わす。画素値「０」は、一筆のアラビア語の文字内にある、暗画像色（つまり、低い明度値）を表わす。開示されたシステムおよび方法は、２進数で表わされたマルチビットの画素値にも適合可能であり、当該２進数で表わされたマルチビットの画素値は、多段階のトーンレベル（たとえば、グレースケール）で、画像濃度を表わし得る。 With reference to FIGS. 3A to 3C, the line image 311 includes an image portion 320 including a plurality of pixels 321 to 323. Each of the pixels 321 to 323 is assigned a binary pixel value “0” or “1”. Pixel value “1” represents a white background. The pixel value “0” represents a dark image color (that is, a low brightness value) in one stroke of Arabic characters. The disclosed system and method can also be adapted to multi-bit pixel values expressed in binary numbers, where the multi-bit pixel values expressed in binary numbers can have multiple tone levels (eg, (Grayscale) can represent image density.

本開示に従うと、テキスト特徴ベクトルは、テキストライン２１１またはライン画像３１１−３１３から抽出され得る（図１のステップ１４０）。テキスト特徴抽出のさまざまな実装の詳細については、以下に、図４〜図８に関連付けられて、議論される。テキスト特徴ベクトルの厳密な形態は、以下に記載されるように、抽出方法によって変化し得る。 In accordance with this disclosure, text feature vectors may be extracted from text lines 211 or line images 311-313 (step 140 of FIG. 1). Details of various implementations of text feature extraction are discussed below in connection with FIGS. The exact form of the text feature vector can vary depending on the extraction method, as described below.

ステップ１４０において取得された特徴ベクトルは、次に、隠れマルコフモデル（ＨＭＭ）に送られる（図１のステップ１５０）。本開示では、ＨＭＭは、隠れマルコフモデルツールキット（ＨＴＫ）によって実装される場合があり、それは、隠れマルコフモデルを構築し操作するための移植可能なツールキットである。ＨＴＫは、語彙集がなく、学習用サンプル文字からのモデルおよび文法に依存する。ＨＭＭは、確率解釈を提供し、特徴ベクトルにおいて見い出されたパターンにおける変化を許容し得る。ＨＴＫの機能性の大部分は、Ｃソースコードで利用可能なライブラリモジュールに組込まれ得る。これらのモジュールは、従来のコマンドライン形式のインターフェイスで動作するように設計されているため、ＨＴＫツールの実行を制御するためのスクリプトの記述がシンプルになる。 The feature vector obtained in step 140 is then sent to a hidden Markov model (HMM) (step 150 in FIG. 1). In this disclosure, the HMM may be implemented by a hidden Markov model toolkit (HTK), which is a portable toolkit for building and manipulating hidden Markov models. HTK has no vocabulary and relies on models and grammar from sample characters for learning. The HMM provides a probability interpretation and can tolerate changes in the pattern found in the feature vector. Most of the functionality of HTK can be incorporated into library modules available in C source code. These modules are designed to operate with a conventional command line type interface, thus simplifying the writing of scripts for controlling the execution of the HTK tools.

ＨＭＭは、既知のアラビア語の単語を含むテキスト画像から取得された特徴ベクトルを用いることによって、学習させることができる（データ転記）（図１のステップ１６０）。ＨＴＫは、学習用サンプルのための文字モデルとグランドツルース（ground truth）とともに提供される。文字のモデル化のためのコンポーネントは、特徴ベクトルとそれに対応するグランドツルースとを利用し、文字モデルを評価する。学習用サンプルによって生成された観察結果は、モデルパラメータを調整するのに用いられるが、テスト用のサンプルによって生成された観察結果は、システムの性能を調査するのに利用される。モデルの各状態は、アルファベットの組における字を表わし、各特徴ベクトルは、１つの観察結果に相当する。ＨＴＫ学習ツールは、準備された学習用データを利用して文字モデルパラメータを調整し、既知のデータ転記を予測することができる。 The HMM can be trained by using a feature vector acquired from a text image containing a known Arabic word (data transcription) (step 160 in FIG. 1). HTK is provided with a character model and ground truth for learning samples. A component for character modeling uses a feature vector and a corresponding ground truth to evaluate a character model. Observations generated by the training samples are used to adjust model parameters, while observations generated by the test samples are used to investigate system performance. Each state of the model represents a letter in the alphabet set, and each feature vector corresponds to one observation. The HTK learning tool can adjust the character model parameters using the prepared learning data and predict a known data transfer.

ＨＭＭパラメータは、学習用画像セグメントのためのグランドツルースから推定された。このセグメント化は輪郭にも適用されて、セグメント化のポイントを発見し、これらのセグメントから特徴を抽出し、そして、特徴ベクトルを観察シーケンスに伝達し得る。セグメント化を基礎とした技術は、単語の画像と文字列とを一致させるためのダイナミックプログラミングに利用される。学習段階では、テキスト画像に相当するテキストであるグランドツルースと一体となった、走査されたテキストのラインが、入力として取得される。そして、各ラインは、狭い縦割りの窓へと分割され、そこから特徴ベクトルが抽出される。 HMM parameters were estimated from ground truth for the learning image segment. This segmentation can also be applied to contours to find segmentation points, extract features from these segments, and communicate feature vectors to the observation sequence. Segmentation-based techniques are used for dynamic programming to match word images and character strings. In the learning stage, a scanned line of text integrated with Grand Truth, which is text corresponding to a text image, is obtained as input. Each line is divided into narrow vertically divided windows from which feature vectors are extracted.

学習したＨＭＭは、辞書および言語モデルを利用して、特徴ベクトルにおけるアラビア語テキストを認識するために用いられる（図１のステップ１７０）。認識段階は、最も高い尤度の文字シーケンスを見つけるための学習段階において推定された異なる知識源とともに用いられる特徴ベクトルを抽出するのと同じ工程に引き続く。認識ツールは、あるモデルから他のモデルへの遷移確率を記述するために、ネットワークを必要とする。辞書および言語モデルが当該ツールに入力され、認識装置が正しい状態シーケンスを出力するのに役立つことができる。 The learned HMM is used to recognize the Arabic text in the feature vector using a dictionary and a language model (step 170 in FIG. 1). The recognition phase follows the same process of extracting feature vectors used with different knowledge sources estimated in the learning phase to find the highest likelihood character sequence. A recognition tool requires a network to describe the transition probabilities from one model to another. A dictionary and language model can be entered into the tool to help the recognizer output the correct state sequence.

いくつかの実施形態では、図３Ａ〜図５を参照して、ライン画像３１１−３１３は、各々が画素値によって特徴付けられる画素３２１−３２３の配列へとデジタル化される（図５のステップ５１０）。ライン画像３１１は、図４に示されるように、複数のセル４１０−４６０へと分割される（図５のステップ５２０）。セル４１０−４６０の各々は、３×３画素の配列のような、隣接する画素のグループを含む。たとえば、セル４２０は、画素４２２，４２３および他の画素を含む。 In some embodiments, referring to FIGS. 3A-5, line images 311-313 are digitized into an array of pixels 321-323, each characterized by a pixel value (step 510 of FIG. 5). ). The line image 311 is divided into a plurality of cells 410-460 as shown in FIG. 4 (step 520 in FIG. 5). Each of cells 410-460 includes a group of adjacent pixels, such as a 3 × 3 pixel array. For example, cell 420 includes pixels 422, 423 and other pixels.

次に、各々のセルの画素値が、２進数のセル番号で表わされる（図５のステップ５３０）。各セルにおける画素値は、まず、シリアル化される。たとえば、セル４２０における９つの画素３２２−３２３は、連続する３行の順に、次のようにシリアル化される：１，１，１，１，０，０，１，０，０。一連の２進数の画素値は、その後、９ビットの２進数のセル番号へとマップされる。画素３２２の画素値は、最上位ビットにマップされ、画素３２３の画素値は、最下位ビットにマップされる。結果として、セル４２０における画素値は、２進数で表わされる９ビットのセル番号１１１１００１００で表わされる。同様に、セル４１０−４６０における画素値が、それぞれが０と５１１との間の範囲にある、２進数で表わされるセル番号４８０へと変換される。 Next, the pixel value of each cell is represented by a binary cell number (step 530 in FIG. 5). The pixel value in each cell is first serialized. For example, the nine pixels 322-323 in cell 420 are serialized in the order of three consecutive rows as follows: 1,1,1,1,0,0,1,0,0. The series of binary pixel values is then mapped to a 9-bit binary cell number. The pixel value of pixel 322 is mapped to the most significant bit, and the pixel value of pixel 323 is mapped to the least significant bit. As a result, the pixel value in the cell 420 is represented by a 9-bit cell number 111100100 represented by a binary number. Similarly, the pixel values in cells 410-460 are converted to cell numbers 480 expressed in binary numbers, each in the range between 0 and 511.

ライン画像３１１のセルにおける、２進数のセル番号は、次に、１０進数のセル番号４９０へと変換される（図５のステップ５４０）。１０進数のセル番号４９０は、その後、ライン画像３１１のための特徴ベクトルを形成するためにシリアル化される（図５のステップ５５０）。ステップ５２０−５５０は、別のライン画像のために繰返される。別のライン画像３１１−３１３からの特徴ベクトルは、その後、隠れマルコフモデルへと送られ、テキストラインにおけるアラビア語の文字を認識する（図５のステップ５６０）。 The binary cell number in the cell of the line image 311 is then converted to a decimal cell number 490 (step 540 in FIG. 5). The decimal cell number 490 is then serialized to form the feature vector for the line image 311 (step 550 in FIG. 5). Steps 520-550 are repeated for another line image. The feature vector from another line image 311-313 is then sent to the hidden Markov model to recognize Arabic characters in the text line (step 560 in FIG. 5).

図４〜図５と併せて記述された上記の抽出方法は、図１において説明された処理のためのテキスト特徴抽出の実装を表す。上記のテキスト特徴抽出方法は、データストリングにおけるマルチビットの画素値および他の数値表現に適合することが理解されるべきである。たとえば、画素値は、テキスト画像におけるグレースケール情報（または、マルチトーン）を取り込むことのできる、３ビットまたは５ビットの２進数によって表わされ得る。マルチビットの画素値は、ストロークのエッジに沿ったテキスト特徴の記述の精度を改善し得る。 The above extraction method described in conjunction with FIGS. 4-5 represents an implementation of text feature extraction for the process described in FIG. It should be understood that the above text feature extraction method is compatible with multi-bit pixel values and other numerical representations in a data string. For example, pixel values can be represented by 3-bit or 5-bit binary numbers that can capture grayscale information (or multitones) in a text image. Multi-bit pixel values can improve the accuracy of the description of text features along the edges of the stroke.

さらに、２進数の代わりに、画素値は、最小値と最大値との間のいかなる数値範囲によっても表わされ得る。いくつかの実装においては、画素値は、［０，１］または［−１，１］のような、所定の範囲に比例した（または、正規化された）値となり得る。そして、画素値は、量子化され得る。特徴ベクトルは、ステップ５３０−５５０と同様に取得され得る。 Furthermore, instead of binary numbers, pixel values can be represented by any numerical range between the minimum and maximum values. In some implementations, the pixel value can be a value that is proportional (or normalized) to a predetermined range, such as [0, 1] or [-1, 1]. The pixel value can then be quantized. A feature vector may be obtained as in steps 530-550.

いくつかの実施形態では、図６を参照して、ライン画像６１０は、分解能において縮小され（つまり、小型化され）、これにより、小型化されたライン画像６２０が形成される。たとえば、ライン画像６１０は、６０画素の高さを有し得る。小型化されたライン画像６２０は、１／３倍の寸法で、２０画素の高さを有し得る。小型化されたライン画像６２０は、各々が画素値によって表わされる画素の配列６３０を形成するために、デジタル化される。配列６３０における各列の画素値は、２進数を形成するために、シリアル化される。異なる列からの２進数は特徴ベクトルを形成するデータストリング６４０を形成する。テキストラインのライン画像から取得された特徴ベクトルは、隠れマルコフモデルへ送られ、これにより当該テキストラインにおけるアラビア語の文字を認識することができる（図５のステップ５６０）。 In some embodiments, referring to FIG. 6, the line image 610 is reduced in resolution (ie, miniaturized), thereby forming a miniaturized line image 620. For example, the line image 610 may have a height of 60 pixels. The miniaturized line image 620 may be 1/3 times as large and have a height of 20 pixels. The miniaturized line image 620 is digitized to form an array 630 of pixels, each represented by a pixel value. The pixel values for each column in array 630 are serialized to form a binary number. Binary numbers from different columns form a data string 640 that forms a feature vector. The feature vector obtained from the line image of the text line is sent to the hidden Markov model, whereby Arabic characters in the text line can be recognized (step 560 in FIG. 5).

図７Ａ，図７Ｂ，および図８を参照して、ライン画像７００は、ステップ５１０（図５）と同様に、画素の配列へとデジタル化される（図８のステップ８１０）。画素は、複数の列に配置される。画素値は、値「１」または値「０」を有する、単一のビットの２進数によって表わされる。各列の画素値がシリアル化されることにより、単一のビットの２進数の列が形成される（図８のステップ８３０）。 Referring to FIGS. 7A, 7B, and 8, the line image 700 is digitized into an array of pixels (step 810 in FIG. 8), similar to step 510 (FIG. 5). Pixels are arranged in a plurality of columns. The pixel value is represented by a single bit binary number having the value “1” or the value “0”. The pixel values in each column are serialized to form a single bit binary column (step 830 in FIG. 8).

次に、図７Ｃおよび図７Ｄに示されるように、値「０」および値「１」の、同じ２進数の画素値を有する連続した画素の頻度が、計算される（図８のステップ８４０）。当該頻度は、足切遷移番号（cut off transition number）まで、カウントされる。当該頻度を表形式化して、頻度カウント７５０および７６０を形成する（図８のステップ８５０）。コンプリメンタリ画素値、たとえば、 Next, as shown in FIGS. 7C and 7D, the frequency of consecutive pixels having the same binary pixel value of value “0” and value “1” is calculated (step 840 of FIG. 8). . The frequency is counted up to the cut off transition number. The frequencies are tabulated to form frequency counts 750 and 760 (step 850 in FIG. 8). Complementary pixel values, for example

以外同じ数の遷移を有する２つの画素の列を区別するために、列の最上部の画素から値「１」の数のカウントを開始することによって、頻度カウントが実行される。左側の列では、初めは、画素値「１」のカウントは「０」であり、「３」カウントの画素値「０」が続く。当該２つの列におけるコンプリメンタリ画素値は、結果として、次の頻度カウントのようになる： In order to distinguish between two pixel columns that have the same number of transitions, a frequency count is performed by starting counting the number of values “1” from the topmost pixel of the column. In the left column, initially, the count of the pixel value “1” is “0”, followed by the pixel value “0” of “3” count. Complementary pixel values in the two columns result in the following frequency count:

各列の初めにおける、当初の画素カウントが、本発明の精神から逸脱することなく、画素値「０」について行なうこともできることが、理解されるべきである。 It should be understood that the initial pixel count at the beginning of each column can also be performed for the pixel value “0” without departing from the spirit of the present invention.

表形式の頻度カウント７５０，７６０（図７Ｃ，図７Ｄ）における各行は、白色の背景（画素値「１」を有する）から暗テキスト領域（画素値「０」を有する）への、またはその逆の、画素値における遷移を表わしている。データを圧縮するために、頻度カウントが、最大遷移番号で切り捨てられている。 Each row in tabular frequency counts 750, 760 (FIGS. 7C, 7D) is from a white background (having pixel value “1”) to a dark text region (having pixel value “0”) or vice versa. Of the pixel value. To compress the data, the frequency count is truncated at the maximum transition number.

表形式の頻度カウント７５０，７６０の各列における頻度カウントは、特徴ベクトルを形成している（図８のステップ８６０）。したがって、本実施の形態では、各列は、ベクトルと称することもできる。ライン画像におけるさまざまな列からの特徴ベクトルが、隠れマルコフモデルへ送られる（図８のステップ８７０）。 The frequency counts in each column of tabular frequency counts 750, 760 form a feature vector (step 860 in FIG. 8). Therefore, in this embodiment, each column can also be referred to as a vector. Feature vectors from various columns in the line image are sent to the hidden Markov model (step 870 in FIG. 8).

最大遷移番号は、アラビア語テキストの大標本についての統計的解析によって決定される。表１に示されるように、およそ９９．３１％の列が、６以下の遷移を有している。換言すれば、テキスト画像の大多数が、足切遷移番号として６を選択することにより適切に特徴付けられ得る。 The maximum transition number is determined by statistical analysis on a large sample of Arabic text. As shown in Table 1, approximately 99.31% of the columns have 6 or fewer transitions. In other words, the majority of text images can be appropriately characterized by selecting 6 as the cut-off transition number.

ＨＭＭをベースとしたシステムを構築するときには、このシステムの学習および検査において用いられる特徴ベクトルのタイプが最初に規定される。特徴ベクトルは、継続タイプと分離タイプとに分類されることができる。継続タイプの特徴ベクトルを利用するシステムでは、上記モデルに送られる係数の配列が、またある場合はマトリクスが、利用される。分離タイプの特徴ベクトルが利用されるシステムでは、単一の係数が、上記モデルに送られる。ベクトル量子化手段が、継続タイプのベクトルを分離タイプのベクトルに変換し、これは、ＨＴＫに伴うHQuantツールとHCopyツールとが用いられることによってなされる。HQuantは、後に分離タイプのベクトルを生成するHCopyとともに用いられる学習用データからコードブックを構築するために用いられる。コードブックの構築は、システムのサイズに応じて当該システムの性能に影響を及ぼし、また、その構築に利用されたデータの量に影響を受ける。HQuantは、コードブックの構築に、線形ベクトル量子化アルゴリズム（Liner Vector Quantization Algorithm）を利用し、これは、計算するのには計算コストが高いアルゴリズムである。本開示では、ユニークベクトル量子化（Unique Vector Quantization（ＵＶＱ））という名前の新しい方法が導入され、これにより、演算時間が削減され、そして、システムの性能が改善される。この方法は、特徴ベクトルの繰返しを削除することによって、線形ベクトル量子化アルゴリズム（Liner Vector Quantization Algorithm）を利用するコードブックの構築に利用される特徴ベクトルの数を減らすことおよび、各特徴ベクトルのたった一つのコピーを保持するために用いられる特徴ベクトルの数を減らすことに焦点を当てている。表２に示されるように、コーパス内の特徴ベクトルの数は、大幅に削減されている。 When building an HMM-based system, the type of feature vector used in learning and testing the system is first defined. Feature vectors can be classified into continuation types and separation types. In systems that use continuation type feature vectors, an array of coefficients sent to the model, and in some cases a matrix, is used. In systems where separate type feature vectors are utilized, a single coefficient is sent to the model. The vector quantization means converts the continuation type vector into the separation type vector, which is done by using the HQuant tool and the HCopy tool associated with HTK. HQuant is used to build a codebook from training data that will be used later with HCopy to generate separable type vectors. The construction of the code book affects the performance of the system depending on the size of the system, and is also affected by the amount of data used for the construction. HQuant uses a Linear Vector Quantization Algorithm to build a codebook, which is a computationally expensive algorithm for computation. In this disclosure, a new method named Unique Vector Quantization (UVQ) is introduced, which reduces computation time and improves system performance. This method reduces the number of feature vectors used to build a codebook that uses the Linear Vector Quantization Algorithm by eliminating feature vector iterations, and only for each feature vector. The focus is on reducing the number of feature vectors used to hold one copy. As shown in Table 2, the number of feature vectors in the corpus is greatly reduced.

２０００個の異なるライン画像の特徴ベクトルのすべてを用いてコードブックを構築しようとしたとき、このコードブックについて構築できる最大のサイズが７２８であることを発見した。ユニーク特徴ベクトルのみから１０２４サイズのコードブックを構築するのに１時間３０分を要したのに対し、このコードブックの構築にはおよそ９時間を要した。モノラルモデル（mono models）を用いたこれらの実験からの認識速度は、表３に示される。ユニークな特徴ベクトルが線形ベクトル量子化アルゴリズムとともに用いられると、コードブックのサイズは増大する。計算速度は６倍に上昇し、認識速度は上昇した。 When trying to build a codebook using all of the 2000 different line image feature vectors, it was discovered that the maximum size that could be built for this codebook was 728. While it took 1 hour and 30 minutes to build a code book of 1024 size from only the unique feature vector, it took about 9 hours to build this code book. The recognition speed from these experiments using mono models is shown in Table 3. When a unique feature vector is used with a linear vector quantization algorithm, the size of the codebook increases. The calculation speed increased 6 times and the recognition speed increased.

上述の方法は、言及された特定の例に限定されるものではないことが、理解されるべきである。設定は、発明の精神から逸脱することなく変更され得る。たとえば、足切遷移番号は、６以外にも選択され得る。ライン画像の高さおよび幅は、当該ライン画像内のセルのサイズと同様に、上述の例とは異なるものにされ得る。テキスト特徴ベクトルの形態は、抽出方法に応じて変更され得る。たとえば、特徴ベクトルは、２進数、１０進数、または他の記数法で記述された数値の形態を取り得る。 It should be understood that the method described above is not limited to the specific examples mentioned. The settings can be changed without departing from the spirit of the invention. For example, the foot cut transition number can be selected other than 6. The height and width of the line image can be different from the above example, as can the size of the cells in the line image. The form of the text feature vector can be changed according to the extraction method. For example, the feature vector may take the form of a number described in binary, decimal, or other notation.

今回開示された実施の形態およびその変形例はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。実施の形態およびその変形例において開示された技術は、可能な限り単独でも組み合わせても実施され得ることが意図される。 It should be thought that embodiment disclosed this time and its modification are illustrations in all the points, and are not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims. It is intended that the techniques disclosed in the embodiments and the modifications thereof can be implemented alone or in combination as much as possible.

Claims

A computer-implemented method for automatically recognizing Arabic text,
Obtaining a text image containing lines of Arabic characters;
Digitizing the line of Arabic characters to form an array of two-dimensional pixels each associated with a pixel value represented in binary, the array of two-dimensional pixels comprising: A plurality of rows in the first direction and a plurality of columns in the second direction;
The method further comprises:
Counting the frequency of consecutive pixels of the same pixel value in a string of pixels in a column of pixels, each string of adjacent pixels having a different pixel value defined by a transition between them, Counting is further
When the number of transitions in a column reaches a predetermined cut-off transition number, stopping counting the frequency of consecutive pixels of the same pixel value in the column;
Forming a text feature vector using a frequency count obtained from a string in the column of pixels;
Recognizing a line of Arabic characters by sending the text feature vector to a hidden Markov model.

The computer-implemented method of claim 1, wherein the line of Arabic characters includes a plurality of Arabic words.

The computer-implemented method of claim 1, wherein the text feature vector is formed by a series of frequency counts obtained from a string of consecutive pixels in the column of pixels.

The computer-implemented method of claim 1, wherein the predetermined cut-off transition number is obtained by statistical analysis on Arabic text prior to the step of digitizing the line of Arabic characters. Method.

The computer-implemented method of claim 1, wherein the predetermined cut-off transition number is six.

The computer-implemented method of claim 1, wherein pixel values in the two-dimensional array are represented by a single bit binary number.

Counting the frequency is
Assigning a value of a first frequency count to “0” when the pixel value of the first one or more pixels in the column is “0”, and after the first frequency count, The computer-implemented method of claim 6, wherein the column is followed by a number of consecutive pixels having a pixel value of “0”.

Counting the frequency is
Assigning “0” as the value of the first frequency count when the pixel value of one or more pixels at the vertices of the column is “1”, and after the first frequency count, The computer-implemented method of claim 6, wherein the beginning of the column is followed by a number of consecutive pixels having a pixel value of “1”.

A computer-readable program for causing a computer to execute the following, including a program code function, the program code function:
Get a text image containing a line of Arabic characters,
Digitizing the line of Arabic characters forms a two-dimensional array of pixels, each associated with a pixel value represented in binary, the two-dimensional pixel array having a first direction And a plurality of columns in the second direction,
The program code function further causes the computer to count the frequency of consecutive pixels of the same pixel value in a string of pixels in a column of pixels, and a string of adjacent pixels each having a different pixel value between them. The step of counting further stops counting the frequency of consecutive pixels of the same pixel value when the number of transitions in the column reaches a predetermined cut-off transition number. Including
The program code function is stored in the computer.
Forming a text feature vector using a frequency count obtained from a string in the pixel column;
Recognizing a line of Arabic characters by sending the text feature vector to a hidden Markov model.

The computer program according to claim 9, wherein the line of Arabic characters includes a plurality of Arabic words.

The computer program product of claim 9, wherein the text feature vector is formed by a series of frequency counts obtained from a string of consecutive pixels in the column of pixels.

The computer program product of claim 9, wherein the predetermined cut-off transition number is obtained by statistical analysis of Arabic text prior to the step of digitizing the line of Arabic characters.

The computer program according to claim 9, wherein the predetermined cut-off transition number is six.

The computer program according to claim 9, wherein pixel values in the two-dimensional array are represented by a single bit binary number.

The step of counting the frequency comprises:
Assigning a value of a first frequency count to “0” when the pixel value of the first one or more pixels in the column is “0”, and after the first frequency count, The computer program according to claim 9, wherein the beginning of the column is followed by a number of consecutive pixels having a pixel value “0”.

The step of counting the frequency comprises:
Assigning “0” as the value of the first frequency count when the pixel value of one or more pixels at the vertices of the column is “1”, and after the first frequency count, The computer program product of claim 9, wherein the number of consecutive pixels having a pixel value “1” follows at the beginning of the column.