JP2012064050A

JP2012064050A - Image retrieval device, method and program

Info

Publication number: JP2012064050A
Application number: JP2010208555A
Authority: JP
Inventors: Toshihiro Yamazaki; 智弘山崎; Masaru Suzuki; 優鈴木; Shinichiro Hamada; 伸一郎浜田; Yoshiaki Mizuoka; 良彰水岡
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2010-09-16
Filing date: 2010-09-16
Publication date: 2012-03-29
Anticipated expiration: 2030-09-16
Also published as: JP5395770B2

Abstract

PROBLEM TO BE SOLVED: To provide an image retrieval device capable of extracting a character string even when characters are arranged in various directions.SOLUTION: An image retrieval device comprises an image storage part for storing an image, a character recognition part for executing a character recognition processing to an image and extracting area information of each character, a metadata storage part for storing area information of each character as metadata of the image, an input part for inputting a character string for searching an image, a retrieval part for retrieving area information of each character stored in the metadata storage part based on each character of the character string, a polygonal line generation part for determining each of central coordinates corresponding to each character of the character string from the retrieved area information to generate a polygonal line using a plurality of central coordinates and a score calculating part for determining the linking score of the polygonal line.

Description

本発明の実施形態は、画像検索装置に関する。 Embodiments described herein relate generally to an image search apparatus.

利用者がカメラで撮影したビットマップ画像などに含まれる文字を認識することによって、画像検索することが検討されている。 Searching for an image by recognizing characters included in a bitmap image taken by a user with a camera has been considered.

フォーマットがきっちり決まっている書類などと異なり、ビットマップ画像は文字認識がうまく働かないことが多く、文字列の抽出精度が低い。したがって、利用者が文字列によってビットマップ画像を検索する場合の再現性が低下するという問題がある。 Unlike documents with a well-defined format, bitmap images often do not work well for character recognition, and the accuracy of character string extraction is low. Therefore, there is a problem that reproducibility when a user searches for a bitmap image by a character string is lowered.

例えば、電車・バスなどの中吊り広告や看板に含まれる文字に至っては縦横に規則正しく並んでいるとはかぎらない。さらには１つの画像中でさえいろいろな方向に文字が並んでいることがあるため、レイアウトを正しく解析することが難しい。 For example, letters in hanging advertisements such as trains and buses and signs are not always lined up vertically and horizontally. Furthermore, since characters may be arranged in various directions even in one image, it is difficult to correctly analyze the layout.

また、ビットマップ画像においては明るさの変化が大きく、かつ文字のサイズと比較して文字同士の間隔が広いことが多いため、本来は１つであるべき領域が細かく複数の領域に分割されてしまうなど、単語としての塊を抽出することが困難である。 Also, in bitmap images, the change in brightness is large and the spacing between characters is often larger than the size of the characters, so the area that should originally be one is divided into a plurality of areas. It is difficult to extract a lump as a word.

辻智彦、他２名、“リアルタイム単語認識技術を利用したカメラベース情報取得システム”、平成２１年０２月１８日、信学技報Tomohiko Tsuji and two others, “Camera-based information acquisition system using real-time word recognition technology”, February 18, 2009, IEICE Technical Report

いろいろな方向に文字が並んでいる場合でも、文字列の抽出を行なうことのできる画像検索装置を提供する。 Provided is an image search apparatus capable of extracting a character string even when characters are arranged in various directions.

本実施形態の画像検索装置は、画像を格納する画像格納部と、前記画像を文字認識処理し、一文字ごとの領域情報を抽出する文字認識部と、一文字ごとの前記領域情報を前記画像のメタデータとして格納するメタデータ格納部と、画像を検索するための文字列を入力する入力部と、前記文字列の各文字に基づいて前記メタデータ格納部に格納されている一文字ごとの前記領域情報を検索する検索部と、検索された前記領域情報から前記文字列の各文字に対応する中心座標をそれぞれ求め、複数の前記中心座標を用いて折れ線を生成する折れ線生成部と、前記折れ線の連結スコアを求めるスコア算出部とを備える。 The image search apparatus according to the present embodiment includes an image storage unit that stores an image, a character recognition unit that performs character recognition processing on the image, extracts region information for each character, and stores the region information for each character as a meta data for the image. A metadata storage unit for storing data, an input unit for inputting a character string for searching for an image, and the region information for each character stored in the metadata storage unit based on each character of the character string A search unit for searching for, a center coordinate corresponding to each character of the character string from the searched area information, and a polygonal line generation unit for generating a polygonal line using the plurality of center coordinates, and a connection of the polygonal line A score calculation unit for obtaining a score.

本実施形態に係る画像検索装置１００の構成を示す図。The figure which shows the structure of the image search device 100 which concerns on this embodiment. 画像領域分割部１２０と文字認識部１３０における処理の流れを示す図。The figure which shows the flow of a process in the image area division part 120 and the character recognition part 130. FIG. 文字認識部１３０による処理結果を示す図。The figure which shows the processing result by the character recognition part. メタデータ格納部１５０に格納された、一文字ごとの領域情報の例を示す図。The figure which shows the example of the area | region information for every character stored in the metadata storage part 150. FIG. 曖昧性を含んだクエリがクエリ入力部１６０に入力された場合の説明図。Explanatory drawing when the query containing ambiguity is input into the query input part 160. FIG. 折れ線生成部１８０における処理の流れを示す図。The figure which shows the flow of a process in the broken line production | generation part 180. FIG. 折れ線生成部１８０によって生成された折れ線の例を示す図。The figure which shows the example of the broken line produced | generated by the broken line production | generation part 180. FIG. スコア算出部１９０での連結スコア付け戦略の例（１）を示す図。The figure which shows the example (1) of the connection scoring strategy in the score calculation part 190. FIG. スコア算出部１９０での連結スコア付け戦略の例（２）を示す図。The figure which shows the example (2) of the connection scoring strategy in the score calculation part 190. FIG. スコア算出部１９０での連結スコア付け戦略の例（３）を示す図。The figure which shows the example (3) of the connection scoring strategy in the score calculation part 190. FIG. スコア算出部１９０での連結スコア付け戦略の例（４）を示す図。The figure which shows the example (4) of the connection scoring strategy in the score calculation part 190. FIG. 図７の折れ線に対する連結スコアの例を示す図。The figure which shows the example of the connection score with respect to the broken line of FIG. 検索結果提示部２００における検索結果の表示例を示す図。The figure which shows the example of a display of the search result in the search result presentation part 200. FIG.

以下、本発明の実施の形態について図面を参照しながら説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本実施形態に係る画像検索装置１００の構成を示す図である。画像検索装置１００は、利用者から画像の入力を受け付ける画像入力部１１０と、入力した画像を区分的な大きさの領域に分割する画像領域分割部１２０と、分割した領域ごとに文字認識処理を行なって１文字ごとの領域情報を抽出する文字認識部１３０と、入力した画像データそのものを格納する画像格納部１４０と、抽出した１文字ごとの領域情報を当該画像のメタデータとして格納するメタデータ格納部１５０を備えている。 FIG. 1 is a diagram illustrating a configuration of an image search apparatus 100 according to the present embodiment. The image search apparatus 100 includes an image input unit 110 that receives an input of an image from a user, an image region dividing unit 120 that divides the input image into regions of a piecewise size, and character recognition processing for each divided region. And a character recognition unit 130 that extracts region information for each character, an image storage unit 140 that stores input image data itself, and metadata that stores the extracted region information for each character as metadata of the image. A storage unit 150 is provided.

そして、利用者が画像を検索するためにクエリ文字列をクエリ入力部１６０に与えるたびに、クエリ文字列に含まれている各文字に対応する１文字ごとの領域情報をメタデータ格納部１５０から検索する領域情報検索部１７０と、検索された領域情報に基づいてクエリ文字列の各文字に対応する中心座標をそれぞれ求め、これらの中心座標を用いて折れ線を生成する折れ線生成部１８０と、生成されたそれぞれの折れ線の連結スコアを求めるスコア算出部１９０と、算出した連結スコアに基づいてそれぞれの折れ線のランキングを行ない、画像格納部１４０に格納された画像と折れ線上の文字領域を検索結果として重畳させて提示する検索結果提示部２００を備えている。 Each time the user gives a query character string to the query input unit 160 to search for an image, the area information for each character corresponding to each character included in the query character string is read from the metadata storage unit 150. An area information search unit 170 to search, a center coordinate corresponding to each character of the query character string based on the searched area information, respectively, and a polygonal line generation unit 180 to generate a polygonal line using these center coordinates, The score calculation unit 190 for obtaining a connection score of each broken line, ranking of each broken line is performed based on the calculated connection score, and the image stored in the image storage unit 140 and the character area on the broken line are used as search results. A search result presenting unit 200 that presents them in a superimposed manner is provided.

画像領域分割部１２０は、画像入力部１１０で利用者から受け取った画像を区分的な大きさの領域に分割する。これは、本実施形態が主として対象とするビットマップ画像においては既存の文字認識がうまく働かないことが多く、文字列としての抽出精度が低いためである。言い換えれば、１文字ごとの認識精度は比較的高いが、縦横のつながりが複雑なために正しい単語として認識できないためである。例えば、電車・バスなどの中吊り広告や看板に含まれる文字は縦横に規則正しく並んでいるとはかぎらないため、レイアウトを正しく解析することが難しい。 The image region dividing unit 120 divides the image received from the user by the image input unit 110 into regions having a piecewise size. This is because existing character recognition often does not work well in a bitmap image mainly targeted by the present embodiment, and the extraction accuracy as a character string is low. In other words, the recognition accuracy for each character is relatively high, but the vertical and horizontal connections are complicated, so that it cannot be recognized as a correct word. For example, it is difficult to correctly analyze the layout because characters included in suspended advertisements such as trains and buses and billboards are not always arranged vertically and horizontally.

ところで、画像の一領域が、雑誌や展示会のパネルのように既存のレイアウト解析装置でレイアウトが正しく解析できる場合、正しく解析できた領域については認識結果の文字列を用いて既存技術で画像を検索することができる。したがって、画像領域分割部１２０は、既存技術で正しく解析できなかった領域のみを区分的な大きさの領域へ分割してもよい。 By the way, if an area of an image can be analyzed correctly with an existing layout analysis device, such as a magazine or exhibition panel, the image that has been correctly analyzed can be imaged with existing technology using the character string of the recognition result. You can search. Therefore, the image area dividing unit 120 may divide only an area that has not been correctly analyzed by the existing technology into an area having a piecewise size.

図２は、画像領域分割部１２０と文字認識部１３０における処理の流れを示すフローチャートである。画像領域分割処理及び文字認識処理でははじめに最小領域サイズＳを決定する（Ｓ２００）。この最小領域サイズＳは、画像のサイズと関係なくあらかじめ画素数で設定しておいてもよいし、処理の対象となる画像の縦または横の長さに対する割合で決定してもよい。どの画像に対しても同じ最小領域サイズを用いる場合は画像サイズが大きくなると処理量が非常に多くなる可能性があり、画像の縦または横の長さに対する割合で決定する場合は画像サイズが小さくなると粗くなりすぎる可能性があるので、両者を組み合わせてＳを決定しても良い。Ｓを決定した後、文字認識を行なう対象領域のサイズをＳを元にして少しずつ広げ、広げた対象領域をＸ軸Ｙ軸ともに少しずつずらして行き、その対象領域に含まれる文字が何であるかを抽出する、ということを繰り返すことになる。 FIG. 2 is a flowchart showing the flow of processing in the image area dividing unit 120 and the character recognition unit 130. In the image area dividing process and the character recognition process, first, the minimum area size S is determined (S200). The minimum area size S may be set in advance by the number of pixels regardless of the size of the image, or may be determined by a ratio to the vertical or horizontal length of the image to be processed. If the same minimum area size is used for all images, the amount of processing may become very large when the image size is large, and the image size is small when determining the ratio with respect to the vertical or horizontal length of the image. Then, it may be too coarse, so S may be determined by combining both. After determining S, the size of the target area for character recognition is gradually increased based on S, and the expanded target area is shifted little by little along the X and Y axes, and what are the characters included in the target area? It will be repeated to extract.

具体的には、まずＸ軸方向の区分数ＣＸ＝画像の横の長さ／Ｓと、Ｙ軸方向の区分数ＣＹ＝画像の縦の長さ／Ｓを求める（Ｓ２１０）。図３に示す画像の例ではＣＸ＝２１、ＣＹ＝１５となっている。続いてＮ＝１とおく。このＮは最小領域サイズのいくつ分まで文字認識を行なう対象領域のサイズを広げたかを表す数値である（Ｓ２２０）。 Specifically, first, the number of sections in the X-axis direction CX = the horizontal length of the image / S and the number of sections in the Y-axis direction CY = the vertical length of the image / S are obtained (S210). In the example of the image shown in FIG. 3, CX = 21 and CY = 15. Subsequently, N = 1 is set. This N is a numerical value indicating how many times the minimum area size has been extended to increase the size of the target area for character recognition (S220).

Ｎを１ずつ増やしつつＮ×Ｓが閾値より大きくなるまで以下のようにして対象領域に含まれる文字が何であるかの抽出を繰り返す（Ｓ２３０、Ｓ３１０）。 While N is incremented by 1, the extraction of what characters are included in the target area is repeated as follows until N × S becomes larger than the threshold (S230, S310).

まず対象領域のＸ軸方向の位置をＩ、Ｙ軸方向の位置Ｊとすると、対象領域のサイズはＮなので０≦Ｉ≦ＣＸ−Ｎ、０≦Ｊ≦ＣＹ−Ｎである。従ってＩ＝０、Ｊ＝０とおき（Ｓ２４０、Ｓ２６０）、それぞれを１ずつ増やしながら（Ｓ３２０、Ｓ３００）文字認識を行ない（Ｓ２８０）、文字が認識された場合は認識結果をＤＢに格納する（Ｓ２９０）。 First, assuming that the position of the target area in the X-axis direction is I and the position J in the Y-axis direction, the size of the target area is N, so 0 ≦ I ≦ CX−N and 0 ≦ J ≦ CY−N. Therefore, I = 0 and J = 0 are set (S240, S260), and character recognition is performed while incrementing each by 1 (S320, S300) (S280). If a character is recognized, the recognition result is stored in the DB ( S290).

例えばＮ＝１のときは、図３に示す一番小さな升目２１×１５個のすべてに対して文字認識を行なう。図３の場合はこの大きさの升目に相当する文字が存在しないため、抽出されていない。続いてＮ＝２のときは、図３に示す一番小さな升目が縦横に２個ずつならんだ升目２０×１４個のすべてに対して文字認識処理を実行する。図３の場合はＲ３、Ｒ４、Ｒ５がこの大きさの升目に相当する文字のため、「タ」、「ナ」、「カ」が抽出されている。さらにＮ＝３のときは、図３に示す一番小さな升目が縦横に３個ずつならんだ升目１９×１３個のすべてに対して文字認識処理を実行する。図３の場合はＲ１、Ｒ２、Ｒ６がこの大きさの升目に相当する文字のため、「カ」、「メ」、「ラ」が抽出されている。 For example, when N = 1, character recognition is performed for all the 21 × 15 smallest cells shown in FIG. In the case of FIG. 3, since there is no character corresponding to this size cell, it is not extracted. Subsequently, when N = 2, the character recognition process is executed for all 20 × 14 squares in which the smallest squares shown in FIG. In the case of FIG. 3, R3, R4, and R5 are characters corresponding to this size cell, so “T”, “N”, and “K” are extracted. Further, when N = 3, the character recognition process is executed for all 19 × 13 squares in which the smallest squares shown in FIG. In the case of FIG. 3, “R”, “R2”, and “R6” are characters corresponding to this size cell, so “K”, “M”, and “L” are extracted.

図４は、メタデータ格納部１５０が図３に示す画像から抽出された１文字ごとの領域情報を格納した例である。この図では、画像格納部１４０に格納された画像ＩＤがＰ１である画像から、領域ＩＤがＲ１からＲ６までの６つの領域が抽出されていることを表している。例えば領域Ｒ１の中心座標は（Ｘ１、Ｙ１）であり、文字は「カ」、サイズは３２ピクセル、フォントはゴシック、色は黒であることを示している。同様に、領域Ｒ３の中心座標は（Ｘ３、Ｙ３）であり、文字は「ナ」、サイズは２０ピクセル、フォントはゴシック、色は赤であることを示している。図４に示すように本実施形態では簡単のために文字認識結果が一意になるように領域情報をＤＢに格納しているが、文字認識の結果に曖昧性がある場合は、同じ領域に対しても文字を一意にせず、確信度も持たせて別々の文字として格納してもよい。また図４の場合は抽出された領域がどれも重なっていないが、漢字などの場合は構成要素を他の文字と認識する可能性もあるため、抽出された領域同士は重なっていてもよい。あるいはサイズ・フォント・色のほかに文字認識装置を用いて抽出できる情報があれば、それらも１文字ごとの領域情報としてメタデータ格納部１５０に格納してもよい。 FIG. 4 shows an example in which the metadata storage unit 150 stores area information for each character extracted from the image shown in FIG. This figure shows that six regions having region IDs R1 to R6 are extracted from the image having the image ID P1 stored in the image storage unit 140. For example, the center coordinate of the region R1 is (X1, Y1), the character is “K”, the size is 32 pixels, the font is Gothic, and the color is black. Similarly, the center coordinates of the region R3 are (X3, Y3), indicating that the character is “na”, the size is 20 pixels, the font is Gothic, and the color is red. As shown in FIG. 4, in this embodiment, the area information is stored in the DB so that the character recognition result is unique for the sake of simplicity. However, if the character recognition result is ambiguous, However, the characters may not be unique and may be stored as separate characters with certainty. In the case of FIG. 4, none of the extracted regions overlap, but in the case of kanji, the extracted regions may overlap because there is a possibility that the constituent element may be recognized as another character. Alternatively, in addition to the size, font, and color, if there is information that can be extracted using the character recognition device, it may be stored in the metadata storage unit 150 as area information for each character.

次に、このように抽出・格納された画像のメタデータとしての１文字ごとの領域情報を用いて、画像を検索する処理について説明する。クエリ入力部１６０は利用者から画像検索のためのクエリ文字列を受け取り、１文字ごとに分割する。例えば「カメラ」というクエリ文字列が与えられた場合は「カ」、「メ」、「ラ」と分割する。本実施形態では簡単のために各文字が確定しているクエリ文字列を入力として受け取るものとしているが、画像を文字認識処理にかけた結果のように曖昧性を含んだ文字列を入力としてもよい。 Next, a process for searching for an image using region information for each character as metadata of the image extracted and stored in this manner will be described. The query input unit 160 receives a query character string for image search from the user and divides it for each character. For example, when a query character string “camera” is given, it is divided into “f”, “me”, and “la”. In this embodiment, for the sake of simplicity, it is assumed that a query character string in which each character is fixed is received as an input, but a character string including ambiguity such as a result obtained by subjecting an image to character recognition processing may be input. .

図５は、曖昧性を含んだクエリがクエリ入力部１６０に入力された場合を説明するための図である。例えば図５に示すような画像を文字認識処理にかけた場合、１文字目と３文字目は「カ」、「ラ」と確定しているが、２文字目が「メ」なのか「ナ」なのか曖昧性を含んでいる。しかしこの場合も、「カメラ」というクエリ文字列の連結スコアを0.7、「カナラ」というクエリ文字列の連結スコアを0.3といったように、各文字の確信度を連結スコアに乗じた複数のクエリ文字列を与えたものとみなすことができるため、以下では曖昧性を含んだ文字列については説明を省略する。 FIG. 5 is a diagram for explaining a case where a query including ambiguity is input to the query input unit 160. For example, when an image as shown in FIG. 5 is subjected to character recognition processing, the first and third characters are determined to be “K” and “La”, but whether the second character is “Me” or “N”. It is ambiguous. However, in this case as well, multiple query strings obtained by multiplying the confidence score of each character by the concatenation score, such as 0.7 for the concatenation score of the query string “Camera” and 0.3 for the concatenation score of the query string “Canara” Therefore, in the following, description of the character string including ambiguity is omitted.

分割されたクエリ文字列は領域情報検索部１７０に送られる。領域情報検索部１７０は、メタデータ格納部１５０に格納されている画像のメタデータ、すなわち、当該画像の一文字ごとの領域情報を検索し、クエリ文字列のすべての文字を含んでいる画像ＩＤをすべて列挙する。このとき「アルバニア」のように同じ文字が複数回出現しているクエリ文字列の場合は「ア」が画像に２個以上含まれていないと最終的な検索結果としては適合しないが、後述する折れ線の連結スコア付けで検索結果として適合しないという処理を施すことができるので、この時点では画像に含まれている個数は無視してすべての文字を含んでいるかどうかだけに基づいて検索を行なえばよい。 The divided query character string is sent to the area information search unit 170. The area information search unit 170 searches the metadata of the image stored in the metadata storage unit 150, that is, the area information for each character of the image, and determines an image ID including all characters of the query character string. List all. At this time, in the case of a query character string in which the same character appears multiple times, such as “Albania”, if the image does not include two or more “a”, the final search result is not suitable, but will be described later. Since it is possible to apply processing that does not match the search result by connecting scoring of broken lines, at this point, if the search is performed only based on whether all the characters are included, ignoring the number included in the image Good.

続いて、上記のようにして得られたクエリ文字列のすべての文字を含んでいる画像から、折れ線生成部１８０が折れ線を生成する処理について説明する。図６は、折れ線生成部１８０における処理の流れを示すフローチャートである。この図は、再帰的な処理を含むためわかりづらくなっている。そのため以下では擬似コードを用いて概要を説明する。下記の２つの関数のうちGetSsが折れ線生成処理のメイン関数であり、GetSsSubが再帰的な処理を担当するサブ関数である。 Next, a process in which the broken line generation unit 180 generates a broken line from an image including all the characters of the query character string obtained as described above will be described. FIG. 6 is a flowchart showing the flow of processing in the polygonal line generation unit 180. This figure is difficult to understand because it includes recursive processing. Therefore, an outline will be described below using pseudo code. Of the following two functions, GetSs is the main function of the polygonal line generation process, and GetSsSub is a subfunction responsible for recursive processing.

GetSs(クエリ文字列Q, クエリ文字列のすべての文字を含んでいる画像群Ps) {
折れ線集合Ss = {} // S600
foreach (画像P : Ps) { // S610
I = 0
S = []
GetSsSub(P, Q, I, S, Ss) // S620
}
return Ss // S680
}

GetSsSub(画像P, クエリ文字列Q, インデクスI, 折れ線バッファS, 折れ線集合Ss) {
if (Iがクエリ文字列Qの長さに等しい) { // S630
折れ線集合Ssに折れ線バッファSを追加 // S660
return
}
foreach (領域R : 画像Pに含まれるクエリ文字列QのI番目の文字Q[I]の領域情報) { // S640, S650
GetSsSub(P, Q, I + 1, S + Rの中心座標, Ss) // S670
}
}

メインのGetSs関数は、生成される折れ線の集合Ssを初期化（Ｓ６００）した上ですべての画像に対してGetSsSub関数を呼ぶ（Ｓ６１０）。GetSsSub関数を呼ぶときには折れ線バッファSを初期化し、Ｉ＝０とする（Ｓ６２０）。GetSsSub関数は、Ｉを１ずつ増やしながら（Ｓ６７０）、Ｉがクエリ文字列の長さより小さい間は（Ｓ６３０）クエリ文字列の各文字に対応する領域をすべて列挙し（Ｓ６４０）、すべての組合せの生成を行なう（Ｓ６５０）。したがって全体では網羅的な折れ線の生成が行なわれることになる。ちなみにフローチャートにおいてＩを１増やす処理（Ｓ６７０）はGetSsSubを呼んで再帰を１段下がることに相当し、Ｉを１減らす処理（Ｓ７１０）はGetSsSubから抜けて再帰を１段上がることに相当する。なお折れ線バッファＳは処理途中のデータを格納しているだけなので、再帰を１段あがるとＳから最後の中心座標を削除する必要がある（Ｓ６８０）。そのためクエリ文字列に対して生成された折れ線は、折れ線集合Ｓｓに追加してやる必要がある（Ｓ６６０）。
GetSs (query string Q, image group Ps containing all characters of query string) {
Polyline set Ss = {} // S600
foreach (Image P: Ps) {// S610
I = 0
S = []
GetSsSub (P, Q, I, S, Ss) // S620
}
return Ss // S680
}

GetSsSub (Image P, Query string Q, Index I, Line buffer S, Line set Ss) {
if (I is equal to the length of the query string Q) {// S630
Add polyline buffer S to polyline set Ss // S660
return
}
foreach (region R: region information of the I-th character Q [I] of the query string Q included in the image P) {// S640, S650
GetSsSub (P, Q, I + 1, S + R center coordinates, Ss) // S670
}
}

The main GetSs function initializes a set of broken lines Ss to be generated (S600), and then calls the GetSsSub function for all images (S610). When calling the GetSsSub function, the polygonal line buffer S is initialized and I = 0 is set (S620). The GetSsSub function increments I by 1 (S670), while I is smaller than the length of the query string (S630), enumerates all areas corresponding to each character of the query string (S640), Generation is performed (S650). Therefore, a comprehensive broken line is generated as a whole. Incidentally, in the flowchart, the process of incrementing I by 1 (S670) corresponds to calling GetSsSub and reducing recursion by one step, and the process of decrementing I by 1 (S710) corresponds to exiting GetSsSub and increasing recursion by one step. Since the polygonal line buffer S only stores data being processed, it is necessary to delete the last center coordinate from S when the recursion is increased by one stage (S680). Therefore, the polygonal line generated for the query character string needs to be added to the polygonal line set Ss (S660).

図７は、折れ線生成部１８０によって生成された折れ線の例を示す図である。画像P1には「カ」が２つ含まれているため「カメラ」というクエリ文字列に対して２×１×１＝２つの折れ線集合が生成されている。画像P2には「カ」も「メ」も２つずつ含まれているため、２×２×１＝４つの折れ線集合が生成されている。画像P3には１つずつしか含まれていないため１つの折れ線のみ生成されている。ここで、「×」はGetSsSub関数の繰り返し計算を意味する。 FIG. 7 is a diagram illustrating an example of a polygonal line generated by the polygonal line generation unit 180. Since the image P1 includes two “f” s, 2 × 1 × 1 = 2 two line sets are generated for the query character string “camera”. Since the image P2 includes both “f” and “me”, 2 × 2 × 1 = 4 broken line sets are generated. Since only one image P3 is included in the image P3, only one broken line is generated. Here, “×” means repeated calculation of the GetSsSub function.

次に、スコア算出部１９０が、生成された折れ線集合に対して連結スコアを算出する処理について説明する。連結スコアの算出方法としてはさまざまなものが考えられる。一番単純な連結スコア付け戦略は、折れ線集合を構成する線分があらかじめ設定してある図形の上にすべて存在するかどうかで１又は０の連結スコアを与えるというものである。ビットマップ画像であっても、人間が読める文字列であれば視線の流れが急激に変化することはないと考えられるため、図形としては例えば直線や円弧などが考えられる。 Next, the process in which the score calculation part 190 calculates a connection score with respect to the produced | generated broken line set is demonstrated. There are various methods for calculating the connection score. The simplest connection scoring strategy is to give a connection score of 1 or 0 depending on whether or not all the line segments constituting the polygonal line set exist on a preset figure. Even if it is a bitmap image, if it is a character string that can be read by humans, it is considered that the flow of the line of sight does not change abruptly.

ただし直線や円弧などの上に有るか無いかだけでは少し外れただけの折れ線の連結スコアも０になってしまう。そこで、図８の連結スコア付け戦略の例（１）に示すように、折れ線を構成するすべての線分を含む２本の平行線や同心円を求めてその間隔Ｄを真の直線や円弧からの誤差とみなして連結スコアを計算するようにしてもよい。 However, the connection score of a broken line that is slightly deviated only by whether or not it is on a straight line or an arc will be 0. Therefore, as shown in the example (1) of the connection scoring strategy in FIG. 8, two parallel lines and concentric circles including all line segments constituting the polygonal line are obtained, and the interval D is determined from a true straight line or arc. You may make it calculate a connection score considering it as an error.

逆に、視線の流れが不自然な折れ線について連結スコアを下げるという戦略が考えられる。例えば線分同士が交差している、あるいはループしているということは視線が文字の間を行ったり来たりしているということであり、自然な視線の流れではない。そのため図９の連結スコア付け戦略の例（２）に示すように、線分同士が交差するあるいはループする折れ線は連結スコアとして−∞を与え、検索結果から除外するようにしてもよい。 Conversely, a strategy of lowering the connection score for a broken line with an unnatural line of sight is conceivable. For example, line segments intersecting or looping means that the line of sight is moving back and forth between characters, and is not a natural line of sight flow. Therefore, as shown in the example (2) of the connection scoring strategy in FIG. 9, a broken line in which line segments intersect or loop may be given −∞ as a connection score and excluded from the search result.

あるいは線分の向きに応じてあらかじめ連結スコアの関数を設定しておき、折れ線を構成する一つ一つの線分ごとに連結スコアを求め、すべての線分の連結スコアの合計や最大値を用いるという連結スコア付け戦略も考えられる。図１０に示す連結スコア付け戦略の例（３）は、線分の向き（角度）と連結スコアの対応を定義した関数の例である。人間の視線の流れという観点からは、文字列は典型的には左から右（０度）や上から下（２７０度）に向かうためこれらの角度で大きな値をとり、右から左（１８０度）下から上（９０度）に向かうことはほとんどありえないためこれらの角度で小さな値をとる関数となっている。 Alternatively, a connection score function is set in advance according to the direction of the line segment, a link score is obtained for each line segment constituting the polygonal line, and the total or maximum value of the link scores of all the line segments is used. A concatenated scoring strategy is also possible. The example (3) of the connection scoring strategy shown in FIG. 10 is an example of a function that defines the correspondence between the direction (angle) of the line segment and the connection score. From the viewpoint of the flow of human eyes, the character string typically goes from left to right (0 degrees) or from top to bottom (270 degrees), so it takes a large value at these angles, and right to left (180 degrees). ) Since it is almost impossible to go from bottom to top (90 degrees), it is a function that takes a small value at these angles.

そのほか、一つ手前の線分との角度の差分、長さの差分、長さの比などに応じてあらかじめ連結スコアの関数を設定しておき、折れ線を構成する線分間の連結スコアを求め、すべての線分間の連結スコアの合計や最大値を用いるという連結スコア付け戦略も考えられる。図１１に示す連結スコア付け戦略の例（４）は、角度の差分、長さの差分、長さの比と連結スコアの対応を定義した関数の例である。視線の流れという観点からは、文字列は典型的には等間隔で並んでいると考えられるため、角度の差分や長さの差分は０に近いほど、長さの比は１に近いほど大きな値をとる関数となっている。ただし、ビットマップ画像においては射影ひずみが生じることが多く、このような場合は長さは等間隔ではなく、一定比率で小さくなっていく。そのため長さの比に関しては、１より小さな値で大きな値をとる関数であってもよい。 In addition, the function of the connection score is set in advance according to the angle difference with the previous line segment, the difference in length, the ratio of the length, etc., and the connection score of the line segments constituting the broken line is obtained, A concatenation scoring strategy that uses the sum or maximum of the concatenation scores for all line segments is also conceivable. The example (4) of the connection scoring strategy shown in FIG. 11 is an example of a function that defines the correspondence between the angle difference, the length difference, the length ratio, and the connection score. From the viewpoint of the line of sight, it is considered that the character strings are typically arranged at equal intervals. Therefore, the difference in angle and the difference in length are closer to 0, and the length ratio is closer to 1. It is a function that takes a value. However, a projection distortion often occurs in a bitmap image, and in such a case, the length is not equal, but decreases at a constant rate. Therefore, the length ratio may be a function that takes a large value with a value smaller than 1.

以上の説明では折れ線を構成する線分の上または周囲に他の文字領域が存在しないことを前提としてきたが、一般的には他の文字領域が存在する場合も考えられる。しかしながら線分が周囲のほかの文字領域を横断するということは、視線がほかの文字領域を横断しているということであり、自然な視線の流れではない。そのためこのような場合にはスコアを低くするようにしてもよい。 In the above description, it has been assumed that there is no other character area on or around the line segment constituting the polygonal line, but in general, there may be other character areas. However, the fact that the line segment crosses other character areas in the surrounding area means that the line of sight crosses other character areas, and is not a natural flow of line of sight. Therefore, in such a case, the score may be lowered.

いくつか連結スコア付け戦略を説明してきたが、本実施形態におけるスコア算出部１９０においてはこれらのどれか１つを選択して連結スコアを算出してもよいし、複数のスコア付け戦略によって算出された連結スコアの合計や最大値などを用いてもよい。また折れ線を構成する線分の頂点に対応する領域情報について、文字の属性（サイズ・フォント・色など）の類似度に応じて連結スコアを調整してもよい。 Although several linked scoring strategies have been described, the score calculation unit 190 in this embodiment may select any one of these to calculate a linked score, or may be calculated by a plurality of scoring strategies. The total or maximum value of the connected scores may be used. In addition, regarding the area information corresponding to the vertices of the line segments constituting the broken line, the connection score may be adjusted according to the similarity of the character attributes (size, font, color, etc.).

図１２は、図７に示す折れ線の例に対しての連結スコアの例を示す図である。ここで、スコア算出部１９０の連結スコア付け戦略はクエリ文字列を与えるたびに変更してもよいが、通常は固定的である。したがって同一の折れ線に対する連結スコアは固定的である。すなわち一度折れ線に対して算出した連結スコアを保存しておき、次回以降同じクエリ文字列が与えられたときは保存しておいた値を読み出すようにすれば、次回以降の検索においては折れ線の生成処理を省略して高速化することができる。 FIG. 12 is a diagram illustrating an example of the connection score for the example of the broken line illustrated in FIG. 7. Here, the linked scoring strategy of the score calculation unit 190 may be changed each time a query character string is given, but is usually fixed. Therefore, the connection score for the same broken line is fixed. In other words, if the connection score calculated for the broken line is saved once and the saved query value is read when the same query string is given after the next time, the line will be generated for the next and subsequent searches. Processing can be omitted and the processing speed can be increased.

最後に、検索結果提示部２００は、スコア算出部１９０によって算出された連結スコアを元に折れ線のランキングを行ない、上位のものについて対応する画像を画像格納部１４０から取得し、折れ線に対応する文字領域を画像に重畳させて表示する。 Finally, the search result presentation unit 200 ranks broken lines based on the connection score calculated by the score calculation unit 190, acquires images corresponding to higher ranks from the image storage unit 140, and retrieves characters corresponding to the broken lines. The region is displayed superimposed on the image.

図１３は、検索結果提示部２００における検索結果の表示例を示す図である。重畳表示の方法としてはさまざまな方法が考えられるが、本実施形態では折れ線に対して文字のサイズの幅を持たせた領域を枠線で囲んで表示する。このとき図１２の例ではランキング上位の折れ線S2-1とS2-3が一つの画像P2に関連しているが、いくつもの折れ線を重畳表示させると見づらくなるため一つの画像に対しては最も連結スコアが高い折れ線のみを表示するようにしている。画像に対して対応する領域を明示的に重畳表示することによって、どのような並びでクエリ文字列が出現しているかを確認できるため利用者にとってわかりやすい検索結果の提示を実現することができるようになる。 FIG. 13 is a diagram illustrating a display example of search results in the search result presentation unit 200. Various methods can be considered as the superimposed display method. In the present embodiment, an area in which the width of the character size is given to the broken line is surrounded by a frame line and displayed. At this time, in the example of FIG. 12, the highest ranking polygonal lines S2-1 and S2-3 are related to one image P2. However, when multiple polygonal lines are displayed in a superimposed manner, it is difficult to see a single image. Only broken lines with high scores are displayed. By explicitly superimposing the corresponding area on the image, it is possible to check the order in which the query character string appears, so that the search results can be presented easily for the user. Become.

なお、本実施形態は、上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化することができる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成することができる。例えば、実施形態に示される全構成要素からいくつかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 Note that the present embodiment is not limited to the above-described embodiment as it is, and can be embodied by modifying the components without departing from the scope of the invention in the implementation stage. Moreover, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

１００画像検索装置
１１０画像入力部
１２０画像領域分割部
１３０文字認識部
１４０画像格納部
１５０メタデータ格納部
１６０クエリ入力部
１７０領域情報検索部
１８０折れ線生成部
１９０スコア算出部
２００検索結果提示部 DESCRIPTION OF SYMBOLS 100 Image search device 110 Image input part 120 Image area division part 130 Character recognition part 140 Image storage part 150 Metadata storage part 160 Query input part 170 Area information search part 180 Polygon line generation part 190 Score calculation part 200 Search result presentation part

Claims

An image storage unit for storing images;
A character recognition unit that performs character recognition processing on the image and extracts area information for each character;
A metadata storage unit that stores the area information for each character as metadata of the image;
An input unit for inputting a character string for searching for an image;
A search unit that searches the region information for each character stored in the metadata storage unit based on each character of the character string;
A center line corresponding to each character of the character string is obtained from the searched area information, and a polygonal line generation unit that generates a polygonal line using the plurality of center coordinates,
An image search apparatus comprising: a score calculation unit that calculates a connection score of the broken lines.

The image search apparatus according to claim 1, further comprising a search result presentation unit that ranks the polygonal lines based on the connection score, and superimposes and presents character regions on the polygonal lines on the image.

Performing character recognition processing on the image stored in the image storage unit and extracting region information for each character;
Storing the area information for each character in the metadata storage unit as metadata of the image;
Entering a string to search for images;
Searching the region information for each character stored in the metadata storage unit based on each character of the character string;
Obtaining center coordinates corresponding to each character of the character string from the searched area information, and generating a polygonal line using a plurality of the center coordinates;
An image search method comprising: calculating a connection score of the broken line.

Computer
Means for storing images;
Means for character recognition processing the image and extracting region information for each character;
Metadata storage means for storing the region information for each character as metadata of the image;
An input unit for inputting a character string for searching for an image;
Means for retrieving the region information for each character stored in the metadata storage means based on each character of the character string;
A center line corresponding to each character of the character string is obtained from the searched area information, and a polygonal line generation unit that generates a polygonal line using the plurality of center coordinates,
Means for determining a connection score of the broken lines;
Image search program to be executed as.