JP2017211976A

JP2017211976A - Image processing device and image processing program

Info

Publication number: JP2017211976A
Application number: JP2016248315A
Authority: JP
Inventors: 伶遠藤; Rei Endo; 吉彦河合; Yoshihiko Kawai; 住吉　英樹; Hideki Sumiyoshi; 英樹住吉; 貴裕望月; Takahiro Mochizuki; 佐野　雅規; Masaki Sano; 雅規佐野
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2016-05-19
Filing date: 2016-12-21
Publication date: 2017-11-30
Anticipated expiration: 2036-12-21
Also published as: JP6818539B2

Abstract

PROBLEM TO BE SOLVED: To highly accurately extract scene characters.SOLUTION: An image processing device includes: an extreme value region extraction unit for extracting an extreme value region in which pixels of a first region and a second region continue from an image on the basis of the first region which is contained in the image and in which plural pixels with pixel values being equal to or less than a first value continue and the second region which is contained in the image and in which plural pixels with the pixel values being equal to or less than the second value larger than the first value continue; and a determination unit for determining the degree of character likeness of the shape of the region contained in the image. The determination unit determines the degree of character likeness of the extreme value region on the basis of the long sides and short sides of a plurality of rectangles externally contacting the extreme value region extracted from the image and rotated by a plurality of angles on the plane of the image.SELECTED DRAWING: Figure 3

Description

本発明は、画像処理装置及び画像処理プログラムに関する。 The present invention relates to an image processing apparatus and an image processing program.

従来、カメラで撮影した画像から、色の近い画素の連続したかたまりである連結成分を抽出し、抽出した連結成分に基づいて、当該画像に写っている文字（情景文字）を認識する技術が知られている。 2. Description of the Related Art Conventionally, a technique for extracting a connected component that is a continuous cluster of pixels with similar colors from an image captured by a camera and recognizing a character (scene character) reflected in the image based on the extracted connected component is known. It has been.

この情景文字を検出する際、画像における輝度値等の画素値が、最小（または最大）の値である各画素（連結成分）を含む領域を初期領域とし、領域に含める画素の値を段階的に大きく（または小さく）することにより、周辺に比べて画素の値が低い（または高い）領域である極値領域（Extremal Region）を網羅的に抽出する技術が知られている（非特許文献１）。 When this scene character is detected, an area including each pixel (connected component) having a minimum (or maximum) pixel value such as a luminance value in the image is set as an initial area, and pixel values included in the area are stepwise. A technique for exhaustively extracting an extreme region (Extremal Region), which is a region having a pixel value lower (or higher) than that of the surrounding area by making it larger (or smaller) is known (Non-Patent Document 1). ).

非特許文献１記載の技術では、例えば従来の２値化を用いて連結成分を抽出する手法と比べ、検出される連結成分の数が多くなるために計算量は増えるものの、周辺とのコントラストの差がそれほど高くない連結成分も抽出できるため、連結成分の検出漏れが少ない。 In the technique described in Non-Patent Document 1, for example, compared to the conventional method of extracting connected components using binarization, the number of connected components to be detected increases, so the amount of calculation increases, but the contrast with the surroundings increases. Since connected components that are not so high in difference can be extracted, there are few detection errors of connected components.

図１は、非特許文献１に記載された従来の処理フローを説明する図である。以下では、画素値ｖが最小値の画素は、一番色が濃いものとして説明する。まず、画素値ｖが最小値（ｖ＝０）であるような連結成分を抽出する。続いて、画素値ｖが０または１（ｖ＝０〜ｔ（ｔ＝１））であるような連結成分を抽出する。続いて、画素値ｖが０から最大値（ｖ＝０〜ｔ（ｔ＝２５５））になるまでｔを１ずつ加算しながら連結成分を取得していく。これにより、背景よりも色の濃い連結成分を網羅的に取得できる。 FIG. 1 is a diagram illustrating a conventional processing flow described in Non-Patent Document 1. In the following description, it is assumed that the pixel having the minimum pixel value v has the darkest color. First, connected components whose pixel value v is the minimum value (v = 0) are extracted. Subsequently, connected components whose pixel value v is 0 or 1 (v = 0 to t (t = 1)) are extracted. Subsequently, the connected component is acquired while adding t one by one until the pixel value v becomes 0 to the maximum value (v = 0 to t (t = 255)). Thereby, it is possible to comprehensively obtain connected components that are darker than the background.

続いて、上記の処理とは逆に、画素値ｖが最大値（ｖ＝２５５）であるような連結成分を抽出する処理を行う。すなわち、画素値ｖが２５５または２５４（ｖ＝ｔ〜２５５（ｔ＝２５４））であるような連結成分を抽出する。続いて、画素値ｖが２５５から最小値（ｖ＝ｔ〜２５５（ｔ＝０））になるまでｔを１ずつ減算しながら連結成分を取得していく。これにより、背景よりも色の薄い連結成分を網羅的に取得でき、例えば黒背景に白文字の領域を抜き出すことができる。 Subsequently, on the contrary to the above process, a process of extracting a connected component whose pixel value v is the maximum value (v = 255) is performed. That is, a connected component having a pixel value v of 255 or 254 (v = t to 255 (t = 254)) is extracted. Subsequently, the connected component is acquired while subtracting t by 1 until the pixel value v becomes 255 to the minimum value (v = t to 255 (t = 0)). As a result, connected components that are lighter in color than the background can be comprehensively acquired. For example, a white character region can be extracted on a black background.

そして、抽出した極値領域の形状等の特徴を計算し、文字らしい特徴を持つかどうかを判定する。この際、非特許文献１記載の技術では、増分的に計算可能な記述子（Incrementally Computable Descriptors、ＩＣ記述子）を基にした特徴を用いて、計算コストを削減する。 Then, a feature such as the shape of the extracted extreme value region is calculated, and it is determined whether or not it has a character-like feature. At this time, in the technique described in Non-Patent Document 1, the calculation cost is reduced by using a feature based on incrementally computable descriptors (IC descriptors).

なお、増分的に計算可能とは、ある領域（親領域）と、その領域を分割した複数の領域（子領域）があった場合に、子領域で計算済みの記述子を利用して、親領域の記述子を計算できる事を言う。すなわち、極値領域の抽出過程において、ある親領域の記述子を計算する際、すでに子領域で計算済みの範囲に関しては計算を省略できる。そのため、計算コストが処理対象の画像の画素数Ｎに対してＮオーダとなり、高速な処理が可能になる。 Note that incremental calculation is possible when there is a certain area (parent area) and multiple areas (child areas) obtained by dividing the area, using the descriptor already calculated in the child area. Says that the region descriptor can be calculated. That is, in the process of extracting the extreme value region, when calculating a descriptor of a certain parent region, the calculation can be omitted for a range already calculated in the child region. Therefore, the calculation cost is on the order of N with respect to the number N of pixels of the image to be processed, and high-speed processing is possible.

次に、図２を参照し、増分的に計算可能な記述子について説明する。図２は、従来技術における増分的に計算可能な記述子の例について説明する図である。 Next, descriptors that can be calculated incrementally will be described with reference to FIG. FIG. 2 is a diagram illustrating an example of a descriptor that can be calculated incrementally in the related art.

図２では、代表的な増分的に計算可能な記述子である、面積（領域のピクセル数）の計算例を示す。ある子領域c1とc2があり、その２つの子領域が繋がった親領域pがあり、それぞれの領域のピクセル数は５、４、９であるとする。 FIG. 2 shows a calculation example of the area (the number of pixels in the region), which is a representative descriptor that can be calculated incrementally. It is assumed that there are certain child areas c1 and c2, and there is a parent area p in which the two child areas are connected, and the number of pixels in each area is 5, 4, and 9, respectively.

このとき、各領域の面積を個別に求める場合には、必要な計算回数は、ピクセル数の総和になるため、１８回になる。ここで、c1とc2の面積が計算済みであり、それが親領域pに含まれている場合には、pの面積はc1とc2の面積を足すだけで求められる。このような特徴をもつ記述子を増分的に計算可能であるという。面積が増分的に計算可能であることを利用すると、c1、c2、pの各領域の面積を求めるのに必要な計算回数は合計１０回にできる。 At this time, when the area of each region is obtained individually, the required number of calculations is 18 because it is the sum of the number of pixels. Here, when the areas of c1 and c2 have been calculated and are included in the parent region p, the area of p can be obtained simply by adding the areas of c1 and c2. It is said that descriptors with such characteristics can be calculated incrementally. Using the fact that the area can be calculated incrementally, the total number of calculations required to determine the area of each of the areas c1, c2, and p can be 10 times.

非特許文献１記載の技術では、増分的に計算可能な記述子を基にした特徴を用いて、文字かどうかの一次判定を行い、一次判定で文字らしいと判定された領域に対してのみ、より計算コストの高い複雑な特徴を用いて高精度の二次判定を行う。これにより、高速な情景文字検出を実現できる。なお、増分的に計算可能な記述子としては、面積（Area）記述子[a]、境界矩形（Bounding Box）記述子[x_min、y_min、x_max、y_max]、周辺長（Perimeter）記述子[p]、オイラー数（Euler number）記述子[e]、水平方向交差数（Horizontal crossings）記述子[ci]を用いており、上記記述子より、アスペクト比特徴[w/h] (ここでw=x_max−x_min、 h=y_max−y_min)、コンパクト度特徴[√a/p]、ホール数特徴[1-e]、水平方向交差数特徴の４特徴を求めて、一次判定を行っている。 In the technique described in Non-Patent Document 1, using a feature based on a descriptor that can be calculated incrementally, a primary determination is made as to whether or not the character is a character, and only for an area that is determined to be a character in the primary determination, High-precision secondary determination is performed using complicated features with higher calculation costs. Thereby, high-speed scene character detection can be realized. The descriptors that can be calculated incrementally are Area descriptor [a], Bounding Box descriptor [x _min , y _min , x _max , y _max ], Perimeter length (Perimeter) Descriptor [p], Euler number descriptor [e], Horizontal crossings descriptor [ci], and the aspect ratio feature [w / h] ( Where w = x _max −x _min , h = y _max −y _min ), compactness feature [√a / p], hole number feature [1-e], and horizontal crossing number feature, Primary judgment is performed.

ここで、水平方向交差数特徴は次式で表される。 Here, the feature of the number of intersections in the horizontal direction is expressed by the following equation.

二次判定では、一次判定を突破した領域に対し、より計算コストの高いホール・エリア率特徴、凸包率特徴、外周上の変曲点数特徴を計算し、それらと一次判定で用いた特徴を合わせた合計７つの特徴を用いる。 In the secondary determination, for the area that broke through the primary determination, the hole area ratio feature, convex hull rate feature, and inflection point feature on the outer periphery with higher calculation cost are calculated, and the features used in the primary determination are calculated. A total of seven features are used.

L. Neumann, J. Matas, "Real-Time Scene Text Localization and Recognition", Proc. in 25th IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp.3538-3545(2012)L. Neumann, J. Matas, "Real-Time Scene Text Localization and Recognition", Proc. In 25th IEEE Conf. On Computer Vision and Pattern Recognition (CVPR), pp.3538-3545 (2012)

しかしながら、非特許文献１記載の従来技術では、数字・アルファベットと比べて図形形状が複雑な漢字や仮名等の情景文字を抽出する精度が低いという問題がある。 However, the conventional technique described in Non-Patent Document 1 has a problem that the accuracy of extracting scene characters such as kanji and kana having a complicated figure shape is lower than that of numerals and alphabets.

そこで、情景文字を高精度で抽出することを目的とする。 Therefore, an object is to extract scene characters with high accuracy.

画像処理装置において、画像に含まれる第１の領域であって、画素値が第１の値以下である複数の画素が連続する第１の領域と、前記画像に含まれる第２の領域であって、画素値が前記第１の値より大きい第２の値以下である複数の画素が連続する第２の領域とに基づき、前記第１の領域及び前記第２の領域の画素が連続する極値領域を前記画像から抽出する極値領域抽出部と、前記画像に含まれる領域の形状の文字らしさの度合いを判定する判定部とを備え、前記判定部は、前記画像から抽出した前記極値領域に外接する複数の矩形であって、前記画像の平面上で複数の角度だけ回転した前記複数の矩形の長辺と短辺とに基づき、前記極値領域の文字らしさの度合いを判定する。 In the image processing apparatus, a first area included in the image, a first area where a plurality of pixels having pixel values equal to or less than the first value are continuous, and a second area included in the image. And a second region in which a plurality of pixels having a pixel value equal to or smaller than a second value greater than the first value are continuous, and a pole in which the pixels in the first region and the second region are continuous. An extreme value region extraction unit that extracts a value region from the image; and a determination unit that determines a character-like degree of the shape of the region included in the image, wherein the determination unit extracts the extreme value extracted from the image A degree of character likeness in the extreme value region is determined based on long sides and short sides of the plurality of rectangles circumscribing the region and rotated by a plurality of angles on the plane of the image.

開示の技術によれば、情景文字を高精度で抽出することが可能となる。 According to the disclosed technique, a scene character can be extracted with high accuracy.

非特許文献１に記載された従来の処理フローを説明する図である。It is a figure explaining the conventional process flow described in the nonpatent literature 1. FIG. 従来技術における増分的に計算可能な記述子の例について説明する図である。It is a figure explaining the example of the descriptor which can be calculated incrementally in a prior art. 本実施形態に係る画像処理装置の機能構成を示す機能ブロック図である。It is a functional block diagram which shows the function structure of the image processing apparatus which concerns on this embodiment. 画像処理装置の処理の一例を示すフローチャートである。It is a flowchart which shows an example of a process of an image processing apparatus. 極値領域抽出処理の一例を示すフローチャートである。It is a flowchart which shows an example of an extreme value area | region extraction process. 回転境界矩形記述子の算出例を説明する図である。It is a figure explaining the example of calculation of a rotation boundary rectangle descriptor. 回転アスペクト比特徴の計算例を説明する図である。It is a figure explaining the example of calculation of a rotation aspect-ratio characteristic. 文字らしい極値領域１次抽出処理の一例を示すフローチャートである。It is a flowchart which shows an example of the extreme value area | region primary extraction process which seems to be a character. 極値領域のツリー構造の例を示す図である。It is a figure which shows the example of the tree structure of an extreme value area | region. 文字らしい極値領域２次抽出処理の一例を示すフローチャートである。It is a flowchart which shows an example of the extreme value area | region secondary extraction process which seems to be a character. 子の極値領域に対する文字度合いの判定を省略する処理について説明する図である。It is a figure explaining the process which abbreviate | omits determination of the character degree with respect to the extreme value area | region of a child. 立体文字が写った元画像の例を示す図である。It is a figure which shows the example of the original image in which the three-dimensional character was reflected. 立体側面除去処理の一例を示すフローチャートである。It is a flowchart which shows an example of a solid side surface removal process. 立体文字の側面である影部分を除去する処理を説明する図である。It is a figure explaining the process which removes the shadow part which is the side of a solid character. 立体文字の側面である影部分を除去する処理を説明する図である。It is a figure explaining the process which removes the shadow part which is the side of a solid character. 文字列らしい各極値領域１次抽出処理の一例を示すフローチャートである。It is a flowchart which shows an example of each extreme value area | region primary extraction process like a character string. 文字列らしい各極値領域２次抽出処理の一例を示すフローチャートである。It is a flowchart which shows an example of each extreme value area | region secondary extraction process like a character string. 文字列らしい各極値領域２次抽出処理を説明する図である。It is a figure explaining each extreme value area | region secondary extraction process like a character string. 第２の実施形態に係る文字列らしい各極値領域１次抽出処理の一例を示すフローチャートである。It is a flowchart which shows an example of each extreme value area | region primary extraction process like the character string which concerns on 2nd Embodiment. 複数の角度の各々における同一の文字列を探索する処理について説明する図である。It is a figure explaining the process which searches the same character string in each of several angles. 同一の文字列を探索する処理の一例を示すフローチャートである。It is a flowchart which shows an example of the process which searches the same character string. 同一の文字列を探索する処理について説明する図である。It is a figure explaining the process which searches the same character string.

以下、図面を参照しながら本発明の実施形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

［第１の実施形態］
＜画像処理装置の機能構成＞
まず、本実施形態に係る画像処理装置の機能構成について、図を用いて説明する。図３は、本実施形態に係る画像処理装置の機能構成を示す機能ブロック図である。 [First Embodiment]
<Functional configuration of image processing apparatus>
First, the functional configuration of the image processing apparatus according to the present embodiment will be described with reference to the drawings. FIG. 3 is a functional block diagram showing a functional configuration of the image processing apparatus according to the present embodiment.

画像処理装置１０は、画像取得部１１、極値領域抽出部１２、算出部１３、文字度合判定部１４、除去部１５、文字列領域検出部１６、推定部１７、検出部１８、記憶部１９、及び出力部２０を有する。 The image processing apparatus 10 includes an image acquisition unit 11, an extreme value region extraction unit 12, a calculation unit 13, a character degree determination unit 14, a removal unit 15, a character string region detection unit 16, an estimation unit 17, a detection unit 18, and a storage unit 19. And an output unit 20.

画像取得部１１は、文字が写っている画像（情景画像）のデータを取得する。画像取得部１１が取得する画像は、例えば、カメラやビデオカメラ等の撮像装置で撮影した画像でもよく、または予め蓄積されている画像でもよい。また、上述した撮像装置は、画像処理装置１０の内部に設けられていてもよく、外部に設けられていてもよい。また、上述した画像は、静止画でもよく、動画に含まれる画像フレームでもよい。 The image acquisition unit 11 acquires data of an image (scene image) in which characters are shown. The image acquired by the image acquisition unit 11 may be, for example, an image captured by an imaging device such as a camera or a video camera, or may be an image stored in advance. Further, the above-described imaging device may be provided inside the image processing device 10 or may be provided outside. Further, the above-described image may be a still image or an image frame included in a moving image.

極値領域抽出部１２は、画像取得部１１が取得した画像から、画素値が第１の値以下である画素の連続したかたまりである第１の領域と、画素値が前記第１の値より大きい第２の値以下である画素の連続したかたまりである第２の領域を抽出する。そして、極値領域抽出部１２は、第１の領域及び第２の領域の画素の連続したかたまりである極値領域を抽出（生成）する。 The extreme value region extraction unit 12 includes a first region that is a continuous cluster of pixels having a pixel value equal to or less than the first value from the image acquired by the image acquisition unit 11, and a pixel value that is greater than the first value. A second region that is a continuous cluster of pixels that are less than or equal to the second large value is extracted. Then, the extreme value region extraction unit 12 extracts (generates) an extreme value region that is a continuous cluster of pixels in the first region and the second region.

算出部１３は、極値領域抽出部１２が抽出した第１の領域に外接する矩形であって、画像の平面上で所定の角度だけ回転した矩形が、当該第１の領域に外接する各位置の座標（第１の各座標）を算出する。 The calculation unit 13 is a rectangle circumscribing the first region extracted by the extreme value region extraction unit 12, and each rectangle circumscribed by the predetermined region on the plane of the image circumscribes the first region. Are calculated (first coordinates).

また、算出部１３は、極値領域抽出部１２が抽出した第２の領域に外接する矩形であって、画像の平面上で所定の角度だけ回転した矩形が、当該第２の領域に外接する各位置の座標（第２の各座標）を算出する。 The calculating unit 13 is a rectangle circumscribing the second region extracted by the extreme value region extracting unit 12, and a rectangle rotated by a predetermined angle on the plane of the image circumscribes the second region. The coordinates of each position (second coordinates) are calculated.

そして、算出部１３は、第１の各座標及び第２の各座標とに基づき、極値領域抽出部１２が抽出した極値領域に外接する矩形であって、画像の平面上で所定の角度だけ回転した矩形のアスペクト比を算出する。 The calculation unit 13 is a rectangle circumscribing the extreme value region extracted by the extreme value region extraction unit 12 based on the first and second coordinates, and has a predetermined angle on the plane of the image. Calculate the aspect ratio of the rectangle rotated by.

文字度合判定部１４は、領域の形状の文字らしさの度合いを判定する。 The character degree determination unit 14 determines the degree of character likeness of the shape of the region.

文字度合判定部１４は、極値領域の形状の文字らしさの度合いが所定の閾値以上であり、かつ、算出部１３が算出したアスペクト比が所定範囲内である場合に、第１の領域の形状の文字らしさの度合いの判定を省略する。 The character degree determination unit 14 determines the shape of the first region when the degree of character-likeness of the shape of the extreme value region is equal to or greater than a predetermined threshold and the aspect ratio calculated by the calculation unit 13 is within a predetermined range. The determination of the degree of character like is omitted.

また、文字度合判定部１４は、極値領域の形状の文字らしさの度合いが所定の閾値以上であり、かつ、算出部１３が算出したアスペクト比が所定範囲内でない場合に、第１の領域の形状の文字らしさの度合いを判定する。 In addition, the character degree determination unit 14 determines the first region when the degree of character-likeness of the shape of the extreme value region is equal to or greater than a predetermined threshold and the aspect ratio calculated by the calculation unit 13 is not within the predetermined range. The degree of character of the shape is determined.

除去部１５は、極値領域のうち、立体文字の側面の領域を除去する。 The removal unit 15 removes the region on the side surface of the three-dimensional character from the extreme value region.

文字列領域検出部１６は、複数の極値領域の高さ（大きさ）に基づき、画像取得部１１が取得した画像における文字列に含まれる各極値領域を検出する。 The character string region detection unit 16 detects each extreme value region included in the character string in the image acquired by the image acquisition unit 11 based on the height (size) of the plurality of extreme value regions.

推定部１７は、文字列領域検出部１６が検出した各極値領域の位置及び大きさに基づき、前記文字列に含まれる各文字の位置及び大きさを推定する。 The estimation unit 17 estimates the position and size of each character included in the character string based on the position and size of each extreme value region detected by the character string region detection unit 16.

検出部１８は、推定部１７が推定した各文字の位置及び大きさに応じた範囲から、前記文字列に含まれる各文字に含まれる極値領域を検出する。 The detection unit 18 detects an extreme value region included in each character included in the character string from a range corresponding to the position and size of each character estimated by the estimation unit 17.

記憶部１９は、極値領域抽出部１２が検出した極値領域のデータ等を記憶する。 The storage unit 19 stores extreme value region data detected by the extreme value region extraction unit 12.

出力部２０は、検出部１８が検出した文字列らしい領域を出力する。なお、出力部２０は、ユーザに画像上で当該領域の位置を提示しても良いし、文字認識装置等に当該領域の形状のデータを出力してもよい。 The output unit 20 outputs a region that seems to be a character string detected by the detection unit 18. Note that the output unit 20 may present the position of the region on the image to the user, or may output shape data of the region to a character recognition device or the like.

＜処理＞
次に、図４を参照し、画像処理装置１０の処理の概要について説明する。図４は、画像処理装置１０の処理の一例を示すフローチャートである。 <Processing>
Next, an overview of the processing of the image processing apparatus 10 will be described with reference to FIG. FIG. 4 is a flowchart illustrating an example of processing of the image processing apparatus 10.

まず、ステップＳ１で、画像取得部１１は、画像を取得する。 First, in step S1, the image acquisition unit 11 acquires an image.

続いて、ステップＳ２で、極値領域抽出部１２は、当該画像から極値領域を抽出する。 Subsequently, in step S2, the extreme value region extraction unit 12 extracts an extreme value region from the image.

続いて、ステップＳ３で、文字度合判定部１４は、ステップＳ２で抽出した各極値領域について、文字らしいか否かを１次的に判定し、文字らしい各極値領域を抽出する。 Subsequently, in step S3, the character degree determination unit 14 primarily determines whether or not each extreme value region extracted in step S2 is likely to be a character, and extracts each extreme value region likely to be a character.

続いて、ステップＳ４で、文字度合判定部１４は、ステップＳ３で抽出した各極値領域について、文字らしいか否かを２次的に判定し、文字らしい各極値領域を抽出する。 Subsequently, in step S4, the character degree determination unit 14 secondarily determines whether or not each extreme value region extracted in step S3 is likely to be a character, and extracts each extreme value region likely to be a character.

続いて、ステップＳ５で、文字列領域検出部１６は、ステップＳ４で抽出した各極値領域について、文字列らしいか否かを判定し、文字列らしい各極値領域を抽出する。 Subsequently, in step S5, the character string region detection unit 16 determines whether or not each extreme value region extracted in step S4 seems to be a character string, and extracts each extreme value region likely to be a character string.

続いて、ステップＳ６で、検出部１８は、ステップＳ５で抽出した各極値領域について、文字列らしいか否かを比較的詳細に判定し、文字列らしい各極値領域を抽出する。なお、ステップＳ６では、ステップＳ５で取りこぼした、文字列らしい各極値領域を抽出する。 Subsequently, in step S6, the detection unit 18 determines in a relatively detailed manner whether or not each extreme value region extracted in step S5 is a character string, and extracts each extreme value region that seems to be a character string. In step S6, each extreme value region that seems to be a character string, which is missed in step S5, is extracted.

≪極値領域抽出処理≫
次に、図５を参照し、ステップＳ２の、画像から極値領域を抽出する処理について説明する。図５は、極値領域抽出処理の一例を示すフローチャートである。 ≪Extreme area extraction processing≫
Next, with reference to FIG. 5, the process of extracting the extreme value region from the image in step S2 will be described. FIG. 5 is a flowchart illustrating an example of the extreme value region extraction process.

まず、ステップＳ１０１で、極値領域抽出部１２は、抽出対象とする画素値を最小値（例えば０）とし、作業用バッファに記憶されている情報を削除（empty）とする。 First, in step S101, the extreme value region extraction unit 12 sets a pixel value to be extracted as a minimum value (for example, 0), and deletes information stored in the work buffer (empty).

続いて、ステップＳ１０２で、極値領域抽出部１２は、抽出対象とする画素値である全ての連結成分を抽出する。 Subsequently, in step S102, the extreme value region extraction unit 12 extracts all connected components that are pixel values to be extracted.

続いて、ステップＳ１０３で、極値領域抽出部１２は、抽出した各連結成分のうちの一つの連結成分を処理対象に選択する。 Subsequently, in step S103, the extreme value region extraction unit 12 selects one connected component of the extracted connected components as a processing target.

続いて、ステップＳ１０４で、極値領域抽出部１２の特徴算出部１２１は、処理対象の連結成分に対する回転境界矩形記述子を算出する。なお、回転境界矩形記述子の算出方法については後述する。なお、ここで、回転境界矩形記述子のみならず、従来の面積記述子等のＩＣ記述子も算出してもよい。 Subsequently, in step S104, the feature calculation unit 121 of the extreme value region extraction unit 12 calculates a rotation boundary rectangle descriptor for the connected component to be processed. A method for calculating the rotation boundary rectangle descriptor will be described later. Here, not only the rotation boundary rectangle descriptor but also an IC descriptor such as a conventional area descriptor may be calculated.

続いて、ステップＳ１０５で、極値領域抽出部１２は、処理対象の連結成分に隣接する１以上の極値領域に対する参照が作業用バッファに記憶されているか判定する。 Subsequently, in step S105, the extreme value region extraction unit 12 determines whether or not a reference to one or more extreme value regions adjacent to the connected component to be processed is stored in the work buffer.

処理対象の連結成分に隣接する極値領域に対する参照が作業用バッファに記憶されていなければ（Ｓ１０５：ＮＯ）、ステップＳ１０６で、処理対象の連結成分を新規の極値領域として極値領域記憶バッファに記憶し、当該新規の極値領域に対する参照を作業用バッファに記憶し、後述するステップＳ１１２の処理に進む。 If a reference to the extreme value region adjacent to the connected component to be processed is not stored in the work buffer (S105: NO), in step S106, the connected component to be processed is set as a new extreme value region in the extreme value region storage buffer. The reference to the new extremum area is stored in the work buffer, and the process proceeds to step S112 described later.

処理対象の連結成分に隣接する極値領域に対する参照が作業用バッファに記憶されていれば（Ｓ１０５：ＹＥＳ）、ステップＳ１０７で、極値領域抽出部１２は、処理対象の連結成分に隣接する各極値領域に対する参照を、作業用バッファから削除する。 If a reference to the extreme value region adjacent to the connected component to be processed is stored in the work buffer (S105: YES), in step S107, the extreme value region extraction unit 12 sets each of the adjacent adjacent components to be processed. Delete the reference to the extreme value area from the working buffer.

ステップＳ１０８で、極値領域抽出部１２は、処理対象の連結成分に隣接する各極値領域と、処理対象の連結成分とを連結して、新規の極値領域とする。 In step S108, the extreme value region extraction unit 12 connects each extreme value region adjacent to the connected component to be processed and the connected component to be processed to form a new extreme value region.

続いて、ステップＳ１０９で、極値領域抽出部１２の特徴算出部１２１は、当該新規の極値領域に対する回転境界矩形記述子を算出する。 Subsequently, in step S109, the feature calculation unit 121 of the extreme value region extraction unit 12 calculates a rotation boundary rectangle descriptor for the new extreme value region.

続いて、ステップＳ１１０で、極値領域抽出部１２は、当該新規の極値領域を、処理対象の連結成分に隣接する各極値領域の親として極値領域記憶バッファに保存する。 Subsequently, in step S110, the extreme value region extraction unit 12 stores the new extreme value region in the extreme value region storage buffer as a parent of each extreme value region adjacent to the connected component to be processed.

続いて、ステップＳ１１１で、極値領域抽出部１２は、当該新規の極値領域に対する参照を作業用バッファに記憶する。 Subsequently, in step S111, the extreme value region extraction unit 12 stores a reference to the new extreme value region in the work buffer.

続いて、ステップＳ１１２で、極値領域抽出部１２は、ステップＳ１０２で抽出した各連結成分のうち、処理対象として未選択の連結成分があるか否か判定する。 Subsequently, in step S112, the extreme value region extraction unit 12 determines whether there is an unselected connected component as a processing target among the connected components extracted in step S102.

処理対象として未選択の連結成分があれば（Ｓ１１２：ＹＥＳ）、ステップＳ１１３で、未選択の連結成分を処理対象に選択し、ステップＳ１０４の処理に進む。 If there is an unselected connected component as a processing target (S112: YES), an unselected connected component is selected as a processing target in step S113, and the process proceeds to step S104.

処理対象として未選択の連結成分がなければ（Ｓ１１２：ＮＯ）、ステップＳ１１４で、極値領域抽出部１２は、抽出対象とする画素値の値を１つ増加させる。 If there is no unselected connected component to be processed (S112: NO), in step S114, the extreme value region extracting unit 12 increases the value of the pixel value to be extracted by one.

続いて、ステップＳ１１５で、極値領域抽出部１２は、抽出対象とする画素値の値が最大値（例えば２５５）を超えたか否か判定する。 Subsequently, in step S115, the extreme value region extraction unit 12 determines whether or not the value of the pixel value to be extracted exceeds a maximum value (for example, 255).

抽出対象とする画素値の値が最大値を超えていない場合（Ｓ１１５：ＮＯ）、ステップＳ１０２の処理に進む。 When the pixel value to be extracted does not exceed the maximum value (S115: NO), the process proceeds to step S102.

抽出対象とする画素値の値が最大値を超えている場合（Ｓ１１５：ＹＥＳ）、処理を終了する。 If the pixel value to be extracted exceeds the maximum value (S115: YES), the process ends.

上記では、例えば黒い文字を抽出するため、抽出対象とする画素値の初期値を最小値として、初期値から１ずつ増加させていくことにより、背景よりも色の濃い連結成分を網羅的に取得する処理について説明した。 In the above, for example, in order to extract black characters, the initial value of the pixel value to be extracted is set to the minimum value and incremented by 1 from the initial value, thereby comprehensively acquiring connected components that are darker than the background. The processing to be described has been described.

一方、例えば白い文字も抽出するため、抽出対象とする画素値の初期値を最大値として、初期値から１ずつ減少させていくことにより、背景よりも色の薄い連結成分も網羅的に取得してもよい。 On the other hand, for example, white characters are also extracted, so that the initial value of the pixel value to be extracted is set to the maximum value and is reduced by 1 from the initial value, so that connected components that are lighter in color than the background are also comprehensively acquired. May be.

≪回転境界矩形記述子算出処理≫
次に、ステップＳ１０４、ステップＳ１０９における回転境界矩形記述子を算出する処理について説明する。 ≪Rotation boundary rectangle descriptor calculation process≫
Next, the process of calculating the rotation boundary rectangle descriptor in step S104 and step S109 will be described.

特徴算出部１２１は、d_min、d_max、d⁻ _min、d⁻ _minの４つの値を有する回転境界矩形記述子を算出する。 The feature calculation unit 121 calculates a rotation boundary rectangle descriptor having four values of d _min, d _max, d ⁻ _{min, and} d ⁻ _min .

極値領域Ｐに含まれる各画素をp_i、p_iの座標をx_i、y_i、入力画像の高さ（x方向の長さ）をhとすると、d_min、d_max、d⁻ _min、d⁻ _minは、それぞれ以下の式により算出する。 Assuming that each pixel included in the extreme value region P is p _i , the coordinates of p _i are x _i , y _i , and the height (length in the x direction) of the input image is h, d _min, d _max, d ⁻ _{min ,} D ⁻ _min are calculated by the following equations, respectively.

ここで、求めたい回転境界矩形記述子の回転角度をθとした場合、c=tan(θ)である。 Here, when the rotation angle of the rotation boundary rectangle descriptor to be obtained is θ, c = tan (θ).

図６は、回転境界矩形記述子の算出例を説明する図である。図６では、回転境界矩形記述子の回転角度を４５度とした場合の算出例を示す。図６のような座標系にて回転境界矩形記述子を求める場合、d_min、d_max、d⁻ _min、d⁻ _minを算出することは、それぞれ、最も左上、最も右下、最も左下、最も右上にくる画素を求めることに等しい。 FIG. 6 is a diagram for explaining a calculation example of the rotation boundary rectangle descriptor. FIG. 6 shows a calculation example when the rotation angle of the rotation boundary rectangle descriptor is 45 degrees. When obtaining the rotation boundary rectangle descriptor in the coordinate system as shown in FIG. 6, the calculation of d _min, d _max, d ⁻ _{min and} d ⁻ _min is the upper left, the lower right, the lower left, and the highest, respectively. Equivalent to finding the pixel on the upper right.

また、ある画素値がv=0〜t(t=i)であるような極値領域pの回転境界矩形記述子を計算する際、その極値領域内に、v=0〜t(t<i)である極値領域cが含まれている場合、極値領域cと重なる画素に関しては新たに計算せずに、計算済みの極値領域cの記述子と、極値領域c以外の領域の画素との計算を行うだけでよい。したがって、抽出した全極値領域のd_min、d_max、d⁻ _min、d⁻ _minを求めるための計算コストは、画素数Ｎに対してＮオーダ（O(N)）となる。 Further, when calculating the rotation boundary rectangle descriptor of the extreme value region p such that a certain pixel value is v = 0 to t (t = i), v = 0 to t (t < i) If the extreme value region c is included, a pixel that overlaps the extreme value region c is not newly calculated, and the calculated extreme value region c descriptor and a region other than the extreme value region c are calculated. It is only necessary to perform calculation with the pixels. Therefore, the calculation cost for _obtaining d _min, d _max, d ⁻ _min, d ⁻ _min of all the extracted extreme value regions is N orders (O (N)) with respect to the number N of pixels.

この回転境界矩形記述子から求まる特徴として、次式で算出できる回転アスペクト比特徴（rotated_aspect）について説明する。 A rotation aspect ratio feature (rotated_aspect) that can be calculated by the following equation will be described as a feature obtained from the rotation boundary rectangle descriptor.

ここで、式（６）が４５度回転したときのアスペクト比となる場合について説明する。まずd_min、d_max、d⁻ _min、d⁻ _minより、極値領域を右に４５度回転させた場合の、境界矩形の縦と横の長さについて検討する。ここで、境界矩形とは、対象とする領域に外接する矩形であり、例えば、当該領域を囲い、かつ面積が最少となる矩形のことをいう。d_min、d_maxは、極値領域の構成画素のx+yの最小、最大値である。すなわち、極値領域の画素を貫くx+y=nとなる直線のうち、原点に最も近い直線と、最も遠い直線のnの値を表している。したがって、右に４５度回転した境界矩形の上側の辺は直線x+y= d_min、下側の辺は直線x+y= d_maxと重なる。すなわち、この2つの直線間の距離が４５度回転したときの境界矩形の高さとなる。 Here, a case where the aspect ratio when Expression (6) is rotated by 45 degrees will be described. First _, the vertical and horizontal lengths of the boundary rectangle when the extreme value region is rotated 45 degrees to the right from d _min, d _max, d ⁻ _{min, and} d ⁻ _min will be examined. Here, the boundary rectangle is a rectangle circumscribing the target region, and refers to a rectangle that surrounds the region and has the smallest area. d _{min and} d _max are the minimum and maximum values of x + y of the constituent pixels in the extreme value region. That is, it represents the value of n of the straight line closest to the origin and the farthest straight line among the straight lines of x + y = n passing through the pixels in the extreme value region. Therefore, the upper side of the boundary rectangle rotated 45 degrees to the right overlaps the straight line x + y = d _min , and the lower side overlaps the straight line x + y = d _max . That is, the distance between the two straight lines is the height of the boundary rectangle when rotated by 45 degrees.

ここで、点(x_i, y_i)と直線ax+by+c = 0の距離を求める以下の式を用いる。 Here, the following equation is used to obtain the distance between the point (x _i , y _i ) and the straight line ax + by + c = 0.

それにより、例えば式（７）に、点を(0, d_max)、直線をx+y- d_min = 0として代入と、以下が求まる。 Thus, for example, when substituting (0, d _max ) for a point and x + y−d _min = 0 for a straight line in equation (7), the following is obtained.

ここで、d_maxは常にd_minより大きいため、式（７）は以下に変形できる。 Here, since d _max is always larger than d _min , Equation (7) can be modified as follows.

そして、これが４５度回転したときの境界矩形の高さである。 And this is the height of the boundary rectangle when rotated 45 degrees.

また、d⁻ _min、d⁻ _minについても同様に考えると、４５度回転したときの回転境界矩形の横幅は以下となる。 Considering d ^- _{min and} d ^- _{min in the} same manner, the horizontal width of the rotation boundary rectangle when rotated 45 degrees is as follows.

よって、以下の式より式（６）が求まる。また、４５度以外の角度に回転したときの回転境界矩形に関しても、同様に式（６）で求めることができる。 Therefore, equation (6) is obtained from the following equation. Further, the rotation boundary rectangle when rotated to an angle other than 45 degrees can be similarly obtained by Expression (6).

図７は、回転アスペクト比特徴の計算例を説明する図である。図７では、式（６）より求まる４５度回転アスペクト比特徴の計算例を示す。図７（Ａ）で示すように、伸長度の低い正方形に近い輪郭形状をもつ領域は、従来のアスペクト比と４５度回転アスペクト比のどちらも、おおよそ１に近い値が得られる。 FIG. 7 is a diagram for explaining a calculation example of the rotation aspect ratio feature. FIG. 7 shows a calculation example of the 45-degree rotation aspect ratio feature obtained from Equation (6). As shown in FIG. 7A, a region having a contour shape close to a square with a low degree of elongation can obtain values close to 1 for both the conventional aspect ratio and the 45-degree rotation aspect ratio.

一方、図７（Ｂ）で示すように、伸長度の高い細長い輪郭形状をもつ領域が斜めになっている場合、従来のアスペクト比はおおよそ１に近い値になるが、４５度回転アスペクト比は約０．５３程度となり、従来のアスペクト比と４５度回転アスペクト比の間に大きな違い（差）がでる。 On the other hand, as shown in FIG. 7 (B), when the region having a long and slender outline shape with a high degree of elongation is slanted, the conventional aspect ratio is close to 1, but the 45 degree rotation aspect ratio is It is about 0.53, and there is a large difference (difference) between the conventional aspect ratio and the 45 ° rotation aspect ratio.

このように、従来のアスペクト比と４５度回転したときの回転アスペクト比との差を見ることで、極値領域の輪郭形状のおおよその伸長度を推定することができる。なお、回転角度として４５度の場合だけでなく、複数の角度の回転アスペクト比を併用すれば、伸長度の推定精度は向上する。文字列検出においては、例えば２２．５度や１５度刻み程度の回転アスペクト比を求める場合でも一定の精度で文字検出ができる。なお、２２．５度刻みとする場合は、２２．５度回転アスペクト比、４５度回転アスペクト比、６７．５度回転アスペクト比の３つを求めることになる。また、１５度刻みとする場合は、１５度回転アスペクト比、３０度回転アスペクト比、４５度回転アスペクト比、６０度回転アスペクト比、７５度回転アスペクト比の５つを求めることになる。 Thus, by looking at the difference between the conventional aspect ratio and the rotated aspect ratio when rotated by 45 degrees, it is possible to estimate the approximate extension degree of the contour shape of the extreme value region. In addition to the case where the rotation angle is 45 degrees, if the rotation aspect ratios of a plurality of angles are used in combination, the estimation accuracy of the extension degree is improved. In character string detection, for example, even when a rotation aspect ratio of about 22.5 degrees or 15 degrees is obtained, character detection can be performed with a certain accuracy. In the case of 22.5 degree increments, three of a 22.5 degree rotation aspect ratio, a 45 degree rotation aspect ratio, and a 67.5 degree rotation aspect ratio are obtained. In the case of 15 degree increments, five of 15 degree rotation aspect ratio, 30 degree rotation aspect ratio, 45 degree rotation aspect ratio, 60 degree rotation aspect ratio, and 75 degree rotation aspect ratio are obtained.

なお、回転アスペクト比特徴の値を扱いやすくするために、例えば次式により、回転アスペクト比特徴が常に１以下の値になるようにしてもよい。 In order to facilitate handling of the value of the rotation aspect ratio feature, the rotation aspect ratio feature may always be a value of 1 or less by the following equation, for example.

≪文字らしい極値領域１次抽出処理≫
次に、図８を参照し、ステップＳ３の、文字らしいか否かを１次的に判定し、文字らしい各極値領域を抽出する処理について説明する。図８は、文字らしい極値領域１次抽出処理の一例を示すフローチャートである。 << Characteristic extreme value area primary extraction process >>
Next, with reference to FIG. 8, the process of first determining whether or not a character appears in step S3 and extracting each extreme value region likely to be a character will be described. FIG. 8 is a flowchart showing an example of extreme value region primary extraction processing that seems to be a character.

まず、この段階においては、極値領域記憶バッファに保存された各極値領域の数が比較的多いため、計算コストの高い特徴量を各極値領域に対して算出すると非常に時間がかかる。そこで、計算コストの低いＩＣ記述子を用いた特徴量を各極値領域に対して算出し、算出した特徴量から明らかに文字ではないと推定される極値領域をツリーから削除する。なお、ここで用いる特徴量を用いた文字度合いの判定については、機械学習を用いてあらかじめ訓練しておいたものを用いてもよいし、各特徴について固定の閾値を手動で定めた判定ルールを作るなどしてもよい。 First, at this stage, since the number of each extreme value region stored in the extreme value region storage buffer is relatively large, it takes a very long time to calculate a feature amount with high calculation cost for each extreme value region. Therefore, a feature value using an IC descriptor with a low calculation cost is calculated for each extreme value region, and an extreme value region that is clearly not a character from the calculated feature value is deleted from the tree. In addition, about the determination of the character degree using the feature-value used here, what has been trained in advance using machine learning may be used, or a determination rule in which a fixed threshold value is manually set for each feature. You may make it.

ステップＳ２０１で、文字度合判定部１４は、極値領域抽出部１２が極値領域を抽出し、極値領域記憶バッファに保存した複数の極値領域から、一の極値領域を処理対象に選択する。 In step S 201, the character degree determination unit 14 extracts an extreme value region from the extreme value region extraction unit 12 and selects one extreme value region as a processing target from a plurality of extreme value regions stored in the extreme value region storage buffer. To do.

続いて、ステップＳ２０２で、文字度合判定部１４は、処理対象の極値領域の回転境界矩形記述子を含むＩＣ記述子から、１次特徴量を算出する。これにより、従来技術と比較して、１次抽出に用いる特徴量を一つ増やせるため、１次抽出の性能が向上する。 Subsequently, in step S202, the character degree determination unit 14 calculates a primary feature amount from the IC descriptor including the rotation boundary rectangle descriptor of the extreme value region to be processed. Thereby, since the feature-value used for primary extraction can be increased by one compared with a prior art, the performance of primary extraction improves.

続いて、ステップＳ２０３で、文字度合判定部１４は、算出した１次特徴量に基づき、処理対象の極値領域の文字らしさの度合い（文字度合い）を算出する。 Subsequently, in step S203, the character degree determination unit 14 calculates the character-like degree (character degree) of the extreme value region to be processed based on the calculated primary feature amount.

続いて、ステップＳ２０４で、文字度合判定部１４は、算出した文字度合いが、所定の閾値より大きいか否かを判定する。 Subsequently, in step S204, the character degree determination unit 14 determines whether or not the calculated character degree is greater than a predetermined threshold.

算出した文字度合いが、所定の閾値より大きくない場合（Ｓ２０４：ＮＯ）、ステップＳ２０７の処理に進む。 When the calculated character degree is not greater than the predetermined threshold (S204: NO), the process proceeds to step S207.

算出した文字度合いが、所定の閾値より大きい場合（Ｓ２０４：ＹＥＳ）、ステップＳ２０５で、文字度合判定部１４は、処理対象の極値領域と、処理対象の極値領域の親の極値領域との類似度が所定の閾値以上であるか判定する。なお、類似度は、親の極値領域のＩＣ記述子の値と、子の極値領域のＩＣ記述子の値との類似度を用いて算出してもよい。 When the calculated character degree is larger than the predetermined threshold (S204: YES), in step S205, the character degree determination unit 14 determines the processing target extreme value region and the parent extreme value region of the processing target extreme value region. It is determined whether the similarity is equal to or greater than a predetermined threshold. The similarity may be calculated using the similarity between the IC descriptor value of the parent extreme value region and the IC descriptor value of the child extreme value region.

親の極値領域との類似度が所定の閾値以上でなければ（Ｓ２０５：ＮＯ）、ステップＳ２０８の処理に進む。 If the similarity with the parent extreme value region is not equal to or greater than the predetermined threshold (S205: NO), the process proceeds to step S208.

親の極値領域との類似度が所定の閾値以上であれば（Ｓ２０５：ＹＥＳ）、ステップＳ２０６で、文字度合判定部１４は、親の極値領域の文字度合いが、処理対象の極値領域の文字度合いより大きいか判定する。 If the degree of similarity with the parent extreme value region is equal to or greater than a predetermined threshold (S205: YES), in step S206, the character degree determination unit 14 determines that the character degree of the parent extreme value region is the extreme value region to be processed. It is judged whether it is larger than the character degree.

親の極値領域の文字度合いが、処理対象の極値領域の文字度合いより大きい場合（Ｓ２０６：ＹＥＳ）、ステップＳ２０７で、処理対象の極値領域を極値領域記憶バッファから削除する。 When the character degree of the parent extreme region is larger than the character degree of the extreme region to be processed (S206: YES), the extreme region to be processed is deleted from the extreme region storage buffer in step S207.

親の極値領域の文字度合いが、処理対象の極値領域の文字度合いより大きくない場合（Ｓ２０６：ＮＯ）、ステップＳ２０８で、文字度合判定部１４は、極値領域記憶バッファに保存された各極値領域のうち、未選択の極値領域があるか否かを判定する。 When the character degree of the parent extremum area is not greater than the character degree of the extremum area to be processed (S206: NO), in step S208, the character degree determination unit 14 saves each character stored in the extremum area storage buffer. It is determined whether or not there is an unselected extreme value region among the extreme value regions.

未選択の極値領域があれば（Ｓ２０８：ＹＥＳ）、ステップＳ２０９で、未選択の極値領域を処理対象に選択し、ステップＳ２０２の処理に進む。 If there is an unselected extreme value region (S208: YES), in step S209, an unselected extreme value region is selected as a processing target, and the process proceeds to step S202.

未選択の極値領域がなければ（Ｓ２０８：ＮＯ）、処理を終了する。 If there is no unselected extreme value region (S208: NO), the process is terminated.

それにより、文字度合いが低い極値領域を、極値領域のツリーから削除できる。また、類似する極値領域の親子があった場合、文字度合いが低い方を極値領域のツリーから削除できる。 Thereby, an extreme value region with a low character degree can be deleted from the extreme value region tree. In addition, when there is a parent and child of a similar extreme value region, the character with the lower character degree can be deleted from the extreme value region tree.

上述した極値領域抽出処理により、極値領域記憶バッファには、各極値領域が、親の極値領域に関連付けられて記憶される。そして、文字らしい極値領域１次抽出処理で、親の極値領域に関連付けられた各極値領域に基づいて、極値領域記憶バッファから、文字度合いが低く、ＩＣ記述子の値が近いものを削除する。図９は、このときの、極値領域記憶バッファに記憶されている各極値領域と、親の極値領域との関連付けを示す極値領域のツリー構造の例を示す図である。 By the above-described extreme value region extraction process, each extreme value region is stored in the extreme value region storage buffer in association with the parent extreme value region. Then, in the extreme value region primary extraction process that seems to be a character, the character degree is low and the IC descriptor value is close from the extreme value region storage buffer based on each extreme value region associated with the parent extreme value region Is deleted. FIG. 9 is a diagram illustrating an example of a tree structure of an extreme value region indicating an association between each extreme value region stored in the extreme value region storage buffer and a parent extreme value region at this time.

≪文字らしい極値領域２次抽出処理≫
次に、図１０を参照し、ステップＳ４の、文字らしいか否かを２次的に判定し、文字らしい各極値領域を抽出する処理について説明する。図１０は、文字らしい極値領域２次抽出処理の一例を示すフローチャートである。 ≪Extreme area secondary extraction process that seems to be a character≫
Next, with reference to FIG. 10, a process of secondly determining whether or not a character appears in step S 4 and extracting each extreme value region likely to be a character will be described. FIG. 10 is a flowchart showing an example of extreme value region secondary extraction processing that seems to be a character.

ステップＳ３０１で、文字度合判定部１４は、まず、作業用バッファに記憶している情報を削除（empty）する。 In step S301, the character degree determination unit 14 first deletes the information stored in the work buffer (empty).

続いて、ステップＳ３０２で、文字度合判定部１４は、上述した文字らしい極値領域１次抽出処理により抽出された極値領域が記憶されている極値領域記憶バッファの中から、親の極値領域が対応付けられて記憶されていない全ての極値領域を作業用バッファに格納する。なお、親の極値領域が対応付けられて記憶されていない極値領域とは、図９のツリー図における最も上位（root）の各極値領域５０１である。 Subsequently, in step S302, the character degree determination unit 14 determines the extreme value of the parent from the extreme value region storage buffer in which the extreme value region extracted by the extreme value region primary extraction process that seems to be the character is stored. All extremal areas that are not associated and stored are stored in the work buffer. It should be noted that the extreme value region where the parent extreme value region is not associated and stored is each extreme value region 501 at the highest level (root) in the tree diagram of FIG.

続いて、ステップＳ３０３で、文字度合判定部１４は、作業用バッファに格納されている複数の極値領域のうち、一の極値領域を処理対象に選択する。 Subsequently, in step S303, the character degree determination unit 14 selects one extreme value region from among a plurality of extreme value regions stored in the work buffer as a processing target.

続いて、ステップＳ３０４で、文字度合判定部１４は、処理対象の極値領域に対する参照を、作業用バッファから削除する。 Subsequently, in step S304, the character degree determination unit 14 deletes the reference to the extreme value area to be processed from the work buffer.

続いて、ステップＳ３０５で、文字度合判定部１４は、処理対象の極値領域の文字度合いを算出する。なお、ステップＳ３０５では、上述したステップＳ２０３の処理の際よりも、算出にコストがかかる特徴量を用いてもよい。例えば、従来技術と同様に、ホール・エリア率特徴、凸包率特徴、外周上の変曲点数特徴を用いる。なお、これに加えて、回転境界矩形記述子を含むＩＣ記述子も用いてもよい。これにより、上述した文字らしい極値領域１次抽出処理により、一定程度絞られた数の極値領域のみに対して算出にコストがかかる特徴量を用いた文字度合いを算出する処理を行う。 Subsequently, in step S305, the character degree determination unit 14 calculates the character degree of the extreme value region to be processed. Note that in step S305, a feature amount that is more costly to calculate may be used than in the process of step S203 described above. For example, the hole area rate feature, the convex hull rate feature, and the inflection point number feature on the outer periphery are used as in the prior art. In addition to this, an IC descriptor including a rotation boundary rectangle descriptor may be used. As a result, by the above-described primary extraction process of extreme values like characters, only the number of extreme values reduced to a certain extent is used to calculate the character degree using the feature amount that is expensive to calculate.

続いて、ステップＳ３０６で、文字度合判定部１４は、算出した文字度合いが、所定の閾値より大きいか否かを判定する。 Subsequently, in step S306, the character degree determination unit 14 determines whether or not the calculated character degree is greater than a predetermined threshold.

算出した文字度合いが、所定の閾値より大きくない場合（Ｓ３０６：ＮＯ）、ステップＳ３０７で、後述する立体側面除去処理を行う。 If the calculated character degree is not greater than the predetermined threshold (S306: NO), a solid side surface removal process described later is performed in step S307.

続いて、ステップＳ３０８で、文字度合判定部１４は、立体側面除去処理を行った後の極値領域を処理対象の極値領域とし、処理対象の極値領域の文字度合いを算出する。なおこの処理は、ステップＳ３０５の処理と同様でもよい。 Subsequently, in step S308, the character degree determination unit 14 calculates the character degree of the processing target extreme value region by setting the extreme value region after the three-dimensional side surface removal processing as the processing target extreme value region. This process may be the same as the process in step S305.

続いて、ステップＳ３０９で、文字度合判定部１４は、算出した文字度合いが、所定の閾値より大きいか否かを判定する。なお、この処理は、ステップＳ３０６の処理と同様でもよい。 Subsequently, in step S309, the character degree determination unit 14 determines whether or not the calculated character degree is larger than a predetermined threshold value. This process may be the same as the process in step S306.

算出した文字度合いが、所定の閾値より大きい場合（Ｓ３０９：ＹＥＳ）、後述するステップＳ３１０の処理に進む。 When the calculated character degree is larger than the predetermined threshold (S309: YES), the process proceeds to step S310 described later.

算出した文字度合いが、所定の閾値より大きくない場合（Ｓ３０９：ＮＯ）、後述するステップＳ３１３の処理に進む。 If the calculated character degree is not greater than the predetermined threshold (S309: NO), the process proceeds to step S313 described later.

ステップＳ３０６で算出した文字度合いが、所定の閾値より大きい場合（Ｓ３０６：ＹＥＳ）、ステップＳ３１０で、文字度合判定部１４は、処理対象の極値領域の形状が細長い（伸長度が大きい）か否か、を判定する。なお、処理対象の極値領域の形状が細長いか否かの判定処理については後述する。 When the character degree calculated in step S306 is larger than the predetermined threshold (S306: YES), in step S310, the character degree determination unit 14 determines whether or not the shape of the extreme value region to be processed is elongated (the degree of expansion is large). To determine. A process for determining whether or not the shape of the extreme value region to be processed is elongated will be described later.

処理対象の極値領域が細長くない場合（Ｓ３１０：ＮＯ）、ステップＳ３１１で、文字度合判定部１４は、処理対象の極値領域に対する参照を第１のリストに追加し、後述するステップＳ３１４の処理に進む。これにより、第１のリストには、文字らしい極値領域が格納される。 If the extreme region to be processed is not elongated (S310: NO), in step S311, the character degree determination unit 14 adds a reference to the extreme region to be processed to the first list, and the processing in step S314 described later. Proceed to As a result, extreme values regions that are likely to be characters are stored in the first list.

処理対象の極値領域が細長い場合（Ｓ３１０：ＹＥＳ）、ステップＳ３１２で、文字度合判定部１４は、処理対象の極値領域を第２のリストに追加する。これにより、第２のリストには、文字らしく、かつ細長い形状の極値領域が格納される。 If the extreme region to be processed is elongated (S310: YES), in step S312, the character degree determination unit 14 adds the extreme region to be processed to the second list. As a result, extremal areas that are character-like and elongated are stored in the second list.

続いて、ステップＳ３１３で、文字度合判定部１４は、処理対象の極値領域の子の極値領域を作業用バッファに追加する。 Subsequently, in step S313, the character degree determination unit 14 adds an extreme value region, which is a child of the extreme value region to be processed, to the work buffer.

続いて、ステップＳ３１４で、文字度合判定部１４は、未選択の極値領域が作業用バッファにあるか否かを判定する。 Subsequently, in step S314, the character degree determination unit 14 determines whether or not an unselected extreme value region exists in the work buffer.

未選択の極値領域が作業用バッファにあれば（Ｓ３１４：ＹＥＳ）、ステップＳ３１５で、未選択の極値領域を処理対象に選択し、ステップＳ３０４の処理に進む。 If there is an unselected extremal area in the work buffer (S314: YES), in step S315, the unselected extremal area is selected as a processing target, and the process proceeds to step S304.

未選択の極値領域が作業用バッファになければ（Ｓ３１５：ＮＯ）、処理を終了する。 If there is no unselected extremal area in the work buffer (S315: NO), the process ends.

これにより、極値領域記憶バッファにツリー構造で記憶されている各極値領域について、ツリーの親から順に文字度合いを判定し、文字度合いが高い極値領域については、当該極値領域の子の極値領域に対する文字度合いの判定処理を省略する。なぜなら、文字度合いが高い極値領域の子の極値領域は、文字の一部しか含まれていないと想定できるためである。 Thus, for each extreme value region stored in a tree structure in the extreme value region storage buffer, the character degree is determined in order from the parent of the tree, and for an extreme value region with a high character degree, a child of the extreme value region is determined. The character degree determination process for the extreme value region is omitted. This is because it can be assumed that the extreme value region that is a child of the extreme value region having a high character degree includes only a part of the character.

また、その際、細長い極値領域については、当該極値領域の子の極値領域に対する文字度合いの判定処理を省略しないことにより、複数の文字が繋がった極値領域の誤判定を防ぐことができる。 Further, at that time, for the long extremum area, it is possible to prevent erroneous determination of the extremum area where a plurality of characters are connected by not omitting the character degree determination process for the extremum area of the child of the extremum area. it can.

図１１を参照し、子の極値領域に対する文字度合いの判定処理を省略する例を示す。図１１は、子の極値領域に対する文字度合いの判定を省略する処理について説明する図である。図１１に示すように、文字度合いと、極値領域が細長いか否かの判定とを用いることで、複数の文字が繋がった領域の誤検出を防止しながら、一文字分の領域の検出と、判定処理の省略を行える。 Referring to FIG. 11, an example in which the character degree determination process for the child extreme value region is omitted will be described. FIG. 11 is a diagram illustrating a process of omitting the character degree determination for the child extreme value region. As shown in FIG. 11, by using the character degree and determination of whether or not the extreme value region is elongated, detection of a region for one character while preventing erroneous detection of a region where a plurality of characters are connected, Judgment processing can be omitted.

≪立体側面除去処理≫
次に、図１２乃至図１５を参照し、ステップＳ３０７の、立体側面除去処理について説明する。図１２は、立体文字が写った元画像の例を示す図である。非特許文献１記載の従来技術では、連結成分の画素に近接するすべての背景の画素が連結成分の色に比べて濃い、または薄い領域しか抽出することができない。 ≪3D side removal processing≫
Next, the three-dimensional side surface removal process in step S307 will be described with reference to FIGS. FIG. 12 is a diagram illustrating an example of an original image in which a three-dimensional character is shown. In the prior art described in Non-Patent Document 1, it is possible to extract only a region where all background pixels adjacent to a connected component pixel are darker or lighter than the connected component color.

そのため、例えば図１２のように、白い壁に、黒い文字のオブジェクトが立体的に映っている場合、文字の背景（壁面）の色は文字色より薄いが、浮き出た文字の影部分（立体文字の側面）は文字色より濃くなる。この場合、連結成分を抽出する際の画素値ｖの初期値を最小値（ｖ＝０）とすると、立体文字の影部分が比較的初期の段階で抽出される。このため、得られる極値領域のうち、立体文字を含むものには、必ず立体文字の影部分が含まれてしまう。 Therefore, for example, as shown in FIG. 12, when a black character object is three-dimensionally reflected on a white wall, the character background (wall surface) is lighter than the character color, but the shadow portion of the raised character (three-dimensional character) The side) is darker than the text color. In this case, assuming that the initial value of the pixel value v when extracting the connected components is the minimum value (v = 0), the shadow portion of the three-dimensional character is extracted at a relatively initial stage. For this reason, among the obtained extreme value regions, those including a three-dimensional character always include a shadow portion of the three-dimensional character.

一方、連結成分を抽出する際の画素値ｖの初期値を最大値（ｖ＝２５５）とすると、文字の背景部分（壁面）が比較的初期の段階で抽出される。このため、得られる極値領域のうち、立体文字を含むものには、必ず立体文字の背景部分が含まれてしまう。 On the other hand, if the initial value of the pixel value v when extracting the connected components is the maximum value (v = 255), the background portion (wall surface) of the character is extracted at a relatively early stage. For this reason, among the obtained extreme value regions, those including a three-dimensional character always include the background portion of the three-dimensional character.

このように立体文字が、影部分または背景部分と分離されていないため、例えば、検出した極値領域の形状に基づいて文字認識しようとした場合に、正しく文字認識できない場合いがある。 Thus, since the three-dimensional character is not separated from the shadow portion or the background portion, for example, when character recognition is attempted based on the detected shape of the extreme value region, the character may not be recognized correctly.

そこで、除去部１５は、以下の立体側面除去処理を行う。図１３は、立体側面除去処理の一例を示すフローチャートである。 Therefore, the removal unit 15 performs the following three-dimensional side surface removal process. FIG. 13 is a flowchart illustrating an example of the three-dimensional side surface removal process.

ステップＳ３０７−１で、除去部１５は、処理対象の極値領域に含まれる複数の画素（処理対象の極値領域を構成する複数の画素）を、画素値に応じて、複数のグループに分割する。例えば、除去部１５は、処理対象の極値領域に含まれる複数の画素のヒストグラムを分析し、グループ内偏差が小さく、グループ間の画素値（例えば各グループの画素の平均値）の差が大きくなるような閾値で、画素値に応じて複数のグループに分割する。なお、グループの数は、２つに限らない。例えば３つ以上に分割してもよい。 In step S307-1, the removing unit 15 divides a plurality of pixels included in the processing target extreme value region (a plurality of pixels configuring the processing target extreme value region) into a plurality of groups according to the pixel value. To do. For example, the removal unit 15 analyzes a histogram of a plurality of pixels included in the extreme value region to be processed, has a small intra-group deviation, and a large difference in pixel values between groups (for example, an average value of pixels in each group). With such a threshold value, it is divided into a plurality of groups according to the pixel value. Note that the number of groups is not limited to two. For example, it may be divided into three or more.

続いて、ステップＳ３０７−２で、除去部１５は、分離したグループ（成分）のうち、立体文字の影部分を含むグループを除去する。 Subsequently, in step S307-2, the removal unit 15 removes the group including the shadow portion of the three-dimensional character from the separated groups (components).

ここで、連結成分を抽出する際の画素値ｖの初期値を最小値（ｖ＝０）として処理対象の極値領域を抽出した場合は、まず、画素値の低いグループを除去する。なぜなら、画素値の高いグループを除去することで得られる極値領域は、連結成分を抽出する際に比較的初期の段階ですでに得られているためである。 Here, when the extreme value region to be processed is extracted by setting the initial value of the pixel value v when extracting the connected components to the minimum value (v = 0), first, the group having a low pixel value is removed. This is because an extreme value region obtained by removing a group having a high pixel value is already obtained at a relatively early stage when extracting connected components.

連結成分を抽出する際の画素値ｖの初期値を最大値（ｖ＝２５５）として処理対象の極値領域を抽出した場合は、画素値の高いグループを除去する。 When the extreme value region to be processed is extracted by setting the initial value of the pixel value v when extracting the connected component to the maximum value (v = 255), the group having a high pixel value is removed.

図１４は、立体文字の側面である影部分を除去する処理を説明する図である。図１４では、処理対象の極値領域に含まれる複数の画素の値を横軸、画素の数を縦軸としたヒストグラムの例を示している。除去部１５は、処理対象の極値領域に含まれる複数の画素を、影部分５１１と、文字形状を示す部分５１２とに分離し、影部分５１１の画素を、処理対象の極値領域から除去する。 FIG. 14 is a diagram for explaining processing for removing a shadow portion that is a side surface of a three-dimensional character. FIG. 14 shows an example of a histogram in which the horizontal axis represents the values of a plurality of pixels included in the extreme value region to be processed and the vertical axis represents the number of pixels. The removal unit 15 separates a plurality of pixels included in the processing target extreme value region into a shadow portion 511 and a character shape portion 512, and removes the shadow portion 511 pixel from the processing target extreme value region. To do.

図１５は、立体文字の側面である影部分を除去する処理を説明する図である。図１５（Ａ）は、白い壁に、「い」の文字の立体的なオブジェクトが付加されている場合の元画像の例である。図１５（Ｂ）は、元画像から抽出された、処理対象の極値領域の例である。図１５（Ｃ）は、処理対象の極値領域に含まれる複数の画素が、画素値に応じて、複数のグループ５１１、５１２に分割された際の例である。図１５（Ｄ）は、立体文字の影部分５１１を含むグループを除去された後の極値領域の例である。 FIG. 15 is a diagram for explaining processing for removing a shadow portion that is a side surface of a three-dimensional character. FIG. 15A is an example of an original image in a case where a three-dimensional object of the letter “I” is added to a white wall. FIG. 15B is an example of the extreme value region to be processed extracted from the original image. FIG. 15C is an example when a plurality of pixels included in the extreme value region to be processed are divided into a plurality of groups 511 and 512 according to the pixel values. FIG. 15D is an example of the extreme value region after the group including the shadow portion 511 of the three-dimensional character is removed.

なお、ステップＳ３０７−１の処理において、３つ以上のグループに分割した場合は、最も画素値の低い（または高い）グループを１つ削除した極値領域、または画素値の低い（または高い）順にグループを２つ削除した極値領域、というように、複数の極値領域を抽出してもよい。 In addition, in the process of step S307-1, when it divides | segments into 3 or more groups, the extreme value area | region which deleted one group with the lowest (or high) pixel value, or the order of the low (or high) pixel value. A plurality of extreme value regions may be extracted, such as an extreme value region in which two groups are deleted.

なお、処理対象の極値領域の形状から、明らかに文字を内包していないと判断できる場合には、立体側面除去処理を省略してもよい。 Note that the solid side surface removal process may be omitted if it can be determined from the shape of the extreme value area to be processed that the character is clearly not included.

≪立体側面除去処理の変形例≫
連結成分を分離してから除去する代わりに、処理対象の極値領域から、当該処理対象の極値領域を抽出した際とは逆の順で、極値領域を抽出してもよい。例えば、画素値ｖの初期値を最小値（ｖ＝０）として処理対象の極値領域を抽出した場合は、処理対象の極値領域に含まれる各画素について、画素値ｖの初期値を最大値（ｖ＝２５５）として極値領域を行う。 ≪Modification of 3D side face removal process≫
Instead of removing the connected components after separating them, the extreme value regions may be extracted from the extreme value region to be processed in the reverse order to the extraction of the extreme value region to be processed. For example, when the extreme value region to be processed is extracted with the initial value of the pixel value v as the minimum value (v = 0), the initial value of the pixel value v is maximized for each pixel included in the extreme value region to be processed. An extreme value region is performed as a value (v = 255).

この場合、ステップＳ３０８、ステップＳ３０９の代わりに、当該処理対象の極値領域を抽出した際とは逆の順で抽出した各極値領域のツリー構造のデータに対し、上述した文字らしい極値領域１次抽出処理を行う。そして、それにより抽出された領域を当該処理対象の極値領域の子の極値領域とし、ステップＳ３１３の処理に進むようにする。 In this case, instead of step S308 and step S309, the above-described character-like extreme value region is used for the data of the tree structure of each extreme value region extracted in the reverse order to the extraction of the extreme value region to be processed. A primary extraction process is performed. Then, the extracted region is set as an extreme value region as a child of the extreme value region to be processed, and the process proceeds to step S313.

≪伸長度（細長いか）の判定処理≫
次に、ステップＳ３１０の、極値領域が細長いか否かを判定する処理について説明する。 ≪Elongation (long or narrow) judgment process≫
Next, the process of determining whether or not the extreme value region is elongated in step S310 will be described.

複数の文字が隣接している場合、それらの文字が繋がった極値領域がツリーに含まれることがある。このような領域は、文字らしい特徴を多く含んでいるため高い確度で文字らしいと判定されることがある。そのため、文字毎の極値領域を検出したい場合、アスペクト比が１に近くない場合、すなわち領域の形状が細長い極値領域の場合は、その子の極値領域（下位領域）の判定を省略しないようにする。 When a plurality of characters are adjacent to each other, an extreme value region where the characters are connected may be included in the tree. Since such an area includes many features that are likely to be characters, it may be determined as likely to be a character with high accuracy. Therefore, when it is desired to detect an extreme value region for each character, when the aspect ratio is not close to 1, that is, when the shape of the region is an elongated extreme value region, the determination of the extreme value region (lower region) of the child is not omitted. To.

カメラまたは被写体が垂直でない状態で撮影された等の場合、回転した漢字や仮名等の文字が画像に含まれ、複数の文字が繋がった極値領域であるにもかかわらずアスペクト比が１に近くなる場合がある。そこで、上述した回転アスペクト比特徴を用いることで、複数の文字が繋がって回転している極値領域については、細長いと判定することができる。 When the camera or the subject is photographed in a non-vertical state, characters such as rotated kanji and kana are included in the image, and the aspect ratio is close to 1 even though it is an extreme region where multiple characters are connected. There is a case. Therefore, by using the rotation aspect ratio feature described above, it is possible to determine that the extreme value region in which a plurality of characters are connected and rotated is elongated.

例えば、アスペクト比、及び全ての回転アスペクト比がともに0.8〜1.2の範囲に収まる場合に、細長くないと判定してもよい。 For example, when both the aspect ratio and all the rotation aspect ratios fall within the range of 0.8 to 1.2, it may be determined that the length is not elongated.

≪文字列らしい各極値領域１次抽出処理≫
次に、図１６を参照し、ステップＳ５の、文字列らしいか否かを判定し、文字列らしい各極値領域を抽出する処理について説明する。図１６は、文字列らしい各極値領域１次抽出処理の一例を示すフローチャートである。 ≪Each extreme value area primary extraction process that seems to be a character string≫
Next, with reference to FIG. 16, the process of determining whether or not a character string is likely in step S 5 and extracting each extreme value region likely to be a character string will be described. FIG. 16 is a flowchart showing an example of each extreme value region primary extraction process that seems to be a character string.

ステップＳ４０１で、文字列領域検出部１６は、作業用バッファに記憶されている情報を削除（empty）する。 In step S401, the character string area detection unit 16 deletes the information stored in the work buffer.

続いて、ステップＳ４０２で、文字列領域検出部１６は、上述した第１のリスト及び第２のリストに格納されている複数の極値領域のうち、一の極値領域ER_iを処理対象に選択する。なお、ステップＳ４０２においては、第１のリストのみを用いてもよい。ステップＳ４０２において第１のリスト及び第２のリストの両方を用いる場合、情景文字を抽出する精度は比較的高くなるものの、処理時間は比較的長くなる。ステップＳ４０２において第１のリストのみを用いる場合、処理時間は比較的速くなるものの、情景文字を抽出する精度は比較的低下する。 Subsequently, in step S402, the character string region detection unit 16 sets one extreme value region ER _i among the plurality of extreme value regions stored in the first list and the second list to be processed. select. In step S402, only the first list may be used. When both the first list and the second list are used in step S402, the accuracy of extracting scene characters is relatively high, but the processing time is relatively long. When only the first list is used in step S402, the processing time is relatively fast, but the accuracy of extracting scene characters is relatively lowered.

具体的には、ステップＳ４０２において第１のリストのみを用いる場合、第１のリストには、細長い形状の極値領域が含まれていないため、例えば、「ｉｉｊ」や「いに」等の細長い形状の極値領域のみで構成される文字列が検出できない。しかし、日本語の情景文字においてそのような文字列は実際にはほぼ存在しないため、実際上は問題ないと考えられる。なお、英語や中国語等においても同様であると考えられる。なお、文字列に１文字でも、細長くない形状の極値領域が含まれていれば、後述するステップＳ４０４乃至ステップＳ４０５の処理により、当該極値領域の周辺に位置する細長い各極値領域を第２のリストから検出できるため、情景文字を抽出する精度は低下しない。 Specifically, when only the first list is used in step S402, the first list does not include an elongated extreme value region, and thus, for example, an elongated shape such as “iii” or “Ini”. A character string composed only of the extreme region of the shape cannot be detected. However, since there are practically no such character strings in Japanese scene characters, it is considered that there is actually no problem. The same applies to English and Chinese. If the character string includes an extreme value area having a shape that is not long and narrow, the processing of steps S404 to S405, which will be described later, causes each of the elongated extreme value areas located around the extreme value area to be Since it can be detected from the list of 2, the accuracy of extracting scene characters does not decrease.

続いて、ステップＳ４０３で、文字列領域検出部１６は、処理対象の極値領域を要素にもつ配列ERline_iを作成する。 Subsequently, in step S403, the character string region detection unit 16 creates an array ERline _i having the extreme value region to be processed as an element.

続いて、ステップＳ４０４で、文字列領域検出部１６は、第１のリスト及び第２のリストに格納されている各極値領域のうち、他の極値領域ER_jを処理対象に選択する。 Subsequently, in step S404, the character string region detecting unit 16, among the extreme region stored in the first list and the second list, to select another extreme region ER _j to be processed.

続いて、ステップＳ４０５で、文字列領域検出部１６は、他の極値領域ER_jが、配列ERline_iの要素のいずれかと同じ文字列らしいか否かを判定する。 Subsequently, in step S405, the character string region detection unit 16 determines whether or not the other extreme value region ER _j seems to be the same character string as any of the elements of the array ERline _i .

なお、文字列領域検出部１６は、例えば、位置座標上の間の距離が所定の閾値以下であって、各極値領域の位置が近い場合に、２つの極値領域が同じ文字列らしいと判定する。
また、文字列領域検出部１６は、各極値領域の高さが近い場合にも、２つの極値領域が同じ文字列らしいと判定する。なお、この場合には、各極値領域の高さとして、上述した回転境界矩形記述子により算出した、各極値領域の斜め方向の高さを用いてもよい。さらに、文字列領域検出部１６は、各極値領域の色（画素値）が近い場合にも、２つの極値領域が同じ文字列らしいと判定する。 Note that the character string region detection unit 16 determines that the two extreme value regions are likely to be the same character string when the distance between the position coordinates is equal to or less than a predetermined threshold value and the positions of the extreme value regions are close, for example. judge.
The character string region detection unit 16 also determines that the two extreme value regions are likely to be the same character string even when the heights of the extreme value regions are close. In this case, as the height of each extreme value region, the height in the oblique direction of each extreme value region calculated by the rotation boundary rectangle descriptor described above may be used. Furthermore, the character string region detection unit 16 determines that the two extreme value regions are likely to be the same character string even when the colors (pixel values) of the extreme value regions are close.

他の極値領域ER_jが、配列ERline_iの要素のいずれかと同じ文字列らしくない場合（Ｓ４０５：ＮＯ）、後述するステップＳ４０７の処理に進む。 When the other extreme value region ER _j is not likely to be the same character string as any of the elements of the array ERline _i (S405: NO), the process proceeds to step S407 described later.

他の極値領域ER_jが、配列ERline_iの要素のいずれかと同じ文字列らしい場合（Ｓ４０５：ＹＥＳ）、ステップＳ４０６で、文字列領域検出部１６は、他の極値領域ER_jを、配列ERline_iの要素として格納する。 When the other extreme value region ER _j seems to be the same character string as one of the elements of the array ERline _i (S405: YES), in step S406, the character string region detection unit 16 arranges the other extreme value region ER _j into the array. Store as ERline _i element.

続いて、ステップＳ４０７で、文字列領域検出部１６は、第１のリスト及び第２のリストに格納されている各極値領域のうち、未選択の他の極値領域ER_jがあるか否かを判定する。 Subsequently, in step S407, the character string region detection unit 16 determines whether there is another unselected extreme value region ER _j among the extreme value regions stored in the first list and the second list. Determine whether.

未選択の他の極値領域ER_jがあれば（Ｓ４０７：ＹＥＳ）、ステップＳ４０８で、文字列領域検出部１６は、未選択の他の極値領域ER_jを処理対象に選択し、ステップＳ４０５の処理に進む。 If there is another extreme value region ER _{j that} has not been selected (S407: YES), in step S408, the character string region detection unit 16 selects another extreme value region ER _j that has not been selected as a processing target, and step S405 is performed. Proceed to the process.

未選択の他の極値領域ER_jがなければ（Ｓ４０７：ＮＯ）、文字列領域検出部１６は、ステップＳ４０９で、配列ERline_iの要素の数が、所定数（例えば３）以上であるか判定する。 If there is no other extreme value region ER _j that has not been selected (S407: NO), in step S409, the character string region detection unit 16 determines whether the number of elements of the array ERline _i is a predetermined number (eg, 3) or more. judge.

配列ERline_iの要素の数が、所定数以上でなければ（Ｓ４０９：ＮＯ）、ステップＳ４１０で、文字列領域検出部１６は、配列ERline_iを破棄し、後述するステップＳ４１２の処理に進む。 If the number of elements of the array ERline _i is not greater than or equal to the predetermined number (S409: NO), in step S410, the character string region detection unit 16 discards the array ERline _i and proceeds to the process of step S412 described later.

配列ERline_iの要素の数が、所定数以上であれば（Ｓ４０９：ＹＥＳ）、ステップＳ４１１で、文字列領域検出部１６は、配列ERline_iを配列リストに格納する。 If the number of elements of the array ERline _i is equal to or greater than the predetermined number (S409: YES), in step S411, the character string region detection unit 16 stores the array ERline _i in the array list.

続いて、ステップＳ４１２で、文字列領域検出部１６は、第１のリスト及び第２のリストに格納されている各極値領域のうち、未選択の一の極値領域ER_iがあるか否かを判定する。 Subsequently, in step S412, the character string region detection unit 16 determines whether or not there is one unselected extreme value region ER _i among the extreme value regions stored in the first list and the second list. Determine whether.

未選択の一の極値領域ER_iがあれば（Ｓ４１２：ＹＥＳ）、ステップＳ４１３で、文字列領域検出部１６は、未選択の一の極値領域ER_iを処理対象に選択し、ステップＳ４０３の処理に進む。 If there is one unselected extreme value region ER _i (S412: YES), in step S413, the character string region detection unit 16 selects one unselected extreme value region ER _i as a processing target, and step S403. Proceed to the process.

未選択の一の極値領域ER_iがなければ（Ｓ４１２：ＮＯ）、文字列領域検出部１６は、処理を終了する。 If there is no unselected extreme value region ER _i (S412: NO), the character string region detection unit 16 ends the process.

≪文字列らしい各極値領域２次抽出処理（射影歪み対応）≫
次に、図１７を参照し、ステップＳ６の、文字列らしいか否かを比較的詳細に判定し、文字列らしい各極値領域を抽出する処理について説明する。図１７は、文字列らしい各極値領域２次抽出処理の一例を示すフローチャートである。 ≪Secondary extraction process for extremum regions that seems to be character strings (projection distortion support) ≫
Next, with reference to FIG. 17, a description will be given of a process of extracting each extreme value region that seems to be a character string by determining whether or not the character string seems to be relatively detailed in step S 6. FIG. 17 is a flowchart showing an example of each extreme value region secondary extraction process that seems to be a character string.

例えば収録済みのニュース素材の動画に含まれる画像から情景文字を検出する場合、裁判所名等を表示する看板が、斜めのアングルから撮影されている場合等がある。この場合、カメラから見て手前側の文字は大きく写り、奥側の文字は小さく写るという射影歪みが生じる。そのため、一の文字列に含まれる各文字の大きさは同程度であるという前提で文字列を検出すると、奥側の小さく写った文字の極値領域や、文字の一部である極値領域を検出できない場合があるという問題がある。そこで、本実施形態では、以下の処理により、このような射影歪みが生じている文字列において、当該文字列に含まれる各文字の位置及び大きさに基づいて、当該文字列を構成する極値領域の探し直しを行う。 For example, when a scene character is detected from an image included in a moving picture of recorded news material, a signboard displaying a court name or the like may be taken from an oblique angle. In this case, a projection distortion occurs in which characters on the near side as viewed from the camera appear large and characters on the far side appear small. Therefore, if a character string is detected on the premise that the size of each character included in one character string is approximately the same, an extreme value area of a character that appears small on the back side or an extreme value area that is a part of the character There is a problem that sometimes cannot be detected. Therefore, in the present embodiment, in the character string in which such a projection distortion is caused by the following processing, the extremum constituting the character string is based on the position and size of each character included in the character string. Re-search the area.

ステップＳ５０１で、推定部１７は、上述した配列リストに格納されている各配列のうち、一の配列ERline_iを処理対象に選択する。 In step S501, the estimation unit 17 selects one array ERline _i from among the arrays stored in the above-described array list as a processing target.

続いて、ステップＳ５０２で、推定部１７は、処理対象の配列ERline_iの要素である各極値領域のうち、文字列の基準となる複数の極値領域を選択する。なお、この処理の詳細は後述する。 Subsequently, in step S502, the estimation unit 17 selects a plurality of extreme value regions serving as a reference for the character string from among the extreme value regions that are elements of the array ERline _{i to be} processed. Details of this process will be described later.

続いて、ステップＳ５０３で、推定部１７は、基準となる複数の極値領域に近接する２つの直線を算出する。 Subsequently, in step S503, the estimation unit 17 calculates two straight lines that are close to a plurality of reference extreme value regions.

続いて、ステップＳ５０４で、推定部１７は、上述した第１のリスト及び第２のリストに格納されている各極値領域のうち、配列リストに格納されている各配列の要素でない極値領域を処理対象として選択する。 Subsequently, in step S504, the estimation unit 17 determines that the extreme value region that is not an element of each array stored in the array list among the extreme value regions stored in the first list and the second list described above. Is selected as a processing target.

続いて、ステップＳ５０５で、推定部１７は、処理対象の極値領域が、ステップＳ５０３で算出した２つの直線の間に位置するか否かを判定する。 Subsequently, in step S505, the estimation unit 17 determines whether or not the processing target extreme value region is located between the two straight lines calculated in step S503.

処理対象の極値領域が、ステップＳ５０３で算出した２つの直線の間に位置しなければ（Ｓ５０５：ＮＯ）、後述するステップＳ５０８の処理に進む。 If the extreme value region to be processed is not located between the two straight lines calculated in step S503 (S505: NO), the process proceeds to step S508 described later.

処理対象の極値領域が、ステップＳ５０３で算出した２つの直線の間に位置すれば（Ｓ５０５：ＹＥＳ）、ステップＳ５０６で、検出部１８は、処理対象の極値領域が、処理対象の配列ERline_iの要素のいずれかの極値領域と色や位置等が近いか否かを判定する。 If the processing target extreme value region is located between the two straight lines calculated in step S503 (S505: YES), in step S506, the detection unit 18 determines that the processing target extreme value region is the processing target array ERline. _It is determined whether or not the extreme value region of any element of _i is close to the color or position.

処理対象の極値領域が、処理対象の配列ERline_iの要素のいずれかの極値領域と色や位置等が近くなければ（Ｓ５０６：ＮＯ）、後述するステップＳ５０８の処理に進む。 If the extreme region to be processed is not close to the color, position, or the like of any of the elements in the array ERline _{i to be} processed (S506: NO), the process proceeds to step S508 described later.

処理対象の極値領域が、処理対象の配列ERline_iの要素のいずれかの極値領域と色や位置等が近ければ（Ｓ５０６：ＹＥＳ）、ステップＳ５０７で、検出部１８は、処理対象の極値領域を、処理対象の配列ERline_iの要素として格納する。 If the extreme region to be processed is close in color, position, etc. to any one of the elements of the array ERline _{i to be} processed (S506: YES), in step S507, the detection unit 18 detects the extreme region to be processed. The value area is stored as an element of the processing target array ERline _i .

なお、上述したステップＳ４０５では、例えば各極値領域の高さが近いかを判定するが、ステップＳ５０６では、２つの直線の間に位置すれば、高さが近いかについては判定しない。それにより、他の極値領域と比較して極めて小さい極値領域も、処理対象の配列ERline_iの要素として格納される。 In step S405 described above, for example, it is determined whether or not the heights of the respective extreme value regions are close. However, in step S506, it is not determined whether the height is close if it is located between two straight lines. Thereby, an extreme value region that is extremely small compared to other extreme value regions is also stored as an element of the array ERline _i to be processed.

続いて、ステップＳ５０８で、推定部１７は、第１のリスト及び第２のリストに格納されている各極値領域であって、配列リストに格納されている各配列のうちのいずれの配列の要素でもない極値領域のうち、処理対象として未選択の極値領域があるか否か判定する。 Subsequently, in step S508, the estimation unit 17 includes each extreme value region stored in the first list and the second list, and any of the arrays stored in the array list. It is determined whether or not there is an unselected extreme value region as a processing target among extreme value regions that are not elements.

処理対象として未選択の極値領域があれば（Ｓ５０８：ＹＥＳ）、ステップＳ５０９で、推定部１７は、当該未選択の極値領域を処理対象として選択し、ステップＳ５０３の処理に進む。 If there is an unselected extreme value region as a processing target (S508: YES), in step S509, the estimating unit 17 selects the unselected extreme value region as a processing target, and the process proceeds to step S503.

処理対象として未選択の極値領域がなければ（Ｓ５０８：ＮＯ）、ステップＳ５１０で、推定部１７は、配列リストに格納されている各配列のうち、処理対象として未選択の配列ERline_iがあるか否か判定する。 If there is no unselected extreme value region as a processing target (S508: NO), in step S510, the estimation unit 17 has an unselected array ERline _i as a processing target among the arrays stored in the array list. It is determined whether or not.

処理対象として未選択の配列ERline_iがあれば（Ｓ５１０：ＹＥＳ）、ステップＳ５１１で、未選択の配列ERline_iを処理対象に選択し、ステップＳ５０２の処理に進む。 If there is an unselected array ERline _i as a processing target (S510: YES), the unselected array ERline _i is selected as a processing target in step S511, and the process proceeds to step S502.

処理対象として未選択の配列ERline_iがなければ（Ｓ５１０：ＮＯ）、推定部１７は、処理を終了する。 If there is no unselected array ERline _i to be processed (S510: NO), the estimating unit 17 ends the process.

図１８は、文字列らしい各極値領域２次抽出処理を説明する図である。図１８（Ａ）は、情景文字が、手前側の文字が大きく写り、奥側の文字が小さく写る、射影歪みを受けた元画像の例を示す図である。 FIG. 18 is a diagram for explaining each extreme value region secondary extraction process that seems to be a character string. FIG. 18A is a diagram illustrating an example of an original image subjected to projective distortion in which a scene character is shown with a large character on the near side and a small character on the back side.

図１８（Ｂ）は、上述した文字らしい極値領域２次抽出処理により元画像から抽出された極値領域の例を示す図である。 FIG. 18B is a diagram illustrating an example of the extreme value region extracted from the original image by the extreme value secondary extraction process that seems to be the character described above.

図１８（Ｃ）は、上述した文字列らしい各極値領域１次抽出処理により抽出された極値領域の例を示す図である。図１８（Ｂ）と比較し、図１８（Ｃ）は、文字の一部が抽出されなくなっている。 FIG. 18C is a diagram showing an example of the extreme value region extracted by each extreme value region primary extraction process that seems to be the character string described above. Compared to FIG. 18B, in FIG. 18C, a part of characters is not extracted.

図１８（Ｄ）は、文字列に含まれる各極値領域の高さ（大きさ）を算出する例を示す図である。例えば、文字列に含まれる各極値領域の半径または直径が、文字列に含まれる各極値領域の高さ（大きさ）とされる。 FIG. 18D is a diagram illustrating an example of calculating the height (size) of each extreme value region included in the character string. For example, the radius or diameter of each extreme value region included in the character string is the height (size) of each extreme value region included in the character string.

図１８（Ｅ）は、ステップＳ５０２で、文字列の基準となる複数の極値領域を選択された際の例を示す図である。図１８（Ｅ）において円５２１、５２２、５２３が外接する極値領域が、文字列の基準として選択される。 FIG. 18E is a diagram illustrating an example when a plurality of extreme value regions serving as a reference for a character string are selected in step S502. In FIG. 18E, an extreme value region circumscribed by the circles 521, 522, and 523 is selected as a character string reference.

図１８（Ｆ）は、文字列の基準を基に、文字列に含まれる各文字の位置及び大きさを推定する際の例を示す図である。図１８（Ｆ）において、円５２１、５２２、５２３に近接する２つの直線５２４、５２５に挟まれる範囲が、文字列に含まれる各文字の位置及び大きさと推定される。なお、推定部１７は、文字列の基準となる複数の極値領域の位置と大きさの変化に基づき、文字列を構成する各文字の位置と大きさを推定してもよい。そして、検出部１８は、推定された各文字の位置と大きさに基づく範囲に、各文字の極値領域と色などの特徴が近い極値領域がある場合は、当該特徴が近い極値領域を、各文字を構成する極値領域として検出する。 FIG. 18F is a diagram illustrating an example of estimating the position and size of each character included in the character string based on the character string reference. In FIG. 18F, a range between two straight lines 524 and 525 adjacent to circles 521, 522, and 523 is estimated as the position and size of each character included in the character string. Note that the estimation unit 17 may estimate the position and size of each character constituting the character string based on changes in the position and size of a plurality of extreme value regions serving as a reference for the character string. Then, if there is an extreme value region in which a feature such as a color and the extreme value region of each character is close in the range based on the estimated position and size of each character, the detection unit 18 has an extreme value region in which the feature is close. Are detected as extreme regions constituting each character.

推定部１７は、文字列領域検出部１６により検出された文字列を左右に１文字分ずつ延長した箇所における文字の位置と大きさを推定してもよい。そして、検出部１８により、延長した箇所に新たな文字らしい極値領域が検出された場合は、新たな文字が検出されなくなる箇所まで、文字の位置と大きさを推定する箇所を繰り返し延長してもよい。 The estimation unit 17 may estimate the position and size of the character at a location where the character string detected by the character string region detection unit 16 is extended by one character left and right. And when the extremum area which seems to be a new character is detected in the extended part by the detection part 18, the part which estimates the position and size of a character is repeatedly extended to the part where a new character is no longer detected. Also good.

図１８（Ｇ）は、推定された位置及び大きさから、文字列に含まれる各文字の極値領域が抽出された例を示す図である。 FIG. 18G is a diagram illustrating an example in which the extreme value region of each character included in the character string is extracted from the estimated position and size.

≪文字列の基準となる複数の極値領域を選択する処理≫
次に、ステップＳ５０２の、処理対象の配列ERline_iの要素である各極値領域のうち、文字列の基準となる複数の極値領域を選択する処理について説明する。 ≪Process to select multiple extremum areas that serve as the reference for character string≫
Next, a process of selecting a plurality of extreme value regions serving as a character string reference from among the extreme value regions that are elements of the processing target array ERline _i in step S502 will be described.

推定部１７は、処理対象の配列ERline_iの要素である各極値領域のうち、高さ（大きさ）が当該各極値領域の平均より高い（大きい）極値領域を、基準となる極値領域として選択する。これにより、処理対象の配列ERline_iの要素に、文字の小さな一部を構成する極値領域が含まれていた場合でも、文字列の基準を正しく選択できる。なお、高さが当該各極値領域の平均より高い極値領域が複数存在しない場合は、例えば、１番目と２番目に高い極値領域を、基準となる極値領域として選択してもよい。 The estimation unit 17 uses, as a reference, an extreme value region whose height (size) is higher (larger) than the average of the extreme value regions among the extreme value regions that are elements of the array ERline _{i to be} processed. Select as value area. As a result, even if the element of the array ERline _{i to be} processed includes an extreme value region that constitutes a small part of the character, the character string reference can be correctly selected. When there are not a plurality of extreme value regions whose height is higher than the average of the extreme value regions, for example, the first and second extreme value regions may be selected as reference extreme value regions. .

また、文字列が射影歪みを受けているとは限らない。そのため、以下のような処理を行うようにしてもよい。 In addition, the character string is not always subject to projective distortion. Therefore, the following processing may be performed.

推定部１７は、処理対象の配列ERline_iの要素である各極値領域のうち、最も高さ（大きさ）が高い（大きい）極値領域を基準とする。 The estimation unit 17 uses the extreme value region having the highest (large) height (size) as a reference among the extreme value regions that are elements of the array ERline _{i to be} processed.

そして、推定部１７は、上述した第１のリスト及び第２のリストに格納されている各極値領域であって、配列リストに格納されている各配列のうちのいずれの配列の要素でもない極値領域のうち、高さ（大きさ）が当該基準のものに近い極値領域を、処理対象の配列_ERline_iの要素として格納する。 The estimation unit 17 is each extreme value region stored in the first list and the second list described above, and is not an element of any array among the arrays stored in the array list. Among the extreme value areas, the extreme value area whose height (size) is close to that of the reference is stored as an element of the array _ERline _{i to be} processed.

そして、検出部１８は、上述した文字列らしい各極値領域２次抽出処理にて抽出した配列ERline_iと、配列_ERline_iの文字度合いを算出し、文字度合いが小さい方の配列を削除する。 Then, the detection unit 18 calculates the character degree of the array ERline _i and the array _ERline _i extracted in each extreme value region secondary extraction process that seems to be the character string described above, and deletes the array having the smaller character degree. .

［第２の実施形態］
第１の実施形態では、図４のステップＳ５の処理において、文字列らしいか否かを判定し、文字列らしい各極値領域を抽出する処理の一例について説明した。 [Second Embodiment]
In the first embodiment, in the process of step S5 in FIG. 4, an example of the process of determining whether or not a character string is likely and extracting each extreme value region that seems to be a character string has been described.

第２の実施形態では、図４のステップＳ５の処理において、文字列らしいか否かを判定し、文字列らしい各極値領域を抽出する処理の他の一例について説明する。 In the second embodiment, another example of the process of determining whether or not a character string appears in the process of step S5 in FIG. 4 and extracting each extreme value region likely to be a character string will be described.

なお、第２の実施形態は一部を除いて第１の実施形態と同様であるため、適宜説明を省略する。 Note that the second embodiment is the same as the first embodiment except for a part thereof, and thus description thereof will be omitted as appropriate.

≪文字列らしい各極値領域１次抽出処理≫
次に、図１９を参照し、図４のステップＳ５の、文字列らしいか否かを判定し、文字列らしい各極値領域を抽出する処理の一例について説明する。図１９は、第２の実施形態に係る文字列らしい各極値領域１次抽出処理の一例を示すフローチャートである。 ≪Each extreme value area primary extraction process that seems to be a character string≫
Next, with reference to FIG. 19, an example of processing for determining whether or not a character string is likely to be performed in step S 5 of FIG. 4 and extracting each extreme value region likely to be a character string will be described. FIG. 19 is a flowchart showing an example of each extreme value region primary extraction process that seems to be a character string according to the second embodiment.

ステップＳ１４０１で、文字列領域検出部１６は、前段のステップＳ４までの処理で抽出された文字らしい各極値領域を、文字らしさの度合いが高い順にソートする。ここで、文字列領域検出部１６は、例えば、上述の第１のリスト及び第２のリストに含まれる各極値領域について、文字度合判定部１４により判定された文字らしさの度合いが高い順にソートする。なお、第１の実施形態のステップＳ４０２についての説明と同様に、ステップＳ１４０１においては、第１のリストのみを用いてもよい。ステップＳ１４０１において第１のリスト及び第２のリストの両方を用いる場合、情景文字を抽出する精度は比較的高くなるものの、処理時間は比較的長くなる。ステップＳ１４０１において第１のリストのみを用いる場合、処理時間は比較的速くなるものの、情景文字を抽出する精度は比較的低下する。なお、文字列に１文字でも細長くない形状の極値領域が含まれていれば、後述するステップＳ２００１の処理により、当該極値領域の周辺に位置する細長い各極値領域を第２のリストから検出できるため、情景文字を抽出する精度は低下しない。 In step S1401, the character string region detection unit 16 sorts each extreme region that seems to be a character extracted in the processing up to step S4 in the preceding stage in descending order of the character likelihood. Here, the character string area detection unit 16 sorts, for example, the extreme value areas included in the first list and the second list in descending order of the character likelihood determined by the character degree determination unit 14. To do. Note that, similarly to the description of step S402 of the first embodiment, only the first list may be used in step S1401. When both the first list and the second list are used in step S1401, the accuracy of extracting scene characters is relatively high, but the processing time is relatively long. When only the first list is used in step S1401, the processing time becomes relatively fast, but the accuracy of extracting scene characters is relatively lowered. Note that if the character string includes an extreme value area having a shape that is not elongated even for one character, the process of step S2001 to be described later extracts each of the elongated extreme value areas located around the extreme value area from the second list. Since it can be detected, the accuracy of extracting scene characters is not lowered.

続いて、ステップＳ１４０２で、文字列領域検出部１６は、ソートしたリストのうち、最も文字らしさの度合いが高い極値領域を処理対象として選択する。なお、以降で、処理対象として選択された極値領域を、「処理対象の極値領域」と称する。 Subsequently, in step S1402, the character string region detection unit 16 selects an extreme value region having the highest character-likeness from the sorted list as a processing target. Hereinafter, the extreme value region selected as the processing target is referred to as “extreme value region to be processed”.

続いて、ステップＳ１４０３で、文字列領域検出部１６は、処理対象の極値領域の文字らしさの度合いが、所定の閾値よりも低いか否かを判定する。 Subsequently, in step S1403, the character string region detection unit 16 determines whether or not the degree of character character of the extreme value region to be processed is lower than a predetermined threshold value.

処理対象の極値領域の文字らしさの度合いが、所定の閾値よりも低い場合（ステップＳ１４０３：ＹＥＳ）、処理を終了する。 If the degree of character-likeness of the extreme value area to be processed is lower than a predetermined threshold (step S1403: YES), the process ends.

処理対象の極値領域の文字らしさの度合いが、所定の閾値よりも低くない場合（ステップＳ１４０３：ＮＯ）、ステップＳ１４０４で、文字列領域検出部１６は、処理対象の極値領域を含む領域（１つの極値領域、または複数の極値領域が組み合わされた領域）について、複数の角度の各々における同一の文字列を探索する処理を行う。なお、複数の角度の各々における同一の文字列を探索する処理については後述する。 If the degree of character-likeness of the extreme region to be processed is not lower than the predetermined threshold (step S1403: NO), in step S1404, the character string region detection unit 16 includes a region including the extreme region to be processed ( One extreme value area or a combination of multiple extreme value areas) is searched for the same character string at each of a plurality of angles. A process for searching for the same character string at each of a plurality of angles will be described later.

続いて、ステップＳ１４０５で、文字列領域検出部１６は、複数の角度の各々で探索した文字列のうち、文字列を構成する各文字の間の「同一の文字列らしさの度合い」が最も高いものを抽出する。ここで、文字列領域検出部１６は、抽出した文字列を、上述した配列リストに格納する。なお、各文字の間の「同一の文字列らしさの度合い」は、例えば、当該各文字の位置、高さ、横幅、ストローク幅、色等の各特徴の分散に基づいて算出してもよい。この場合、文字列領域検出部１６は、例えば、各特徴の分散値に、各特徴量に応じた重み係数を乗算した値の合計値が最も小さい文字列を、「同一の文字列らしさの度合い」が最も高い文字列と判定してもよい。 Subsequently, in step S1405, the character string region detection unit 16 has the highest “degree of likelihood of the same character string” between the characters constituting the character string among the character strings searched at each of a plurality of angles. Extract things. Here, the character string region detection unit 16 stores the extracted character strings in the above-described array list. In addition, you may calculate "the degree of likeness of the same character string" between each character based on dispersion | distribution of each characteristics, such as the position of each said character, height, horizontal width, stroke width, a color, for example. In this case, for example, the character string region detection unit 16 determines the character string having the smallest sum of the values obtained by multiplying the variance value of each feature by the weighting coefficient corresponding to each feature amount, as “the degree of likelihood of being the same character string”. "May be determined to be the highest character string.

続いて、ステップＳ１４０６で、文字列領域検出部１６は、ソートしたリストのうち、文字らしさの度合いが次に高い極値領域があるか否かを判定する。 Subsequently, in step S1406, the character string region detection unit 16 determines whether there is an extreme value region having the next highest character-likeness degree in the sorted list.

文字らしさの度合いが次に高い極値領域があれば（ステップＳ１４０６：ＹＥＳ）、ステップＳ１４０７で、文字列領域検出部１６は、文字らしさの度合いが次に高い極値領域を処理対象として選択し、上述したステップＳ１４０３の処理に進む。 If there is an extreme value region with the next highest character likelihood (step S1406: YES), in step S1407, the character string region detection unit 16 selects an extreme value region with the next highest character likelihood as a processing target. The process proceeds to step S1403 described above.

文字らしさの度合いが次に高い極値領域がなければ（ステップＳ１４０６：ＮＯ）、処理を終了する。 If there is no extreme value region with the next highest character-like degree (step S1406: NO), the process is terminated.

本実施例では、漢字（中国語の漢字も含む）、仮名（平仮名、カタカナを含む）、ハングル等における、一文字が複数の極値領域に分離されている文字も、抽出対象としている。この場合、複数の極値領域を組み合わせた領域（複合領域）についても、同一の文字列らしさの度合いを算出する対象とする。この場合、複数の極値領域の全ての組み合わせについて、同一の文字列らしさの度合いを算出すると、計算量が膨大となる。そのため、文字列領域検出部１６は、ステップＳ１４０４及びステップＳ１４０５の処理を、文字らしさの度合いが所定の閾値以上である極値領域についてのみ行うようにする。 In the present embodiment, a character in which one character is separated into a plurality of extreme value regions in kanji (including Chinese kanji), kana (including hiragana and katakana), hangul, and the like is also extracted. In this case, a region (composite region) in which a plurality of extreme value regions are combined is also a target for calculating the degree of character string identity. In this case, if the same character string degree is calculated for all combinations of a plurality of extreme value regions, the amount of calculation becomes enormous. For this reason, the character string region detection unit 16 performs the processing in steps S1404 and S1405 only for the extreme value region in which the degree of characterness is equal to or greater than a predetermined threshold value.

次に、図２０を参照し、ステップＳ１４０４の、複数の角度の各々における同一の文字列を探索する処理について説明する。図２０は、複数の角度の各々における同一の文字列を探索する処理について説明する図である。図２０では、中国語で駐車場を示す３文字の漢字が写っている画像において、１５度刻みの角度に沿って、隣接する文字の候補となる極値領域を探索する例を示している。この場合、隣接する文字の候補を探索する方向の角度を、ステップＳ１０３、ステップＳ１０７において、極値領域抽出部１２により、回転境界矩形記述子が算出された際の回転角度と揃えてもよい。すなわち、極値領域抽出部１２により、１５度刻みで回転アスペクト比等の回転境界矩形記述子が算出された場合は、文字列領域検出部１６は、１５度刻みの角度に沿って、隣接する文字の候補となる極値領域を探索する。これにより、回転境界矩形記述子を算出する過程で算出された回転境界矩形を用いて、「同一の文字列らしさの度合いを示すスコア」（以下で単に「スコア」とも称する。）を算出することができるため、算出のための処理量を低減できる。 Next, the process for searching for the same character string at each of a plurality of angles in step S1404 will be described with reference to FIG. FIG. 20 is a diagram illustrating processing for searching for the same character string at each of a plurality of angles. FIG. 20 shows an example in which an extreme value region that is a candidate for an adjacent character is searched for along an angle of 15 degrees in an image that includes three Chinese characters indicating a parking lot in Chinese. In this case, the angle in the direction of searching for a candidate for an adjacent character may be aligned with the rotation angle when the extreme boundary region descriptor 12 is calculated by the extreme value region extraction unit 12 in steps S103 and S107. In other words, when the extreme boundary region descriptor 12 calculates the rotation boundary rectangle descriptor such as the rotation aspect ratio in increments of 15 degrees, the character string region detection unit 16 is adjacent along the angle in increments of 15 degrees. Search for extreme regions that are candidates for characters. Thus, using the rotation boundary rectangle calculated in the process of calculating the rotation boundary rectangle descriptor, a “score indicating the degree of the same character string” (hereinafter also simply referred to as “score”) is calculated. Therefore, the processing amount for calculation can be reduced.

ここで、「スコア」は、例えば、比較する２つの極値領域の位置、高さ、横幅、ストローク幅、色等の各特徴量の差分に、各特徴に応じた重み係数を乗算した値の合計値により算出する。この場合、位置、高さ、横幅、ストローク幅等の各特徴は、特徴量の差分が小さい程、より同一の文字列らしいことを示すため、「スコア」の値が小さい程、同一の文字列らしい（「同一の文字列らしさの度合い」が高い）ことを示す。なお、上述の重み係数は、機械学習法等を用いて決定されてもよい。 Here, the “score” is, for example, a value obtained by multiplying the difference between the feature amounts such as the position, height, width, stroke width, color, and the like of the two extreme value regions to be compared by a weighting factor corresponding to each feature. Calculated by the total value. In this case, each feature such as position, height, width, stroke width, etc. indicates that the smaller the difference in feature amount, the more likely it is to be the same character string. Therefore, the smaller the “score” value, the more the same character string. ("Similarity of the same character string" is high). The weighting factor described above may be determined using a machine learning method or the like.

文字列領域検出部１６は、「スコア」を算出する際、比較する２つの極値領域の位置、高さ、及び横幅の差分を、回転境界矩形記述子を算出する過程で算出された回転境界矩形を用いて算出する。 When calculating the “score”, the character string region detection unit 16 calculates the rotation boundary calculated in the process of calculating the rotation boundary rectangle descriptor using the difference between the position, height, and width of the two extreme value regions to be compared. Calculate using a rectangle.

２つの極値領域の位置の差分は、比較する２つの極値領域の各々に対する回転境界矩形間の距離としてもよい。図２０の例では、一番左側の文字６０１に対する回転境界矩形６０１ａの右端と、中央の文字６０２に対する回転境界矩形６０２ａの左端との間の距離が、２つの極値領域の位置の差分とされる。 The difference between the positions of the two extreme value regions may be a distance between the rotation boundary rectangles for each of the two extreme value regions to be compared. In the example of FIG. 20, the distance between the right end of the rotation boundary rectangle 601a for the leftmost character 601 and the left end of the rotation boundary rectangle 602a for the central character 602 is the difference between the positions of the two extreme value regions. The

２つの極値領域の高さの差分、及び横幅の差分は、比較する２つの極値領域の各々に対する回転境界矩形間の高さの差分、及び横幅の差分としてもよい。図２０の例では、一番左側の文字６０１に対する回転境界矩形６０１ａの高さ及び横幅と、中央の文字６０２に対する回転境界矩形６０２ａの高さ及び横幅との間の差が、それぞれ、２つの極値領域の高さの差分、及び横幅の差分とされる。これにより、従来の、回転していない境界矩形（６０１ｂ、６０２ｂ）を用いる方式と比較して、上下方向のアライメント（整列具合）が揃っているため、同一の文字列に含まれる文字の検出精度が向上する。 The difference in height between the two extreme value regions and the difference in horizontal width may be a difference in height between the rotation boundary rectangles and a difference in horizontal width for each of the two extreme value regions to be compared. In the example of FIG. 20, the difference between the height and width of the rotation boundary rectangle 601a for the leftmost character 601 and the height and width of the rotation boundary rectangle 602a for the center character 602 are two poles, respectively. The difference between the height of the value area and the difference between the widths are set. As a result, as compared with the conventional method using the non-rotating boundary rectangles (601b, 602b), the vertical alignment (alignment) is aligned, so that the detection accuracy of characters included in the same character string is detected. Will improve.

なお、ストローク幅、及び色は、回転とは無関係な特徴である。ストローク幅は、文字の線幅である。色は、極値領域の色である。極値領域の色は、色相（ｈｕｅ）、彩度（ｓａｔｕｒａｔｉｏｎ）、輝度（ｌｕｍｉｎａｎｃｅ）よりなるＨＳＩカラースペースを用いてもよいし、ＲＧＢカラースペースを用いてもよいし、輝度等を用いてもよい。 Note that the stroke width and color are characteristics unrelated to rotation. The stroke width is the line width of the character. The color is the color of the extreme value region. For the color of the extreme value region, an HSI color space composed of hue, saturation, and luminance may be used, an RGB color space may be used, or luminance may be used. Good.

次に、図２１を参照し、ステップＳ１４０４の、同一の文字列を探索する処理について説明する。図２１は、同一の文字列を探索する処理の一例を示すフローチャートである。 Next, a process for searching for the same character string in step S1404 will be described with reference to FIG. FIG. 21 is a flowchart illustrating an example of processing for searching for the same character string.

ステップＳ２００１で、文字列領域検出部１６は、処理対象の領域に、所定の角度方向で例えば右側に隣接する文字の候補となる極値領域群を、上述した第１のリスト及び第２のリストから探索する。ここで、「処理対象の領域」とは、ステップＳ１４０４における処理対象の極値領域を含む領域であって、１つの極値領域、または複数の極値領域が組み合わされた領域である。なお、ステップＳ１４０１において第１のリストのみを用いた場合であっても、ステップＳ２００１では、第１のリストに含まれる極値領域の周辺に位置する細長い各極値領域も探索するため、第１のリスト及び第２のリストの両方を用いる。 In step S2001, the character string region detection unit 16 sets the extreme value region group that is a candidate for a character adjacent to the right side in a predetermined angular direction in the processing target region as the first list and the second list described above. Search from. Here, the “region to be processed” is a region including the extreme value region to be processed in step S1404, and is a region in which one extreme value region or a plurality of extreme value regions are combined. Even in the case where only the first list is used in step S1401, in step S2001, the long and narrow extreme value regions located around the extreme value region included in the first list are also searched. Both the first list and the second list are used.

ここで、文字列領域検出部１６は、色、位置、ストローク幅等の特徴が、処理対象の領域と近い複数の極値領域を探索する。より具体的には、文字列領域検出部１６は、処理対象の領域の特徴と、他の極値領域の特徴との差が所定の閾値未満である場合に、当該他の極値領域を、処理対象の領域に隣接する文字の候補となる極値領域群に含める。これにより、当該候補となる極値領域群に含められる極値領域の数を絞り込むことができるため、以降の計算処理量を低減できる。なお、当該所定の閾値は、機械学習法等を用いて決定されてもよい。 Here, the character string area detection unit 16 searches for a plurality of extreme value areas whose characteristics such as color, position, and stroke width are close to the area to be processed. More specifically, when the difference between the feature of the region to be processed and the feature of the other extreme value region is less than a predetermined threshold, the character string region detection unit 16 determines the other extreme value region, It is included in the extreme value region group that is a candidate for a character adjacent to the region to be processed. Thereby, since the number of extreme value regions included in the candidate extreme value region group can be narrowed down, the amount of subsequent calculation processing can be reduced. The predetermined threshold value may be determined using a machine learning method or the like.

続いて、ステップＳ２００２で、文字列領域検出部１６は、処理対象の領域と、上述した候補となる極値領域群に含まれる各極値領域との間の「同一の文字列らしさの度合いを示すスコア」を算出する。 Subsequently, in step S2002, the character string region detection unit 16 determines “the degree of likelihood of the same character string” between the region to be processed and each extreme value region included in the candidate extreme value region group described above. The score shown is calculated.

続いて、ステップＳ２００３で、文字列領域検出部１６は、上述した候補となる極値領域群に含まれる各極値領域のうち、処理対象の領域に対する「スコア」の値が最も小さい極値領域を、「隣接する文字の候補（Ｃ_ｎｅｘｔ）」として選択する。 Subsequently, in step S2003, the character string region detection unit 16 has an extreme value region having the smallest “score” value for the region to be processed among the extreme value regions included in the candidate extreme value region group described above. _Are selected as “adjacent character candidates (C _next )”.

続いて、ステップＳ２００４で、文字列領域検出部１６は、上述した候補となる極値領域群のうち、「隣接する文字の候補（Ｃ_ｎｅｘｔ）」に含まれない一の極値領域を選択する。 Subsequently, in step S2004, the character string region detection unit 16 selects one extreme value region that is not included in the “adjacent character candidate (C _next )” from the candidate extreme value region group described above. .

続いて、ステップＳ２００５で、文字列領域検出部１６は、選択した極値領域と、「隣接する文字の候補（Ｃ_ｎｅｘｔ）」とを組み合わせた領域を「Ｃ_ｎｅｘｔの候補（Ｃ_{ｃａｎｄｉｄａｔｅ}）」とし、「Ｃ_ｎｅｘｔの候補（Ｃ_{ｃａｎｄｉｄａｔｅ}）」について、処理対象の領域に対する「スコア」の値を算出する。 Subsequently, in step S2005, the character string region detection unit 16 sets a region obtained by combining the selected extreme value region and the “adjacent character candidate (C _next )” as a “C _next candidate (C _candinate )”. for _{"C next} candidate _{(C candidate)",} to calculate the value of the "score" for the region to be processed.

続いて、ステップＳ２００６で、文字列領域検出部１６は、上述した候補となる極値領域群に、未選択の極値領域があるか否かを判定する。 Subsequently, in step S2006, the character string region detection unit 16 determines whether or not there is an unselected extreme value region in the candidate extreme value region group described above.

未選択の極値領域がある場合（ステップＳ２００６：ＹＥＳ）、ステップＳ２００７で、文字列領域検出部１６は、未選択の極値領域を選択する。 When there is an unselected extreme value region (step S2006: YES), in step S2007, the character string region detection unit 16 selects an unselected extreme value region.

続いて、ステップＳ２００８で、文字列領域検出部１６は、選択した極値領域と、「隣接する文字の候補（Ｃ_ｎｅｘｔ）」とを組み合わせた領域について、処理対象の領域に対する「スコア」の値を算出する。 Subsequently, in step S2008, the character string region detection unit 16 sets the “score” value for the region to be processed for a region obtained by combining the selected extreme value region and “adjacent character candidate (C _next )”. Is calculated.

続いて、ステップＳ２００９で、文字列領域検出部１６は、ステップＳ２００８で算出した当該組み合わせた領域の「スコア」の値が、「Ｃ_ｎｅｘｔの候補（Ｃ_{ｃａｎｄｉｄａｔｅ}）」の「スコア」の値よりも小さいか否かを判定する。 Subsequently, in step S2009, the character string region detection unit 16 determines that the “score” value of the combined region calculated in step S2008 is larger than the “score” value of “C _next _candidate ” (C _candidate ). It is determined whether or not it is small.

小さくない場合（ステップＳ２００９：ＮＯ）、ステップＳ２００６の処理に進む。 If not smaller (step S2009: NO), the process proceeds to step S2006.

小さい場合（ステップＳ２００９：ＹＥＳ）、ステップＳ２０１０で、文字列領域検出部１６は、当該組み合わせた領域を、新たな「Ｃ_ｎｅｘｔの候補（Ｃ_{ｃａｎｄｉｄａｔｅ}）」とし、ステップＳ２００６の処理に進む。 If it is smaller (step S2009: YES), in step S2010, the character string region detection unit 16 sets the combined region as a new “C _next candidate (C _candidate )”, and proceeds to the processing of step S2006.

ステップＳ２００６で、未選択の極値領域がない場合（ステップＳ２００６：ＮＯ）、ステップＳ２０１１で、文字列領域検出部１６は、「Ｃ_ｎｅｘｔの候補（Ｃ_{ｃａｎｄｉｄａｔｅ}）」の「スコア」の値が、「隣接する文字の候補（Ｃ_ｎｅｘｔ）」の「スコア」の値よりも小さいか否かを判定する。 In step S2006, when there is no unselected extreme value region (step S2006: NO), in step S2011, the character string region detection unit 16 determines that the value of “score” of “C _next candidate (C _candidate )” is It is determined whether or not the value is smaller than the “score” value of “adjacent character candidate (C _next )”.

小さい場合（ステップＳ２０１１：ＹＥＳ）、ステップＳ２０１２で、文字列領域検出部１６は、「Ｃ_ｎｅｘｔの候補（Ｃ_{ｃａｎｄｉｄａｔｅ}）」を、新たな「隣接する文字の候補（Ｃ_ｎｅｘｔ）」とし、ステップＳ２００４の処理に進む。 If it is smaller (step S2011: YES), in step S2012, the character string region detection unit 16 sets “C _next candidate (C _candidate )” as a new “adjacent character candidate (C _next )”, and step S2004. Proceed to the process.

小さくない場合（ステップＳ２０１１：ＮＯ）、ステップＳ２０１３で、文字列領域検出部１６は、 If not small (step S2011: NO), in step S2013, the character string region detection unit 16

「隣接する文字の候補（Ｃ_ｎｅｘｔ）」の「スコア」の値が、所定の閾値よりも小さいか否かを判定する。なお、当該所定の閾値は、機械学習法等を用いて決定されてもよい。 It is determined whether or not the “score” value of “adjacent character candidate (C _next )” is smaller than a predetermined threshold. The predetermined threshold value may be determined using a machine learning method or the like.

所定の閾値よりも小さい場合（ステップＳ２０１３：ＹＥＳ）、ステップＳ２０１４で、文字列領域検出部１６は、処理対象の領域と、「隣接する文字の候補（Ｃ_ｎｅｘｔ）」とを、同一の文字列であると判定する。 If it is smaller than the predetermined threshold (step S2013: YES), in step S2014, the character string region detection unit 16 sets the processing target region and “adjacent character candidate (C _next )” to the same character string. It is determined that

続いて、ステップＳ２０１５で、文字列領域検出部１６は、「隣接する文字の候補（Ｃ_ｎｅｘｔ）」を、新たな「処理対象の領域」とし、上述したステップＳ２００１の処理に進む。 Subsequently, in step S2015, the character string region detection unit 16 sets “adjacent character candidate (C _next )” as a new “region to be processed”, and proceeds to the processing in step S2001 described above.

所定の閾値よりも小さくない場合（ステップＳ２０１３：ＮＯ）、処理を終了する。 If it is not smaller than the predetermined threshold (step S2013: NO), the process is terminated.

次に、図２２を参照し、図２１の同一の文字列を探索する処理について説明する。図２２は、同一の文字列を探索する処理について説明する図である。 Next, processing for searching for the same character string in FIG. 21 will be described with reference to FIG. FIG. 22 is a diagram illustrating processing for searching for the same character string.

図２２では、中国語の駐車場の看板が写っている画像の例を示す。図２２（Ａ）では、３つの中国語の漢字の文字のうち、一番左側の文字６０１及び中央の文字６０２が、ステップＳ２０１４で、同一の文字列と判定されている例を示す。ステップＳ２０１５で、文字６０１を構成する１以上の極値領域と、「隣接する文字の候補（Ｃ_ｎｅｘｔ）」である文字６０２を構成する１以上の極値領域の組み合わせた領域が、新たな「処理対象の領域」とされ、上述したステップＳ２００１の処理に進む。 In FIG. 22, the example of the image in which the signboard of the Chinese parking lot is reflected is shown. FIG. 22A shows an example in which the leftmost character 601 and the middle character 602 among the three Chinese characters are determined to be the same character string in step S2014. In step S2015, a region obtained by combining one or more extreme value regions constituting the character 601 and one or more extreme value regions constituting the character 602 that is the “adjacent character candidate (C _next )” is a new “ The process proceeds to step S2001 described above.

ステップＳ２００３で、各極値領域の、「処理対象の領域」に対する「スコア」が算出される。ここで、図２２（Ｂ）に示す、文字６０１に対する回転境界矩形６０１ａ、及び文字６０２に対する回転境界矩形６０２ａ（「第１の矩形」の一例。）と、一番右側の文字６０３の一部６０４に対する回転境界矩形６０４ａとに基づいて「スコア」が算出される。同様に、領域６０５乃至領域６０８に対する回転境界矩形６０５ａ乃至回転境界矩形６０８ａについても「スコア」が算出される。そして、「スコア」の値が最も小さい、一番右側の文字６０３の一部の領域６０５が、「隣接する文字の候補（Ｃ_ｎｅｘｔ）」とされているものとする。 In step S2003, a “score” for “region to be processed” of each extreme value region is calculated. Here, the rotation boundary rectangle 601a for the character 601 and the rotation boundary rectangle 602a for the character 602 (an example of “first rectangle”) and a part 604 of the rightmost character 603 shown in FIG. A “score” is calculated based on the rotation boundary rectangle 604a with respect to. Similarly, a “score” is calculated for the rotation boundary rectangles 605 a to 608 a for the regions 605 to 608. Further, it is assumed that a partial area 605 of the rightmost character 603 having the smallest “score” value is set as “adjacent character candidate (C _next )”.

そして、ステップＳ２００５乃至ステップＳ２０１０で、「スコア」の値が最も小さい一の極値領域である「隣接する文字の候補（Ｃ_ｎｅｘｔ）」に、他の一の極値領域を組み合わせた領域の、「処理対象の領域」に対する「スコア」が算出される。そして、当該組み合わされた領域のうち、「スコア」の値が最も小さい領域が、「Ｃ_ｎｅｘｔの候補（Ｃ_{ｃａｎｄｉｄａｔｅ}）」となる。 Then, in steps S2005 to S2010, an area obtained by combining another extreme area with the “adjacent character candidate (C _next )” that is one extreme area having the smallest “score” value. A “score” for the “region to be processed” is calculated. Of the combined regions, the region having the smallest “score” value is a “C _next _candidate ”.

ここで、図２２（Ｃ）に示す、文字６０１に対する回転境界矩形６０１ａ、及び文字６０２に対する回転境界矩形６０２ａと、領域６０５に領域６０４を組み合わせた領域６０３に対する回転境界矩形６０３ａ（「第２の矩形」の一例。）とに基づいて「スコア」が算出される。同様に、領域６０５に領域６０６乃至領域６０８を組み合わせた領域に対する回転境界矩形についても「スコア」が算出される。 Here, as shown in FIG. 22C, the rotation boundary rectangle 601a for the character 601 and the rotation boundary rectangle 602a for the character 602, and the rotation boundary rectangle 603a (“second rectangle” for the region 603 in which the region 604 is combined with the region 604 are shown. "Score" is calculated based on the above. Similarly, the “score” is also calculated for the rotation boundary rectangle for the area obtained by combining the area 605 and the areas 606 to 608.

そして、２つの極値領域を組み合わせた領域の「スコア」の値のうち、領域６０５である「隣接する文字の候補（Ｃ_ｎｅｘｔ）」に、領域６０４を組み合わせた領域の「スコア」の値が最も小さいものとすると、領域６０５と領域６０４を組み合わせた領域が、「Ｃ_ｎｅｘｔの候補（Ｃ_{ｃａｎｄｉｄａｔｅ}）」となる。 Then, among the “score” values of the region combining the two extreme regions, the “score” value of the region combining the region 604 is added to the “adjacent character candidate (C _next )” that is the region 605. Assuming that the area is the smallest, the area obtained by combining the area 605 and the area 604 is a “C _next _candidate ”.

そして、ステップＳ２０１１で、領域６０５と領域６０４を組み合わせた領域の「スコア」の値が、領域６０５である「隣接する文字の候補（Ｃ_ｎｅｘｔ）」の「スコア」の値よりも小さいとすると、ステップＳ２０１２で、領域６０５と領域６０４を組み合わせた領域が、新たな「隣接する文字の候補（Ｃ_ｎｅｘｔ）」とされる。 Then, in step S2011, if the “score” value of the region combining the region 605 and the region 604 is smaller than the “score” value of the “adjacent character candidate (C _next )” that is the region 605, In step S2012, a region obtained by combining the region 605 and the region 604 is set as a new “adjacent character candidate (C _next )”.

そして、再度のステップＳ２００５乃至ステップＳ２０１０で、領域６０５及び領域６０４に、他の一の極値領域を組み合わせた領域の、「処理対象の領域」に対する「スコア」が算出される。 Then, in steps S2005 to S2010 again, the “score” for the “region to be processed” of the region obtained by combining the region 605 and the region 604 with another extreme region is calculated.

そして、再度のステップＳ２０１１で、領域６０５及び領域６０４に、他の一の極値領域を組み合わせた領域の「スコア」が、領域６０５と領域６０４を組み合わせた領域である「隣接する文字の候補（Ｃ_ｎｅｘｔ）」の「スコア」の値よりも小さくないとする。この場合、ステップＳ２０１３で、「隣接する文字の候補（Ｃ_ｎｅｘｔ）」の「スコア」の値が所定の閾値よりも小さければ、ステップＳ２０１５で、文字６０１、文字６０２、文字６０３を構成する極値領域群を組み合わせた領域が、新たな「処理対象の領域」とされ、上述したステップＳ２００１の処理に進む。そして、図２２（Ｃ）に示す、文字６０１乃至文字６０３に対する回転境界矩形６０１ａ乃至回転境界矩形６０３ａに基づいて、所定の角度方向に隣接する文字の候補となる極値領域を探索し、再度のステップＳ２０１３で、「隣接する文字の候補（Ｃ_ｎｅｘｔ）」の「スコア」の値が所定の閾値よりも小さければ、処理を終了する。 Then, in step S2011 again, the “score” of the region obtained by combining the region 605 and the region 604 with another extreme value region is the region where the region 605 and the region 604 are combined. It is assumed that it is not smaller than the value of “score” of “C _next )”. In this case, if the “score” value of “adjacent character candidate (C _next )” is smaller than a predetermined threshold value in step S2013, in step S2015, extreme values constituting the character 601, character 602, and character 603 are formed. A region obtained by combining the region groups is set as a new “region to be processed”, and the process proceeds to step S2001 described above. Then, based on the rotation boundary rectangles 601a to 603a for the characters 601 to 603 shown in FIG. 22C, an extreme value region that is a candidate for a character adjacent in a predetermined angular direction is searched, and If the “score” value of “adjacent character candidate (C _next )” is smaller than a predetermined threshold value in step S2013, the process ends.

＜プログラムについて＞
本実施形態に係る画像処理装置１は、例えば、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等の揮発性の記憶媒体、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）等の不揮発性の記憶媒体、マウスやキーボード、ポインティングデバイス等の入力装置、画像やデータを表示する表示部、並びに外部と通信するためのインターフェースを備えたコンピュータによって構成してもよい。 <About the program>
The image processing apparatus 1 according to the present embodiment includes, for example, a volatile storage medium such as a CPU (Central Processing Unit) and a RAM (Random Access Memory), a nonvolatile storage medium such as a ROM (Read Only Memory), a mouse, You may comprise by the computer provided with input devices, such as a keyboard and a pointing device, the display part which displays an image and data, and the interface for communicating with the exterior.

その場合、画像処理装置１が有する各機能は、これらの機能を記述したプログラム（画像処理プログラム）をＣＰＵに実行させることによりそれぞれ実現可能となる。また、このプログラムは、磁気ディスク（フロッピィーディスク、ハードディスク等）、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤ等）、半導体メモリ等の記録媒体に格納して頒布することもできる。つまり、上述した各構成における処理をコンピュータ（ハードウェア）に実行させるためのプログラムを、例えば汎用のＰＣやサーバ等にそのプログラムをインストールすることにより、上述した処理を実現することができる。 In this case, each function of the image processing apparatus 1 can be realized by causing the CPU to execute a program (image processing program) describing these functions. The program can also be stored and distributed on a recording medium such as a magnetic disk (floppy disk, hard disk, etc.), optical disk (CD-ROM, DVD, etc.), semiconductor memory, or the like. That is, the above-described processing can be realized by installing a program for causing a computer (hardware) to execute the processing in each configuration described above, for example, on a general-purpose PC or server.

また、上述した実施形態における画像処理装置１の一部、または全部を、ＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）等の集積回路として実現してもよい。画像処理装置１の各機能ブロックは個別にプロセッサ化してもよいし、一部、または全部を集積してプロセッサ化してもよい。 Moreover, you may implement | achieve part or all of the image processing apparatus 1 in embodiment mentioned above as integrated circuits, such as LSI (Large Scale Integration). Each functional block of the image processing apparatus 1 may be individually made into a processor, or a part or all of them may be integrated into a processor.

なお、画像処理装置１は、例えば、取材内容の動画や、過去の放送番組の動画を蓄積した動画データベースから、動画中の画像に写っている文字を文字認識し、当該動画の検索用のタグとして、認識した文字列を当該動画に対応付けて管理するアーカイブシステムに適用することが可能である。また、画像処理装置１は、例えば、カメラで撮影した画像から文字を認識するスマートフォンに適用することも可能である。 For example, the image processing apparatus 1 recognizes characters in an image in the moving image from a moving image database in which the moving image of the coverage content or the moving image of the past broadcast program is stored, and a tag for searching for the moving image. As described above, the present invention can be applied to an archive system that manages a recognized character string in association with the moving image. The image processing apparatus 1 can also be applied to, for example, a smartphone that recognizes characters from an image captured by a camera.

＜まとめ＞
非特許文献１記載の技術では、漢字、仮名、ハングル等、複数の分離された領域により１文字が構成される言語を対象とし、情景文字を抽出する場合、抽出の精度に問題がある。非特許文献１記載の技術が対象とするアルファベットと数字では、「i」と「j」以外は一筆書きできる（一文字を構成する領域が分離していない）が、例えば日本語の「二」、「川」、「い」のような情景文字を抽出する場合、一文字が複数の極値領域に分離されている。そして、アルファベットと数字は１００種類程度であるが、例えば日本語は６千文字程度ある。さらに、情景文字が回転して写っている場合等も考慮すると、情景文字の領域を抽出する際の処理のコストは膨大になるという問題がある。 <Summary>
In the technique described in Non-Patent Document 1, there is a problem in the accuracy of extraction when scene characters are extracted for a language in which one character is composed of a plurality of separated regions such as kanji, kana, and hangul. In the alphabet and numbers targeted by the technology described in Non-Patent Document 1, one stroke can be written except for “i” and “j” (the area constituting one character is not separated). When scene characters such as “river” and “i” are extracted, one character is separated into a plurality of extreme value regions. There are about 100 alphabets and numbers. For example, Japanese has about 6,000 characters. Furthermore, when considering the case where the scene character is rotated and the like, there is a problem that the processing cost when extracting the scene character region becomes enormous.

これに対し、本実施形態によれば、回転アスペクト比特徴を用いて、極値領域が細長いか否かを判定する。そして、抽出した極値領域のツリー構造上で、文字らしく、かつ細長くない極値領域があった場合、その極値領域より下位に属する極値領域は、その文字の一部分を表す領域であるため、文字らしさを判定する対象としない。これにより、計算コストが重い特徴計算処理を削減でき、例えば回転した漢字、仮名、ハングル等の情景文字に対しても、文字領域検出処理を高速に実行できる。 On the other hand, according to the present embodiment, it is determined whether or not the extreme value region is elongated using the rotation aspect ratio feature. And if there is an extremum area that is character-like and not elongated on the tree structure of the extracted extremum area, the extremum area belonging to the lower level of the extremum area is an area that represents a part of the character. It is not a target for character character judgment. As a result, feature calculation processing with a high calculation cost can be reduced, and character area detection processing can be executed at high speed even for scene characters such as rotated kanji, kana, and hangul.

また、本実施形態では、抽出した極値領域のツリー構造上で、文字らしく、かつ細長い極値領域があった場合、その極値領域より下位に属する極値領域は、文字らしさを判定する対象とする。それにより、複数の文字がつながった極値領域が誤って検出されるのを防ぐことができる。 In the present embodiment, if there is a character-like and elongated extremal area on the tree structure of the extracted extremal area, the extremal area belonging to the lower level of the extremal area is a target for character-likeness determination. And Thereby, it is possible to prevent an extreme value region in which a plurality of characters are connected from being erroneously detected.

また、従来技術では、文字列の上側のラインと下側のラインが揃っていることを利用して文字列を見つけている。アルファベットや数字等であれば、「i」と「j」以外は一文字が一つの極値領域（連結成分、Connected-Component）であるため、上側のラインと下側のラインが一定程度揃っているか見れば、文字列が斜めに写っていても検出できる。しかしながら、例えば日本語の「カニ」のような文字列が斜めに写っている場合、「ニ」に含まれる２辺のうちの一方のみが、「カ」に隣接する文字として検出される場合がある。また、文字列が２行である場合、下の行に含まれる文字であって、例えば「お」等の複数の極値領域により構成される文字を構成する極値領域が、上の行の文字に隣接する文字として検出される場合がある。 In the prior art, a character string is found by utilizing the fact that the upper line and the lower line of the character string are aligned. For alphabets and numbers, etc., except for “i” and “j”, each character is one extreme value region (connected component), so the upper and lower lines are aligned to a certain extent. If it sees, even if a character string is reflected diagonally, it can detect. However, for example, when a character string such as Japanese “crab” is shown obliquely, only one of the two sides included in “ni” may be detected as a character adjacent to “ka”. is there. In addition, when the character string is two lines, the extreme value area that constitutes a character that is included in the lower line and is composed of a plurality of extreme value areas such as “O” is the upper line. It may be detected as a character adjacent to the character.

これに対し、本実施形態によれば、複数の角度方向に沿って、回転境界矩形を用いて同一の文字列らしさの度合いを算出することにより、同一の文字列に含まれる文字の領域を探索する。これにより、漢字、仮名、ハングル等の情景文字に対しても、文字列を検出する精度を向上させることができる。 On the other hand, according to the present embodiment, a region of characters included in the same character string is searched for by calculating the degree of likelihood of the same character string using a rotation boundary rectangle along a plurality of angular directions. To do. Thereby, it is possible to improve the accuracy of detecting a character string even for scene characters such as kanji, kana, and hangul.

なお、上述した実施形態は、漢字、仮名、ハングル等に限らず、複数の分離された領域により１文字が構成される言語の情景文字を抽出する場合に特に好適であるが、アルファベットや数字等の一文字を構成する領域が分離していない情景文字を抽出する場合にも好適である。 The above-described embodiment is not limited to kanji, kana, hangul, etc., but is particularly suitable for extracting scene characters in a language in which one character is constituted by a plurality of separated areas. It is also suitable for extracting scene characters in which the area constituting one character is not separated.

以上、図面を参照して実施形態について詳しく説明してきたが、具体的な構成は上述のものに限られることはなく、この発明の要旨を逸脱しない範囲内において様々な設計変更等をすることが可能である。 As described above, the embodiments have been described in detail with reference to the drawings. However, the specific configuration is not limited to that described above, and various design changes and the like can be made without departing from the scope of the invention. Is possible.

１画像処理装置
１１画像取得部
１２極値領域抽出部
１３算出部
１４文字度合判定部
１５除去部
１６文字列領域検出部
１７推定部
１８検出部
１９記憶部
２０出力部 DESCRIPTION OF SYMBOLS 1 Image processing apparatus 11 Image acquisition part 12 Extreme value area extraction part 13 Calculation part 14 Character degree determination part 15 Removal part 16 Character string area | region detection part 17 Estimation part 18 Detection part 19 Storage part 20 Output part

Claims

A first region included in the image, a first region in which a plurality of pixels having pixel values equal to or less than the first value are continuous, and a second region included in the image, wherein the pixel value is Based on a second region in which a plurality of pixels that are equal to or smaller than a second value greater than the first value are continuous, the extreme region in which the pixels in the first region and the second region are continuous is defined as the image. An extreme value region extraction unit that extracts from
A determination unit that determines the degree of character-likeness of the shape of the region included in the image,
The determination unit determines the degree of character of the extreme value region according to a long side and a short side of a rectangle circumscribed by the extreme value region and rotated by a predetermined angle on the plane of the image. An image processing apparatus.

The image processing apparatus according to claim 1.
The determination unit circumscribes the extreme value region, and when the aspect ratio between the long side and the short side of the rectangle rotated by a predetermined angle on the plane of the image is not within a predetermined range, the determination unit Judge the degree of character of the shape,
When the aspect ratio is within the predetermined range, the character processing degree of the shape of the first region is not determined.

The image processing apparatus according to claim 2.
A rectangle circumscribing the first region and rotated by the predetermined angle on the plane of the image, and a rectangle circumscribing the second region and rotated by the predetermined angle on the plane of the image An image processing apparatus comprising: a calculation unit that calculates the aspect ratio of a rectangle circumscribing the extreme value region and rotated by a predetermined angle on a plane of the image.

In the image processing device according to any one of claims 1 to 3,
Of the extreme value region, comprising a removal unit for removing the region of the side of the three-dimensional character,
The said determination part determines the degree of character likeness of the shape of the extreme value area | region from which the area | region of the side surface of the solid character was removed, The image processing apparatus characterized by the above-mentioned.

The image processing apparatus according to claim 4.
The removing unit divides each pixel included in the extreme value region into a plurality of groups based on the value of each pixel included in the extreme value region, and removes a group including a shadow portion of a three-dimensional character.
An image processing apparatus.

In the image processing device according to any one of claims 1 to 5,
A first rectangle circumscribing the first extremum region and rotated by the predetermined angle on the plane of the image, and a second rectangle circumscribing the second extremum region and the plane of the image by the predetermined angle An image processing apparatus comprising: a character string region detection unit that detects the second extreme value region, which is the same character string as the first extreme value region, based on the rotated second rectangle.

The image processing apparatus according to claim 6.
The character string area detection unit
Based on at least one of a distance difference, a height difference, and a width difference between the first rectangle and the second rectangle, the first extreme value region, and the second rectangle An image processing apparatus that determines whether or not an extreme value region is the same character string.

In the image processing device according to any one of claims 1 to 7,
An estimation unit for estimating the position and size of each character included in the character string based on the position and size of each extreme value region included in the character string in the image;
From a range according to the position and size of each character estimated by the estimation unit, a detection unit for detecting an extreme value region included in each character included in the character string;
Providing
An image processing apparatus.

The image processing apparatus according to claim 8.
The estimation unit is configured to determine a position of each character included in the character string based on positions and sizes of a plurality of extreme value regions larger than an average of the extreme value regions among the extreme value regions included in the character string. And estimating the size,
An image processing apparatus.

Computer
An image processing program that functions as the image processing apparatus according to claim 1.