JP2017117340A

JP2017117340A - Character attribute estimation device and character attribute estimation program

Info

Publication number: JP2017117340A
Application number: JP2015254409A
Authority: JP
Inventors: 伶遠藤; Rei Endo; 吉彦河合; Yoshihiko Kawai; 住吉　英樹; Hideki Sumiyoshi; 英樹住吉; 貴裕望月; Takahiro Mochizuki
Original assignee: Nippon Hoso Kyokai NHK
Current assignee: Japan Broadcasting Corp
Priority date: 2015-12-25
Filing date: 2015-12-25
Publication date: 2017-06-29
Anticipated expiration: 2035-12-25
Also published as: JP6609181B2

Abstract

PROBLEM TO BE SOLVED: To accurately estimate the attribute of characters in a scene image on the basis of information other than the characters.SOLUTION: A character attribute estimation device comprises: a character area detection part that detects an area of characters from an image; an object area detection part that detects, of pixels adjacent to the area of characters detected by the character area detection part, a pixel satisfying a condition according to at least part of a pixel value, as an area of an object on which the characters are described; and an attribute estimation part that performs machine learning of an image group according to the attribute of the details represented by the characters as training data, and estimates the attribute of the details represented by the characters from a result of the machine learning and the shape of the area of the object detected by the object area detection part.SELECTED DRAWING: Figure 2

Description

本発明は、文字属性推定装置、及び文字属性推定プログラムに関する。 The present invention relates to a character attribute estimation device and a character attribute estimation program.

従来、カメラで撮影した画像から、当該画像に写っている文字を認識する技術が知られている。一般に、文字を認識する場合には、まず画像から文字領域を抽出し、次に文字の形状や色等のパターンに基づいて、文字を認識する（例えば、特許文献１〜３参照）。 2. Description of the Related Art Conventionally, a technique for recognizing characters in an image captured by a camera is known. In general, when recognizing a character, a character region is first extracted from an image, and then a character is recognized based on a pattern such as the shape and color of the character (see, for example, Patent Documents 1 to 3).

なお、上述した手法では、画像に写っている文字の形状や色等を完全な状態で抽出することができず、情報の一部が欠けたり、ノイズが混入したりする。そのため、従来では、予め作成しておいた辞書を用いて補正することで、文字列を認識した際に、一部の文字の認識誤りを修正する手法が存在する（例えば、特許文献４、特許文献５参照）。 In the above-described method, the shape, color, and the like of characters in the image cannot be extracted in a complete state, and part of information is missing or noise is mixed. Therefore, conventionally, there is a method of correcting recognition errors of some characters when a character string is recognized by correcting using a dictionary created in advance (for example, Patent Document 4, Patent) Reference 5).

特開２００１−３３１８０３号公報JP 2001-331803 A 特開２００５−０１８１７６号公報JP 2005-018176 A 特開２００２−２３０５５１号公報JP 2002-230551 A 特公平０４−０７４７５６号公報Japanese Patent Publication No. 04-074756 特公平０７−０６６４２３号公報Japanese Patent Publication No. 07-066423

しかしながら、従来技術において、精度の高い文字認識を行うために、文字以外の情報に基づいて文字の属性を推定する手法は存在しなかった。 However, in the prior art, there is no method for estimating character attributes based on information other than characters in order to perform highly accurate character recognition.

そこで、本発明は、情景画像中の文字の属性を、文字以外の情報に基づいて高精度で推定できる文字属性推定装置、及び文字属性推定プログラムを提供することを目的とする。 Therefore, an object of the present invention is to provide a character attribute estimation device and a character attribute estimation program capable of estimating the attribute of a character in a scene image with high accuracy based on information other than characters.

本発明の一つの態様では、文字属性推定装置において、画像から文字の領域を検出する文字領域検出部と、前記文字領域検出部により検出された前記文字の領域に隣接する画素のうちの少なくとも一部の画素値に応じた条件を満たす画素を、前記文字が記載された物体の領域として検出する物体領域検出部と、前記文字が表す内容の属性に応じた画像群をトレーニングデータとして機械学習し、前記機械学習の結果、及び前記物体領域検出部により検出された前記物体の領域の形状から、前記文字が表す内容の属性を推定する属性推定部と、を備える。 In one aspect of the present invention, in the character attribute estimation device, at least one of a character region detection unit that detects a character region from an image and pixels adjacent to the character region detected by the character region detection unit. Machine learning using, as training data, an object region detection unit that detects a pixel that satisfies a condition corresponding to a pixel value of a part as an object region in which the character is described, and an image group according to the attribute of the content represented by the character An attribute estimation unit that estimates an attribute of the content represented by the character from the result of the machine learning and the shape of the region of the object detected by the object region detection unit.

また、本発明の一つの態様では、文字属性推定装置において、画像から文字の領域を検出する文字領域検出部と、前記文字領域検出部により検出された前記文字の領域に隣接する画素のうちの少なくとも一部の画素値に応じた条件を満たす画素を、前記文字が記載された物体の領域として検出する物体領域検出部と、物体の種別に応じた画像群をトレーニングデータとして機械学習し、前記機械学習の結果、及び前記物体領域検出部により検出された前記物体の領域の形状から、前記物体の種別を判別する物体種別判定部と、前記物体種別判定部により判定された前記物体の種別から前記文字が表す内容の属性を推定する属性推定部と、を備える。 In one aspect of the present invention, in the character attribute estimation device, a character region detection unit that detects a character region from an image, and a pixel adjacent to the character region detected by the character region detection unit. An object region detection unit that detects pixels satisfying a condition corresponding to at least a part of pixel values as an object region in which the characters are described, and machine learning an image group according to the type of the object as training data, From the result of machine learning and the shape of the region of the object detected by the object region detection unit, the object type determination unit for determining the type of the object, and the type of the object determined by the object type determination unit An attribute estimation unit that estimates an attribute of content represented by the character.

開示の技術によれば、情景画像中の文字の属性を、文字以外の情報に基づいて高精度で推定することが可能となる。 According to the disclosed technology, it is possible to estimate the attribute of a character in a scene image with high accuracy based on information other than the character.

本実施形態に係る文字属性推定装置の概要を説明するための図である。It is a figure for demonstrating the outline | summary of the character attribute estimation apparatus which concerns on this embodiment. 本実施形態に係る文字属性推定装置の機能構成を示す機能ブロック図である。It is a functional block diagram which shows the function structure of the character attribute estimation apparatus which concerns on this embodiment. 属性推定テーブルの一例を示す図である。It is a figure which shows an example of an attribute estimation table. 情景画像中の文字の属性を推定する処理の一例を示すフローチャートである。It is a flowchart which shows an example of the process which estimates the attribute of the character in a scene image. 物体領域検出処理の一例を示すフローチャートである。It is a flowchart which shows an example of an object area | region detection process. 物体領域検出処理を説明する図である。It is a figure explaining an object area | region detection process. 属性推定処理の一例を示すフローチャートである。It is a flowchart which shows an example of an attribute estimation process. 属性推定処理を説明する図である。It is a figure explaining an attribute estimation process. 文字属性推定装置の機能構成の変形例を示す機能ブロック図である。It is a functional block diagram which shows the modification of the function structure of a character attribute estimation apparatus. 情景画像中の文字の属性を推定する処理の変形例を示すフローチャートである。It is a flowchart which shows the modification of the process which estimates the attribute of the character in a scene image.

以下、図面を参照しながら本発明の実施形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

＜文字属性推定装置の概要＞
図１は、本実施形態に係る文字属性推定装置の概要を説明するための図である。図１に示す文字属性推定装置１は、画像（情景画像２）を入力し、当該画像中に写っている１以上の文字の属性を推定する。なお、「文字」には、例えば、漢字、かな、英数字等に限らず、絵文字、記号、図形等、所定の情報を伝達するために記載されている情報を含む。ここで、「文字の属性」とは、例えば、「人名」、「地名」、「会社名（組織名）」等の文字が表す内容を示す情報であるが、上記の例に限定されるものではない。 <Outline of character attribute estimation device>
FIG. 1 is a diagram for explaining the outline of the character attribute estimation apparatus according to the present embodiment. A character attribute estimation apparatus 1 shown in FIG. 1 inputs an image (scene image 2), and estimates the attribute of one or more characters appearing in the image. The “character” includes, for example, information described for transmitting predetermined information such as a pictograph, a symbol, and a graphic as well as kanji, kana, and alphanumeric characters. Here, the “character attribute” is information indicating contents represented by characters such as “person name”, “place name”, “company name (organization name)”, but is limited to the above example. is not.

情景画像２中に写っている文字３は、何らかの物体４（例えば名札や看板）に記載されたものであり、文字３と物体４とには何らかの関連があることが期待できる。例えば、「名札の中央部付近に大きく記載されている文字の属性は、人名であることが多い」といった関連である。 The character 3 shown in the scene image 2 is written on some object 4 (for example, a name tag or a signboard), and it can be expected that the character 3 and the object 4 have some relationship. For example, there is a relation such as “character attributes that are largely described near the center of a name tag are often personal names”.

本実施形態に係る文字属性推定装置１は、情景画像２中に写っている文字３の画素の集合である文字領域を検出し、その文字領域の周辺の情報を利用して、文字３の属性を推定する。これにより、情景画像中の文字の属性を、高精度で推定できる。また、本実施形態では、文字の属性を出力するだけでなく、文字の属性を用いて認識した文字データを出力してもよい。 The character attribute estimation device 1 according to the present embodiment detects a character area that is a set of pixels of the character 3 shown in the scene image 2, and uses information around the character area to determine the attribute of the character 3 Is estimated. Thereby, the attribute of the character in a scene image can be estimated with high precision. In the present embodiment, not only the character attributes but also the character data recognized using the character attributes may be output.

＜文字属性推定装置の機能構成＞
次に、本実施形態に係る文字属性推定装置の機能構成について、図を用いて説明する。図２は、本実施形態に係る文字属性推定装置の機能構成を示す機能ブロック図である。 <Functional configuration of character attribute estimation device>
Next, the functional configuration of the character attribute estimation apparatus according to the present embodiment will be described with reference to the drawings. FIG. 2 is a functional block diagram showing a functional configuration of the character attribute estimation apparatus according to the present embodiment.

文字属性推定装置１は、画像取得部１１と、文字領域検出部１２と、物体領域検出部１３と、物体種別判定部１４と、属性推定部１５と、辞書選択部１６と、文字認識部１７とを有する。 The character attribute estimation device 1 includes an image acquisition unit 11, a character region detection unit 12, an object region detection unit 13, an object type determination unit 14, an attribute estimation unit 15, a dictionary selection unit 16, and a character recognition unit 17. And have.

画像取得部１１は、文字が写っている画像（情景画像）データを取得する。画像取得部１１は、例えば、カメラやビデオカメラ等の撮像装置で撮影した画像でもよく、または予め蓄積されている画像でもよい。また、上述した撮像装置は、文字属性推定装置１の内部に設けられていてもよく、外部に設けられていてもよい。また、上述した画像は、静止画でもよく、動画に含まれる画像フレームでもよい。 The image acquisition unit 11 acquires image (scene image) data including characters. For example, the image acquisition unit 11 may be an image taken by an imaging device such as a camera or a video camera, or may be an image stored in advance. Moreover, the imaging device mentioned above may be provided inside the character attribute estimation apparatus 1, and may be provided outside. Further, the above-described image may be a still image or an image frame included in a moving image.

文字領域検出部１２は、画像取得部１１により取得された画像から、文字が写っている領域である文字領域を検出する。なお、文字領域は、例えば、白地の看板等に黒い文字が写っている場合は、当該黒い部分の領域である。文字領域の検出は、近い特徴を有する画素集合の色や形状、エッジの形状等に基づいて検出することができるが、これに限定されるものではない。なお、文字領域は、例えば、当該文字を構成する各画素（座標）の集合により表すことができる。 The character region detection unit 12 detects a character region that is a region in which characters are captured from the image acquired by the image acquisition unit 11. For example, when a black character appears on a white signboard or the like, the character region is an area of the black portion. The detection of the character region can be detected based on the color and shape of a pixel set having similar characteristics, the shape of an edge, and the like, but is not limited thereto. The character area can be represented by, for example, a set of pixels (coordinates) constituting the character.

物体領域検出部１３は、文字領域検出部により検出された文字領域に基づいて、画像取得部１１により取得された画像から、文字が記載された物体の領域である物体領域を検出する物体領域検出処理を行う。物体領域検出部１３は、物体領域検出処理において、文字領域検出部１２により検出された文字の領域に隣接する画素のうちの少なくとも一部の画素を物体領域の初期範囲とし、この一部画素の外側にある画素について所定の条件を満たす画素を、物体領域として検出する。 The object region detection unit 13 detects an object region that is an object region in which a character is described from an image acquired by the image acquisition unit 11 based on the character region detected by the character region detection unit. Process. In the object region detection process, the object region detection unit 13 sets at least some of the pixels adjacent to the character region detected by the character region detection unit 12 as an initial range of the object region. A pixel that satisfies a predetermined condition with respect to a pixel outside is detected as an object region.

すなわち、物体領域検出部１３は、物体領域検出処理において、文字領域検出部１２により検出された文字領域に隣接する画素のうちの少なくとも一部を、物体領域の初期範囲として設定する。また、物体領域検出部１３は、物体領域に隣接し、物体領域の画素値に応じた条件を満たす画素の有無を判断し、条件を満たす画素を、順次物体領域に含める。物体領域検出部１３は、上述した処理を物体領域の初期範囲を基準に外側に順次行うことで、物体領域の拡張等を行い、最終的な物体領域を検出する。物体領域検出処理の詳細は後述する。なお、物体領域検出部１３は、周知のパターン認識等の画像処理技術を用いて、画像中から、所定の物体領域を検出してもよい。 That is, the object region detection unit 13 sets at least a part of pixels adjacent to the character region detected by the character region detection unit 12 as an initial range of the object region in the object region detection process. In addition, the object region detection unit 13 determines whether there is a pixel that is adjacent to the object region and satisfies the condition corresponding to the pixel value of the object region, and sequentially includes the pixels that satisfy the condition in the object region. The object region detection unit 13 performs the above-described processing sequentially outward with respect to the initial range of the object region, thereby expanding the object region and detecting the final object region. Details of the object region detection process will be described later. Note that the object region detection unit 13 may detect a predetermined object region from the image using a known image processing technique such as pattern recognition.

物体種別判定部１４は、物体領域検出部１３により検出された物体領域に基づいて、文字が記載されている物体の種別を判定する。物体種別判定部１４は、予め、検出対象の物体の種別に応じた画像群をトレーニングデータとして機械学習しておく。そして、物体種別判定部１４は、当該機械学習の結果、及び入力された物体領域の形状から、文字が記載されている物体の種別を判定する。また、物体種別判定部１４は、物体領域の色を検出し、入力された物体領域の色もあわせて、文字が記載されている物体の種別を判定する。 The object type determination unit 14 determines the type of an object on which characters are described based on the object region detected by the object region detection unit 13. The object type determination unit 14 performs machine learning in advance using an image group corresponding to the type of the object to be detected as training data. Then, the object type determination unit 14 determines the type of the object on which the character is described from the result of the machine learning and the shape of the input object region. In addition, the object type determination unit 14 detects the color of the object area, and also determines the type of the object on which the character is written, together with the color of the input object area.

属性推定部１５は、物体種別判定部１４により判定された物体の種別から、文字領域検出部１２により検出された文字領域の文字が表す内容の属性を推定する。属性推定部１５は、物体領域における文字の位置、文字の範囲、属性の推定対象である文字の他の文字に対する相対的な大きさ、文字の数、及び文字の背景の色を検出する。属性推定部１５は、例えば予め設定された属性推定テーブルを用いて、物体の種別、物体領域における文字の位置、文字の範囲、属性の推定対象である文字の他の文字に対する相対的な大きさ、文字の数、及び文字の背景の色の少なくとも一つに対応させて、物体に記載されている文字の属性（カテゴリ）を推定する属性推定処理を行う。これにより、属性推定部１５は、例えば文字の属性に基づいて、文字を文字認識するための辞書（単語辞書）を適切に選択することができる。また、属性推定部１５は、上述した属性推定処理により、文字に対する１以上の属性を推定した後、その属性情報を出力してもよい。なお、属性推定処理の詳細は後述する。 The attribute estimation unit 15 estimates the attribute of the content represented by the character in the character area detected by the character area detection unit 12 from the type of the object determined by the object type determination unit 14. The attribute estimation unit 15 detects the position of the character in the object region, the range of the character, the relative size of the character whose attribute is to be estimated, the number of characters, and the color of the background of the character. The attribute estimation unit 15 uses, for example, a preset attribute estimation table to determine the type of the object, the position of the character in the object region, the character range, and the relative size of the character whose attribute is to be estimated with respect to other characters. Attribute estimation processing for estimating the attribute (category) of the character described in the object is performed in correspondence with at least one of the number of characters and the background color of the character. Thereby, the attribute estimation part 15 can select appropriately the dictionary (word dictionary) for character recognition based on the attribute of a character, for example. Moreover, the attribute estimation part 15 may output the attribute information, after estimating the 1 or more attribute with respect to a character by the attribute estimation process mentioned above. Details of the attribute estimation process will be described later.

辞書選択部１６は、属性推定部１５により推定された文字の属性に応じた辞書を選択する。例えば、文字の属性が「人名」であれば、人名として用いられやすい文字が登録されている辞書を選択する。 The dictionary selection unit 16 selects a dictionary corresponding to the character attribute estimated by the attribute estimation unit 15. For example, if the character attribute is “person name”, a dictionary in which characters that are easily used as a person name are registered is selected.

文字認識部１７は、辞書選択部１６により選択された辞書のデータに基づいて、文字領域検出部１２に検出された文字領域の文字を認識する。例えば、辞書に登録されている各文字と、検出された文字領域とのパターンマッチングを行い、検出された文字領域の文字を、辞書に登録されている各文字のうち、類似度が最も高い文字であると認識する。 The character recognition unit 17 recognizes characters in the character region detected by the character region detection unit 12 based on the dictionary data selected by the dictionary selection unit 16. For example, pattern matching between each character registered in the dictionary and the detected character region is performed, and the character in the detected character region is the character with the highest similarity among the characters registered in the dictionary. Recognize that

なお、図２に示す文字属性推定装置１は、画像等に含まれる文字の属性を推定した後、その推定結果を用いて文字の認識まで行う構成としたが、これに限定されるものではなく、例えば文字の属性を出力するまでの構成でもよい。その場合、文字属性推定装置１は、上述した画像取得部１１と、文字領域検出部１２と、物体領域検出部１３と、物体種別判定部１４と、属性推定部１５とを有し、属性推定部１５から得られる文字属性情報が出力される。 The character attribute estimation apparatus 1 shown in FIG. 2 is configured to perform character recognition using the estimation result after estimating the attribute of the character included in the image or the like, but is not limited to this. For example, a configuration up to outputting a character attribute may be used. In that case, the character attribute estimation device 1 includes the image acquisition unit 11, the character region detection unit 12, the object region detection unit 13, the object type determination unit 14, and the attribute estimation unit 15 described above. Character attribute information obtained from the unit 15 is output.

＜属性推定テーブルの一例＞
図３は、属性推定テーブルの一例を示す図である。属性推定テーブルの項目の一例としては、例えば、「物体の種別」、「文字の位置または範囲」、「他の文字に対する相対的な大きさ」、「文字の数」、「文字の背景の色」、「文字の属性」等があるが、これに限定されるものではない。 <Example of attribute estimation table>
FIG. 3 is a diagram illustrating an example of the attribute estimation table. Examples of items in the attribute estimation table include, for example, “object type”, “character position or range”, “relative size relative to other characters”, “number of characters”, “character background color” ”And“ character attributes ”, but are not limited thereto.

「物体の種別」は、物体種別判定部１４により判定された、文字が記載されている物体の種別である。 The “object type” is the type of the object in which characters are described, determined by the object type determination unit 14.

「文字の位置または文字の範囲」は、文字が記載されている物体の領域における、文字の位置または文字の範囲である。「文字の位置」としては、例えば、「上部」、「中央部」、「下部」等を設定する。また、本実施形態では、各「物体の種別」の「文字の位置」毎に、当該位置と判定する所定の条件が設定されていてもよい。例えば、「物体の種別」が「名札」で、「文字の位置」が「上部」であると判定する条件は、「文字領域の中心が、物体領域を垂直方向に３等分した場合の一番の上の領域に位置する場合」等であるが、条件についてはこれに限定されるものではない。 “Character position or character range” is the character position or character range in the region of the object in which the character is described. As the “character position”, for example, “upper part”, “central part”, “lower part” and the like are set. In the present embodiment, a predetermined condition for determining the position may be set for each “character position” of each “object type”. For example, the condition for determining that the “object type” is “name tag” and the “character position” is “upper” is that “the center of the character area divides the object area into three equal parts in the vertical direction. In this case, the condition is not limited to this.

また、「文字の範囲」としては、例えば、「全面」等を設定する。また、本実施形態では、各「物体の種別」の「文字の範囲」毎に、当該範囲と判定する所定の条件が設定されていてもよい。例えば、「物体の種別」が「看板」で「文字の範囲」が「全面」であると判定する条件は、「物体に記載されている全ての文字領域を含む範囲（例えば、各の文字領域を含む面積最少の楕円（なお、図８の６２０、６２１のような、矩形の左右を直線に代えてそれぞれ半円で構成した図形のことも、本明細書では便宜的に「楕円」と称する）または矩形）の面積が、物体領域の面積の例えば８０％以上の場合」等であるが、条件についてはこれに限定されるものではない。 Further, as the “character range”, for example, “entire surface” or the like is set. In the present embodiment, a predetermined condition for determining the range may be set for each “character range” of each “object type”. For example, the condition for determining that “object type” is “signboard” and “character range” is “entire surface” is “a range including all character regions described in the object (for example, each character region An ellipse with the smallest area including the shape (a figure such as 620 and 621 in FIG. 8 in which the left and right sides of the rectangle are replaced with straight lines and each formed by a semicircle is also referred to as an “ellipse” for the sake of convenience. ) Or rectangle) is, for example, 80% or more of the area of the object region. However, the conditions are not limited thereto.

「他の文字に対する相対的な大きさ」は、例えば、ある物体の領域に複数の文字が記載されている場合に、文字の属性の判定対象である文字の大きさを、その領域内の他の文字に対する相対的な大きさの情報として表したものである。「他の文字に対する相対的な大きさ」としては、例えば、「大」、「中」、「小」、「均等」等を設定する。また、本実施形態では、各「物体の種別」の、「他の文字に対する相対的な大きさ」毎に、当該大きさと判定する所定の条件が設定されていてもよい。例えば、「物体の種別」が「名札」で「他の文字に対する相対的な大きさ」が「大」であると判定する条件は、「一のグループに含まれる円の大きさと、他のグループに含まれる円の大きさの比が所定値以上である場合」等であるが、条件についてはこれに限定されるものではない。 “Relative size with respect to other characters” refers to, for example, when a plurality of characters are described in a region of an object, the size of the character that is the object of character attribute determination, It is expressed as relative size information for the characters. As “relative size with respect to other characters”, for example, “large”, “medium”, “small”, “equal”, etc. are set. In the present embodiment, for each “object type”, a predetermined condition for determining the size may be set for each “relative size with respect to other characters”. For example, the condition for determining that “type of object” is “name tag” and “relative size with respect to other characters” is “large” is that the size of a circle included in one group and the other group Is a case where the ratio of the sizes of the circles included in is equal to or greater than a predetermined value, etc., but the conditions are not limited thereto.

「文字の数」は、物体領域に含まれる文字領域の数である。「文字の背景の色」は、物体領域における文字領域以外の部分の色である。 The “number of characters” is the number of character regions included in the object region. The “character background color” is the color of the part other than the character area in the object area.

図３の例では、物体の属性が「名札」に対応付けて、文字の属性として「人名」と「会社名」が記憶されている。「看板」に対応付けて、「会社名」が記憶されている。これは、例えば、ニュース映像を記録する場合、最初に建物や会議室の前に設置されている看板が撮影されている場合のためである。また、「ネームプレート」に対応付けて、「人名」が記憶されている。これは、例えば、記者会見の映像を記録する場合、最初に机に置かれているネームプレートが撮影されている場合のためである。 In the example of FIG. 3, the attribute of the object is associated with “name tag”, and “person name” and “company name” are stored as character attributes. “Company name” is stored in association with “signboard”. This is because, for example, when recording a news video, a signboard installed in front of a building or conference room is first photographed. In addition, “person name” is stored in association with “name plate”. This is because, for example, when recording a video of a press conference, a name plate placed on a desk is first photographed.

＜情景画像中の文字の属性を推定する処理＞
次に、図４を参照して、情景画像中の文字の属性を推定する処理について説明する。図４は、情景画像中の文字の属性を推定する処理の一例を示すフローチャートである。 <Process for estimating character attributes in a scene image>
Next, with reference to FIG. 4, a process for estimating the attributes of characters in a scene image will be described. FIG. 4 is a flowchart illustrating an example of processing for estimating the attributes of characters in a scene image.

画像取得部１１は、情景画像を取得する（ステップＳ１０１）。 The image acquisition unit 11 acquires a scene image (step S101).

文字領域検出部１２は、取得された情景画像から、文字が写っている領域である文字領域を検出する（ステップＳ１０２）。 The character region detection unit 12 detects a character region that is a region where characters are captured from the acquired scene image (step S102).

物体領域検出部１３は、検出した文字領域に基づいて、文字が記載されている物体の領域である物体領域を検出する物体領域検出処理を行う（ステップＳ１０３）。 Based on the detected character area, the object area detection unit 13 performs an object area detection process for detecting an object area that is an object area in which characters are described (step S103).

物体種別判定部１４は、物体領域検出部１３により検出された物体領域に基づいて、文字が記載されている物体の種別を判定する（ステップＳ１０４）。 The object type determination unit 14 determines the type of the object on which the character is written based on the object region detected by the object region detection unit 13 (step S104).

属性推定部１５は、判定された物体の種別に基づき、当該物体に記載されている文字の属性を推定する属性推定処理を行う（ステップＳ１０５）。 The attribute estimation unit 15 performs attribute estimation processing for estimating the attribute of the character described in the object based on the determined type of the object (step S105).

＜物体領域検出処理＞
次に、図５を参照して、図４のステップＳ１０３の、物体領域検出部１３による、文字が記載されている物体の領域である物体領域を検出する物体領域検出処理の詳細について説明する。図５は、物体領域検出処理の一例を示すフローチャートである。 <Object area detection processing>
Next, with reference to FIG. 5, the details of the object region detection process for detecting the object region, which is the region of the object in which the characters are described, by the object region detection unit 13 in step S103 of FIG. 4 will be described. FIG. 5 is a flowchart illustrating an example of the object region detection process.

物体領域検出部１３は、文字領域の外縁部の画素値を基に、物体領域の初期範囲を決定する（ステップＳ２０１）。 The object area detection unit 13 determines the initial range of the object area based on the pixel values at the outer edge of the character area (step S201).

次に、物体領域検出部１３は、物体領域に隣接する画素のうち、未選択の一の画素を物体領域の候補として選択する（ステップＳ２０２）。次に、物体領域検出部１３は、選択した画素が、物体領域に含まれる画素群に近い特徴を有するか判断する（ステップＳ２０３）。 Next, the object region detection unit 13 selects one unselected pixel as a candidate for the object region among the pixels adjacent to the object region (step S202). Next, the object region detection unit 13 determines whether the selected pixel has a feature close to a pixel group included in the object region (step S203).

近い特徴を有しない場合は（ステップＳ２０３でＮＯ）、後述するステップＳ２０５の処理に進む。また、近い特徴を有する場合は（ステップＳ２０３でＹＥＳ）、当該画素を、物体領域に含めることにより、物体領域を順次拡張する（ステップＳ２０４）。 If there is no close feature (NO in step S203), the process proceeds to step S205 described later. In addition, when it has a close feature (YES in step S203), the object region is sequentially expanded by including the pixel in the object region (step S204).

物体領域検出部１３は、物体領域に隣接する画素のうち、物体領域の候補として未選択の画素があるか判断する（ステップＳ２０５）。 The object region detection unit 13 determines whether there is an unselected pixel as a candidate for the object region among the pixels adjacent to the object region (step S205).

物体領域検出部１３は、未選択の画素がなければ（ステップＳ２０５でＮＯ）、処理を終了する。また、物体領域検出部１３は、未選択の画素があれば（ステップＳ２０５でＹＥＳ）、ステップＳ２０２の処理に戻る。 If there is no unselected pixel (NO in step S205), the object region detection unit 13 ends the process. If there is an unselected pixel (YES in step S205), the object region detection unit 13 returns to the process in step S202.

＜物体領域検出の一例＞
次に、図６を参照して、本実施形態に係る物体領域検出の一例について説明する。図６は、物体領域検出処理を説明する図である。 <Example of object area detection>
Next, an example of object region detection according to the present embodiment will be described with reference to FIG. FIG. 6 is a diagram for explaining the object region detection process.

図６（Ａ）は、情景画像に含まれる、「１」という１文字が記載された物体の領域の例を示している。図６（Ａ）において、文字領域６０１に含まれる各画素の画素値は、例えば輝度が約０であるとする。領域６０２は、文字領域６０１に接する各画素に近い特徴を有する領域である。領域６０２に含まれる各画素の画素値は、例えば輝度が約２００であるとする。領域６０３は、文字領域６０１に接する各画素と近い特徴を有しない領域である。領域６０３の各画素の画素値は、例えば輝度が約１２５であるとする。 FIG. 6A shows an example of an object region in which one character “1” is included, which is included in the scene image. In FIG. 6A, the pixel value of each pixel included in the character area 601 is assumed to have a luminance of about 0, for example. A region 602 is a region having characteristics close to each pixel in contact with the character region 601. The pixel value of each pixel included in the area 602 is assumed to have a luminance of about 200, for example. The region 603 is a region that does not have a feature close to each pixel in contact with the character region 601. The pixel value of each pixel in the region 603 is assumed to have a luminance of about 125, for example.

物体領域検出処理により、文字領域６０１と、領域６０２を含む領域が、物体領域として検出される。 By the object region detection process, a character region 601 and a region including the region 602 are detected as object regions.

図６（Ｂ）は、図５のステップＳ２０１にて、文字領域６０１に接する各画素である外縁部の値を基に、物体領域の初期範囲を決定する例を示す図である。図６（Ｂ）の例では、文字領域６０１の外縁部の全ての画素を、物体領域の初期範囲６０４ｂとして決定する。なお、例えば、文字領域の外縁部の画素の中から、それらの平均値や中央値に最も近い値の画素を選ぶ等、外縁部の一部の画素を初期範囲として決定する構成としてもよい。 FIG. 6B is a diagram illustrating an example in which the initial range of the object region is determined based on the value of the outer edge that is each pixel in contact with the character region 601 in step S201 of FIG. In the example of FIG. 6B, all the pixels at the outer edge of the character area 601 are determined as the initial range 604b of the object area. In addition, for example, a configuration may be adopted in which a part of the pixels in the outer edge portion is determined as the initial range, such as selecting a pixel having a value closest to the average value or median value from the pixels in the outer edge portion of the character area.

次に、図５のステップＳ２０４にて、物体領域を順次拡張する。まず、物体領域の初期範囲６０４ｂに接する各画素（図６（Ｂ）における「×」を付加されている各画素）を選択する。そして、選択した各画素のうち、例えば、現在の物体領域に最も近い特徴を有する画素を、物体領域に含める処理を順次繰り返す。その場合、例えば、現在の物体領域に含まれる画素の平均値や中央値に最も近い画素を物体領域に含める処理を繰り返す。なお、図６（Ｃ）は、文字領域６０１の左下付近の画素が、現在の物体領域に含まれる画素の平均値や中央値に比較的近いため、先に物体領域６０４ｃに取り込まれていく場合の例を示している。この処理を順次繰り返すことにより、物体領域が、図６（Ｂ）の６０４ｂから、図６（Ｃ）の６０４ｃ、図６（Ｄ）の６０４ｄ、図６（Ｅ）の６０４ｅ、図６（Ｆ）６０４ｆのように、順次拡張する。 Next, in step S204 in FIG. 5, the object region is sequentially expanded. First, each pixel in contact with the initial range 604b of the object area (each pixel to which “x” in FIG. 6B is added) is selected. Then, among the selected pixels, for example, the process of including the pixel having the feature closest to the current object region in the object region is sequentially repeated. In this case, for example, the process of including the pixel closest to the average value or median value of the pixels included in the current object region in the object region is repeated. In FIG. 6C, the pixel near the lower left of the character area 601 is relatively close to the average value or the median value of the pixels included in the current object area, and is therefore captured first in the object area 604c. An example is shown. By sequentially repeating this process, the object region is changed from 604b in FIG. 6B to 604c in FIG. 6C, 604d in FIG. 6D, 604e in FIG. 6E, and FIG. As in 604f, it is expanded sequentially.

または、物体領域に隣接する画素の値が、物体領域に含まれている画素の最大値と最小値の間である場合に、当該隣接する画素を物体領域に含めることにより、物体領域を順次拡張する構成としてもよい。その場合、物体領域に含める画素がなくなった場合、新たに物体領域に含める画素の最大値を１増加させ、最小値を１減少させ、隣接する画素の値が、新たな物体領域に含まれている画素の最大値と最小値の間であれば、当該隣接する画素を物体領域に含める処理を繰り返す構成としてもよい。 Or, when the value of the pixel adjacent to the object region is between the maximum value and the minimum value of the pixels included in the object region, the object region is sequentially expanded by including the adjacent pixel in the object region. It is good also as composition to do. In that case, when there are no more pixels to be included in the object region, the maximum value of the pixels to be newly included in the object region is increased by 1, the minimum value is decreased by 1, and the values of adjacent pixels are included in the new object region. As long as it is between the maximum value and the minimum value of the existing pixels, the process of including the adjacent pixels in the object region may be repeated.

図６（Ｆ）は、図５のステップＳ２０５でＮＯとなり、処理を終了する例を示す図である。物体領域６０４ｆは、文字領域６０１と、領域６０２を含む領域となる。物体領域に隣接する画素と、物体領域に含まれる画素群との特徴が遠い（例えば、特徴の差が所定の閾値以上）ため、物体領域の拡張が収束した時点で、処理を終了する。例えば、現在の物体領域の平均値や中央値から所定の閾値以内である画素が、現在の物体領域外に存在しなくなった場合に、処理を終了する。 FIG. 6F is a diagram illustrating an example in which NO is determined in step S205 in FIG. The object region 604f is a region including a character region 601 and a region 602. Since the feature between the pixel adjacent to the object region and the pixel group included in the object region are distant (for example, the difference in feature is equal to or greater than a predetermined threshold value), the process ends when the extension of the object region converges. For example, the processing ends when pixels that are within a predetermined threshold from the average value or median value of the current object region no longer exist outside the current object region.

または、物体領域に含まれている画素の最大値と最小値の間から所定の閾値以内である画素が、現在の物体領域外に存在しなくなった場合に、処理を終了する構成としてもよい。 Alternatively, the processing may be terminated when a pixel that is within a predetermined threshold value between the maximum value and the minimum value of the pixels included in the object region does not exist outside the current object region.

なお、画素値として、ＲＧＢの３つの各画素値を用いる構成としてもよい。その場合には、例えば、ＲＧＢから選択した一つの原色（例えばＲ）の画素値のみを用いてもよく、複数の原色の平均値を用いてもよい。 In addition, it is good also as a structure which uses each three pixel values of RGB as a pixel value. In that case, for example, only a pixel value of one primary color (for example, R) selected from RGB may be used, or an average value of a plurality of primary colors may be used.

＜物体種別判定処理＞
次に、物体種別判定部１４による、物体の種別を判定する処理について説明する。物体種別判定部１４は、予め、検出対象の物体の種別に応じた画像群をトレーニングデータとして機械学習しておく。そして、物体種別判定部１４は、物体領域検出部１３により検出された物体領域の、例えば形状や色の特徴に基づいて一般物体認識または特定物体認識を行うことにより、文字が記載されている物体の種別を判定する。 <Object type determination process>
Next, processing for determining the type of an object by the object type determination unit 14 will be described. The object type determination unit 14 performs machine learning in advance using an image group corresponding to the type of the object to be detected as training data. Then, the object type determination unit 14 performs general object recognition or specific object recognition on the object region detected by the object region detection unit 13 based on, for example, a shape or color feature, so that an object on which characters are described The type of is determined.

＜属性推定処理＞
次に、図７を参照して、属性推定部１５による、属性推定処理について説明する。図７は、属性推定処理の一例を示すフローチャートである。 <Attribute estimation processing>
Next, attribute estimation processing by the attribute estimation unit 15 will be described with reference to FIG. FIG. 7 is a flowchart illustrating an example of attribute estimation processing.

属性推定部１５は、文字領域検出部１２が検出した文字領域と、物体領域検出部１３が検出した物体領域に基づいて、物体領域における文字の位置を判定する（ステップＳ３０１）。 The attribute estimation unit 15 determines the position of the character in the object region based on the character region detected by the character region detection unit 12 and the object region detected by the object region detection unit 13 (step S301).

次に、属性推定部１５は、物体領域における文字を含む領域の範囲を判定する（ステップＳ３０２）。次に、属性推定部１５は、物体領域における他の文字に対する相対的な大きさを判定する（ステップＳ３０３）。次に、属性推定部１５は、物体領域における文字の数を判定する（ステップＳ３０４）。次に、物体領域における文字の背景の色を判定する（ステップＳ３０５）。 Next, the attribute estimation unit 15 determines the range of the area including characters in the object area (step S302). Next, the attribute estimation unit 15 determines a relative size with respect to other characters in the object region (step S303). Next, the attribute estimation unit 15 determines the number of characters in the object area (step S304). Next, the background color of the character in the object area is determined (step S305).

次に、属性推定部１５は、属性推定テーブルに基づいて、物体の種別、物体領域における文字の位置、範囲、他の文字に対する相対的な大きさ、文字の数、背景の色に基づいて、文字が表す内容の属性を推定する（ステップＳ３０６）。 Next, based on the attribute estimation table, the attribute estimation unit 15 is based on the type of object, the position and range of characters in the object region, the relative size with respect to other characters, the number of characters, and the background color. The attribute of the content represented by the character is estimated (step S306).

＜属性推定の一例＞
次に、図８を参照して、本実施形態に係る属性推定の一例について説明する。図８は、本実施形態に係る属性推定の一例を説明する図である。 <An example of attribute estimation>
Next, an example of attribute estimation according to the present embodiment will be described with reference to FIG. FIG. 8 is a diagram for explaining an example of attribute estimation according to the present embodiment.

図８（Ａ）は、物体領域検出部１３により検出された物体４の領域（物体領域）における文字を判定するための前処理を説明する図である。属性推定部１５は、例えば、各文字領域を含む（内接する）円６１０〜６１９を生成し、距離（例えば最も近い円の中心同士の距離）が所定の範囲内であり、大きさ（例えば円の半径）の差が所定の範囲内である１以上の円を、同一のグループと判定する。図８（Ａ）の例では、円６１０〜６１３が同一グループと判定され、円６１４〜６１９が同一グループと判定される。なお、円の代わりに、正方形や矩形等の多角形を用いる構成としてもよい。 FIG. 8A is a diagram for explaining preprocessing for determining characters in the region (object region) of the object 4 detected by the object region detection unit 13. For example, the attribute estimation unit 15 generates circles 610 to 619 including (inscribed) each character area, and the distance (for example, the distance between the centers of the nearest circles) is within a predetermined range, and the size (for example, a circle) One or more circles having a difference in radius) within a predetermined range are determined as the same group. In the example of FIG. 8A, the circles 610 to 613 are determined to be the same group, and the circles 614 to 619 are determined to be the same group. In addition, it is good also as a structure which uses polygons, such as a square and a rectangle, instead of a circle.

図８（Ｂ）は、図７のステップＳ３０１〜ステップＳ３０５の、物体領域における、文字の位置、文字の範囲、他の文字に対する相対的な大きさ、文字の数、背景の色を判定する処理を説明する図である。 FIG. 8B is a process of determining the character position, the character range, the relative size with respect to other characters, the number of characters, and the background color in the object region in steps S301 to S305 in FIG. FIG.

属性推定部１５は、例えば、各グループの円を含み、面積が最少となる楕円６２０、６２１を生成し、生成した楕円６２０、６２１の、物体領域における位置と、範囲を判定する。 For example, the attribute estimation unit 15 generates ellipses 620 and 621 including the circles of each group and having the smallest area, and determines the positions and ranges of the generated ellipses 620 and 621 in the object region.

物体領域における文字の位置は、図３の属性推定テーブルにおいて設定されている、各位置と判定する所定の条件を満たすか否かに応じて判定する。図８（Ｂ）の例では、円６１４〜６１９の中心が、物体領域を垂直方向に３等分した場合の一番の上の領域に位置するとの条件を満たすため、円６１４〜６１９に含まれる各文字の位置は「上部」であると判定する。 The position of the character in the object area is determined according to whether or not a predetermined condition for determining each position is set in the attribute estimation table of FIG. In the example of FIG. 8B, the circles 614 to 619 are included in the circles 614 to 619 in order to satisfy the condition that the centers of the circles 614 to 619 are located in the uppermost region when the object region is divided into three equal parts in the vertical direction. It is determined that the position of each character is “upper”.

物体領域における文字の範囲は、図３の属性推定テーブルにおいて設定されている、各範囲と判定する所定の条件を満たすか否かに応じて判定する。 The character range in the object region is determined according to whether or not a predetermined condition for determining each range is set in the attribute estimation table of FIG.

物体領域における他の文字に対する相対的な大きさは、図３の属性推定テーブルにおいて設定されている、各大きさと判定する所定の条件を満たすか否かに応じて判定する。図８（Ｂ）の例では、楕円６２０に含まれる各円の大きさと、楕円６２１に含まれる各円の大きさの比が所定値以上であるため、楕円６２０に含まれる各文字の、他の文字に対する相対的な大きさを、「大」であると判定する。同様に、楕円６２１に含まれる各文字の、他の文字に対する相対的な大きさを、「中」であると判定する。 The relative size with respect to other characters in the object region is determined according to whether or not a predetermined condition for determining each size is set in the attribute estimation table of FIG. In the example of FIG. 8B, the ratio of the size of each circle included in the ellipse 620 to the size of each circle included in the ellipse 621 is equal to or greater than a predetermined value. Is determined to be “large”. Similarly, the relative size of each character included in the ellipse 621 with respect to other characters is determined to be “medium”.

物体領域における文字の数は、例えば、各グループに含まれる円の数に基づいて判定する。図８（Ｂ）の例では、楕円６２０のグループに含まれる円６１４〜６１９の数は６であるため、文字の数は６と判定される。 The number of characters in the object area is determined based on, for example, the number of circles included in each group. In the example of FIG. 8B, since the number of circles 614 to 619 included in the group of ellipses 620 is 6, the number of characters is determined to be 6.

物体領域における文字の背景の色は、物体領域における文字領域以外の部分の色の平均値に基づいて判定する。図８（Ｂ）の例では、文字領域以外の部分の平均値が、「白」と判定する所定範囲以内であるため、文字の背景の色を、「白」であると判定する。 The background color of the character in the object area is determined based on the average value of the color of the part other than the character area in the object area. In the example of FIG. 8B, since the average value of portions other than the character area is within a predetermined range for determining “white”, the background color of the character is determined to be “white”.

図３に示す属性推定テーブルの例を用いて、図７のステップＳ３０６による文字が表す内容の属性を推定する処理について説明する。 Using the example of the attribute estimation table shown in FIG. 3, the process for estimating the attribute of the content represented by the character in step S306 in FIG. 7 will be described.

円６１０〜６１３の文字は、物体の種別が「名刺」、物体領域における文字の位置が「中央部」、他の文字に対する相対的な大きさが「大」、物体領域における文字の数は４、物体領域における文字の背景の色が「白」である。そのため、文字が表す内容の属性は、「人名」であると推定される。同様に、円６１４〜６１９の文字は、物体の種別が「名刺」、他の文字に対する相対的な大きさが「中」であるため、文字が表す内容の属性は、「社名」であると推定される。 The characters of the circles 610 to 613 have the object type “business card”, the character position in the object region “center”, the relative size with respect to other characters “large”, and the number of characters in the object region 4 The background color of the character in the object area is “white”. Therefore, it is presumed that the attribute of the content represented by the characters is “person name”. Similarly, since the characters of the circles 614 to 619 have the object type “business card” and the relative size with respect to other characters “medium”, the attribute of the content represented by the characters is “company name”. Presumed.

なお、上述した属性推定処理の各処理における所定範囲の値、及び所定の閾値は、物体の種別に応じた値を用いる構成としてもよい。その場合、当該値は、ユーザが設定できる構成としてもよい。 Note that the values in the predetermined range and the predetermined threshold in each of the attribute estimation processes described above may be configured to use values according to the type of the object. In that case, the value may be configured to be set by the user.

＜変形例＞
物体領域検出部１３は、物体領域を拡張する度に、物体領域の特徴を保存しながら、情景画像全体が物体領域に含まれるまで拡張し、条件に合った領域を検出する構成としてもよい。 <Modification>
The object region detection unit 13 may be configured to detect a region that meets a condition by expanding the entire scene image until the entire region image is included in the object region while preserving the feature of the object region each time the object region is expanded.

例えば、物体領域検出部１３は、物体領域を拡張する度に、拡張する前の物体領域内の画素値のばらつきq_internal（例えば、最小値と最大値との差や分散等）と、拡張した前の物体領域内の画素値のばらつきq_externalとを算出して保存しておく。そして、物体領域検出部１３は、情景画像の全ての画素が物体領域に含まれる状態まで拡張を行った後、物体領域をある回数だけ拡張した際のばらつきの比率q_internal／q_externalが、当該回数の前後の回数だけ拡張した際の各ばらつきの比率q_internal／q_externalと比べて小さい領域を、物体領域として検出する構成としてもよい。 For example, every time the object region is expanded, the object region detection unit 13 expands the pixel value variation q _internal (for example, the difference between the minimum value and the maximum value, variance, etc.) in the object region before the expansion. The pixel value variation q _external in the previous object area is calculated and stored. Then, the object region detection unit 13 expands the state image so that all the pixels of the scene image are included in the object region, and then the variation ratio q _internal / q _external when the object region is expanded a certain number of times is A configuration may be _adopted in which a region that is smaller than the ratio q _internal / q _external when the number of variations before and after the number of times of expansion is expanded is detected as the object region.

また、ばらつきの比率q_internal／q_externalが、物体領域を拡張した回数が所定範囲内（例えば５０回〜１５０回）で最小となる領域を、物体領域として検出する構成としてもよい。なお、ここで、所定範囲内で最少となる領域としたのは、全範囲内で最小となる領域とすると、検出したい物体が一回り大きな別の物体の中にある場合、（例えば「人物」が写っている領域の中に「名札」が写っている領域がある場合）、当該別の物体が検出される可能性があるためである。 Alternatively, a configuration may be _{adopted in} which a region where the variation ratio q _internal / q _external is the smallest when the number of times the object region is expanded is within a predetermined range (for example, 50 to 150 times) is detected as the object region. Here, the region that is the smallest within the predetermined range is the region that is the smallest within the entire range. If the object to be detected is in a slightly larger object (for example, “person”) This is because there is a possibility that another object may be detected when there is a region where “name tag” is reflected in the region where the name appears.

上記処理により、複数の領域が物体領域として検出された場合、当該複数の領域のうちの一つのみをその後の処理に用いてもよいし、当該複数の領域のうちの一以上の領域を用いて、各領域に対する文字の属性を推定し、得られた各領域に対する文字の属性を文字認識等に利用してもよい。 When a plurality of regions are detected as object regions by the above processing, only one of the plurality of regions may be used for the subsequent processing, or one or more regions of the plurality of regions are used. Thus, the attribute of the character for each area may be estimated, and the obtained attribute of the character for each area may be used for character recognition or the like.

また、属性推定部１５は、物体種別判定部１４の判定結果が複数あるため、文字の属性が複数推定された場合は、当該属性毎に、物体種別判定部１４から取得した、判定結果の確からしさを示すスコアも出力するか、最も確からしい属性に対応付けられた文字の属性を出力する構成としてもよい。 In addition, since the attribute estimation unit 15 has a plurality of determination results of the object type determination unit 14, if a plurality of character attributes are estimated, the attribute estimation unit 15 confirms the determination result obtained from the object type determination unit 14 for each attribute. A score indicating the likelihood may be output, or a character attribute associated with the most probable attribute may be output.

また、属性推定部１５は、例えば、物体領域検出部１３により検出された物体の領域に基づいて、文字の属性を推定する構成としてもよい。その場合について、以下に説明する。図９は、文字属性推定装置の機能構成の変形例を示す機能ブロック図である。図２の機能ブロック図と比較して、物体種別判定部１４が存在しない点が異なる。変形例の属性推定部１５は、予め、検出対象の文字が表す内容の属性に応じた画像群をトレーニングデータとして機械学習しておく。例えば、人名の文字が写っている複数の画像に「人名」の属性を付加し、当該複数の画像をトレーニングデータとして機械学習しておく。そして、変形例の属性推定部１５は、当該機械学習の結果、及び物体領域検出部１３により検出された物体の領域の形状等から、前記文字が表す内容の属性を推定する。図１０は、情景画像中の文字の属性を推定する処理の変形例を示すフローチャートである。図１０のステップＳ４０１〜ステップＳ４０３は、図４のステップＳ１０１〜ステップＳ１０３と同様である。ステップＳ４０４の属性推定処理において、変形例の属性推定部１５は、物体の種別の代わりに、物体領域の形状に基づいて、文字の属性を推定する。これにより、物体種別判定部１４による物体の種別の判定は不要となる。 Moreover, the attribute estimation part 15 is good also as a structure which estimates the attribute of a character based on the area | region of the object detected by the object area | region detection part 13, for example. Such a case will be described below. FIG. 9 is a functional block diagram showing a modification of the functional configuration of the character attribute estimation device. Compared with the functional block diagram of FIG. 2, the object type determination unit 14 is not present. The attribute estimation unit 15 of the modification machine-learns in advance an image group corresponding to the attribute of the content represented by the detection target character as training data. For example, a “person name” attribute is added to a plurality of images in which characters of a person's name are shown, and the plurality of images are machine-learned as training data. And the attribute estimation part 15 of a modification estimates the attribute of the content which the said character represents from the result of the said machine learning, the shape of the area | region of the object detected by the object area | region detection part 13, etc. FIG. FIG. 10 is a flowchart illustrating a modification of the process for estimating the attribute of a character in a scene image. Steps S401 to S403 in FIG. 10 are the same as steps S101 to S103 in FIG. In the attribute estimation process of step S404, the attribute estimation unit 15 of the modified example estimates the character attribute based on the shape of the object region instead of the object type. Thereby, it is not necessary to determine the object type by the object type determination unit 14.

また、物体種別判定部１４は、判定した物体の種別の情報を、文字認識部１７に出力する構成としてもよい。これにより、文字認識部１７は、例えば物体の種別がのぼり等の布である場合に、歪みが多く直線成分が少ない等の特徴を用いて文字認識することができる。 The object type determination unit 14 may be configured to output information on the determined object type to the character recognition unit 17. As a result, the character recognition unit 17 can recognize a character using features such as a large amount of distortion and a small amount of linear components when the type of the object is a cloth such as a banner.

文字属性推定装置１は、例えば１以上のコンピュータにより構成されるクラウドコンピューティングにより実現されていてもよい。その場合、例えば、辞書選択部１６や文字認識部１７は、別のコンピュータにより実現する構成としてもよい。 The character attribute estimation device 1 may be realized by cloud computing constituted by one or more computers, for example. In that case, for example, the dictionary selection unit 16 and the character recognition unit 17 may be configured to be realized by another computer.

文字属性推定装置１は、例えば、文字の属性に応じて、文字認識に用いる辞書を切り替えるシステムに適用可能である。例えば、取材内容の動画や、過去の放送番組の動画を蓄積した動画データベースから、動画中の画像に写っている文字を文字認識し、当該動画の検索用のタグとして、認識した文字列を当該動画に対応付けて管理するアーカイブシステムに適用することが可能である。また、文字属性推定装置１は、例えば、カメラで撮影した画像から文字を認識するスマートフォンに適用することも可能である。 The character attribute estimation device 1 can be applied to a system that switches a dictionary used for character recognition, for example, according to a character attribute. For example, from a video database that accumulates videos of interview contents or videos of past broadcast programs, characters recognized in images in the video are recognized as characters, and the recognized character string is used as a search tag for the video. The present invention can be applied to an archive system that manages in association with a moving image. Moreover, the character attribute estimation apparatus 1 can also be applied to, for example, a smartphone that recognizes characters from an image captured by a camera.

＜文字属性推定プログラムについて＞
本実施形態に係る文字属性推定装置１は、例えば、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等の揮発性の記憶媒体、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）等の不揮発性の記憶媒体、マウスやキーボード、ポインティングデバイス等の入力装置、画像やデータを表示する表示部、並びに外部と通信するためのインターフェースを備えたコンピュータによって構成してもよい。 <About the character attribute estimation program>
The character attribute estimation apparatus 1 according to this embodiment includes, for example, a CPU (Central Processing Unit), a volatile storage medium such as a RAM (Random Access Memory), a nonvolatile storage medium such as a ROM (Read Only Memory), a mouse, and the like. Or a keyboard, an input device such as a pointing device, a display unit for displaying images and data, and a computer having an interface for communicating with the outside.

その場合、文字属性推定装置１が有する各機能は、これらの機能を記述したプログラム（文字属性推定プログラム）をＣＰＵに実行させることによりそれぞれ実現可能となる。また、このプログラムは、磁気ディスク（フロッピィーディスク、ハードディスク等）、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤ等）、半導体メモリ等の記録媒体に格納して頒布することもできる。つまり、上述した各構成における処理をコンピュータ（ハードウェア）に実行させるためのプログラムを、例えば汎用のＰＣやサーバ等にそのプログラムをインストールすることにより、上述した処理を実現することができる。 In that case, each function of the character attribute estimation device 1 can be realized by causing the CPU to execute a program (character attribute estimation program) describing these functions. The program can also be stored and distributed on a recording medium such as a magnetic disk (floppy disk, hard disk, etc.), optical disk (CD-ROM, DVD, etc.), semiconductor memory, or the like. That is, the above-described processing can be realized by installing a program for causing a computer (hardware) to execute the processing in each configuration described above, for example, on a general-purpose PC or server.

また、上述した実施形態における文字属性推定装置１の一部、または全部を、ＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）等の集積回路として実現してもよい。文字属性推定装置１の各機能ブロックは個別にプロセッサ化してもよいし、一部、または全部を集積してプロセッサ化してもよい。 Moreover, you may implement | achieve part or all of the character attribute estimation apparatus 1 in embodiment mentioned above as integrated circuits, such as LSI (Large Scale Integration). Each functional block of the character attribute estimation apparatus 1 may be individually made into a processor, or a part or all of them may be integrated into a processor.

＜まとめ＞
上述した実施形態によれば、文字領域の周辺の情報を利用して、文字の属性を推定する。これにより、情景画像中の文字の属性を、高精度で推定できる。文字の属性は、情景画像中の文字の認識精度を向上されるための単語辞書の選択等に利用可能である。文字の属性に応じて、文字認識に用いる辞書を切り替えれば、照合する辞書サイズの減少による認識速度の向上や、文字（単語）の誤認識の低減が可能となる。 <Summary>
According to the embodiment described above, the character attributes are estimated using information around the character area. Thereby, the attribute of the character in a scene image can be estimated with high precision. Character attributes can be used for selecting a word dictionary for improving the recognition accuracy of characters in a scene image. If the dictionary used for character recognition is switched according to the character attributes, the recognition speed can be improved by reducing the size of the dictionary to be collated, and erroneous recognition of characters (words) can be reduced.

また、上述した実施形態によれば、文字領域の周辺の情報を利用して、物体の領域を検出する。そのため、従来技術のように、情景画像中から、輝度等に基づいて物体の領域を検出する場合と比較して、文字が記載されている物体の領域を高精度で検出できる。 Further, according to the above-described embodiment, the area of the object is detected using information around the character area. Therefore, compared to the case where the object region is detected based on the luminance or the like from the scene image as in the conventional technique, the object region in which the characters are described can be detected with high accuracy.

以上、図面を参照して実施形態について詳しく説明してきたが、具体的な構成は上述のものに限られることはなく、この発明の要旨を逸脱しない範囲内において様々な設計変更等をすることが可能である。また、上述した各実施形態の一部又は全部を組み合わせることも可能である。 As described above, the embodiments have been described in detail with reference to the drawings. However, the specific configuration is not limited to that described above, and various design changes and the like can be made without departing from the scope of the invention. Is possible. Moreover, it is also possible to combine a part or all of each embodiment mentioned above.

１文字属性推定装置
１１画像取得部
１２文字領域検出部
１３物体領域検出部
１４物体種別判定部
１５属性推定部
１６辞書選択部
１７文字認識部 DESCRIPTION OF SYMBOLS 1 Character attribute estimation apparatus 11 Image acquisition part 12 Character area detection part 13 Object area detection part 14 Object type determination part 15 Attribute estimation part 16 Dictionary selection part 17 Character recognition part

Claims

A character area detection unit for detecting a character area from an image;
Object region detection for detecting pixels satisfying a condition corresponding to at least some pixel values of pixels adjacent to the character region detected by the character region detection unit as an object region in which the character is described And
Machine learning is performed as training data on an image group corresponding to the attribute of the content represented by the character, and the content represented by the character is determined from the result of the machine learning and the shape of the region of the object detected by the object region detection unit. An attribute estimation unit for estimating attributes;
A character attribute estimation device comprising:

A character area detection unit for detecting a character area from an image;
Object region detection for detecting pixels satisfying a condition corresponding to at least some pixel values of pixels adjacent to the character region detected by the character region detection unit as an object region in which the character is described And
Object type determination in which an image group corresponding to the object type is machine-learned as training data, and the object type is determined from the result of the machine learning and the shape of the object area detected by the object area detection unit And
An attribute estimation unit that estimates an attribute of the content represented by the character from the type of the object determined by the object type determination unit;
A character attribute estimation device comprising:

The object area detection unit uses at least a part of pixels adjacent to the character area as an initial range of the object area, and is adjacent to the object area and is in accordance with a pixel value of the object area. Sequentially include pixels in the region of the object,
The character attribute estimation apparatus according to claim 1 or 2.

The attribute estimation unit is based on the position of the character in the region of the object, the range of the character, the relative size of the character with respect to other characters, the number of characters, or the background color of the character, Estimating the attributes;
The character attribute estimation apparatus as described in any one of Claims 1 thru | or 3.

A dictionary selection unit that selects a dictionary according to the attribute estimated by the attribute estimation unit;
A character recognition unit for recognizing the character based on the dictionary selected by the dictionary selection unit;
The character attribute estimation apparatus according to claim 1, comprising:

Computer
The character attribute estimation program for functioning as a character attribute estimation apparatus as described in any one of Claims 1 thru | or 5.