JP7037875B2

JP7037875B2 - Image normalization equipment, methods, and computer-readable recording media

Info

Publication number: JP7037875B2
Application number: JP2016122068A
Authority: JP
Inventors: 伸水谷; 良成白井; 泰恵岸野; 太納谷
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2016-06-20
Filing date: 2016-06-20
Publication date: 2022-03-17
Anticipated expiration: 2036-06-20
Also published as: JP2017227993A

Description

本発明は、画像正規化装置、方法、及びコンピュータ読み取り可能な記録媒体に係り、特に、認識器に入力するための正規化された画像を生成する画像正規化装置、方法、及びコンピュータ読み取り可能な記録媒体に関する。 The present invention relates to image normalization devices, methods, and computer-readable recording media, and in particular, image normalization devices, methods, and computer-readable images that produce normalized images for input to a recognizer. Regarding recording media.

画像認識分野で、ある物体と当該物体を撮影するカメラとの距離が変化したり、カメラに対して物体が回転したりして、画像中のその物体の像が、回転／拡大／縮小する場合、それらを不変に認識／識別することは、単純な画像認識技術では困難である。画像認識部分だけで、回転／拡大／縮小パターンを認識させるには、各々全ての変化に対応したパターン認識／識別器を前もって用意する必要があり、莫大な数のパターン認識／識別器が必要となる。これらを回避する方法として、認識させたい画像パターンの特徴量を計算し、それを入力画像パターンの特徴量と比較する方法が考案されている。例えば、非特許文献１に記載されている画像の特徴量であるＳＩＦＴ（ＨｉｓｔｏｇｒａｍｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔｓ）／ＳＵＲＦ（ＳｐｅｅｄＵｐＲｏｂｕｓｔＦｅａｔｕｒｅｓ）[ＳＩＦＴ／ＳＵＲＦ]などは、回転、拡大、縮小の変形には依存しない特徴量を用いて、画像パターンの判別／マッチングを行なうことが可能である。 In the field of image recognition, when the distance between an object and the camera that shoots the object changes, or the object rotates with respect to the camera, and the image of the object in the image is rotated / enlarged / reduced. , It is difficult to recognize / identify them invariantly with a simple image recognition technique. In order to recognize the rotation / enlargement / reduction pattern only in the image recognition part, it is necessary to prepare a pattern recognition / classifier corresponding to each change in advance, and a huge number of pattern recognition / classifiers are required. Become. As a method of avoiding these, a method of calculating the feature amount of the image pattern to be recognized and comparing it with the feature amount of the input image pattern has been devised. For example, SIFT (Histogram of Oriented Gradients) / SURF (Speed Up Robot Features) [SIFT / SURF], which are feature quantities of images described in Non-Patent Document 1, depend on deformation of rotation, enlargement, and reduction. It is possible to discriminate / match the image pattern by using the feature amount that does not exist.

中部大学工学部情報工学科藤吉研究室,「画像局所特徴量と特定物体認識-SIFTと最近のアプローチ-」,［平成２８年６月２０日検索］,インターネット<http://www.vision.cs.chubu.ac.jp/cvtutorial/PDF/02SIFTandMore.pdf>Fujiyoshi Laboratory, Department of Computer Science, Faculty of Engineering, Chubu University, "Image Local Features and Specific Object Recognition-SIFT and Recent Approaches-", [Search June 20, 2016], Internet <http://www.vision.cs. chubu.ac.jp/cvtutorial/PDF/02SIFTandMore.pdf>

しかし、これらの特徴量は、Ｄｉｆｆｅｒｅｎｃｅ－ｏｆ－Ｇａｕｓｓｉａｎ（ＤｏＧ）と呼ばれる二次元等方的なガウス関数によるフィルタを用いて対象画像のスケールを得るため、対象画像の拡大、縮小に関して、二次元平面の直交する二つの軸で等方的な場合のみを想定している。そのため、ＳＩＦＴ／ＳＵＲＦは、二次元の直交軸で異方的な拡大／縮小する画像による不変な判別／マッチングに使用することはできない。一般的には、三次元物体の画像上の二次元平面での見えは、射影変換となり、その一部である台形変換などは、ある回転角の範囲で、画像上の直交する二つの軸の独立な拡大／縮小に近似できる。二つの軸で独立な拡大／縮小となる不変認識ができれば、射影変換の一部の台形変換が不変認識でき、ＳＩＦＴ／ＳＵＲＦで困難だった台形変換への適用もある程度広がる。 However, these features are two-dimensional planes with respect to enlargement and reduction of the target image in order to obtain the scale of the target image using a filter by a two-dimensional isotropic Gaussian function called Orthogonality-of-Gaussian (DoG). Only the isotropic case with two orthogonal axes is assumed. Therefore, SIFT / SURF cannot be used for invariant discrimination / matching of images that are anisotropically enlarged / reduced on a two-dimensional orthogonal axis. In general, the appearance of a three-dimensional object in a two-dimensional plane on an image is a projective transformation, and the trapezoidal transformation, which is a part of it, is a transformation of two orthogonal axes on the image within a certain rotation angle range. It can be approximated to independent enlargement / reduction. If invariant recognition that is independent enlargement / reduction is possible on the two axes, some trapezoidal transformations of the projective transformation can be invariantly recognized, and the application to trapezoidal transformations that was difficult with SIFT / SURF will be expanded to some extent.

また、ＳＩＦＴ／ＳＵＲＦなどの特徴量は、入力画像中の局所領域と、テンプレートと呼ばれる特定物体画像を比較し同じものか否かを判定するテンプレートマッチング（参考文献１参照）を行なうために使用されることが主である。例えば、道路標識など、特定物体の同定／検出には有効であるが、画像中の物体のクラス分類（画像分類）や、クラス分類を行なう一般物体認識／検出には、そのままではＳＩＦＴ／ＳＵＲＦ特徴量を使用することが困難であるという問題があった。 Further, feature quantities such as SIFT / SURF are used for performing template matching (see Reference 1) for comparing a local region in an input image with a specific object image called a template and determining whether they are the same. Is the main thing. For example, it is effective for identifying / detecting specific objects such as road signs, but it is a SIFT / SURF feature as it is for class classification (image classification) of objects in images and general object recognition / detection for class classification. There was the problem that it was difficult to use the quantity.

［参考文献１］中京大学工学部橋本学,「テンプレートマッチングの魅力」,インターネット<http://isl.sist.chukyo-u.ac.jp/Archives/SSII2013TS-Hashimoto.pdf> [Reference 1] Chukyo University Faculty of Engineering Manabu Hashimoto, "Charm of Template Matching", Internet <http://isl.sist.chukyo-u.ac.jp/Archives/SSII2013TS-Hashimoto.pdf>

この問題のために、ｂａｇ－ｏｆ－ｋｅｙｐｏｉｎｔｓ（ＢｏＫ）という方法で、画像分類を行なうことが提案されているが、この方法は、複数の局所特徴量の画像上の位置関係は無視されるという、画像認識上重要な副作用的欠点がある。つまり、局所特徴量のセットが同じならば、局所的な画像の位置の入れ替わりがあっても、同じクラスとして分類されてしまう。ｂａｇ－ｏｆ－ｋｅｙｐｏｉｎｔｓという方法は、画像をＳＩＦＴ／ＳＵＲＦで使用する局所特徴量（ｋｅｙｐｏｉｎｔｓ）の一つの集まり（ｂａｇ－ｏｆ－ｋｅｙｐｏｉｎｔｓ）とみなし、その統計量により、画像をカテゴリに分けるものである（参考文献２参照）。 Due to this problem, it has been proposed to perform image classification by a method called bag-of-keypoints (BoK), but this method ignores the positional relationship of multiple local features on the image. , Has important side effect drawbacks in image recognition. That is, if the set of local features is the same, even if the positions of the local images are exchanged, they are classified as the same class. The method called bag-of-keypoints regards an image as a collection of local features (keypoints) used in SIFT / SURF (bag-of-keypoints), and divides the images into categories according to the statistic. (See Reference 2).

［参考文献２］中部大学工学部情報工学科藤吉研究室,「局所特徴量と統計学習手法による物体検出」,［平成２８年６月２０日検索］,インターネット<http://www.vision.cs.chubu.ac.jp/CVTutorial/PDF/03ObjectDetection.pdf> [Reference 2] Fujiyoshi Laboratory, Department of Computer Science, Faculty of Engineering, Chubu University, "Object Detection by Local Features and Statistical Learning Methods", [Search on June 20, 2016], Internet <http://www.vision.cs. chubu.ac.jp/CVTutorial/PDF/03ObjectDetection.pdf>

そのため、局所特徴量の画像上の位置関係は無視される。一般的に、画像分類は、画像中の物体がどんなカテゴリに属するかを分類する機能で、同じカテゴリ内のものは、その個体差を無視し、共通の特徴からクラス分類を行い、異なるカテゴリの間は、そのカテゴリ間の特徴差によりクラス分類を行う必要がある。そのため、画像分類は、機械学習などの統計的な学習法により、画像認識器を構成する必要があり、前述の特定物体の同定／検出に使用されるテンプレートマッチングを分類に使用することは、複数のテンプレートでカテゴリを形成するなどの特別な工夫を導入するなどの方法以外では通常困難である。 Therefore, the positional relationship of the local features on the image is ignored. In general, image classification is a function to classify what category an object in an image belongs to, and if it is in the same category, it ignores the individual difference and classifies it based on common characteristics, and it is in a different category. In the meantime, it is necessary to classify according to the feature difference between the categories. Therefore, for image classification, it is necessary to configure an image recognizer by a statistical learning method such as machine learning, and the template matching used for the above-mentioned identification / detection of a specific object may be used for classification. It is usually difficult except by introducing special ideas such as forming categories with the template of.

以上、画像二次元平面で異方的な拡大／縮小する画像を不変に認識すること、及びＳＩＦＴ／ＳＵＲＦ特徴量を一般物体認識に使用することの二点について、同時に満たすことが困難であるという課題があった。 As mentioned above, it is difficult to simultaneously recognize the two points of invariantly recognizing an image that is enlarged / reduced in an image two-dimensional plane and using SIFT / SURF features for general object recognition. There was a challenge.

本発明は、上記課題を鑑みて成されたものであり、認識器を用いた認識に合わせて、異方的な拡大又は縮小により正規化した画像を生成することができる画像正規化装置、方法、及び記録媒体を提供することを目的とする。 The present invention has been made in view of the above problems, and is an image normalization device and a method capable of generating a normalized image by anisotropic enlargement or reduction according to recognition using a recognizer. , And to provide a recording medium.

上記目的を達成するために、第１の発明に係る画像正規化装置は、入力画像から、認識器に入力するための正規化された画像を生成する画像正規化装置であって、前記入力画像に写っている物体を表す領域である図パターンを、前記図パターンがはみ出さず、かつ、前記図パターンに接するよう長方形の枠によって囲み、前記長方形の枠を用いて、候補画像を獲得する候補獲得部と、前記候補画像を、前記長方形の枠が、前記認識器が認識する物体の縦横比となるように拡大又は縮小することにより正規化する画像正規化部と、を含んで構成されている。 In order to achieve the above object, the image normalization device according to the first invention is an image normalization device that generates a normalized image to be input to a recognizer from an input image, and is the input image. A candidate for acquiring a candidate image by surrounding a diagram pattern, which is an area representing an object reflected in the image, with a rectangular frame so that the diagram pattern does not protrude and is in contact with the diagram pattern. It is configured to include an acquisition unit and an image normalization unit that normalizes the candidate image by enlarging or reducing the candidate image so that the rectangular frame has the aspect ratio of the object recognized by the recognizer. There is.

また、第１の発明に係る画像正規化装置において、前記候補獲得部は、前記図パターン及び前記長方形の枠の何れか一方を回転させて得られる複数の長方形の枠のうち、予め定めた前記長方形の枠に関する基準条件を満たす前記長方形の枠を用いて、前記候補画像を各々獲得するようにしてもよい。 Further, in the image normalization device according to the first invention, the candidate acquisition unit is a predetermined of a plurality of rectangular frames obtained by rotating either one of the figure pattern and the rectangular frame. The candidate images may be acquired by using the rectangular frame that satisfies the criteria for the rectangular frame.

また、第１の発明に係る画像正規化装置において、前記基準条件を、前記長方形の枠についての、外周の長さ、対角線の長さ、又は面積が最小であることとするようにしてもよい。 Further, in the image normalization apparatus according to the first invention, the reference condition may be such that the outer peripheral length, the diagonal length, or the area of the rectangular frame is the minimum. ..

第２の発明に係る画像正規化方法は、入力画像から、認識器に入力するための正規化された画像を生成する画像正規化装置における画像正規化方法であって、候補獲得部が、前記入力画像に写っている物体を表す領域である図パターンを、前記図パターンがはみ出さず、かつ、前記図パターンに接するよう長方形の枠によって囲み、前記長方形の枠を用いて、候補画像を獲得するステップと、画像正規化部が、前記候補画像を、前記長方形の枠が、前記認識器が認識する物体の縦横比となるように拡大又は縮小することにより正規化するステップと、を含んで実行することを特徴とする。 The image normalization method according to the second invention is an image normalization method in an image normalization device that generates a normalized image for input to a recognizer from an input image, and the candidate acquisition unit is described above. A diagram pattern, which is an area representing an object shown in an input image, is surrounded by a rectangular frame so that the diagram pattern does not protrude and is in contact with the diagram pattern, and a candidate image is acquired using the rectangular frame. The image normalization unit includes a step of normalizing the candidate image by enlarging or reducing the candidate image so that the rectangular frame has the aspect ratio of the object recognized by the recognizer. It is characterized by executing.

また、第２の発明に係る画像正規化方法において、前記候補獲得部が候補画像を獲得するステップは、前記図パターン及び前記長方形の枠の何れか一方を回転させて得られる複数の長方形の枠のうち、予め定めた前記長方形の枠に関する基準条件を満たす前記長方形の枠を用いて、前記候補画像を各々獲得するようにしてもよい。 Further, in the image normalization method according to the second invention, the step of acquiring the candidate image by the candidate acquisition unit is a plurality of rectangular frames obtained by rotating either one of the diagram pattern and the rectangular frame. Of these, the candidate images may be acquired by using the rectangular frame that satisfies the predetermined criteria for the rectangular frame.

また、第２の発明に係る画像正規化方法において、前記基準条件を、前記長方形の枠についての、外周の長さ、対角線の長さ、又は面積が最小であることとするようにしてもよい。 Further, in the image normalization method according to the second invention, the reference condition may be such that the outer peripheral length, the diagonal length, or the area of the rectangular frame is the minimum. ..

また、第３の発明に係る記録媒体は、コンピュータを、第１の発明に係る画像正規化装置を構成する各部として機能させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体である。 Further, the recording medium according to the third invention is a computer-readable recording medium on which a program for making a computer function as each part constituting the image normalization apparatus according to the first invention is recorded.

本発明の画像正規化装置、方法、及び記録媒体によれば、入力画像に写っている物体を表す領域である図パターンを、図パターンがはみ出さず、かつ、図パターンに接するよう長方形の枠によって囲み、長方形の枠を用いて、候補画像を獲得し、候補画像を、長方形の枠が、認識器が認識する物体の縦横比となるように拡大又は縮小して正規化することにより、認識器を用いた認識に合わせて、異方的な拡大又は縮小により正規化した画像を生成することができる、という効果が得られる。 According to the image normalization device, method, and recording medium of the present invention, a rectangular frame is provided so that the diagram pattern, which is an area representing an object in the input image, does not protrude from the diagram pattern and is in contact with the diagram pattern. Recognized by enclosing with and using a rectangular frame to acquire candidate images and normalizing the candidate image by enlarging or reducing it so that the rectangular frame has the aspect ratio of the object recognized by the recognizer. The effect that a normalized image can be generated by anisotropic enlargement or reduction according to recognition using a vessel can be obtained.

本発明の実施の形態に係る画像正規化装置の構成を示すブロック図である。It is a block diagram which shows the structure of the image normalization apparatus which concerns on embodiment of this invention. 文字を写した入力画像の一例を示す図である。It is a figure which shows an example of the input image which copied the character. 文字のテンプレートの一例を示す図である。It is a figure which shows an example of a character template. 背景領域と文字領域の図地分離した二値画像の一例を示す図である。It is a figure which shows an example of the binary image which separated the map area of the background area and the character area. 連結領域をラベリング後、最初に当てはめた長方形の枠で切り出した図パターンの各々の一例を示す図である。It is a figure which shows an example of each of the figure pattern cut out by the rectangular frame which was first fitted after labeling the connecting area. 回転させた図パターンの各々の一例を示す図である。It is a figure which shows an example of each of the rotated figure patterns. 候補画像の各々の一例を示す図である。It is a figure which shows an example of each of the candidate images. 、最小の面積になった長方形の枠と、外側の一定領域とを含む候補画像の一例を示す図である。, Is a diagram showing an example of a candidate image including a rectangular frame having the smallest area and a certain outer area. 候補画像の各々を正規化した場合の一例を示す図である。It is a figure which shows an example of the case where each of the candidate images is normalized. 入力画像に長方形の枠を当てはめた場合の一例を示す図である。It is a figure which shows an example of the case where the rectangular frame is fitted to the input image. 文字Ｅと読める図パターンを認識した場合の一例を示す図である。It is a figure which shows an example of the case where the figure pattern which can be read as a character E is recognized. 長方形の枠を回転させる場合の一例を示す図である。It is a figure which shows an example of the case of rotating a rectangular frame. 本発明の実施の形態に係る画像正規化装置における画像正規化処理ルーチンを示すフローチャートである。It is a flowchart which shows the image normalization processing routine in the image normalization apparatus which concerns on embodiment of this invention. 魚を写した入力画像の一例を示す図である。It is a figure which shows an example of the input image which showed the fish. 魚の領域と背景の領域を分離した一例を示す図である。It is a figure which shows an example which separated the area of a fish and the area of a background. 魚の背骨の検出例を示す図である。It is a figure which shows the detection example of the spine of a fish.

＜本発明の実施の形態に係る概要＞ <Overview of Embodiments of the Present Invention>

まず、本発明の実施の形態における概要を説明する。 First, an outline of the embodiment of the present invention will be described.

本実施の形態の手法は、上記の課題の困難を解決する方法を提供する。上記の二つの課題の内、まず、拡大／縮小の変形に対して、二次元平面で等方的な拡大／縮小する画像の場合だけ、正規化できていた機能を、異方的な拡大／縮小する画像も各々の二次元軸で独立に正規化できるようにする。これにより、正規化画像を、認識／判別器へ送る画像とし、拡大／縮小の変形には依存しない認識／判別処理ができるようにする。 The method of this embodiment provides a method of solving the difficulty of the above-mentioned problem. Of the above two issues, first of all, the function that could be normalized only in the case of an image that is isotropically enlarged / reduced in a two-dimensional plane with respect to the deformation of enlargement / reduction is anisotropically enlarged / reduced. The image to be reduced can also be normalized independently on each 2D axis. As a result, the normalized image is used as an image to be sent to the recognition / discriminator, and the recognition / discrimination process that does not depend on the enlargement / reduction deformation can be performed.

また、同時に、認識部分は、単なるテンプレートマッチングだけでなく、統計的な学習で構成される認識器への入力もできるように、従来の局所特徴量が使用できるようにする。 At the same time, the recognition part enables the use of conventional local features so that not only template matching but also input to the recognizer composed of statistical learning can be performed.

＜本発明の実施の形態に係る原理＞ <Principle of the Embodiment of the present invention>

次に、本発明の実施の形態における原理を説明する。 Next, the principle in the embodiment of the present invention will be described.

本発明の実施の形態は、物体による二次元の像が、閉曲線の内部（背景に対する閉領域）の図パターンと、その他の背景の地の二つの領域に分離できる場合、その図パターンがはみ出さず、かつ、接するように囲む長方形の枠をある基準により生成し、その枠内と周辺の二次元像を含む入力画像を正規化して切り出し、認識器へ伝えるという手法で、上記の課題を解決する。必要な図地分離は、認識したい物体が、図地分離したときに図パターンに入っていればよく、認識したい物体とそれ以外が完全に分離されている必要はなく（分離できていれば、改めて認識する必要なし）、認識したい物体を含む図パターン候補を全て認識器へ入力することにより、認識器側で判断する。対象物の図パターンがはみ出さず、接するように囲むことができる長方形の枠で対象物の画像の大きさを把握する。この処理の結果、枠の各々の辺の長さがわかるので、拡大／縮小の変形を正規化することができる。ある物体の二次元像の一つの図パターンを囲む長方形の枠は、様々存在するが、それを一意に決めるために、基準条件を一つ設ける。当然、長方形の枠は、対象物の図パターンがはみ出さず、接するように囲むことができる必要がある。その上での基準である。例えば、長方形の枠の外周長さが最小、長方形の枠の対角線の長さが最小、長方形の枠の面積が最小など複数考えられ、物体による図パターン形状によっては、これらの内、いくつかの条件が同じ長方形の枠になる場合もある。これらのうちの一つの基準条件により、対象物にもよるが、基準条件を満たす長方形の枠は、高々数個で、対象の図パターンを囲むことができる。その後、長方形の枠内とその周り一定の範囲内の画像を、長方形の枠の大きさに応じて正規化し、認識器に高々数回入力する。これらの内、最も当てはまるものを認識器の判断とすればよい。長方形の枠の周り一定の範囲とは、理想的には、長方形の枠内とその周りの範囲で、図地パターンの面積が半分ずつになるようなもので、実際には、最初から長方形の枠の各々の辺の何倍かを決めて用いてもよい。 In an embodiment of the present invention, when a two-dimensional image of an object can be separated into two regions, a diagram pattern inside a closed curve (closed region with respect to the background) and another background ground, the diagram pattern protrudes. The above problem is solved by a method of generating a rectangular frame that surrounds the frame so as to be in contact with each other, normalizing the input image including the two-dimensional image in and around the frame, cutting it out, and transmitting it to the recognizer. do. The necessary figure-ground separation is as long as the object to be recognized is in the figure pattern when the figure-ground is separated, and the object to be recognized and the others do not have to be completely separated (if they can be separated). There is no need to recognize it again), and the recognizer makes a judgment by inputting all the figure pattern candidates including the object to be recognized into the recognizer. The size of the image of the object is grasped by a rectangular frame that can be surrounded so that the figure pattern of the object does not protrude. As a result of this processing, the length of each side of the frame is known, so that the enlargement / reduction deformation can be normalized. There are various rectangular frames that surround one figure pattern of a two-dimensional image of an object, but in order to uniquely determine it, one reference condition is set. Naturally, the rectangular frame needs to be able to be surrounded so that the figure pattern of the object does not protrude and touches. It is a standard on that. For example, the outer peripheral length of the rectangular frame is the minimum, the diagonal length of the rectangular frame is the minimum, the area of the rectangular frame is the minimum, and so on. In some cases, the conditions are the same rectangular frame. Depending on the reference condition of one of these, the number of rectangular frames satisfying the reference condition can be at most several and can surround the figure pattern of the target. After that, the images in the rectangular frame and within a certain range around it are normalized according to the size of the rectangular frame, and input to the recognizer at most several times. Of these, the one that most applies is the judgment of the recognizer. A certain area around a rectangular frame is ideally such that the area of the map pattern is halved in and around the rectangular frame, and in reality, it is rectangular from the beginning. You may decide how many times each side of the frame should be used.

回転に関する正規化は、長方形の枠の大きさに応じて拡大／縮小を正規化する際に４方向の可能性だけを考慮に入れ、全ての場合を認識器に入力すれば、達成できる。可能性のある四つの正規化した画像を認識器に入力し、最も当てはまるものを選択すればよい。これにより、回転変換にも不変な入力ができる。認識器に全てを入力する前に、四つの正規化した画像の様々な特徴量を用いて、最も適したと思われる方向だけを選ぶことも可能である。使用できる特徴量は、様々である。 Normalization for rotation can be achieved by inputting all cases into the recognizer, taking into account only the four-way possibilities when normalizing the enlargement / reduction according to the size of the rectangular frame. Enter the four possible normalized images into the recognizer and select the one that best fits your needs. As a result, an invariant input can be made for rotation conversion. It is also possible to use the various features of the four normalized images to select only the direction that seems most suitable before inputting everything into the recognizer. The features that can be used vary.

枠は、ＳＩＦＴ／ＳＵＲＦのように円でも、楕円、三角形などの多角形でもつくれるが、長方形にする理由は、以下の通りである。画像は二次元平面で表現されるため、独立な軸は、ｘ、ｙ軸の二つである。そのため、この独立な軸に対して、それと平行な辺を持つ長方形を用いることにより、画像の単純な拡大／縮小変形を表現できる。他の多角形では、画像の単純な拡大／縮小変形を表現することは困難である。円は、等方的な拡大／縮小変形しか表現できない。また、楕円の場合、対象の図パターンに接するものを見つけるのが困難である。このため、枠を長方形にしている。 The frame can be a circle like SIFT / SURF, or a polygon such as an ellipse or a triangle, but the reason for making it a rectangle is as follows. Since the image is represented by a two-dimensional plane, there are two independent axes, the x and y axes. Therefore, by using a rectangle having sides parallel to this independent axis, it is possible to express a simple enlargement / reduction deformation of the image. With other polygons, it is difficult to represent a simple enlargement / reduction transformation of an image. A circle can only represent isotropic enlargement / reduction transformations. Also, in the case of an ellipse, it is difficult to find one that touches the target figure pattern. Therefore, the frame is made rectangular.

以下に、正規化の具体的方法を示す。具体的には、例えば、物体による閉領域の図パターンの中心を任意に設定して、図パターンを様々な角度で回転させ、長方形の枠で囲み、その外周長さや対角線の長さ、又は面積が最小になる角度のものを選べばよい。ここでは、長方形の枠の方は、回転させず、対象物の画像を回転させる方法を説明する。方法としては、この逆でも、もちろんよい。長方形の枠は、対象物の図パターンがはみ出さず、接するように囲むため、回転させた閉領域の図パターンの最小、最大のｘ座標と同じくｙ座標が枠を構成する直線となるように構成し、対象物の画像の様々な回転角度で同様に枠を作り、その中で、基準条件を満たすものを選べばよい。その際、相対角度差として０度、９０度、１８０度、及び２７０度の回転した図パターンに対応する枠が、基準を満たすものとして選ばれる。この四通りの長方形の枠の大きさに応じて拡大／縮小の正規化を行なった後、認識／判別器に入力し、いずれの回転角度の画像が最も当てはまるかを判定すればよい。この際、認識／判別器が、二分器の場合、最も近いとされても、相対的に近いだけでは判断できず、絶対的な近さのしきい値により判断が必要である。判断は、これだけ近ければ、この分類は正しいと判断できるしきい値を用いる。 The specific method of normalization is shown below. Specifically, for example, the center of the figure pattern in a closed area by an object is arbitrarily set, the figure pattern is rotated at various angles, surrounded by a rectangular frame, and its outer peripheral length, diagonal length, or area is set. You can choose the one with the minimum angle. Here, a method of rotating the image of the object without rotating the rectangular frame will be described. Of course, the reverse is also possible as a method. Since the rectangular frame does not protrude from the image pattern of the object and surrounds it so that it touches it, the y coordinate should be a straight line constituting the frame as well as the minimum and maximum x coordinates of the rotated closed area diagram pattern. It is sufficient to compose and make a frame in the same manner at various rotation angles of the image of the object, and select the one that satisfies the reference condition. At that time, the frame corresponding to the rotated figure pattern of 0 degree, 90 degree, 180 degree, and 270 degree as the relative angle difference is selected as satisfying the standard. After normalizing the enlargement / reduction according to the size of the four rectangular frames, it may be input to the recognition / discriminator to determine which rotation angle image is most applicable. At this time, when the recognition / discriminator is a dichotomizer, even if it is considered to be the closest, it cannot be judged only by being relatively close, and it is necessary to make a judgment based on the absolute closeness threshold value. Judgment uses a threshold that can be judged to be correct if this classification is so close.

上記の説明では、対象とする物体による二次元の像が、閉曲線の内部（背景に対する閉領域）の図パターンと、その他の背景の地の二つの領域に分離できる場合を想定した。この状況は、画像として、各々の画素に割り当てられたスカラー値（特徴量、例えば、輝度やＲＧＢ色の一つの成分量など）が、二値（二つのグループ）に分けられ、図と地の二つの領域に分離できる場合である。画像中の対象となる物体が図を含む領域として、画像中では二次元的に閉曲線内の領域（閉領域）で切り出せ、その他の領域は地として分離できるものである。しかし、物体による二次元像の図パターンが、閉領域になっていない場合には、長方形の枠で囲むことは不可能になる。例えば、近接する画素が近い連続値の特徴量を持つ画像などで、特に白黒グレイスケールの輝度画像では、近接する画素の輝度が近い連続値になっている場合である。これらの場合では、画像そのもの、画素の輝度値ではなく、各々の画素のある特徴量を計算し、二値化などを行なった後に、図と地が分離し、対象となる物体の画像部分を閉領域にできれば、この方法を用いることは可能である。輝度の場合は、輝度値があるしきい値以上の明るい／暗い領域であるが、例えば、二つの画像の相違のある領域とない領域に分けるために、二つの画像の各々の画素の特徴量の差を用いることも可能である。背景画像が何らかの方法で得られる場合、入力画像と背景画像の特徴量との差があるしきい値以上の変化があった領域とそうでない領域に分けるなどである。 In the above description, it is assumed that the two-dimensional image of the target object can be separated into two areas, the figure pattern inside the closed curve (closed area with respect to the background) and the other background ground. In this situation, as an image, the scalar value (feature amount, for example, luminance or one component amount of RGB color) assigned to each pixel is divided into binary values (two groups), and the figure and the ground are divided into two groups. This is the case when it can be separated into two areas. The target object in the image can be cut out two-dimensionally in the closed curve area (closed area) in the image as the area including the figure, and the other areas can be separated as the ground. However, if the figure pattern of the two-dimensional image by the object is not a closed area, it becomes impossible to surround it with a rectangular frame. For example, in an image in which adjacent pixels have features having close continuous values, particularly in a black-and-white grayscale luminance image, the brightness of adjacent pixels has close continuous values. In these cases, the image itself, not the brightness value of the pixel, but the feature amount of each pixel is calculated, binarized, etc., and then the figure and the ground are separated, and the image part of the target object is displayed. It is possible to use this method if it is possible to make it a closed region. In the case of luminance, it is a bright / dark region where the luminance value is equal to or higher than a certain threshold value. It is also possible to use the difference between. When the background image is obtained by some method, the difference between the input image and the feature amount of the background image is divided into an area where there is a change of a certain threshold value or more and an area where the difference is not.

上記の方法は、様々な特徴量を用いた画像処理を行なって得られる閉領域を対象にできる。しかし、上記の方法でも困難な場合も考えられ、例えば、人間や動物の顔など部位を対象にする場合で、どこまでが顔で首、胴体との境目が画像処理で得られない場合が存在する。照明条件によっては、二値化などの方法で適用可能な場合がある。しかし、このような特定のカテゴリを対象とする画像認識では、特に対象の物体の事前知識を利用して、入力画像を正規化することが可能である。人間の顔、動物の顔などを入力画像から検出するタスクでは、目の特徴を使用し、二つの目の位置が検出できれば、目の間の距離から顔の大きさを推定でき、その領域を対象物の画像とすればよい。不確定な場合、対象物の画像候補を複数考えてもよい。目が一つのみ、または、二つ以上検出した場合も、複数の組み合わせにより、対象物の画像候補を複数考えればよい。画像から目を検出するためには、テンプレートマッチングなどの手法を用いることができる。他にも、枠を用いた方法を使わない、対象の事前知識を用いることができる例がある。枠も使用できるが、例えば、魚ならば、その背骨など、その対象に特徴的な部位を用いれば良いので、様々に考えられる。 The above method can target a closed region obtained by performing image processing using various features. However, there may be cases where it is difficult even with the above method. For example, when targeting a part such as the face of a human or animal, there are cases where the boundary between the face and the neck and torso cannot be obtained by image processing. .. Depending on the lighting conditions, it may be applicable by a method such as binarization. However, in image recognition targeting such a specific category, it is possible to normalize the input image, particularly by utilizing the prior knowledge of the target object. The task of detecting human faces, animal faces, etc. from input images uses eye features, and if the positions of the two eyes can be detected, the size of the face can be estimated from the distance between the eyes, and the area can be estimated. It may be an image of an object. If it is uncertain, a plurality of image candidates of the object may be considered. Even when only one eye or two or more eyes are detected, it is sufficient to consider a plurality of image candidates of the object by a plurality of combinations. In order to detect eyes from an image, a method such as template matching can be used. In addition, there is an example in which the prior knowledge of the object can be used without using the method using the frame. A frame can also be used, but in the case of a fish, for example, a part characteristic of the object such as the spine may be used, so various possibilities can be considered.

また、物体による二次元像の図パターンとしての閉領域に地パターンが入り込む場合も考えられる。例えば、ドーナツのような穴がある物体や、二値化される過程で、本来、図パターンとして分類される画素が何らかの理由により、地パターンに分類される場合である。この場合は、入り込んだまま上記の枠を当てはめても構わない。上記と同様に回転、拡大、縮小に対して正規化できる。 In addition, it is conceivable that the ground pattern enters the closed region as the figure pattern of the two-dimensional image by the object. For example, there is a case where an object having a hole such as a donut or a pixel originally classified as a figure pattern in the process of being binarized is classified into a ground pattern for some reason. In this case, the above frame may be applied while it is still inserted. Similar to the above, it can be normalized for rotation, enlargement, and reduction.

以降の説明においても、説明を簡単にするため、対象となる物体が、画像パターンで閉曲線内の領域として図地分離できるものを用いるが、その図地分離には、従来からある様々な画像処理技術を用いることにより、達成できる場合が多い。 In the following description as well, in order to simplify the explanation, an image pattern in which the target object can be separated into a map area as a region in a closed curve is used. However, various conventional image processing is used for the image separation. It can often be achieved by using technology.

認識器に入力する画像は、当てはめた枠内と周辺を含む二次元像を正規化したもので、対象となる領域とそうでない領域との面積比が約１：１になるように設定するのが、その後の認識器やテンプレートマッチングにとって望ましい。 The image input to the recognizer is a normalized two-dimensional image including the inside and the periphery of the fitted frame, and the area ratio between the target area and the non-target area is set to be about 1: 1. However, it is desirable for subsequent recognizers and template matching.

また、認識器に伝える枠情報付きの入力画像は、次の二通り考えられる。一つは、認識したい物体による二次元像の図パターンを閉領域にするために用いたしきい値処理などをした結果から得られた二値特徴量である。次に、その処理をする前の元画像から同じサイズで切り出した画像から計算した各々の画素の特徴量である。一般的に、認識器は、通常その入力された特徴量画像を使用して認識する。 In addition, the input image with frame information to be transmitted to the recognizer can be considered in the following two ways. One is a binary feature quantity obtained from the result of performing threshold processing or the like used to make a figure pattern of a two-dimensional image of an object to be recognized into a closed region. Next, it is the feature amount of each pixel calculated from the image cut out with the same size from the original image before the processing. Generally, the recognizer usually recognizes using the input feature image.

また、本実施の形態の手法は、入力画像を正規化する方法なので、特に認識器／テンプレートマッチングは特定する必要はない。しかし、使用する認識器に用いるテンプレートや学習画像にも、長方形の枠を上記の基準により、当てはめ、長方形の枠の外側を一定領域含めて正規化された画像を使用し、認識器／テンプレートマッチングを構成する必要がある。回転角に関しては、対象物の特徴的な軸の角度を統一し、一つに決め、配置し、拡大／縮小は、学習画像の縦横の長さを複数学習画像間で同じにすることにより上記の認識器／テンプレートマッチングに対する条件を達成できる。 Further, since the method of the present embodiment is a method of normalizing the input image, it is not necessary to specify the recognizer / template matching in particular. However, for the template and training image used for the recognizer to be used, a rectangular frame is applied according to the above criteria, and an image normalized to include a certain area outside the rectangular frame is used for recognizer / template matching. Need to be configured. Regarding the angle of rotation, the angle of the characteristic axis of the object is unified, determined and arranged as one, and enlargement / reduction is performed by making the vertical and horizontal lengths of the learning images the same among multiple learning images. Can achieve the conditions for recognizer / template matching.

以上の手法で入力画像をテンプレートマッチングや認識器に合わせて正規化できるが、複数の対象三次元物体が接近して、その二次元像が重なった場合には、重なった画像や図パターンが得られるため、上記の方法で正規化して認識器に入力しても、正しい認識は得られない。この場合には、以上の処理に加えて、何からの工夫を入れないと認識はできない点には留意する必要がある。 The input image can be normalized according to the template matching and the recognizer by the above method, but when multiple target 3D objects approach each other and the 2D images overlap, the overlapped image or figure pattern is obtained. Therefore, even if it is normalized by the above method and input to the recognizer, correct recognition cannot be obtained. In this case, it should be noted that recognition cannot be performed without some ingenuity in addition to the above processing.

以上、本発明の実施の形態の手法によって、認識／識別器への入力画像を正規化することができ、正規化により、二次元独立軸における拡大／縮小によって、不変のパターン認識／識別ができる。一般的には、三次元物体の画像上の二次元平面での見えは、射影変換となり、その一部である台形変換などは、ある回転角の範囲で、画像上の直交する二つの軸の独立な拡大／縮小に近似できる。二つの軸で独立な拡大／縮小となる不変認識ができれば、射影変換の一部の台形変換が不変認識でき、ＳＩＦＴ／ＳＵＲＦで困難だった台形変換への適用もある程度広がる。 As described above, the input image to the recognition / classifier can be normalized by the method of the embodiment of the present invention, and invariant pattern recognition / identification can be performed by enlargement / reduction in the two-dimensional independent axis by normalization. .. In general, the appearance of a three-dimensional object in a two-dimensional plane on an image is a projective transformation, and the trapezoidal transformation, which is a part of it, is a transformation of two orthogonal axes on the image within a certain rotation angle range. It can be approximated to independent enlargement / reduction. If invariant recognition that is independent enlargement / reduction is possible on the two axes, some trapezoidal transformations of the projective transformation can be invariantly recognized, and the application to trapezoidal transformations that was difficult with SIFT / SURF will be expanded to some extent.

また、本発明の実施の形態の手法は、画像認識に用いる特徴量として、画像から計算できるどんなものでも使用できる。そのため、特定物体認識のテンプレートマッチングだけではなく、機械学習による統計的認識／識別器を使用し、個別変化の大きい物体のカテゴリ分類を伴う一般的な物体認識方法にも使用できる、特徴量の例としては、二値化された局所領域の図の画素数のヒストグラム、二値化する前のオリジナル画像のＲＧＢ色の成分値のヒストグラム、ＨＯＧ（ＨｉｓｔｏｇｒａｍｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔｓ）と呼ばれる輝度の勾配方向とその強度のヒストグラム[ＨＯＧ]など様々なものが考えられる（参考文献３参照）。これにより、統計的認識／識別器として、単なるテンプレートマッチングだけではなく、ＡｄａＢｏｏｓｔやＮＮ（ＮｅｕｒａｌＮｅｔｗｏｒｋｓ）、ＳＶＭ(Support Vector Machine)[ＡｄａＢｏｏｓｔ／ＮＮ／ＳＶＭ]など、従来技術を自由に選ぶことができる。 Further, the method of the embodiment of the present invention can be any feature quantity used for image recognition that can be calculated from an image. Therefore, an example of feature quantity that can be used not only for template matching of specific object recognition but also for general object recognition method with categorization of objects with large individual changes by using statistical recognition / classifier by machine learning. These include a histogram of the number of pixels in the figure of the binarized local region, a histogram of the RGB color component values of the original image before binarization, and the gradient direction of brightness called HOG (Histogram of Oriented Radients) and its. Various things such as an intensity histogram [HOG] can be considered (see Reference 3). As a result, as a statistical recognition / discriminator, not only template matching but also conventional techniques such as AdaBoost, NN (Neural Networks), SVM (Support Vector Machine) [AdaBoost / NN / SVM] can be freely selected. ..

［参考文献３］中部大学工学部情報工学科藤吉研究室,「HOG特徴量とBoostingを用いた人検出」,［平成２８年６月２０日検索］,インターネット<http://www.vision.cs.chubu.ac.jp/joint_hog/pdf/HOG+Boosting_LN.pdf> [Reference 3] Fujiyoshi Laboratory, Department of Computer Science, Faculty of Engineering, Chubu University, "People detection using HOG features and Boosting", [Search on June 20, 2016], Internet <http://www.vision.cs. chubu.ac.jp/joint_hog/pdf/HOG+Boosting_LN.pdf>

また、本発明の実施の形態の手法では、対象物の存在領域を二値画像など使用して、その候補の位置／領域を決める。そのため、この方法の副次的効果として、入力画像をラスタスキャンし、大量に処理せずに、認識／識別器に入力する局所画像の候補を絞り込むことができ、処理時間を大幅に少なくすることができるという利点がある。 Further, in the method of the embodiment of the present invention, the position / region of the candidate is determined by using the existing region of the object such as a binary image. Therefore, as a side effect of this method, the input image can be raster-scanned and the candidates for the local image to be input to the recognizer / classifier can be narrowed down without processing a large amount, and the processing time can be significantly reduced. There is an advantage that it can be done.

以下、図面を参照して本発明の実施の形態における構成を詳細に説明する。 Hereinafter, the configuration according to the embodiment of the present invention will be described in detail with reference to the drawings.

＜本発明の第１の実施の形態に係る画像正規化装置の構成＞ <Structure of image normalization device according to the first embodiment of the present invention>

次に、本発明の第１の実施の形態に係る画像正規化装置の構成について説明する。 Next, the configuration of the image normalization device according to the first embodiment of the present invention will be described.

図１に示すように、本発明の第１の実施の形態に係る画像正規化装置１００は、ＣＰＵと、ＲＡＭと、後述する画像正規化処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この画像正規化装置１００は、機能的には図１に示すように入力部１０と、演算部２０と、出力部５０とを備えている。 As shown in FIG. 1, the image normalization device 100 according to the first embodiment of the present invention stores a CPU, a RAM, a program for executing an image normalization processing routine described later, and various data. It can be configured with a computer including a ROM. The image normalization device 100 functionally includes an input unit 10, a calculation unit 20, and an output unit 50 as shown in FIG.

入力部１０は、図２に示すような、物体として文字が写り込んだ画像を入力画像として受け付ける。入力画像の輝度値はグレイスケールになっている。本実施の形態では、入力画像の中の文字パターンを、図３に示す認識対象である文字Ｅとテンプレートマッチングするために、回転／拡大／縮小し、不変に認識／検出する場合を考える。 The input unit 10 accepts an image in which characters are reflected as an object as an input image as shown in FIG. The luminance value of the input image is grayscale. In the present embodiment, a case is considered in which a character pattern in an input image is rotated / enlarged / reduced and continuously recognized / detected in order to perform template matching with the character E which is a recognition target shown in FIG.

演算部２０は、領域抽出部３０と、候補獲得部３２と、画像正規化部３４と、画像認識部３６と、認識器４０とを含んで構成されている。 The calculation unit 20 includes a region extraction unit 30, a candidate acquisition unit 32, an image normalization unit 34, an image recognition unit 36, and a recognizer 40.

認識器４０には、文字とのテンプレートマッチングにより認識する認識器が記憶されている。なお、認識器は、テンプレートマッチングでなくても、どんなものでも使用できる。 The recognizer 40 stores a recognizer that recognizes by template matching with characters. Any recognizer can be used, not just template matching.

領域抽出部３０は、以下に説明するように、入力部１０で受け付けた入力画像に写っている物体を表す領域である図パターンを、図パターンがはみ出さず、かつ、図パターンに接するよう長方形の枠によって囲み、長方形の枠で囲まれた対象領域画像を切り出す。本実施の形態では、入力画像に写っている文字を物体とする。 As described below, the area extraction unit 30 has a rectangular shape so that the diagram pattern does not protrude and is in contact with the diagram pattern, which is an area representing an object reflected in the input image received by the input unit 10. The target area image surrounded by a rectangular frame is cut out. In the present embodiment, the characters appearing in the input image are objects.

領域抽出部３０は、まず、文字の図パターンがある物体領域とそうでない背景領域の図地分離を行なう。入力画像には、輝度がグレイスケールの様々な文字の画像パターンがあるが、図地分離のために、入力画像の画素の輝度を二値化すると図４のようになる。図４は、輝度値に対するしきい値処理後の二値画像であり、背景領域と文字領域の図地分離ができている。 The area extraction unit 30 first separates the object area having the character figure pattern from the background area not having the character figure pattern. The input image has image patterns of various characters whose brightness is grayscale, and when the brightness of the pixels of the input image is binarized for the purpose of separating the figure ground, it becomes as shown in FIG. FIG. 4 is a binary image after the threshold value processing for the luminance value, and the background area and the character area are separated from each other.

領域抽出部３０は、次に、地の部分と分離できた文字の部分の図パターンに対して、連結領域をラベリングし、図パターンの数を調べる。図パターンの数が、判定すべき対象の数となる。図４の場合は、図パターンの数は８個である。次に、対象となる図パターンを長方形の枠によって囲み、長方形の枠で囲まれた対象領域画像を一つずつ切り出す。最初に、対象となる図パターンの各々に接するように当てはめた長方形の枠で対象領域画像を切り出したものが、図５である。図５は、連結領域をラベリング後、最初に当てはめた長方形の枠で切り出した図パターンの各々である。 Next, the area extraction unit 30 labels the connected area with respect to the figure pattern of the character part separated from the ground part, and examines the number of figure patterns. The number of figure patterns is the number of objects to be determined. In the case of FIG. 4, the number of figure patterns is eight. Next, the target figure pattern is surrounded by a rectangular frame, and the target area images surrounded by the rectangular frame are cut out one by one. First, FIG. 5 is a cutout of a target area image with a rectangular frame fitted so as to be in contact with each of the target diagram patterns. FIG. 5 shows each of the diagram patterns cut out by the rectangular frame first fitted after labeling the connecting region.

候補獲得部３２は、以下に説明するように、領域抽出部３０によって切り出した対象領域画像から、長方形の枠に関する基準条件を満たす長方形の枠を用いて、候補画像を各々獲得する。本実施の形態では、基準条件を、長方形の枠についての面積が最小であることとする。また、対象領域画像を回転させて、当てはめた複数の長方形の枠のうち、面積が最小となる長方形の枠を用いて、候補画像を各々獲得する。 As described below, the candidate acquisition unit 32 acquires candidate images from the target area image cut out by the area extraction unit 30 using a rectangular frame that satisfies the reference condition for the rectangular frame. In the present embodiment, the reference condition is that the area of the rectangular frame is the minimum. Further, the target area image is rotated, and the candidate images are acquired by using the rectangular frame having the smallest area among the plurality of fitted rectangular frames.

候補獲得部３２は、まず、切り出した対象領域画像に対して、中心を任意にとった様々な角度の回転変換を加える。図６では、０度から９０度分までの回転で１５度毎の回転のみ（９０度回転を除く）を表示しているが、３６０度全ての角度で回転変換したものが候補となる。 First, the candidate acquisition unit 32 applies a rotation transformation of various angles with an arbitrary center to the cut out target area image. In FIG. 6, only rotations every 15 degrees (excluding 90 degree rotations) are displayed for rotations from 0 degrees to 90 degrees, but rotation conversions at all angles of 360 degrees are candidates.

候補獲得部３２は、次に、回転させた対象領域画像の各々に対し、図パターンの閉領域の最小、最大のｘ座標と、同じく最小、最大のｙ座標とを用いて、長方形の枠を構成する各辺が垂直／水平の直線になるように、長方形の枠を構成する。次に、例えば、面積が最小という基準条件で、長方形の枠を選択し、選択された長方形の枠内と、外側の一定領域とを含む画像を、候補画像とする。図７に示すように、回転された対象領域画像の各々の図パターンの下に書かれた数字は、回転角度と、長方形の枠の面積である。回転させた対象領域画像の各々に当てはめた長方形の枠の中で、最小の面積になった長方形の枠と、外側の一定領域とを含む画像が、図８に示す候補画像の各々である。 Next, the candidate acquisition unit 32 creates a rectangular frame for each of the rotated target region images by using the minimum and maximum x-coordinates of the closed region of the diagram pattern and the same minimum and maximum y-coordinates. A rectangular frame is constructed so that each of the constituent sides is a vertical / horizontal straight line. Next, for example, a rectangular frame is selected under the reference condition that the area is the minimum, and an image including the inside of the selected rectangular frame and a certain area outside is set as a candidate image. As shown in FIG. 7, the numbers written under each figure pattern of the rotated target area image are the rotation angle and the area of the rectangular frame. Among the rectangular frames fitted to each of the rotated target area images, the image including the rectangular frame having the smallest area and the outer fixed area is each of the candidate images shown in FIG.

画像正規化部３４は、候補画像の各々について、当該候補画像の長方形の枠の大きさが、認識器４０が認識する文字の縦横比となるように、当該候補画像を拡大又は縮小することにより正規化する。回転角９０度分で一つの候補画像が得られるため、３６０度分では、図９のように、図パターンの各々において面積が最小となる４つの候補画像が得られる。この４つの候補画像で、当てはめた枠の縦横比を、図３の文字Ｅの縦横比に合わせるように正規化してテンプレートマッチングを行った場合の二乗差の値が、候補画像の各々の左下に表示されている。最も小さい差のものは正方形で囲っている。 The image normalization unit 34 enlarges or reduces the candidate image so that the size of the rectangular frame of the candidate image is the aspect ratio of the characters recognized by the recognizer 40 for each of the candidate images. Normalize. Since one candidate image is obtained at a rotation angle of 90 degrees, four candidate images having the smallest area in each of the diagram patterns are obtained at 360 degrees, as shown in FIG. In these four candidate images, the squared difference value when template matching is performed by normalizing the aspect ratio of the fitted frame to match the aspect ratio of the character E in FIG. 3 is in the lower left of each of the candidate images. It is displayed. The one with the smallest difference is surrounded by a square.

画像認識部３６では、画像正規化部３４で正規化された候補画像の各々から、認識器４０を用いて、入力画像の文字を認識し、出力部５０に出力する。候補画像を、もとの二値画像に表示すると図１０のようになる。また、差が小さい値を認識器４０の文字Ｅのテンプレートと同じであると見なすと、図１１の中の枠で囲まれた図パターンが該当する。これらの枠で囲まれた図パターンは、文字Ｅと読めるものと認識される。上記の手法によって、回転、拡大、及び縮小により図パターンを正規化した画像を用いて文字を認識できることが分かる。 The image recognition unit 36 recognizes the characters of the input image from each of the candidate images normalized by the image normalization unit 34 using the recognizer 40, and outputs the characters to the output unit 50. When the candidate image is displayed on the original binary image, it becomes as shown in FIG. Further, assuming that the value having a small difference is the same as the template of the character E of the recognizer 40, the figure pattern surrounded by the frame in FIG. 11 corresponds. The figure pattern surrounded by these frames is recognized as being readable as the character E. It can be seen that by the above method, characters can be recognized using an image in which the figure pattern is normalized by rotation, enlargement, and reduction.

また、本実施の形態では、文字パターンが裏返ったパターンは、元の文字パターンとは異なるものであると問題を設定している。 Further, in the present embodiment, the problem is set that the pattern in which the character pattern is turned inside out is different from the original character pattern.

なお、基準条件は、長方形の枠についての、外周の長さ、又は対角線の長さが最小であることとしてもよい。また、長方形の枠の方は回転させず、対象領域画像を回転させる方法を説明したが、方法としては、この逆でもよい。長方形の枠を回転させる場合は、図１２のようになる。 The reference condition may be that the length of the outer circumference or the length of the diagonal line of the rectangular frame is the minimum. Further, although the method of rotating the target area image without rotating the rectangular frame has been described, the reverse may be used as the method. When rotating the rectangular frame, it becomes as shown in FIG.

＜本発明の第１の実施の形態に係る画像正規化装置の作用＞ <Operation of the image normalization device according to the first embodiment of the present invention>

次に、本発明の第１の実施の形態に係る画像正規化装置１００の作用について説明する。入力部１０において入力画像を受け付けると、画像正規化装置１００は、図１３に示す画像正規化処理ルーチンを実行する。 Next, the operation of the image normalization device 100 according to the first embodiment of the present invention will be described. When the input unit 10 receives the input image, the image normalization device 100 executes the image normalization processing routine shown in FIG.

まず、ステップＳ１００では、入力部１０で受け付けた入力画像に写っている文字を表す領域である図パターンを、図パターンがはみ出さず、かつ、図パターンに接するよう長方形の枠によって囲み、長方形の枠で囲まれた対象領域画像を切り出す。 First, in step S100, the figure pattern, which is an area representing the characters appearing in the input image received by the input unit 10, is surrounded by a rectangular frame so that the figure pattern does not protrude and is in contact with the figure pattern, and is rectangular. Cut out the target area image surrounded by the frame.

次に、ステップＳ１０２では、ステップＳ１００で切り出した対象領域画像を回転させて当てはめた複数の長方形の枠のうち、面積が最小となる長方形の枠を用いて、候補画像を各々獲得する。 Next, in step S102, candidate images are acquired by using the rectangular frame having the smallest area among the plurality of rectangular frames fitted by rotating the target area image cut out in step S100.

ステップＳ１０４では、ステップＳ１０２で獲得した候補画像の各々について、当該候補画像の長方形の枠の長さが、認識器４０が認識する文字の縦横比となるように、当該候補画像を拡大又は縮小することにより正規化する。 In step S104, for each of the candidate images acquired in step S102, the candidate image is enlarged or reduced so that the length of the rectangular frame of the candidate image is the aspect ratio of the characters recognized by the recognizer 40. Normalize by.

ステップＳ１０６では、ステップＳ１０４で正規化された候補画像の各々から、認識器４０を用いて、入力画像の文字を認識し、出力部５０に出力して処理を終了する。 In step S106, the characters of the input image are recognized from each of the candidate images normalized in step S104 by using the recognizer 40, and the characters are output to the output unit 50 to end the process.

以上説明したように、第１の実施の形態に係る画像正規化装置によれば、入力画像に写っている物体を表す領域である図パターンを、図パターンがはみ出さず、かつ、図パターンに接するよう長方形の枠によって囲み、長方形の枠を用いて、候補画像を獲得し、候補画像を、長方形の枠が、認識器が認識する物体の縦横比となるように拡大又は縮小して正規化することにより、認識器を用いた認識に合わせて、異方的な拡大又は縮小により正規化した画像を生成することができる。 As described above, according to the image normalization apparatus according to the first embodiment, the diagram pattern, which is an area representing the object reflected in the input image, is formed into the diagram pattern without the diagram pattern protruding. Enclose it in a rectangular frame so that it touches, obtain a candidate image using the rectangular frame, and normalize the candidate image by enlarging or reducing it so that the rectangular frame has the aspect ratio of the object recognized by the recognizer. By doing so, it is possible to generate a normalized image by eccentric enlargement or reduction according to the recognition using the recognizer.

＜本発明の第２の実施の形態に係る画像正規化装置の構成＞ <Structure of image normalization device according to the second embodiment of the present invention>

次に、本発明の第２の実施の形態に係る画像正規化装置の構成について説明する。なお、第１の実施の形態と同様の構成となる箇所については同一符号を付して説明を省略する。 Next, the configuration of the image normalization device according to the second embodiment of the present invention will be described. The parts having the same configuration as that of the first embodiment are designated by the same reference numerals and the description thereof will be omitted.

第２の実施の形態では、水中の魚を表す入力画像から得られる画像を、回転、拡大、又は縮小させて正規化して認識させる。 In the second embodiment, the image obtained from the input image representing the fish in the water is rotated, enlarged, or reduced to be normalized and recognized.

本発明の第２の実施の形態に係る画像正規化装置１００は、第１の実施の形態と同様に、上記図１に示すように入力部１０と、演算部２０と、出力部５０とを備えている。 Similar to the first embodiment, the image normalization device 100 according to the second embodiment of the present invention has an input unit 10, a calculation unit 20, and an output unit 50 as shown in FIG. I have.

入力部１０は、図１４に示すような、水中の魚を撮影した動画を入力画像群として受け付ける。 The input unit 10 accepts a moving image of an underwater fish as shown in FIG. 14 as an input image group.

第２の実施の形態の演算部２０は、第１の実施の形態と同様に、領域抽出部３０と、候補獲得部３２と、画像正規化部３４と、画像認識部３６と、認識器４０とを含んで構成されている。 Similar to the first embodiment, the calculation unit 20 of the second embodiment includes a region extraction unit 30, a candidate acquisition unit 32, an image normalization unit 34, an image recognition unit 36, and a recognizer 40. It is composed including and.

領域抽出部３０は、入力部１０で受け付けた入力画像群の各々について、第１の実施の形態と同様に、入力画像に写っている物体を表す領域である図パターンを、図パターンがはみ出さず、かつ、図パターンに接するよう長方形の枠によって囲み、長方形の枠で囲まれた対象領域画像を切り出す。本実施の形態では、入力画像群に写っている魚を物体とする。 In the area extraction unit 30, for each of the input image groups received by the input unit 10, the diagram pattern protrudes from the diagram pattern which is an region representing the object reflected in the input image, as in the first embodiment. The target area image surrounded by the rectangular frame is cut out by surrounding it with a rectangular frame so as to be in contact with the figure pattern. In the present embodiment, the fish shown in the input image group is used as an object.

具体的には、領域抽出部３０は、まず、魚の領域を図パターンとして、その他、背景を地パターンとして図地分離を行う。そのために、動画として撮影された入力画像群から、背景画像を計算し、魚が写っている場面の画像との差を各々の画素で計算し、その値があるしきい値以上ならば、変化があり、魚が写っている領域の候補とする。 Specifically, the area extraction unit 30 first separates the map area using the fish area as a figure pattern and the background as a ground pattern. Therefore, the background image is calculated from the input image group shot as a moving image, the difference from the image of the scene where the fish is shown is calculated for each pixel, and if the value is above a certain threshold value, it changes. There is, and it is a candidate for the area where the fish is reflected.

各々の画素で差をとる値は、輝度やカラーのＨＳＶ成分のＨ成分値だけや様々に考えられる。ここでは、例として、輝度値の場合を説明する。背景画像は、例えば、各々の画素で、輝度値のヒストグラムを動画から構成し、ヒストグラムの最頻値（モード）で構成できる。その他、ＥｉｇｅｎＢａｃｋｇｒｏｕｎｄ法（参考文献４参照）など、様々な方法で構成することが可能である。ここでは、説明のため、動くものを魚のみとし、水の流れなどの動きがない場合を想定すると比較的簡単に、きれいに魚が写っている領域の候補が取れる。この変化領域、背景領域の二値にわける処理により、背景とは異なり、変化があった領域を検出し、図１５のように、魚がいると思われる領域とその背景の領域に分離できる。 The value that takes the difference in each pixel can be considered variously, such as only the H component value of the HSV component of the luminance or color. Here, the case of the luminance value will be described as an example. In the background image, for example, a histogram of the luminance value can be composed of moving images for each pixel, and can be configured by the mode of the histogram. In addition, it can be configured by various methods such as the Eign Background method (see Reference 4). Here, for the sake of explanation, the only moving object is the fish, and assuming that there is no movement such as the flow of water, it is relatively easy to take a candidate for the area where the fish is clearly reflected. By the process of dividing into two values of the change area and the background area, unlike the background, the changed area can be detected and separated into the area where the fish seems to be present and the background area as shown in FIG.

［参考文献４］川西康友,椋木雅之,美濃導彦,「背景の時間変化に着目した固有空間中での時系列フィルタに基づく背景画像推定」,電子情報通信学会 [Reference 4] Yasutomo Kawanishi, Masayuki Hibiki, Michihiko Mino, "Background image estimation based on time-series filter in eigenspace focusing on time change of background", Institute of Electronics, Information and Communication Engineers

また、魚と思われる図パターンとされた領域の内、面積の小さいものは無視し、魚ではないと判断して処理を省略する。また、魚が写っている領域全てが、完全に図パターンになっていない場合もあり得るが、ある程度魚の形をしていればよい。魚か否かを判断するのは、その後の認識／識別器の役割であり、そこで必要な情報量を、その領域の図パターンが含んでいればよい。 In addition, among the areas with the figure pattern that seems to be fish, those with a small area are ignored, and it is judged that they are not fish and the processing is omitted. In addition, the entire area where the fish is shown may not be completely in the figure pattern, but it suffices if it has the shape of a fish to some extent. It is the role of the subsequent recognizer / discriminator to determine whether it is a fish, and the amount of information required there may be included in the diagram pattern of the area.

領域抽出部３０は、次に、魚がいると想定される領域を分離できた図パターンに対して、ラベリングをし、図パターンの数を調べる。次に、対象となる図パターンを長方形の枠によって囲み、長方形の枠で囲まれた対象領域画像を一つずつ切り出す。なお、入力画像群の入力画像の各々から、一匹の魚の図パターンについて複数の対象領域画像を切り出す場合も考えられるが、本実施の形態では、任意の入力画像から切り出した対象領域画像を候補獲得部３２に出力すればよい。 Next, the region extraction unit 30 labels the diagram patterns that can separate the regions where the fish are supposed to be present, and examines the number of the diagram patterns. Next, the target figure pattern is surrounded by a rectangular frame, and the target area images surrounded by the rectangular frame are cut out one by one. It is possible to cut out a plurality of target area images for the figure pattern of one fish from each of the input images of the input image group, but in the present embodiment, the target area images cut out from any input image are candidates. It may be output to the acquisition unit 32.

候補獲得部３２は、入力画像群の各々について、第１の実施の形態と同様に、領域抽出部３０によって切り出した対象領域画像から、長方形の枠に関する基準条件を満たす長方形の枠を用いて、候補画像を各々獲得する。 As in the first embodiment, the candidate acquisition unit 32 uses a rectangular frame that satisfies the reference condition for the rectangular frame from the target area image cut out by the area extraction unit 30 for each of the input image groups. Acquire each candidate image.

候補獲得部３２は、まず、第１の実施の形態の文字の図パターンの場合と同様に、切り出した対象領域画像の各々に対して、中心を任意にとった回転変換を加える。回転させた対象領域画像の各々に対し、閉領域の最小、最大のｘ座標と、同じく最小、最大のｙ座標とを用いて、長方形の枠を構成する各辺が垂直／水平の直線になるように、長方形の枠を構成する。例えば、面積が最も小さいという基準条件で、長方形の枠を選択し、選択された長方形の枠内と、外側の一定領域とを含む画像を、候補画像とする。図１５の長方形の枠が、選択された枠である。 First, the candidate acquisition unit 32 applies a rotation transformation with an arbitrary center to each of the cut out target region images, as in the case of the character diagram pattern of the first embodiment. For each of the rotated target area images, each side constituting the rectangular frame becomes a vertical / horizontal straight line using the minimum and maximum x-coordinates of the closed area and the same minimum and maximum y-coordinates. As such, construct a rectangular frame. For example, a rectangular frame is selected under the criterion that the area is the smallest, and an image including the inside of the selected rectangular frame and a certain area outside is set as a candidate image. The rectangular frame in FIG. 15 is the selected frame.

画像正規化部３４は、入力画像群の各々について、第１の実施の形態と同様に、候補画像の各々について、当該候補画像の長方形の枠の大きさが、認識器４０が認識する魚の縦横比となるように、当該候補画像を拡大又は縮小することにより正規化する。 In the image normalization unit 34, for each of the input image groups, the size of the rectangular frame of the candidate image for each of the candidate images is the vertical and horizontal directions of the fish recognized by the recognizer 40, as in the first embodiment. Normalize by enlarging or reducing the candidate image so that it becomes a ratio.

画像認識部３６では、入力画像群の各々について、第１の実施の形態と同様に、画像正規化部３４で正規化された候補画像の各々から、認識器４０を用いて、入力画像の魚を認識し、出力部５０に出力する。 In the image recognition unit 36, as in the first embodiment, for each of the input image groups, the fish of the input image is used from each of the candidate images normalized by the image normalization unit 34 using the recognizer 40. Is recognized and output to the output unit 50.

なお、第２の実施の形態の他の構成及び作用は、第１の実施の形態と同様となるため、詳細な説明を省略する。 Since the other configurations and operations of the second embodiment are the same as those of the first embodiment, detailed description thereof will be omitted.

なお、本実施の形態の魚の例では、魚の面が裏返った画像でも、魚と認識しなければならないという要請により、二分器認識器を用いるならば、魚のある面とその反対面用の認識器の二つを用意する必要がある。 In addition, in the example of the fish of the present embodiment, if the dichotomous device recognizer is used due to the request that even the image in which the face of the fish is turned inside out must be recognized as a fish, the recognizer for the side with the fish and the opposite side thereof. It is necessary to prepare two.

また、魚の場合は、枠ではなく、背骨を検出することにより画像を正規化できる。例えば、図１６の線分は、背骨として検出された部位である。魚領域として検出された図パターンの連結領域の輪郭線上の二点で、最も距離が長いものを記したものである。このように、同じ対象物で、枠や枠以外でも正規化可能な場合がある、枠で困難なものは、対象物の最も特徴的な部分を検出すれば、正規化が可能である。 Also, in the case of fish, the image can be normalized by detecting the spine instead of the frame. For example, the line segment in FIG. 16 is the site detected as the spine. Two points on the contour line of the connecting area of the figure pattern detected as the fish area, which have the longest distance, are described. In this way, the same object, which may be normalized in a frame or other than the frame, which is difficult in the frame, can be normalized by detecting the most characteristic part of the object.

以上、第２の実施の形態に係る画像正規化装置によれば、入力画像群の入力画像に写っている物体を表す領域である図パターンを、図パターンがはみ出さず、かつ、図パターンに接するよう長方形の枠によって囲み、長方形の枠を用いて、候補画像を獲得し、候補画像を、長方形の枠が、認識器が認識する物体の縦横比となるように拡大又は縮小して正規化することにより、認識器を用いた認識に合わせて、異方的な拡大又は縮小により正規化した画像を生成することができる。 As described above, according to the image normalization device according to the second embodiment, the diagram pattern, which is an area representing the object reflected in the input image of the input image group, is formed into the diagram pattern without the diagram pattern protruding. Enclose it in a rectangular frame so that it touches, obtain a candidate image using the rectangular frame, and normalize the candidate image by enlarging or reducing it so that the rectangular frame has the aspect ratio of the object recognized by the recognizer. By doing so, it is possible to generate a normalized image by eccentric enlargement or reduction according to the recognition using the recognizer.

なお、本発明は、上述した実施の形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made without departing from the gist of the present invention.

例えば、上述した実施の形態では、長方形の枠に関する基準条件に基づいて、対象領域画像を回転させて当てはめた複数の長方形の枠のうち、長方形の枠に関する基準条件を満たす長方形の枠を用いて、候補画像を各々獲得する場合を例に説明したが、これに限定されるものではない。例えば、物体の回転を考慮しなくてもよい場合には、対象領域画像を回転させずに当てはめた長方形の枠を用いて、候補画像を獲得すればよい。 For example, in the above-described embodiment, among a plurality of rectangular frames to which the target area image is rotated and fitted based on the reference condition for the rectangular frame, the rectangular frame that satisfies the reference condition for the rectangular frame is used. , The case where each candidate image is acquired has been described as an example, but the present invention is not limited to this. For example, when it is not necessary to consider the rotation of the object, the candidate image may be acquired by using the rectangular frame fitted without rotating the target area image.

１０入力部
２０演算部
３０領域抽出部
３２候補獲得部
３４画像正規化部
３６画像認識部
４０認識器
５０出力部
１００画像正規化装置 10 Input unit 20 Calculation unit 30 Area extraction unit 32 Candidate acquisition unit 34 Image normalization unit 36 Image recognition unit 40 Recognizer 50 Output unit 100 Image normalization device

Claims

An image normalization device that generates a normalized image for input to a recognizer from an input image.
From the input image , the two-dimensional image on the image of the three-dimensional object shown in the input image and the background area that is not the two-dimensional image of the three-dimensional object are separated by binarization and labeled. For each connected area, a region extraction unit that extracts a diagram pattern, which is an region representing a three-dimensional object shown in the input image separated from the map, and a region extraction unit.
For each of the diagram patterns by the one connecting region, the diagram pattern is surrounded by a rectangular frame so that the diagram pattern does not protrude and is in contact with the diagram pattern, and a candidate image is obtained using the rectangular frame. Candidate acquisition department to acquire and
The candidate image includes an image normalization unit that normalizes the candidate image by enlarging or reducing the rectangular frame so as to have the aspect ratio of the three-dimensional object recognized by the recognizer.
The candidate acquisition unit uses the rectangular frame that satisfies the criteria for the predetermined rectangular frame among the plurality of rectangular frames obtained by rotating either one of the figure pattern and the rectangular frame. And acquire each of the candidate images.
Image normalization device.

The image normalization apparatus according to claim 1, wherein the reference condition is that the length of the outer circumference, the length of the diagonal line, or the area of the rectangular frame is the minimum.

It is an image normalization method in an image normalization device that generates a normalized image for input to a recognizer from an input image.
The area extraction unit separates the two-dimensional image on the image of the three-dimensional object shown in the input image and the background region that is not the two-dimensional image of the three-dimensional object from the input image by binarization. , A step of extracting a diagram pattern, which is an region representing a three-dimensional object reflected in the input image separated from the map, for each connected region of the labeled pixels.
The candidate acquisition unit surrounds the figure pattern with a rectangular frame so that the figure pattern does not protrude and is in contact with the figure pattern for each of the figure patterns formed by the one connecting region, and the rectangular frame is used. And the steps to get the candidate image,
The image normalization unit includes a step of normalizing the candidate image by enlarging or reducing the rectangular frame so as to have the aspect ratio of the three-dimensional object recognized by the recognizer.
In the step of acquiring the candidate image by the candidate acquisition unit, a predetermined reference condition regarding the rectangular frame among a plurality of rectangular frames obtained by rotating either one of the figure pattern and the rectangular frame is set. Each of the candidate images is acquired using the rectangular frame that fills.
Image normalization method.

The image normalization method according to claim 3, wherein the reference condition is that the length of the outer circumference, the length of the diagonal line, or the area of the rectangular frame is the minimum.

A computer-readable recording medium on which a program for operating a computer as a component of the image normalization device according to claim 1 or 2 is recorded.