JP7393179B2

JP7393179B2 - Photography equipment

Info

Publication number: JP7393179B2
Application number: JP2019193712A
Authority: JP
Inventors: 基文福井
Original assignee: Sumitomo Heavy Industries Ltd
Current assignee: Sumitomo Heavy Industries Ltd
Priority date: 2019-10-24
Filing date: 2019-10-24
Publication date: 2023-12-06
Anticipated expiration: 2039-10-24
Also published as: JP2021069032A

Description

本発明は、狭隘部に適した撮影装置に関する。 The present invention relates to a photographing device suitable for narrow spaces.

トンネルや狭い路地、室内、狭い凹部を有する構造物、間隔が狭い２枚の壁に挟まれた空間をはじめとする狭隘部において、その周辺や内部構造（以下、被写体という）を撮影したい場合がある。狭隘部においては、カメラと被写体との距離を十分にとることができないため、カメラと被写体とを正対させると、広角なレンズを用いた場合であっても、被写体の一部分がカメラに写らない状況が生じうる。 When you want to photograph the surrounding area or internal structure (hereinafter referred to as the subject) in a narrow space such as a tunnel, a narrow alley, a room, a structure with a narrow recess, or a space between two closely spaced walls. be. In a narrow space, it is not possible to maintain a sufficient distance between the camera and the subject, so if the camera and subject are directly facing each other, even if a wide-angle lens is used, part of the subject may not be captured by the camera. Situations may arise.

特許文献８には、被写体に沿ってカメラを移動しながら、被写体を近くで斜め方向から撮影し、撮影画像から被写体を含む部分をクロップし、正面から撮影した画像に変換し、変換後の画像を、走査方向にならって順に繋げて合成する技術が開示される。この手法によれば、カメラを被写体に正対させた場合に比べて視野を広くすることができる。 Patent Document 8 discloses that while moving a camera along the subject, the subject is photographed from a close angle, the portion including the subject is cropped from the photographed image, the image is converted to an image photographed from the front, and the converted image is Disclosed is a technique for sequentially connecting and synthesizing the images in the scanning direction. According to this method, the field of view can be made wider than when the camera is directly facing the subject.

特開２００４－３１２５４９号公報Japanese Patent Application Publication No. 2004-312549 国際公開ＷＯ０８／１０２８９８号公報International Publication WO08/102898 Publication 特開２００７－３２３６１６号公報Japanese Patent Application Publication No. 2007-323616 特開２０１０－０７３０３５号公報Japanese Patent Application Publication No. 2010-073035 特開２０１２－０２２６５２号公報JP2012-022652A 特開２０１２－１０９７３７号公報Japanese Patent Application Publication No. 2012-109737 特開２０１３－２５０８９１号公報Japanese Patent Application Publication No. 2013-250891 特開２０１１－１２６９８９号公報Japanese Patent Application Publication No. 2011-126989

R. Timofte, V.D. Smet, and L. V. Gool., "A+: Adjusted Anchored Neighborhood Regression for Fast Super-Resolution. ACCV, 2014"R. Timofte, V.D. Smet, and L. V. Gool., "A+: Adjusted Anchored Neighborhood Regression for Fast Super-Resolution. ACCV, 2014" C. Dong, C. C. Loy, K. He, and X. Tang. "Image Super-Resolution Using Deep Convolutional Networks", TPAMI, 2016C. Dong, C. C. Loy, K. He, and X. Tang. "Image Super-Resolution Using Deep Convolutional Networks", TPAMI, 2016

特許文献８に記載の技術であっても、高さ方向の撮影範囲が十分でない状況が生じうる。この問題を解決するアプローチとして、カメラを高さ方向（ピッチング方向）にチルトさせながら撮影を行い、複数の画像を合成する方法が考えられる。しかしながらこの方法では、カメラをチルトさせる必要があるため、撮影装置全体が構造的に複雑となる。また、複数回の撮影が必要となるため、時間がかかる。 Even with the technique described in Patent Document 8, a situation may arise where the photographing range in the height direction is insufficient. One possible approach to solving this problem is to take pictures while tilting the camera in the height direction (pitching direction) and combine multiple images. However, this method requires the camera to be tilted, which makes the entire photographing device structurally complex. Furthermore, it is time-consuming because it requires multiple shootings.

この問題を解決する別のアプローチとして、異なる方向を向いた、あるいは高さが異なる位置に配置された複数のカメラを設けて撮影する方法が考えられる。しかしながらこの場合、複数のカメラが必要となるため、システムが複雑化し、またコストが高くなる。 Another approach to solving this problem is to use multiple cameras facing different directions or placed at different heights to take pictures. However, in this case, multiple cameras are required, which complicates the system and increases cost.

本発明は係る課題に鑑みてなされたものであり、そのある態様の例示的な目的のひとつは、狭隘部に適した撮影装置の提供にある。 The present invention has been made in view of the above problems, and one exemplary objective of a certain aspect of the present invention is to provide an imaging device suitable for narrow spaces.

本発明のある態様は、狭隘部の側方の正面視画像を生成する撮影装置に関する。撮影装置は、狭隘部を奥行き方向に移動しながら撮影するカメラと、注視部分とカメラが相対的に遠いときのカメラの画像における注視部分の像と、注視部分とカメラが相対的に近いときのカメラの画像における注視部分の像との相関から導かれた学習済みモデルを有しており、実動作中に、カメラから順次出力されるカメラ画像にもとづいて、正面視画像を生成する画像処理部と、を備える。 One aspect of the present invention relates to an imaging device that generates a lateral front-view image of a narrow area. The imaging device has a camera that takes pictures while moving in the depth direction of a narrow space, and an image of the focused area in the camera image when the focused area and the camera are relatively far, and an image of the focused area when the focused area and the camera are relatively close. An image processing unit that has a trained model derived from the correlation with the image of the gazed part in the camera image, and generates a front-view image based on the camera images sequentially output from the camera during actual operation. and.

画像処理部は、カメラ画像から所定領域をクロップし、クロップした画像を学習済みモデルに入力して高解像度画像を生成し、カメラの移動にともない順次生成される高解像度画像を合成し、正面視画像を生成してもよい。 The image processing unit crops a predetermined area from the camera image, inputs the cropped image to a trained model to generate a high-resolution image, and synthesizes the high-resolution images that are sequentially generated as the camera moves to create a front-view image. An image may also be generated.

画像処理部は、異なる時刻において撮影されたカメラ画像のペアの一方に含まれ、他方に含まれない第１領域をクロップし、クロップした画像を学習済みモデルに入力して高解像度部分画像を生成し、高解像度部分画像を、カメラ画像のペアの他方の第２領域の画像と結合して、高解像度画像を生成し、カメラの移動にともない順次生成される高解像度画像を合成し、正面視画像を生成してもよい。 The image processing unit crops a first region included in one of the pair of camera images taken at different times but not included in the other, and inputs the cropped image to the trained model to generate a high-resolution partial image. Then, the high-resolution partial image is combined with the image of the second region of the other camera image pair to generate a high-resolution image, and the high-resolution images sequentially generated as the camera moves are combined to create a front-view image. An image may also be generated.

画像処理部は、異なる時刻において撮影された複数のカメラ画像を含む画像セットにもとづいて高解像度画像を生成してもよい。 The image processing unit may generate a high-resolution image based on an image set including a plurality of camera images taken at different times.

画像セットに含まれる複数のカメラ画像それぞれについて、異なる位置に所定領域が定められており、画像処理部は、各カメラ画像から対応する所定領域をクロップし、クロップした画像を学習済みモデルに入力して高解像度中間画像を生成し、複数のカメラ画像について得られた複数の高解像度中間画像を合成して、高解像度画像を生成してもよい。 A predetermined region is defined at a different position for each of the plurality of camera images included in the image set, and the image processing unit crops the corresponding predetermined region from each camera image and inputs the cropped image to the trained model. A high-resolution intermediate image may be generated by combining a plurality of high-resolution intermediate images obtained for a plurality of camera images to generate a high-resolution image.

高解像度画像は、ベクトルデータであってもよい。 The high resolution image may be vector data.

なお、以上の構成要素の任意の組み合わせや本発明の構成要素や表現を、方法、装置、システムなどの間で相互に置換したものもまた、本発明の態様として有効である。 Note that arbitrary combinations of the above-mentioned constituent elements and mutual substitution of constituent elements and expressions of the present invention among methods, devices, systems, etc. are also effective as aspects of the present invention.

本発明によれば、狭隘部に適した撮影装置を提供できる。 According to the present invention, it is possible to provide a photographing device suitable for narrow spaces.

図１（ａ）は、実施の形態に係る撮影装置の撮影対象を説明する図であり、図１（ｂ）は、撮影装置の出力である最終画像を示す図である。FIG. 1(a) is a diagram illustrating an object to be photographed by a photographing device according to an embodiment, and FIG. 1(b) is a diagram showing a final image that is an output of the photographing device. 実施の形態に係る撮影装置のブロック図である。FIG. 1 is a block diagram of an imaging device according to an embodiment. 図３（ａ）は、カメラの移動を説明する図であり、図３（ｂ）、（ｃ）は、図３（ａ）のカメラの視点Ａ，Ｂにおいて得られる２枚の画像ＩＭＧ＿Ａ，ＩＭＧ＿Ｂを示す図である。FIG. 3(a) is a diagram explaining the movement of the camera, and FIGS. 3(b) and 3(c) are two images IMG_A and IMG_B obtained at camera viewpoints A and B in FIG. 3(a). FIG. 図４（ａ）、（ｂ）は、第１比較技術および第２比較技術において得られる正面視画像を示す図である。FIGS. 4A and 4B are diagrams showing front-view images obtained in the first comparative technique and the second comparative technique. 学習を説明する図である。It is a figure explaining learning. 図６（ａ）、（ｂ）は、超解像処理を説明する図である。FIGS. 6A and 6B are diagrams illustrating super-resolution processing. 図７（ａ）～（ｃ）は、撮影装置の動作を説明する図である。FIGS. 7A to 7C are diagrams illustrating the operation of the photographing device. 実施例に係る撮影装置のブロック図である。FIG. 1 is a block diagram of an imaging device according to an embodiment. 図９（ａ）、（ｂ）は、正面視画像合成部の処理を説明する図である。FIGS. 9A and 9B are diagrams illustrating the processing of the front-view image composition section. ディスプレイのユーザインタフェースを示す図である。FIG. 3 is a diagram showing a user interface of a display. 変形例１に係る超解像処理を説明する図である。7 is a diagram illustrating super-resolution processing according to Modification 1. FIG. 変形例２に係る超解像処理を説明する図である。FIG. 7 is a diagram illustrating super-resolution processing according to Modification 2.

以下、本発明を好適な実施の形態をもとに図面を参照しながら説明する。各図面に示される同一または同等の構成要素、部材、処理には、同一の符号を付するものとし、適宜重複した説明は省略する。また、実施の形態は、発明を限定するものではなく例示であって、実施の形態に記述されるすべての特徴やその組み合わせは、必ずしも発明の本質的なものであるとは限らない。 DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below based on preferred embodiments with reference to the drawings. Identical or equivalent components, members, and processes shown in each drawing are designated by the same reference numerals, and redundant explanations will be omitted as appropriate. Further, the embodiments are illustrative rather than limiting the invention, and all features and combinations thereof described in the embodiments are not necessarily essential to the invention.

図１（ａ）は、実施の形態に係る撮影装置１００の撮影対象を説明する図である。撮影装置１００の撮影対象は狭隘部２（この例では道路）の側方である。撮影装置１００は、狭隘部２を奥行き方向に進行しながら、両側あるいは片側を撮影する。以下では、説明の簡潔化と理解の容易化のため、撮影装置１００の左側方に着目して説明する。 FIG. 1A is a diagram illustrating an object to be photographed by the photographing apparatus 100 according to the embodiment. The object to be photographed by the photographing device 100 is the side of the narrow area 2 (in this example, a road). The photographing device 100 photographs both sides or one side of the narrow portion 2 while moving in the depth direction. In the following, for the sake of brevity and ease of understanding, the explanation will focus on the left side of the photographing device 100.

図１（ｂ）は、撮影装置１００の出力である最終画像ＩＭＧｆを示す図である。最終画像ＩＭＧｆは、狭隘部の側方（ここでは左側方）の正面視画像であり、狭隘部２の左側に存在する複数の物体ＯＢＪ１～ＯＢＪ３が含まれている。 FIG. 1(b) is a diagram showing the final image IMGf that is the output of the imaging device 100. The final image IMGf is a front view image of the side (left side in this case) of the narrow part, and includes a plurality of objects OBJ1 to OBJ3 existing on the left side of the narrow part 2.

図２は、実施の形態に係る撮影装置１００のブロック図である。撮影装置１００は、カメラ１１０および画像処理部１２０を備える。カメラ１１０は、撮影装置１００の進行方向正面に向けられている。あるいは左側方のみを測定する場合には、カメラ１１０は左側にわずかに傾けて配置してもよい。 FIG. 2 is a block diagram of the imaging device 100 according to the embodiment. The photographing device 100 includes a camera 110 and an image processing section 120. The camera 110 is directed toward the front in the direction in which the photographing device 100 moves. Alternatively, when measuring only the left side, the camera 110 may be placed slightly tilted to the left.

カメラ１１０は、狭隘部を奥行き方向に移動しながら、連続的に撮影を行う。カメラ１１０は、スチルカメラであってもよいし、ビデオカメラであってもよい。 The camera 110 continuously takes pictures while moving in the depth direction of the narrow part. Camera 110 may be a still camera or a video camera.

図３（ａ）は、カメラの移動を説明する図である。図３（ａ）には、狭隘部２を横から見た様子が示される。図３（ｂ）、（ｃ）は、図３（ａ）のカメラの視点Ａ，Ｂにおいて得られる２枚のカメラ画像ＩＭＧ＿Ａ，ＩＭＧ＿Ｂを示す図である。 FIG. 3(a) is a diagram illustrating movement of the camera. FIG. 3(a) shows the narrow portion 2 viewed from the side. FIGS. 3(b) and 3(c) are diagrams showing two camera images IMG_A and IMG_B obtained at camera viewpoints A and B in FIG. 3(a).

中央の物体ＯＢＪ２（この例ではビル）に着目する。手前側の視点Ａから写した画像ＩＭＧ＿Ａには、物体ＯＢＪ２の正面の全体が写っているが、そのサイズは小さいため、物体ＯＢＪ２の解像度は低いといえる。 Focus is on the central object OBJ2 (a building in this example). The image IMG_A taken from the viewpoint A on the near side shows the entire front of the object OBJ2, but since its size is small, it can be said that the resolution of the object OBJ2 is low.

より被写体である物体ＯＢＪ２に近い視点Ｂから写した画像ＩＭＧ＿Ｂには、物体ＯＢＪ２が、画像ＩＭＧ＿Ａよりも大きく、その細部まで高解像度で写っている。ただし物体ＯＢＪ２の正面の全体は写っていない。 In the image IMG_B taken from a viewpoint B closer to the object OBJ2, which is the subject, the object OBJ2 is larger than the image IMG_A, and its details are shown in high resolution. However, the entire front of object OBJ2 is not captured.

実施の形態に係る撮影装置１００の画像処理を説明する前に、いくつかの比較技術を説明する。 Before explaining the image processing of the imaging device 100 according to the embodiment, some comparative techniques will be explained.

第１比較技術では、被写体（注視部分）に近い位置で撮影した画像から、当該被写体を含む領域（図３（ｂ）、（ｃ）の領域ＲＧＮ１）をクロップして、それを正面視変換して合成する。図４（ａ）は、第１比較技術において得られる正面視画像を示す図である。第１比較技術では、高解像度な正面視画像を得ることができるが、被写体の上側が欠けることとなる。 In the first comparison technique, from an image taken at a position close to the subject (attention point), a region including the subject (region RGN1 in FIGS. 3(b) and 3(c)) is cropped and converted into a front view. and synthesize. FIG. 4(a) is a diagram showing a front view image obtained in the first comparison technique. With the first comparison technique, a high-resolution front-view image can be obtained, but the upper side of the subject is missing.

第２比較技術では、被写体から遠い位置で撮影した画像から、被写体の全体が写っている領域（図３（ｂ），（ｃ）の領域ＲＧＮ２）をクロップして、それを正面視変換して、合成する。図４（ｂ）は、第２比較技術において得られる正面視画像を示す図である。第２比較技術では、被写体の正面の全体を観察できるが、正面視変換により画像が引き伸ばされるため解像度が低下し、ピンボケのようにエッジや輪郭が滲んで見える。 In the second comparison technique, an area where the entire object is captured (region RGN2 in FIGS. 3(b) and (c)) is cropped from an image taken at a far position from the object, and it is converted to a front view. , synthesize. FIG. 4(b) is a diagram showing a front view image obtained in the second comparison technique. With the second comparison technique, the entire front of the subject can be observed, but the image is stretched by front view conversion, so the resolution decreases and edges and contours appear blurred, as if out of focus.

図２に戻り、本実施の形態における画像処理について説明する。画像処理部１２０は、カメラ１１０の出力にもとづいて最終画像ＩＭＧｆを生成する。画像処理部１２０は、ＣＰＵ（Central Processing Unit）などの演算処理手段と、ソフトウェアプログラムの組み合わせで実装することができる。 Returning to FIG. 2, image processing in this embodiment will be described. The image processing unit 120 generates a final image IMGf based on the output of the camera 110. The image processing unit 120 can be implemented by a combination of arithmetic processing means such as a CPU (Central Processing Unit) and a software program.

画像処理部１２０には、機械学習によって予め生成された学習済みモデル１２２が実装される。学習済みモデル１２２の機械学習について説明する。学習済みモデル１２２は、画像の超解像処理に利用される。 The image processing unit 120 is equipped with a trained model 122 that is generated in advance by machine learning. Machine learning of the trained model 122 will be explained. The trained model 122 is used for super-resolution processing of images.

超解像処理は、学習データを必要としない古典的な手法（たとえばbilinear法、Lanczos法）などを用いてもよいが、その場合、精度が問題となる。そこで本実施の形態では、非特許文献１に開示されるＡ＋法や、非特許文献２に開示されるＳＲＣＮＮ法などの、学習ベースの手法を採用するものとする。Ａ＋法は、学習データから、スパース辞書と呼ばれる学習モデルを構築する手法である。ＳＲＣＮＮ法は、深層学習を使った手法である。学習画像としては、汎用コーパスを用いてもよいし、独自に用意してよい。学習済みモデル１２２の学習には、低解像度画像と高解像度画像のペアが必要となるが、しばしば用いられるのが、高解像度画像とそれを上述の古典的手法を用いて低解像度化した画像である。低解像度画像に対応する高解像度画像の拡大率は事前に定めた値（仮にｒとする）を用いる。 Super-resolution processing may use classical methods that do not require training data (eg, bilinear method, Lanczos method), but in that case, accuracy becomes a problem. Therefore, in this embodiment, a learning-based method such as the A+ method disclosed in Non-Patent Document 1 and the SRCNN method disclosed in Non-Patent Document 2 is adopted. The A+ method is a method of constructing a learning model called a sparse dictionary from learning data. The SRCNN method is a method using deep learning. As the learning images, a general-purpose corpus may be used, or one may be prepared independently. A pair of low-resolution images and high-resolution images is required for training the trained model 122, but often a high-resolution image and an image obtained by lowering the resolution using the classical method described above are used. be. A predetermined value (temporarily assumed to be r) is used as the enlargement ratio of the high resolution image corresponding to the low resolution image.

一方、学習データとして、撮影装置１００が使用される環境、あるいはそれと近い環境で入手した画像を用いることもできる。図５は、学習を説明する図である。学習段階において、カメラ１１０を移動させながら複数のサンプル画像ＳＩを撮影する。このサンプル画像ＳＩが、学習データとして使用される。狭隘部に存在するある物体あるいは部分（注視部分という）は、異なる位置から撮影された複数のサンプル画像に、異なるサイズで異なる位置に写る。 On the other hand, images obtained in the environment in which the photographing device 100 is used or in an environment similar thereto can also be used as the learning data. FIG. 5 is a diagram explaining learning. In the learning stage, a plurality of sample images SI are taken while moving the camera 110. This sample image SI is used as learning data. A certain object or part (referred to as a gaze part) existing in a narrow area appears in different sizes and at different positions in a plurality of sample images taken from different positions.

図５において、Ｘ_＃とＹ_＃（＃＝１，２，…）は、同じ注視部分を含む領域を表している。ある注視部分に着目したときに、注視部分とカメラが相対的に遠いときのサンプル画像を第１画像と称する。また同じ注視部分とカメラが相対的に遠いときのサンプル画像を第２画像と称する。学習済みモデルは、第１画像における注視部分の像Ｘ_＃と、第２画像における注視部分の像Ｙ_＃の相関にもとづいて導かれる。像Ｘ_＃は、注視部分の広範囲かつ低解像度な画像であり、像Ｙ_＃は、注視部分の狭範囲かつ、高解像な画像である。学習済みモデル１２２は、ＸからＹへの変換処理を実行するものと把握できる。この変換処理を写像ｆで表す。
Ｙ_＃＝ｆ（Ｘ_＃）
上述のように、像Ｘ_＃、Ｙ_＃のペアは、幅が１ピクセルの一次元画像（ベクトル）であってもよい。この場合、像Ｙ_＃の高さ（ピクセル数）は、像Ｘ_＃の高さ（ピクセル数）のｒ倍である。 In FIG. 5, X _# and Y _# (#=1, 2, . . . ) represent regions including the same gaze portion. When focusing on a certain gazing area, a sample image obtained when the gazing area and the camera are relatively far apart is referred to as a first image. Further, a sample image obtained when the same gaze point and the camera are relatively far apart is referred to as a second image. The trained model is derived based on the correlation between the image X _# of the part of interest in the first image and the image Y _# of the part of interest in the second image. The image X _# is a wide-range, low-resolution image of the gazed area, and the image Y _# is a high-resolution image of a narrow-gazing area. The learned model 122 can be understood to execute a conversion process from X to Y. This conversion process is represented by a mapping f.
Y _# =f(X _# )
As mentioned above, the pair of images X _# , Y _# may be a one-dimensional image (vector) with a width of one pixel. In this case, the height (number of pixels) of image Y _# is r times the height (number of pixels) of image X _# .

像Ｘ_＃、Ｙ_＃のペアを、幅が２ピクセル以上の二次元画像とする場合、それらの形状は異なっていてもよい。なぜならカメラの光学系によって同じ被写体までの距離や位置に応じてパースが付くため、それらの形状は異なって撮影されるからである。 When the pair of images X _# and Y _# is a two-dimensional image with a width of 2 pixels or more, their shapes may be different. This is because the camera's optical system creates a perspective depending on the distance and position to the same subject, so the shapes of the subjects are photographed differently.

複数のサンプル画像から、複数の注視部分それぞれについて、像Ｘ_＃とＹ_＃のペアを学習器に入力することにより、学習済みモデルとして、変換器Ｙ＝ｆ（Ｘ）を得ることができる。画像処理部１２０においては、超解像処理に加えて、デノイジング処理を施してもよい。 By inputting a pair of images X _# and Y _# from a plurality of sample images to a learning device for each of a plurality of gaze parts, a transformer Y=f(X) can be obtained as a trained model. The image processing unit 120 may perform denoising processing in addition to super-resolution processing.

図２に戻る。画像処理部１２０は、撮影装置１００の動作中に、カメラ１１０の出力画像から、所定領域をクロップし、所定領域に含まれる低解像度画像を、学習済みモデルである変換器に入力する。その結果、所定領域全体を超解像処理した画像（高解像度画像）が得られる。図６（ａ）、（ｂ）は、超解像処理を説明する図である。図６（ａ）は、一次元の低解像度画像ＩＭＧｌを入力する場合を示す。高解像度画像ＩＭＧｈは、カメラの位置が、注視部分に近づいたときに得られる画像を推定したものと把握でき、その高さｈ’は、低解像度画像ＩＭＧｌの高さｈのｒ倍（ｒ＞１）である。 Return to Figure 2. The image processing unit 120 crops a predetermined region from the output image of the camera 110 during operation of the photographing device 100, and inputs a low-resolution image included in the predetermined region to a converter that is a trained model. As a result, an image (high-resolution image) in which the entire predetermined area is subjected to super-resolution processing is obtained. FIGS. 6A and 6B are diagrams illustrating super-resolution processing. FIG. 6(a) shows a case where a one-dimensional low resolution image IMGl is input. The high-resolution image IMGh can be understood as an estimated image obtained when the camera position approaches the gaze area, and its height h' is r times the height h of the low-resolution image IMGl (r> 1).

図６（ｂ）は、二次元の低解像度画像を入力する場合であり、高解像度画像ＩＭＧｈの高さｈ’は、低解像度画像ＩＭＧｌの高さｈのｒ倍である。高解像度画像ＩＭＧｈの幅ｗ’は、低解像度画像ＩＭＧｌの幅ｗと等しくてもよいし、それより大きくてもよい。 FIG. 6B shows a case where a two-dimensional low resolution image is input, and the height h' of the high resolution image IMGh is r times the height h of the low resolution image IMGl. The width w' of the high resolution image IMGh may be equal to or larger than the width w of the low resolution image IMGl.

図２に戻る。画像処理部１２０は、カメラ１１０の移動にともなって順次生成される高解像度画像を結合し、正面視画像である最終画像ＩＭＧｆを生成する。 Return to Figure 2. The image processing unit 120 combines high-resolution images that are sequentially generated as the camera 110 moves, and generates a final image IMGf that is a front-view image.

以上が撮影装置１００の構成である。続いてその動作を説明する。図７（ａ）～（ｃ）は、撮影装置１００の動作を説明する図である。図７（ａ）は、ある時刻において得られたカメラ１１０の出力画像を示す。画像処理部１２０は、図７（ａ）の画像の中から、所定領域ＲＧＮｃを抽出する。この例では、所定領域ＲＧＮｃの横幅は２ピクセル以上であり、クロップされる低解像度画像は、二次元画像データであるが、所定領域ＲＧＮｃの横幅を１ピクセルとして、低解像度画像を一次元ベクトルデータとしてもよい。 The above is the configuration of the photographing device 100. Next, its operation will be explained. FIGS. 7A to 7C are diagrams illustrating the operation of the photographing device 100. FIG. 7(a) shows an output image of the camera 110 obtained at a certain time. The image processing unit 120 extracts a predetermined region RGNc from the image of FIG. 7(a). In this example, the width of the predetermined region RGNc is 2 pixels or more, and the low-resolution image to be cropped is two-dimensional image data. You can also use it as

図７（ｂ）は、領域ＲＧＮｃ内の画像データを、学習済みモデル１２２に入力して得られる高解像度画像ＩＭＧｈを示す。比較のために、図３（ｃ）の画像ＩＭＧ＿Ｂのフレームを一点鎖線で示す。この高解像度画像ＩＭＧｈは、画像ＩＭＧ＿Ｂのフレームからはみ出した部分についても、ディテールが豊富である。 FIG. 7B shows a high-resolution image IMGh obtained by inputting the image data in the region RGNc to the learned model 122. For comparison, the frame of image IMG_B in FIG. 3(c) is indicated by a chain line. This high-resolution image IMGh has rich details even in the portions that protrude from the frame of image IMG_B.

図７（ｃ）は、図７（ｂ）の高解像度画像ＩＭＧｈを、正面視変換した画像ＩＭＧｇを示す。図７（ａ）～（ｃ）の処理を、カメラの位置を変えながら、言い換えると異なる時刻において得られたカメラ画像に対して繰り返すことで、複数の正面視高解像度画像ＩＭＧｇが生成される。そしてそれらを結合することにより、図１（ｂ）の最終画像ＩＭＧｆを得ることができる。 FIG. 7(c) shows an image IMGg obtained by converting the high-resolution image IMGh of FIG. 7(b) in front view. A plurality of front-view high-resolution images IMGg are generated by repeating the processes in FIGS. 7A to 7C for camera images obtained at different times while changing the camera position. By combining them, the final image IMGf shown in FIG. 1(b) can be obtained.

続いて、撮影装置１００の具体的な構成を、実施例を参照して説明する。 Next, a specific configuration of the photographing device 100 will be described with reference to examples.

図８は、実施例に係る撮影装置２００のブロック図である。撮影装置２００は、画像撮影部２０２、入力画像決定部２０４、鮮明化処理部２０６、正面視画像合成部２０８、画像出力部２１０を備える。 FIG. 8 is a block diagram of the imaging device 200 according to the embodiment. The photographing device 200 includes an image photographing section 202, an input image determining section 204, a sharpening processing section 206, a front view image combining section 208, and an image output section 210.

入力画像決定部２０４は、画像の取得手段であり、図２のカメラ１１０に対応する。入力画像決定部２０４は、カメラと、カメラを移動させる手段、カメラの位置あるいは移動距離を取得する手段、カメラの姿勢を制御する手段などを含みうる。入力画像決定部２０４が撮影した画像は、位置情報とともに、入力画像決定部２０４に入力される。 The input image determination unit 204 is an image acquisition unit and corresponds to the camera 110 in FIG. 2. The input image determining unit 204 may include a camera, means for moving the camera, means for acquiring the position or movement distance of the camera, means for controlling the attitude of the camera, and the like. The image captured by the input image determining unit 204 is input to the input image determining unit 204 together with position information.

入力画像決定部２０４、鮮明化処理部２０６、正面視画像合成部２０８は、図２の画像処理部１２０に対応付けることができる。 The input image determination unit 204, the sharpening processing unit 206, and the front-view image synthesis unit 208 can be associated with the image processing unit 120 in FIG. 2.

入力画像決定部２０４は、画像撮影部２０２において得られた画像および位置情報を受け、入力された画像の一部分（図７（ｂ）の所定領域ＲＧＮｃに相当、以下、選択領域という）を選択し、それを出力する。選択領域は、カメラから遠くに位置する比較的広い範囲を撮影した領域であり、カメラから遠すぎて鮮明度（実効的な解像度）が低い領域である。選択領域は、線状（ベクトル）であってもよいし、幅をもった領域（画像）であってもよい。選択領域の画像データは、位置情報とともに鮮明化処理部２０６に供給される。 The input image determining unit 204 receives the image and position information obtained by the image capturing unit 202, and selects a part of the input image (corresponding to the predetermined area RGNc in FIG. 7(b), hereinafter referred to as the selected area). , print it. The selected area is an area located far from the camera and photographed over a relatively wide range, and is an area that is too far from the camera and has low clarity (effective resolution). The selected area may be linear (vector) or may be a wide area (image). The image data of the selected area is supplied to the sharpening processing unit 206 together with position information.

鮮明化処理部２０６は、入力画像決定部２０４からの選択領域を受け、鮮明度を高めた画像を出力する。「鮮明化」とは、超解像処理のように、画像を拡大する機能や、ノイズを除去する機能（デノイジング）を含む。超解像処理の場合、選択領域がベクトルデータの場合は、その要素数を定数倍したサイズのベクトルを出力とする。選択領域が幅を持つ二次元画像データの場合は、縦方向と横方向のピクセル数を定数倍したサイズの画像を出力するものとする。鮮明化処理部２０６の出力は、高解像化された画像（一次元のベクトルあるいは二次元画像）およびその位置情報を含む。高解像度画像を生成する手段としては、図２の学習済みモデル１２２を用いることができる。 The sharpening processing unit 206 receives the selected area from the input image determining unit 204 and outputs an image with increased sharpness. "Sharpening" includes a function to enlarge an image, such as super-resolution processing, and a function to remove noise (denosing). In the case of super-resolution processing, if the selected area is vector data, a vector whose size is the number of elements multiplied by a constant is output. If the selected area is two-dimensional image data with a width, an image whose size is a constant multiplication of the number of pixels in the vertical and horizontal directions is output. The output of the sharpening processing unit 206 includes a high-resolution image (one-dimensional vector or two-dimensional image) and its position information. The learned model 122 in FIG. 2 can be used as a means for generating a high-resolution image.

図９（ａ）、（ｂ）は、正面視画像合成部２０８の処理を説明する図である。正面視画像合成部２０８は、鮮明化処理部２０６から出力される高解像度画像を、それとともに出力された位置情報を利用して繋ぎ合わせて、１枚の大きな最終画像ＩＭＧｆを生成する。高解像度画像が一次元画像（ベクトル）である場合には、図９（ａ）に示すように、複数の高解像度画像Ｌ１～Ｌｎを横方向に結合すればよい。高解像度画像が二次元画像である場合には、図９（ｂ）に示すように、のりしろ部４を設けて貼り合わせてもよい。この場合、のりしろ部４での画素値は、対応する画素の平均値を用いたり、最大値を用いてもよい。この処理によって、正面視画像を得ることができる。貼り合わせ処理の結果、接合部において異様なエッジなどが生じる場合、それを滑らかにする後処理を行ってもよく、たとえばガウシアンフィルタやウィーナーフィルタなどを用いることができる。 FIGS. 9A and 9B are diagrams illustrating the processing of the front-view image composition unit 208. The front-view image synthesis unit 208 combines the high-resolution images output from the sharpening processing unit 206 using the position information output together with the high-resolution images to generate one large final image IMGf. If the high-resolution image is a one-dimensional image (vector), a plurality of high-resolution images L1 to Ln may be combined in the horizontal direction, as shown in FIG. 9(a). When the high-resolution images are two-dimensional images, as shown in FIG. 9(b), a margin 4 may be provided and the images may be pasted together. In this case, as the pixel value in the overlap portion 4, the average value of the corresponding pixels may be used, or the maximum value may be used. Through this processing, a front-view image can be obtained. If an unusual edge or the like occurs at the joint as a result of the bonding process, post-processing may be performed to smooth it, for example, a Gaussian filter or a Wiener filter may be used.

なお、貼り合わせの前処理、あるいは後処理として、正面視した画像に近づけるために、カメラの位置情報補正などの変換処理を行ってもよい。 Note that as pre-processing or post-processing for pasting, conversion processing such as camera position information correction may be performed in order to approximate the image viewed from the front.

図８に戻る画像出力部２１０は、正面視画像合成部２０８によって得られた最終画像ＩＭＧｆを出力する。画像出力部２１０は、ディスプレイを備え、最終画像ＩＭＧｆをユーザに視覚的に提示してもよい。画像出力部２１０は、拡大縮小機能を有してもよい。また、最終画像が複数存在するような場合には、それらを同時に表示する機能、それらを択一的に表示する機能を備えてもよい。最終画像のサイズが大きい場合には、スライドバーなどのＧＵＩ（Graphical User Interface）によって、表示部分を変化させるようにしてもよい。 Returning to FIG. 8, the image output unit 210 outputs the final image IMGf obtained by the front-view image synthesis unit 208. The image output unit 210 may include a display and visually present the final image IMGf to the user. The image output unit 210 may have a scaling function. Furthermore, if there are multiple final images, a function to display them simultaneously or a function to display them selectively may be provided. If the size of the final image is large, the display portion may be changed using a GUI (Graphical User Interface) such as a slide bar.

また、画像出力部２１０は、ユーザにとって関心がある部分（たとえば異常部分や損傷部分）にマーカー付して強調表示する機能を有してもよい。またユーザが関心を持つ部分の位置情報を計算し、その値を表示する機能を有してもよい。なお、画像出力部２１０は、最終画像ＩＭＧｆのデータを、記憶媒体に保存してもよい。 Further, the image output unit 210 may have a function of attaching a marker to and highlighting a portion of interest to the user (for example, an abnormal portion or a damaged portion). It may also have a function of calculating positional information of a portion of interest to the user and displaying the value. Note that the image output unit 210 may save the data of the final image IMGf in a storage medium.

図１０は、ディスプレイのユーザインタフェースを示す図である。ディスプレイ３００の領域３０２には、カメラの出力画像が表示される。この領域３０２と付随して、再生、停止、巻き戻しのボタン３０６が設けられており、領域３０２に表示する画像（あるいはフレーム）を制御できるようになっている。 FIG. 10 is a diagram showing the user interface of the display. In area 302 of display 300, an output image of the camera is displayed. Accompanying this area 302 are play, stop, and rewind buttons 306 so that the image (or frame) displayed in the area 302 can be controlled.

領域３０２には、マーキング３０８が表示可能となっている。マーキング３０８は、ユーザが指定する領域であり、マーキング３０８の表示の有無は、マーキングボタン３１０によって制御可能である。たとえば撮影装置２００が診断装置である場合には、マーキング３０８で囲まれる領域が、異常検出処理の対象となり、領域内の異常箇所や損傷箇所が検出される。 A marking 308 can be displayed in the area 302. The marking 308 is an area designated by the user, and whether or not the marking 308 is displayed can be controlled by a marking button 310. For example, when the imaging device 200 is a diagnostic device, the area surrounded by the markings 308 is the target of the abnormality detection process, and abnormal locations and damaged locations within the area are detected.

領域３１０には、画像処理によって得られた最終画像ＩＭＧｆが表示される。領域３１０は、複数のコラム３１２に分割されてもよい。たとえば、複数のコラム３１２には、最終画像ＩＭＧｆの異なる部分を選択的に表示してもよい。また、狭隘部の右側と左側の両方の正面視画像を生成する場合には、複数のコラム３１２の一方に左側面の最終画像を、他方に右側面の最終画像を表示してもよい。コラム３１２に、最終画像の全体が表示できない場合には、スライドバー３１４を表示するようにして、表示範囲をユーザが選択できるようにしてもよい。スライドバー３１４に加えて、あるいはそれに代えて、画像を拡大、縮小するためのボタンを追加してもよい。 In area 310, the final image IMGf obtained by image processing is displayed. Region 310 may be divided into multiple columns 312. For example, different portions of the final image IMGf may be selectively displayed in the plurality of columns 312. Furthermore, when generating front-view images of both the right and left sides of the narrow portion, the final image of the left side surface may be displayed on one of the plurality of columns 312, and the final image of the right side surface may be displayed on the other column. If the entire final image cannot be displayed in the column 312, a slide bar 314 may be displayed to allow the user to select the display range. In addition to or in place of the slide bar 314, buttons for enlarging or reducing the image may be added.

領域３１０にも、マーキング３１６が表示可能であり、マーキング３１６の表示部（つまり３１０）は、ボタン３２０，３２２によって拡大、縮小の制御が可能となっている。 A marking 316 can also be displayed in the area 310, and the display portion of the marking 316 (that is, 310) can be enlarged or reduced by buttons 320 and 322.

以上が撮影装置２００の構成である。撮影装置２００によれば、広い範囲が撮影された正面視画像を生成することができ、人間が目視で画像中のどこに何が存在するのかを発見することが容易となる。さらに歪みが殆どないため、どこに何が存在するかを、画像認識処理により自動的に見つけることが容易となる。したがって、撮影装置２００の出力を利用した、目視処理や自動画像認識処理の精度を向上できる。 The above is the configuration of the photographing device 200. According to the photographing device 200, it is possible to generate a front-view image in which a wide range is photographed, and it becomes easy for a person to visually discover what exists where in the image. Furthermore, since there is almost no distortion, it becomes easy to automatically find out what exists where by image recognition processing. Therefore, the accuracy of visual processing and automatic image recognition processing using the output of the photographing device 200 can be improved.

撮影装置１００や撮影装置２００の用途は特に限定されないが、トンネルや土管、コークス炉など、測定対象が細長く、カメラを壁面に正対させることができない場合、設置することが困難な場合に広く適用できる。 The use of the photographing device 100 and the photographing device 200 is not particularly limited, but it is widely applicable to tunnels, clay pipes, coke ovens, etc. where the measurement target is long and narrow and the camera cannot be directly facing the wall, or where it is difficult to install. can.

以上、本発明を実施例にもとづいて説明した。本発明は上記実施形態に限定されず、種々の設計変更が可能であり、様々な変形例が可能であること、またそうした変形例も本発明の範囲にあることは、当業者に理解されるところである。以下、こうした変形例を説明する。 The present invention has been described above based on examples. It will be understood by those skilled in the art that the present invention is not limited to the above embodiments, and that various design changes and modifications are possible, and that such modifications also fall within the scope of the present invention. By the way. Hereinafter, such modified examples will be explained.

（変形例１）
これまでの超解像処理では、ひとつの注視部分の高解像度画像を、１枚の画像を利用して生成したが、変形例１では、異なる距離で撮影された複数のカメラ画像のセットを利用して、ひとつの注視部分の高解像度画像を生成する。図１１は、変形例１に係る超解像処理を説明する図である。図１１には、異なる時刻において得られる、言い換えると、同じ注視部分を異なる距離から撮影した複数（この例では３枚）のカメラ画像が示される。複数のカメラ画像それぞれに対して、同じ注視部分を包含するように、固有の領域が定められている。各カメラ画像から、固有の領域の画像をクロップし、３枚の低解像度画像ＩＭＧｌ１～ＩＭＧｌ３が生成される。３枚の低解像度画像ＩＭＧｌ１～ＩＭＧｌ３に超解像処理を施すことにより、３枚の高解像度中間画像ＩＭＧｈ１～ＩＭＧｈ３が得られる。３枚の高解像度中間画像ＩＭＧｈ１～ＩＭＧｈ３を合成し、１枚の高解像度画像ＩＭＧｈが生成される。 (Modification 1)
In previous super-resolution processing, a high-resolution image of a single gaze area was generated using a single image, but in modification example 1, a set of multiple camera images taken at different distances was used. Then, a high-resolution image of a single gaze area is generated. FIG. 11 is a diagram illustrating super-resolution processing according to Modification 1. FIG. 11 shows a plurality of (three in this example) camera images obtained at different times, in other words, photographing the same gazed area from different distances. A unique region is defined for each of the plurality of camera images so as to include the same gazed portion. From each camera image, an image of a unique region is cropped to generate three low-resolution images IMGl1 to IMGl3. By performing super-resolution processing on the three low-resolution images IMGl1-IMGl3, three high-resolution intermediate images IMGh1-IMGh3 are obtained. Three high-resolution intermediate images IMGh1 to IMGh3 are combined to generate one high-resolution image IMGh.

（変形例２）
図１２は、変形例２に係る超解像処理を説明する図である。図６（ａ）、（ｂ）の超解像処理では、クロップする領域が、注視部分の全体を含むように定められていた。これに対して図１２の変形例では、注視部分の一部をクロップして、超解像処理を施す。図１２の左側は、注視部分４００から遠いときに得られる画像を、図１２の右側は、注視部分４００に近いときに得られる画像を示す。図１２の画像中、符号４０２はカメラが注視部分に近づいたときに、フレームアウトする部分を表す。この変形例では、この部分４０２をクロップし、クロップした画像に超解像処理を施す。その結果、カメラが注視部分に近接したときにフレームアウトする部分の高解像度な画像部分（高解像度部分画像）４０４が得られる。また、カメラが注視部分に近接したときに得られる画像にフレームインしている注視部分の範囲４０６が選択され、２つの部分４０４と４０６を結合することにより、高解像度画像ＩＭＧｈを生成できる。 (Modification 2)
FIG. 12 is a diagram illustrating super-resolution processing according to Modification 2. In the super-resolution processing shown in FIGS. 6A and 6B, the region to be cropped was determined to include the entire gaze portion. On the other hand, in the modified example shown in FIG. 12, a part of the gazed area is cropped and super-resolution processing is performed. The left side of FIG. 12 shows an image obtained when the object is far from the gaze area 400, and the right side of FIG. 12 shows an image that is obtained when the image is close to the gaze area 400. In the image of FIG. 12, reference numeral 402 represents a portion that goes out of frame when the camera approaches the focused portion. In this modification, this portion 402 is cropped, and the cropped image is subjected to super-resolution processing. As a result, a high-resolution image portion (high-resolution partial image) 404 of a portion that goes out of frame when the camera approaches the focused portion is obtained. Furthermore, a range 406 of the gazed portion that is framed in the image obtained when the camera approaches the gazed portion is selected, and by combining the two portions 404 and 406, a high-resolution image IMGh can be generated.

２狭隘部
１００撮影装置
１１０カメラ
１２０画像処理部
１２２学習済みモデル
２００撮影装置
２０２画像撮影部
２０４入力画像決定部
２０６鮮明化処理部
２０８正面視画像合成部
２１０画像出力部 2 Narrow area 100 Photographing device 110 Camera 120 Image processing section 122 Learned model 200 Photographing device 202 Image photographing section 204 Input image determining section 206 Sharpening processing section 208 Front-view image composition section 210 Image output section

Claims

An imaging device that generates a lateral front view image of a narrow space,
a camera that photographs the narrow area while moving in the depth direction;
an image processing unit that has a converter based on the trained model derived in the learning stage and generates the front-view image based on camera images sequentially output from the camera during actual operation;
Equipped with
In the trained model, a gazed area, which is an object or part existing in a narrow area, is captured in a predetermined first part of a plurality of sample images taken while moving the camera in the depth direction during the learning stage. Using a pair of a first image and a second image photographed on the back side of the first image in which the gazed area is reflected in a predetermined second part, the gazed area of the first image is is derived from the correlation between the image X of configured,
The image processing unit crops a predetermined region from the camera image during the actual operation, inputs the cropped image to the converter, and generates a high-resolution image in which the predetermined region is subjected to high-resolution processing in the height direction. The photographing device is characterized in that the plurality of high-resolution images generated as the camera moves are combined to generate the front-view image .

The predetermined area includes the first portion and is longer than the first portion in the height direction,
The photographing device according to claim 1 , wherein the image processing unit laterally combines the high-resolution images that are sequentially generated as the camera moves to generate the front-view image .

The predetermined area is adjacent to the upper side of the first portion and is defined in correspondence with a portion of the second image where the gazed portion goes out of frame in the height direction,
The image processing unit includes:
clipping the predetermined area from one of the camera images taken while the camera is moving and inputting it to the converter to generate a high-resolution first image portion;
generating a second image portion by selecting a range of the gazed portion from another camera image in which the gazed portion is captured in the second portion among camera images taken while the camera is moving;
generating the high-resolution image by connecting the first image portion and the second image portion in the height direction;
The imaging device according to claim 1, wherein the front-view image is generated by combining a plurality of the high-resolution images generated as the camera moves.

The photographing device according to claim 1, wherein the image processing unit generates a high-resolution image based on an image set including a plurality of the camera images taken at different times.

For each of the plurality of camera images included in the image set, predetermined areas are defined at different positions in which the same object should be captured ,
The image processing unit crops a corresponding predetermined region from each camera image, inputs the cropped image to the trained model, and generates a high-resolution intermediate image;
The photographing device according to claim 4, wherein the high-resolution image is generated by combining a plurality of high-resolution intermediate images obtained for the plurality of camera images.

6. The photographing device according to claim 2, wherein the high-resolution image is vector data.