WO2020022329A1 - Object detection/recognition device, method, and program - Google Patents
Object detection/recognition device, method, and program Download PDFInfo
- Publication number
- WO2020022329A1 WO2020022329A1 PCT/JP2019/028838 JP2019028838W WO2020022329A1 WO 2020022329 A1 WO2020022329 A1 WO 2020022329A1 JP 2019028838 W JP2019028838 W JP 2019028838W WO 2020022329 A1 WO2020022329 A1 WO 2020022329A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- recognition
- category
- detection
- unit
- candidate region
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
Definitions
- the present invention relates to an object detection / recognition apparatus, method, and program, and more particularly to an object detection / recognition apparatus, method, and program for detecting and recognizing an object in an image.
- ⁇ There is a technology that detects an object from a video or an image and recognizes a category (class) of the detected object. This technique is used to analyze and understand the scene contents of video and images.
- an object candidate area representing a subject object or person
- the feature amount of the object is obtained in the object candidate region, and the category is recognized using the feature amount.
- detection and recognition of an object in a video or image are realized by integrating recognition results of individual object candidate regions. For example, object detection and recognition detection by deep learning include the following methods.
- the input image is divided into S * S cell regions, and P bounding boxes having different widths and lengths are determined in advance for each region.
- a model of a neural network such as CNN (Convolutional Neural Network) (for example, VGG (Visual Geometry Group)
- the probability that an object in the S * S region belongs to a certain category and the P corresponding to the cell By simultaneously deriving a bounding box and a bounding box representing a region of a true object having a high degree of reliability (the reliability index is determined by the length, height, and coordinates of the bounding box), the object Detection and recognition can be realized.
- images and videos shot in a real environment often have a complicated background or a small-sized object.
- an image of a forest or a mountain as shown in FIG. 4, or a small road sign in a flying drone or a road scene is an object to be often detected and recognized.
- Non-Patent Document 1 divides the entire image into fixed-size candidate regions, and estimates the type of the object and the region for each divided region. Therefore, when a plurality of objects appear in the object candidate area or when there are small objects, the information amount (feature amount) for expressing those objects in the output feature map as the CNN layer becomes deeper , The accuracy of detection and recognition is reduced.
- the present invention has been made to solve the above problems, and has as its object to provide an object detection / recognition apparatus, method, and program capable of accurately detecting and recognizing a small-sized object.
- an object detection / recognition device includes a separation unit configured to acquire an image to be detected and to separate a background included in the image from a foreground in which the object is captured. And, among the regions representing the separated foreground, an object candidate region extraction unit that extracts a region equal to or smaller than a predetermined size as an object candidate region, and, based on the extracted object candidate region, An input image generation unit that generates a plurality of input images corresponding to the periphery of the object candidate region, and detects each of the generated plurality of input images to detect a previously learned object and recognize a category of the object.
- CNN Convolutional Neural Network
- the image is acquired for each scene of a video that is a target of object detection, and for each image of the scene, the separation unit, the object candidate region extraction unit, Each process of the input image generation unit and the detection recognition unit may be performed.
- the input image generation unit performs upsampling on the extracted object candidate region and a plurality of regions obtained from the periphery of the object candidate region, thereby performing the sampling on the plurality of regions.
- An input image may be generated.
- the integration of the recognition results of the detection / recognition unit is obtained from the reliability of each of the categories calculated in the recognition of the category for each of the input images.
- the category having the highest reliability may be obtained by using the maximum value or the average value of the reliability for each category, and the integrated recognition result may be obtained.
- the object detection / recognition device further includes a learning unit that learns the CNN by using the object detected from the image by the detection / recognition unit and the category of the object.
- the detection and the recognition may be performed by the unit using the learned CNN.
- An object detection / recognition method includes a step in which a separation unit acquires an image to be detected and separates a background included in the image from a foreground in which the object is captured; An extracting unit for extracting, as an object candidate region, an area having a size equal to or less than a predetermined size from the area representing the separated foreground; and an input image generating unit, based on the extracted object candidate area, Generating a plurality of input images corresponding to the periphery of the candidate region and the object candidate region; anda detection / recognition unit detects each of the generated plurality of input images by detecting a previously learned object and detecting the object.
- a program according to a third invention is a program for causing a computer to function as each unit of the object detection and recognition device according to the first invention.
- an image to be detected is acquired, a background included in the image is separated from a foreground in which the object is captured, and the separated foreground is represented.
- a region equal to or smaller than a predetermined size is extracted as an object candidate region, and a plurality of input images corresponding to the object candidate region and the periphery of the object candidate region are generated based on the extracted object candidate region.
- Each of the plurality of input images is input to a CNN for detecting a previously learned object and recognizing a category of the object, and detecting a position of an object included in the input image for each of the input images, Recognizing the category of the object detected for each of the input images, and integrating the recognition results of the category of the object based on the recognition result of the category of each object of the input image, the size Can accurately detect and recognize even for small objects, effect is obtained that.
- FIG. 1 is a block diagram illustrating a configuration of an object detection and recognition device according to an embodiment of the present invention.
- FIG. 9 is a diagram illustrating an example of a case where a plurality of regions are generated from an object candidate region.
- 5 is a flowchart illustrating an object detection and recognition processing routine in the object detection and recognition device according to the embodiment of the present invention.
- FIG. 3 is a diagram illustrating an example of an image to be detected and recognized by an object.
- an outline of the embodiment of the present invention will be described.
- the background and the foreground in which the object appears in the video or the image are separated, an object candidate area representing a small-sized object is extracted from the foreground, and only the extracted object candidate area is targeted. It is considered that the technique of detecting and recognizing is effective for detecting an object having a small size.
- a separation unit that separates the background and the foreground in a video or an image, a unit that extracts an object candidate region having a certain size or less from the foreground, and a detection of an object learned in advance on a Deep Learning basis and By providing means for generating an input image to be input to the CNN for recognizing the category of the object, it is possible to accurately detect and recognize a small-sized object.
- an object detection / recognition device 100 includes a CPU, a RAM, a ROM storing a program for executing an object detection / recognition processing routine described later, and various data, And a computer including
- the object detection / recognition apparatus 100 includes a storage unit 20, an acquisition unit 22, a separation unit 30, an object candidate region extraction unit 32, an input image generation unit 34, It is configured to include a recognition unit 36 and a learning unit 38.
- the storage unit 20 stores an image to be detected and recognized by an object. Upon receiving the processing instruction from the processing target acquisition unit 22, the storage unit 20 outputs a video to the processing target acquisition unit 22. In addition, the detection result and the recognition result obtained by the detection / recognition unit 36 are stored in the storage unit 20. It should be noted that an image, not a video, is stored in the storage unit 20, and the separation unit 30, the object candidate region extraction unit 32, the input image generation unit 34, and the detection and recognition unit 36 perform object detection and recognition processing for each image. Is also good.
- the acquisition unit 22 outputs a detection and recognition processing instruction to the storage unit 20, obtains a video stored in the storage unit 20, and outputs the obtained video to the separation unit 30. Further, after the processing of the detection / recognition unit 36, a learning processing instruction is output to the storage unit 20, and the integrated result of the detection result of each input image and the recognition result of the category stored in the storage unit 20 is obtained. , To the learning unit 38.
- the separating unit 30 acquires an image to be detected as an object for each video scene, and separates a background included in the image from a foreground in which the object appears. First, the separation unit 30 extracts images (v 1 , v 2 ,..., V N ) from the video received from the acquisition unit 22 in frames at fixed time intervals. Next, the background and the foreground in which the object appears in the image are separated using the dynamic feature between the preceding and succeeding frames in chronological order. For the separation, for example, the processing of cv2.absdiff () of the image processing library OpenCV may be used. The separated foreground is output to the object candidate area extracting unit 32.
- the object candidate area extraction unit 32 extracts, as an object candidate area, an area having a size equal to or smaller than a predetermined size from the areas representing the foreground separated by the separation unit 30 and outputs the extracted area to the input image generation unit 34. Specifically, first, an area representing a foreground is extracted using an object area extraction method using edge information as shown in Non-Patent Document 2 below.
- Non-Patent Document 2 Edge Boxes: Locating Object Proposals from Edges, C. Lawrence Zitnick PiotrDollar, ECCV 2014
- the object candidate area extracting unit 32 calculates the size of each of the extracted areas representing the foreground. For example, the area of the bounding box of the region representing the foreground may be calculated. Then, an area having a predetermined size or less (for example, 50 * 50 pixels or less) is extracted as an object candidate area.
- a predetermined size or less for example, 50 * 50 pixels or less
- the input image generation unit 34 generates an object candidate area and a plurality of input images corresponding to the periphery of the object candidate area based on the extracted object candidate area. For example, a plurality of input images are generated by up-sampling the extracted object candidate region and a plurality of regions obtained from the periphery of the object candidate region. Specifically, as shown in FIG. 2, a box of e is generated as one object candidate area, and a region adjacent to the upper left having the same area as e is generated as an area of a. Similarly, lower left, upper right, and right are generated. A plurality of regions such as b, c, and d can be generated using the lower peripheral region.
- upsampling processing is performed on the areas a, b, c, d, and e.
- the area can be enlarged by using Nearest Neighbor (Nearest Neighbor) or bilinear interpolation (bilinear) processing in image processing.
- the upsampled areas a, b, c, d, and e are output to the detection / recognition unit 36 as an input image.
- the CNN parameters for performing the detection of the object learned on the basis of the Deep ⁇ ⁇ ⁇ ⁇ Learning and the recognition of the category of the object are tuned by using a high-resolution image or a plurality of images for training. be able to. Since it is expected that the accuracy of detection and recognition will be improved by tuning the parameters of the CNN, the input images of the areas a, b, c, and d of the surrounding areas are increased with the aim of increasing the data amount including the area e. It is created.
- FIG. 2 is one embodiment, and other methods of increasing data may be employed depending on application needs. For example, an area in which the area of e is enlarged may be generated to increase the data.
- the detection / recognition unit 36 inputs each of the plurality of input images generated by the input image generation unit 34 to a CNN for detecting a previously learned object and recognizing a category of the object, and inputs each of the input images. Detecting the position of the object included in the input image with respect to, and recognizing the category of the detected object for each of the input images, based on the recognition result of the category of each object of the input image, the recognition result of the category of the object To integrate.
- a detection and recognition method using CNN for example, Yolo described in Non-Patent Document 1 is used. In this method, for example, an object detection probability for each cell is calculated for each input image, and in the category recognition, the reliability of each category is calculated for each of a plurality of bounding boxes.
- the maximum value of the reliability of each category may be obtained from the reliability of each category calculated for each of the input images, and the category having the highest maximum value may be obtained by integrating the recognition results of the categories.
- the category having the highest average value of the reliability may be obtained by integrating the category recognition results.
- the ⁇ detection recognition unit 36 stores in the storage unit 20 an integrated result of the detection result of each object in the input image and the recognition result of the category.
- the learning unit 38 receives from the acquisition unit 22 the integrated detection result of each of the input images and the recognition result of the category stored in the storage unit 20, and uses the detection result and the recognition result to generate an object.
- the parameters of the CNN for detecting and recognizing the category of the object are tuned, and the learning result is fed back to the detection / recognition unit 36.
- a general CNN learning method such as an error back propagation method may be used.
- the detection and recognition unit 36 can detect and recognize the object using the CNN whose parameters are tuned.
- the process of the learning unit 38 is performed separately from the process of detecting and recognizing a series of objects by the acquisition unit 22, the separation unit 30, the object candidate region extraction unit 32, the input image generation unit 34, and the detection and recognition unit 36. May be performed at any timing.
- the object detection and recognition device 100 executes an object detection and recognition processing routine shown in FIG.
- step S100 the acquisition unit 22 outputs a detection and recognition processing instruction to the storage unit 20, acquires a video stored in the storage unit 20, and outputs the acquired video to the separation unit 30.
- step S102 the separating unit 30 extracts images (v 1 , v 2 ,..., V N ) to be detected from the frames at a fixed time interval for each video scene. I do.
- step S104 the separation unit 30 selects the image v i of the object.
- step S106 the separation unit 30, the image v i of the object, when using the dynamic feature quantity between adjacent frames in chronological order, to separate the foreground Utsuru backgrounds and the object in the image.
- step S108 the object candidate area extraction unit 32 extracts, as an object candidate area, an area having a size equal to or smaller than a predetermined size from the areas representing the foreground separated in step S106, and outputs the extracted area to the input image generation unit 34.
- step S110 the input image generation unit 34, the image v i of the object, based on the extracted object candidate region, generates a plurality of input images corresponding to the periphery of the object candidate region and object candidate region.
- step S112 detection and recognition unit 36, the image v i of the object, each of the plurality of input images generated in step S110, the CNN for the detection and recognition of objects categories in advance learned object Input, detecting the position of the object included in the input image for each of the input images, recognizing the category of the detected object for each of the input images, and based on the recognition result of the category of each object of the input image. Then, the recognition results of the category of the object are integrated.
- step S114 detection and recognition unit 36, the image v i of the target, and stores the detection result of each of the objects in the input image, and an integration of the recognition result of the category in the storage unit 20.
- the detection and recognition unit 36 determines whether the processing has been finished for the image v i of all subjects, if the ends finished object detection recognition processing routine, the flow returns to step S104 if not ended Te repeat the process to select the image v i of the next target.
- an image to be detected is acquired, and the background included in the image and the foreground in which the object is captured are separated.
- a region smaller than a predetermined size is extracted as an object candidate region, and based on the extracted object candidate region, a plurality of input images corresponding to the object candidate region and the periphery of the object candidate region.
- Is generated, and each of the plurality of generated input images is input to a CNN for detecting a previously learned object and recognizing a category of the object, and for each of the input images, an object included in the input image is input. Detecting the position, recognizing the category of the detected object for each of the input images, and integrating the recognition result of the category of the object based on the recognition result of the category of each object in the input image. More, it is detected and recognized precisely also small object size.
- the learning unit 38 is included in the object detection / recognition apparatus 100 has been described as an example.
- the present invention is not limited to this, and is configured as a learning apparatus separate from the object detection / recognition apparatus 100. You may make it.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The present invention makes it possible to detect and recognize even a small object with good precision. An image in which an object is to be detected is acquired; a background contained in the image and a foreground contained in the image and in which an object appears are separated; among the regions representing the separated foreground, regions equal to or smaller than a prescribed size are extracted as an object candidate regions; a plurality of input images corresponding to the object candidate regions and the periphery of the object candidate regions are generated, on the basis of the extracted object candidate regions; each of the plurality of generated input images is input to a CNN for detecting a pre-learned object and recognizing the category of the object; for each of the input images the position of an object included in the input image is detected, and the category of the object detected in each of the input images is recognized; and the recognition results for the object category are integrated on the basis of the object category recognition results for each of the input images.
Description
本発明は、物体検出認識装置、方法、及びプログラムに係り、特に、画像の物体を検出し、認識するための物体検出認識装置、方法、及びプログラムに関する。
The present invention relates to an object detection / recognition apparatus, method, and program, and more particularly to an object detection / recognition apparatus, method, and program for detecting and recognizing an object in an image.
映像や画像の中から物体を検出し、検出した物体のカテゴリ(クラス)を認識する技術がある。この技術は、映像や画像のシーン内容を解析し理解するために用いられるものである。一般的な処理の流れとしては、まず、映像や画像から被写体(物体や人物)を表す物体候補領域を抽出する。そして、物体候補領域において物体の特徴量を求め、当該特徴量を用いてカテゴリの認識を行う。また、個々の物体候補領域の認識結果を統合することにより映像や画像中の物体の検出及び認識を実現する。例えば、深層学習による物体検出と認識検出は次のような方法がある。
技術 There is a technology that detects an object from a video or an image and recognizes a category (class) of the detected object. This technique is used to analyze and understand the scene contents of video and images. As a general processing flow, first, an object candidate area representing a subject (object or person) is extracted from a video or an image. Then, the feature amount of the object is obtained in the object candidate region, and the category is recognized using the feature amount. Further, detection and recognition of an object in a video or image are realized by integrating recognition results of individual object candidate regions. For example, object detection and recognition detection by deep learning include the following methods.
まず、入力画像をS*Sのセルの領域に分割し、領域ごとに幅と長さの異なるP個のバウンディングボックスを予め決めておく。次に、入力画像をCNN(Convolutional Neural Network)などのニューラルネットワークのモデル(例えば、VGG(Visual Geometry Group))において、S*S領域内の物体があるカテゴリに属する確率と、セルに対応するP個のバウンディングボックスと、信頼度(信頼度の指標はバウンディングボックスの長さ、高さ、及び座標に応じて定まる)の高い真の物体の領域を表すバウンディングボックスとを同時に導出することで、物体の検出及び認識とを実現できる。
{Circle around (1)} First, the input image is divided into S * S cell regions, and P bounding boxes having different widths and lengths are determined in advance for each region. Next, in a model of a neural network such as CNN (Convolutional Neural Network) (for example, VGG (Visual Geometry Group)), the probability that an object in the S * S region belongs to a certain category and the P corresponding to the cell By simultaneously deriving a bounding box and a bounding box representing a region of a true object having a high degree of reliability (the reliability index is determined by the length, height, and coordinates of the bounding box), the object Detection and recognition can be realized.
もっとも、実環境で撮影された画像や映像には、複雑な背景がある場合や、サイズの小さい物体が映されている場合がよくある。例えば、図4に示すような森や山の映像や、飛んでいるドローンや道路シーンにおいて小さな道路標識はよく挙げられる物体の検出と認識の対象である。
However, images and videos shot in a real environment often have a complicated background or a small-sized object. For example, an image of a forest or a mountain as shown in FIG. 4, or a small road sign in a flying drone or a road scene is an object to be often detected and recognized.
上記の非特許文献1に示すような従来手法は画像全体を固定サイズの候補領域に分割し、それぞれの分割領域ごとに物体の種類と領域推定を行っている。そのため、物体候補領域に複数の物体が映っている場合やサイズの小さい物体があった場合、CNNの層が深くなるにつれて出力の特徴マップにおけるそれらの物体を表現するための情報量(特徴量)が少なくなるため、検出及び認識の精度が低くなる問題が生じる。
The conventional method described in Non-Patent Document 1 divides the entire image into fixed-size candidate regions, and estimates the type of the object and the region for each divided region. Therefore, when a plurality of objects appear in the object candidate area or when there are small objects, the information amount (feature amount) for expressing those objects in the output feature map as the CNN layer becomes deeper , The accuracy of detection and recognition is reduced.
本発明は、上記問題点を解決するために成されたものであり、サイズの小さな物体についても精度よく検出及び認識ができる物体検出認識装置、方法、及びプログラムを提供することを目的とする。
The present invention has been made to solve the above problems, and has as its object to provide an object detection / recognition apparatus, method, and program capable of accurately detecting and recognizing a small-sized object.
上記目的を達成するために、第1の発明に係る物体検出認識装置は、物体の検出の対象となる画像を取得し、前記画像に含まれる背景と物体が写った前景とを分離する分離部と、前記分離された前景を表す領域のうち、所定のサイズ以下の領域を物体候補領域として抽出する物体候補領域抽出部と、前記抽出された物体候補領域に基づいて、前記物体候補領域及び前記物体候補領域の周辺に対応する複数の入力画像を生成する入力画像生成部と、生成された前記複数の入力画像の各々を、予め学習された物体の検出及び前記物体のカテゴリの認識を行うためのCNN(Convolutional Neural Network)に入力して、前記入力画像の各々について前記入力画像に含まれる前記物体の位置を検出すると共に、前記入力画像の各々について検出された前記物体のカテゴリを認識し、前記入力画像の各々の前記物体のカテゴリの認識結果に基づいて、前記物体のカテゴリの前記認識結果を統合する検出認識部と、を含んで構成されている。
In order to achieve the above object, an object detection / recognition device according to a first aspect of the present invention includes a separation unit configured to acquire an image to be detected and to separate a background included in the image from a foreground in which the object is captured. And, among the regions representing the separated foreground, an object candidate region extraction unit that extracts a region equal to or smaller than a predetermined size as an object candidate region, and, based on the extracted object candidate region, An input image generation unit that generates a plurality of input images corresponding to the periphery of the object candidate region, and detects each of the generated plurality of input images to detect a previously learned object and recognize a category of the object. To the CNN (Convolutional Neural Network) to detect the position of the object included in the input image for each of the input images, and to detect the position of the object detected for each of the input images. Recognizing the body categories, based on the recognition result of the category of the object of each of the input image, and is configured to include a detection recognition unit for integrating the recognition results of the category of the object.
また、第1の発明に係る物体検出認識装置において、前記画像は、物体の検出の対象となる映像のシーンごとに取得し、前記シーンの画像ごとに、前記分離部、物体候補領域抽出部、入力画像生成部、及び検出認識部の各処理を行うようにしてもよい。
Further, in the object detection and recognition device according to the first invention, the image is acquired for each scene of a video that is a target of object detection, and for each image of the scene, the separation unit, the object candidate region extraction unit, Each process of the input image generation unit and the detection recognition unit may be performed.
また、第1の発明に係る物体検出認識装置において、前記入力画像生成部は、前記抽出された物体候補領域及び前記物体候補領域の周辺から得られる複数の領域をアップサンプリングすることにより前記複数の入力画像を生成するようにしてもよい。
In the object detection / recognition device according to the first invention, the input image generation unit performs upsampling on the extracted object candidate region and a plurality of regions obtained from the periphery of the object candidate region, thereby performing the sampling on the plurality of regions. An input image may be generated.
また、第1の発明に係る物体検出認識装置において、前記検出認識部の前記認識結果の統合は、前記入力画像の各々についてのカテゴリの認識において算出されるカテゴリの各々の信頼度から求められる、カテゴリごとの前記信頼度の最大値、又は平均値を用いて、最も前記信頼度が高いカテゴリを求め、統合した認識結果とするようにしてもよい。
In the object detection / recognition device according to the first invention, the integration of the recognition results of the detection / recognition unit is obtained from the reliability of each of the categories calculated in the recognition of the category for each of the input images. The category having the highest reliability may be obtained by using the maximum value or the average value of the reliability for each category, and the integrated recognition result may be obtained.
また、第1の発明に係る物体検出認識装置において、前記検出認識部で前記画像から検出された物体、及び前記物体のカテゴリを用いて、前記CNNを学習する学習部を更に含み、前記検出認識部により、学習した前記CNNを用いて、前記検出及び前記認識を行うようにしてもよい。
The object detection / recognition device according to the first invention further includes a learning unit that learns the CNN by using the object detected from the image by the detection / recognition unit and the category of the object. The detection and the recognition may be performed by the unit using the learned CNN.
第2の発明に係る物体検出認識方法は、分離部が、物体の検出の対象となる画像を取得し、前記画像に含まれる背景と物体が写った前景とを分離するステップと、物体候補領域抽出部が、前記分離された前景を表す領域のうち、所定のサイズ以下の領域を物体候補領域として抽出するステップと、入力画像生成部が、前記抽出された物体候補領域に基づいて、前記物体候補領域及び前記物体候補領域の周辺に対応する複数の入力画像を生成するステップと、検出認識部が、生成された前記複数の入力画像の各々を、予め学習された物体の検出及び前記物体のカテゴリの認識を行うためのCNN(Convolutional Neural Network)に入力して、前記入力画像の各々について前記入力画像に含まれる前記物体の位置を検出すると共に、前記入力画像の各々について検出された前記物体のカテゴリを認識し、前記入力画像の各々の前記物体のカテゴリの認識結果に基づいて、前記物体のカテゴリの前記認識結果を統合するステップと、を含んで実行することを特徴とする。
An object detection / recognition method according to a second aspect of the present invention includes a step in which a separation unit acquires an image to be detected and separates a background included in the image from a foreground in which the object is captured; An extracting unit for extracting, as an object candidate region, an area having a size equal to or less than a predetermined size from the area representing the separated foreground; and an input image generating unit, based on the extracted object candidate area, Generating a plurality of input images corresponding to the periphery of the candidate region and the object candidate region; anda detection / recognition unit detects each of the generated plurality of input images by detecting a previously learned object and detecting the object. Input to a CNN (Convolutional Neural Network) for performing category recognition to detect the position of the object included in the input image for each of the input images, Recognizing the category of the object detected with respect to and integrating the recognition result of the category of the object based on the recognition result of the category of the object in each of the input images. Features.
第3の発明に係るプログラムは、コンピュータを、第1の発明に記載の物体検出認識装置の各部として機能させるためのプログラムである。
プ ロ グ ラ ム A program according to a third invention is a program for causing a computer to function as each unit of the object detection and recognition device according to the first invention.
本発明の物体検出認識装置、方法、及びプログラムによれば、物体の検出の対象となる画像を取得し、画像に含まれる背景と物体が写った前景とを分離し、分離された前景を表す領域のうち、所定のサイズ以下の領域を物体候補領域として抽出し、抽出された物体候補領域に基づいて、物体候補領域及び物体候補領域の周辺に対応する複数の入力画像を生成し、生成された複数の入力画像の各々を、予め学習された物体の検出及び物体のカテゴリの認識を行うためのCNNに入力して、入力画像の各々について入力画像に含まれる物体の位置を検出すると共に、入力画像の各々について検出された物体のカテゴリを認識し、入力画像の各々の物体のカテゴリの認識結果に基づいて、物体のカテゴリの認識結果を統合することにより、サイズの小さな物体についても精度よく検出及び認識ができる、という効果が得られる。
According to the object detection / recognition apparatus, method, and program of the present invention, an image to be detected is acquired, a background included in the image is separated from a foreground in which the object is captured, and the separated foreground is represented. Among the regions, a region equal to or smaller than a predetermined size is extracted as an object candidate region, and a plurality of input images corresponding to the object candidate region and the periphery of the object candidate region are generated based on the extracted object candidate region. Each of the plurality of input images is input to a CNN for detecting a previously learned object and recognizing a category of the object, and detecting a position of an object included in the input image for each of the input images, Recognizing the category of the object detected for each of the input images, and integrating the recognition results of the category of the object based on the recognition result of the category of each object of the input image, the size Can accurately detect and recognize even for small objects, effect is obtained that.
以下、図面を参照して本発明の実施の形態を詳細に説明する。
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
<本発明の実施の形態に係る概要>
<Overview according to Embodiment of the Present Invention>
まず、本発明の実施の形態における概要を説明する。上述した課題に対し、映像や画像において背景と物体が写った前景とを分離し、前景からサイズの小さい物体を表す物体候補領域を抽出して、抽出された物体候補領域だけを対象にして物体の検出及び認識を行う手法はサイズの小さい物体検出に対して有効であると考えられる。
First, an outline of the embodiment of the present invention will be described. In order to solve the above-mentioned problem, the background and the foreground in which the object appears in the video or the image are separated, an object candidate area representing a small-sized object is extracted from the foreground, and only the extracted object candidate area is targeted. It is considered that the technique of detecting and recognizing is effective for detecting an object having a small size.
本発明の実施の形態では、映像や画像中の背景と前景とを分離する分離手段と、前景から一定サイズ以下の物体候補領域を抽出する手段と、Deep Learningベースで予め学習した物体の検出及び物体のカテゴリの認識を行うためのCNNに入力する入力画像を生成する手段とを設けることで、サイズの小さい物体を精度よく検出及び認識できるようにする。
In the embodiment of the present invention, a separation unit that separates the background and the foreground in a video or an image, a unit that extracts an object candidate region having a certain size or less from the foreground, and a detection of an object learned in advance on a Deep Learning basis and By providing means for generating an input image to be input to the CNN for recognizing the category of the object, it is possible to accurately detect and recognize a small-sized object.
<本発明の実施の形態に係る物体検出認識装置の構成>
<Configuration of Object Detection and Recognition Apparatus According to Embodiment of the Present Invention>
次に、本発明の実施の形態に係る物体検出認識装置の構成について説明する。図1に示すように、本発明の実施の形態に係る物体検出認識装置100は、CPUと、RAMと、後述する物体検出認識処理ルーチンを実行するためのプログラムや各種データを記憶したROMと、を含むコンピュータで構成することが出来る。この物体検出認識装置100は、機能的には図1に示すように、蓄積部20と、取得部22と、分離部30と、物体候補領域抽出部32と、入力画像生成部34と、検出認識部36と、学習部38とを含んで構成されている。
Next, the configuration of the object detection and recognition device according to the embodiment of the present invention will be described. As shown in FIG. 1, an object detection / recognition device 100 according to an embodiment of the present invention includes a CPU, a RAM, a ROM storing a program for executing an object detection / recognition processing routine described later, and various data, And a computer including As shown in FIG. 1, the object detection / recognition apparatus 100 includes a storage unit 20, an acquisition unit 22, a separation unit 30, an object candidate region extraction unit 32, an input image generation unit 34, It is configured to include a recognition unit 36 and a learning unit 38.
蓄積部20には、物体の検出及び認識の対象となる映像を蓄積する。蓄積部20は、処理対象の取得部22から処理指示を受け取ると、処理対象の取得部22に対して映像を出力する。また、検出認識部36で求められた検出結果及び認識結果を蓄積部20に格納する。なお、蓄積部20に映像でなく画像を蓄積し、画像ごとに分離部30、物体候補領域抽出部32、入力画像生成部34、及び検出認識部36による物体の検出及び認識の処理を行ってもよい。
(4) The storage unit 20 stores an image to be detected and recognized by an object. Upon receiving the processing instruction from the processing target acquisition unit 22, the storage unit 20 outputs a video to the processing target acquisition unit 22. In addition, the detection result and the recognition result obtained by the detection / recognition unit 36 are stored in the storage unit 20. It should be noted that an image, not a video, is stored in the storage unit 20, and the separation unit 30, the object candidate region extraction unit 32, the input image generation unit 34, and the detection and recognition unit 36 perform object detection and recognition processing for each image. Is also good.
取得部22は、蓄積部20に検出及び認識の処理指示を出力し、蓄積部20に格納された映像を取得し、取得した映像を分離部30へ出力する。また、検出認識部36の処理後に、蓄積部20に学習の処理指示を出力し、蓄積部20に格納された、入力画像の各々の検出結果、及びカテゴリの認識結果を統合したものを取得し、学習部38に出力する。
The acquisition unit 22 outputs a detection and recognition processing instruction to the storage unit 20, obtains a video stored in the storage unit 20, and outputs the obtained video to the separation unit 30. Further, after the processing of the detection / recognition unit 36, a learning processing instruction is output to the storage unit 20, and the integrated result of the detection result of each input image and the recognition result of the category stored in the storage unit 20 is obtained. , To the learning unit 38.
分離部30は、物体の検出の対象となる画像を映像のシーンごとに取得し、画像に含まれる背景と物体が写った前景とを分離する。分離部30では、まず、取得部22から受け取った映像において、一定の時間間隔のフレームで画像(v1,v2,...,vN)を抽出する。次に、時系列順に前後のフレーム間の動的な特徴量を用いて、画像中において背景と物体の写る前景とを分離する。分離には、例えば、画像処理のライブラリOpenCVのcv2.absdiff()の処理を用いればよい。分離された前景を物体候補領域抽出部32へ出力する。
The separating unit 30 acquires an image to be detected as an object for each video scene, and separates a background included in the image from a foreground in which the object appears. First, the separation unit 30 extracts images (v 1 , v 2 ,..., V N ) from the video received from the acquisition unit 22 in frames at fixed time intervals. Next, the background and the foreground in which the object appears in the image are separated using the dynamic feature between the preceding and succeeding frames in chronological order. For the separation, for example, the processing of cv2.absdiff () of the image processing library OpenCV may be used. The separated foreground is output to the object candidate area extracting unit 32.
物体候補領域抽出部32は、分離部30で分離された前景を表す領域のうち、所定のサイズ以下の領域を物体候補領域として抽出し、入力画像生成部34へ出力する。
具体的には、まず下記非特許文献2に示すようなエッジ情報を用いた物体領域抽出手法を用いて、前景を表す領域を抽出する。 The object candidatearea extraction unit 32 extracts, as an object candidate area, an area having a size equal to or smaller than a predetermined size from the areas representing the foreground separated by the separation unit 30 and outputs the extracted area to the input image generation unit 34.
Specifically, first, an area representing a foreground is extracted using an object area extraction method using edge information as shown in Non-Patent Document 2 below.
具体的には、まず下記非特許文献2に示すようなエッジ情報を用いた物体領域抽出手法を用いて、前景を表す領域を抽出する。 The object candidate
Specifically, first, an area representing a foreground is extracted using an object area extraction method using edge information as shown in Non-Patent Document 2 below.
非特許文献2:Edge Boxes: Locating Object Proposals from Edges, C.LawrenceZitnickPiotrDollar,ECCV 2014
Non-Patent Document 2: Edge Boxes: Locating Object Proposals from Edges, C. Lawrence Zitnick PiotrDollar, ECCV 2014
次に、物体候補領域抽出部32は、抽出した前景を表す領域の個々のサイズを計算する。例えば、前景を表す領域のバウンディングボックスの面積を計算すればよい。そして、所定のサイズ以下(例えば、50*50pixel以下)の領域を物体候補領域として抽出する。
Next, the object candidate area extracting unit 32 calculates the size of each of the extracted areas representing the foreground. For example, the area of the bounding box of the region representing the foreground may be calculated. Then, an area having a predetermined size or less (for example, 50 * 50 pixels or less) is extracted as an object candidate area.
入力画像生成部34は、抽出された物体候補領域に基づいて、物体候補領域及び物体候補領域の周辺に対応する複数の入力画像を生成する。例えば、抽出された物体候補領域及び物体候補領域の周辺から得られる複数の領域をアップサンプリングすることにより複数の入力画像を生成する。具体的には、図2に示すように、eのボックスを一つの物体候補領域として、eと同じ面積を持つ左上に隣接した領域をaの領域として生成する、同じく、左下、右上、及び右下の周辺領域を用いて、b,c,dのような複数の領域を生成することができる。次に、a,b,c,d,eの領域に対してアップサンプリング処理する。例えば、画像処理の最近傍補間(Nearest neighbor)やバイリニア補間(bilinear)処理を用いれば領域を拡大することができる。そして、アップサンプリングされたa,b,c,d,eの領域を入力画像として検出認識部36へ出力する。このように複数の領域を用いることで認識の精度を高められる。また、後述する学習部38において、解像度の高い画像やトレーニング用の複数の画像を用いて、Deep Learningベースで学習した物体の検出及び物体のカテゴリの認識を行うためのCNNのパラメータのチューニングをすることができる。CNNのパラメータのチューニングにより、検出及び認識の精度が高くなることが見込めるため、上記eの領域を含め、データ量を増やす目的で、周辺領域のa,b,c,dの領域の入力画像は生成されるのである。図2は一つの態様であり、応用ニーズに応じて、データを増やす他の方法を取り入れてもよい。例えば、eの領域を拡大した領域を生成してデータを増やしてもよい。
The input image generation unit 34 generates an object candidate area and a plurality of input images corresponding to the periphery of the object candidate area based on the extracted object candidate area. For example, a plurality of input images are generated by up-sampling the extracted object candidate region and a plurality of regions obtained from the periphery of the object candidate region. Specifically, as shown in FIG. 2, a box of e is generated as one object candidate area, and a region adjacent to the upper left having the same area as e is generated as an area of a. Similarly, lower left, upper right, and right are generated. A plurality of regions such as b, c, and d can be generated using the lower peripheral region. Next, upsampling processing is performed on the areas a, b, c, d, and e. For example, the area can be enlarged by using Nearest Neighbor (Nearest Neighbor) or bilinear interpolation (bilinear) processing in image processing. Then, the upsampled areas a, b, c, d, and e are output to the detection / recognition unit 36 as an input image. By using a plurality of regions in this way, the accuracy of recognition can be increased. Further, in the learning unit 38 described later, the CNN parameters for performing the detection of the object learned on the basis of the Deep カ テ ゴ リ Learning and the recognition of the category of the object are tuned by using a high-resolution image or a plurality of images for training. be able to. Since it is expected that the accuracy of detection and recognition will be improved by tuning the parameters of the CNN, the input images of the areas a, b, c, and d of the surrounding areas are increased with the aim of increasing the data amount including the area e. It is created. FIG. 2 is one embodiment, and other methods of increasing data may be employed depending on application needs. For example, an area in which the area of e is enlarged may be generated to increase the data.
検出認識部36は、入力画像生成部34で生成された複数の入力画像の各々を、予め学習された物体の検出及び物体のカテゴリの認識を行うためのCNNに入力して、入力画像の各々について入力画像に含まれる物体の位置を検出すると共に、入力画像の各々について検出された物体のカテゴリを認識し、入力画像の各々の物体のカテゴリの認識結果に基づいて、物体のカテゴリの認識結果を統合する。CNNを用いた検出及び認識の手法としては、例えば非特許文献1に記載のYoloを用いる。同手法では、例えば、入力画像の各々について、セルごとの物体検出確率が算出されると共に、カテゴリの認識において、複数のバウンディングボックスの各々に対して、カテゴリの各々の信頼度が算出される。そこで、入力画像の各々について算出したカテゴリの各々の信頼度から、カテゴリごとの信頼度の最大値を求め、最大値が最も高いカテゴリを、カテゴリの認識結果を統合したものとすればよい。もしくは、入力画像の各々についてのカテゴリごとの信頼度の平均値を用いて、信頼度の平均値が最も高いカテゴリを、カテゴリの認識結果を統合したものとしてもよい。
The detection / recognition unit 36 inputs each of the plurality of input images generated by the input image generation unit 34 to a CNN for detecting a previously learned object and recognizing a category of the object, and inputs each of the input images. Detecting the position of the object included in the input image with respect to, and recognizing the category of the detected object for each of the input images, based on the recognition result of the category of each object of the input image, the recognition result of the category of the object To integrate. As a detection and recognition method using CNN, for example, Yolo described in Non-Patent Document 1 is used. In this method, for example, an object detection probability for each cell is calculated for each input image, and in the category recognition, the reliability of each category is calculated for each of a plurality of bounding boxes. Therefore, the maximum value of the reliability of each category may be obtained from the reliability of each category calculated for each of the input images, and the category having the highest maximum value may be obtained by integrating the recognition results of the categories. Alternatively, using the average value of the reliability of each category of each of the input images, the category having the highest average value of the reliability may be obtained by integrating the category recognition results.
検出認識部36は、上記の入力画像の各々の物体の検出結果、及びカテゴリの認識結果を統合したものを蓄積部20に格納する。
The 認識 detection recognition unit 36 stores in the storage unit 20 an integrated result of the detection result of each object in the input image and the recognition result of the category.
学習部38は、取得部22から、蓄積部20に格納された、入力画像の各々の検出結果、及びカテゴリの認識結果を統合したものを受け取り、検出結果、及び認識結果を用いて、物体の検出及び物体のカテゴリの認識を行うためのCNNのパラメータをチューニングし、学習結果を検出認識部36にフィードバックする。学習は誤差逆伝播法などの一般的なCNNの学習手法を用いればよい。学習部38の学習により、検出認識部36では、パラメータがチューニングされたCNNを用いて物体の検出及び認識をすることができる。
The learning unit 38 receives from the acquisition unit 22 the integrated detection result of each of the input images and the recognition result of the category stored in the storage unit 20, and uses the detection result and the recognition result to generate an object. The parameters of the CNN for detecting and recognizing the category of the object are tuned, and the learning result is fed back to the detection / recognition unit 36. For learning, a general CNN learning method such as an error back propagation method may be used. By the learning of the learning unit 38, the detection and recognition unit 36 can detect and recognize the object using the CNN whose parameters are tuned.
なお、学習部38の処理については、取得部22、分離部30、物体候補領域抽出部32、入力画像生成部34、及び検出認識部36による一連の物体の検出及び認識の処理とは別個に、任意のタイミングで行えばよい。
The process of the learning unit 38 is performed separately from the process of detecting and recognizing a series of objects by the acquisition unit 22, the separation unit 30, the object candidate region extraction unit 32, the input image generation unit 34, and the detection and recognition unit 36. May be performed at any timing.
<本発明の実施の形態に係る物体検出認識装置の作用>
<Operation of Object Detection and Recognition Apparatus According to Embodiment of the Present Invention>
次に、本発明の実施の形態に係る物体検出認識装置100の物体の検出及び認識に関する作用について説明する。物体検出認識装置100は、図3に示す物体検出認識処理ルーチンを実行する。
Next, the operation of the object detection and recognition device 100 according to the embodiment of the present invention relating to the detection and recognition of an object will be described. The object detection and recognition device 100 executes an object detection and recognition processing routine shown in FIG.
まず、ステップS100では、取得部22は、蓄積部20に検出及び認識の処理指示を出力し、蓄積部20に格納された映像を取得し、取得した映像を分離部30へ出力する。
First, in step S100, the acquisition unit 22 outputs a detection and recognition processing instruction to the storage unit 20, acquires a video stored in the storage unit 20, and outputs the acquired video to the separation unit 30.
次に、ステップS102では、分離部30は、映像のシーンごとに、一定の時間間隔のフレームから、物体の検出の対象となる画像(v1,v2,...,vN)を抽出する。
Next, in step S102, the separating unit 30 extracts images (v 1 , v 2 ,..., V N ) to be detected from the frames at a fixed time interval for each video scene. I do.
ステップS104では、分離部30は、対象の画像viを選択する。
In step S104, the separation unit 30 selects the image v i of the object.
ステップS106では、分離部30は、対象の画像viについて、時系列順に前後のフレーム間の動的な特徴量を用いて、画像中において背景と物体の写る前景とを分離する。
In step S106, the separation unit 30, the image v i of the object, when using the dynamic feature quantity between adjacent frames in chronological order, to separate the foreground Utsuru backgrounds and the object in the image.
ステップS108では、物体候補領域抽出部32は、ステップS106で分離された前景を表す領域のうち、所定のサイズ以下の領域を物体候補領域として抽出し、入力画像生成部34へ出力する。
In step S108, the object candidate area extraction unit 32 extracts, as an object candidate area, an area having a size equal to or smaller than a predetermined size from the areas representing the foreground separated in step S106, and outputs the extracted area to the input image generation unit 34.
ステップS110では、入力画像生成部34は、対象の画像viについて、抽出された物体候補領域に基づいて、物体候補領域及び物体候補領域の周辺に対応する複数の入力画像を生成する。
In step S110, the input image generation unit 34, the image v i of the object, based on the extracted object candidate region, generates a plurality of input images corresponding to the periphery of the object candidate region and object candidate region.
ステップS112では、検出認識部36は、対象の画像viについて、ステップS110で生成された複数の入力画像の各々を、予め学習された物体の検出及び物体のカテゴリの認識を行うためのCNNに入力して、入力画像の各々について入力画像に含まれる物体の位置を検出すると共に、入力画像の各々について検出された物体のカテゴリを認識し、入力画像の各々の物体のカテゴリの認識結果に基づいて、物体のカテゴリの認識結果を統合する。
In step S112, detection and recognition unit 36, the image v i of the object, each of the plurality of input images generated in step S110, the CNN for the detection and recognition of objects categories in advance learned object Input, detecting the position of the object included in the input image for each of the input images, recognizing the category of the detected object for each of the input images, and based on the recognition result of the category of each object of the input image. Then, the recognition results of the category of the object are integrated.
ステップS114では、検出認識部36は、対象の画像viについて、入力画像の各々の物体の検出結果、及びカテゴリの認識結果を統合したものを蓄積部20に格納する。
In step S114, detection and recognition unit 36, the image v i of the target, and stores the detection result of each of the objects in the input image, and an integration of the recognition result of the category in the storage unit 20.
ステップS116では、検出認識部36は、全ての対象の画像viについて処理を終了したかを判定し、終了していれば物体検出認識処理ルーチンを終了し、終了していなければステップS104に戻って次の対象の画像viを選択して処理を繰り返す。
At step S116, the detection and recognition unit 36 determines whether the processing has been finished for the image v i of all subjects, if the ends finished object detection recognition processing routine, the flow returns to step S104 if not ended Te repeat the process to select the image v i of the next target.
以上説明したように、本発明の実施の形態に係る物体検出認識装置によれば、物体の検出の対象となる画像を取得し、画像に含まれる背景と物体が写った前景とを分離し、分離された前景を表す領域のうち、所定のサイズ以下の領域を物体候補領域として抽出し、抽出された物体候補領域に基づいて、物体候補領域及び物体候補領域の周辺に対応する複数の入力画像を生成し、生成された複数の入力画像の各々を、予め学習された物体の検出及び物体のカテゴリの認識を行うためのCNNに入力して、入力画像の各々について入力画像に含まれる物体の位置を検出すると共に、入力画像の各々について検出された物体のカテゴリを認識し、入力画像の各々の物体のカテゴリの認識結果に基づいて、物体のカテゴリの認識結果を統合することにより、サイズの小さな物体についても精度よく検出及び認識ができる。
As described above, according to the object detection and recognition device according to the embodiment of the present invention, an image to be detected is acquired, and the background included in the image and the foreground in which the object is captured are separated. Among the regions representing the separated foreground, a region smaller than a predetermined size is extracted as an object candidate region, and based on the extracted object candidate region, a plurality of input images corresponding to the object candidate region and the periphery of the object candidate region. Is generated, and each of the plurality of generated input images is input to a CNN for detecting a previously learned object and recognizing a category of the object, and for each of the input images, an object included in the input image is input. Detecting the position, recognizing the category of the detected object for each of the input images, and integrating the recognition result of the category of the object based on the recognition result of the category of each object in the input image. More, it is detected and recognized precisely also small object size.
なお、本発明は、上述した実施の形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。
The present invention is not limited to the above-described embodiment, and various modifications and applications can be made without departing from the spirit of the present invention.
例えば、上述した実施の形態では、学習部38を物体検出認識装置100に含める場合を例に説明したが、これに限定されるものではなく、物体検出認識装置100とは別個の学習装置として構成するようにしてもよい。
For example, in the above-described embodiment, the case where the learning unit 38 is included in the object detection / recognition apparatus 100 has been described as an example. However, the present invention is not limited to this, and is configured as a learning apparatus separate from the object detection / recognition apparatus 100. You may make it.
20 蓄積部
22 取得部
30 分離部
32 物体候補領域抽出部
34 入力画像生成部
36 検出認識部
38 学習部
100 物体検出認識装置 Reference Signs List 20accumulation unit 22 acquisition unit 30 separation unit 32 object candidate region extraction unit 34 input image generation unit 36 detection and recognition unit 38 learning unit 100 object detection and recognition device
22 取得部
30 分離部
32 物体候補領域抽出部
34 入力画像生成部
36 検出認識部
38 学習部
100 物体検出認識装置 Reference Signs List 20
Claims (7)
- 物体の検出の対象となる画像を取得し、前記画像に含まれる背景と物体が写った前景とを分離する分離部と、
前記分離された前景を表す領域のうち、所定のサイズ以下の領域を物体候補領域として抽出する物体候補領域抽出部と、
前記抽出された物体候補領域に基づいて、前記物体候補領域及び前記物体候補領域の周辺に対応する複数の入力画像を生成する入力画像生成部と、
生成された前記複数の入力画像の各々を、予め学習された物体の検出及び前記物体のカテゴリの認識を行うためのCNN(Convolutional Neural Network)に入力して、前記入力画像の各々について前記入力画像に含まれる前記物体の位置を検出すると共に、前記入力画像の各々について検出された前記物体のカテゴリを認識し、前記入力画像の各々の前記物体のカテゴリの認識結果に基づいて、前記物体のカテゴリの前記認識結果を統合する検出認識部と、
物体検出認識装置。 A separation unit that obtains an image to be detected as an object and separates a background included in the image and a foreground in which the object is captured,
An object candidate region extracting unit that extracts a region equal to or smaller than a predetermined size as an object candidate region among the regions representing the separated foreground;
An input image generation unit that generates a plurality of input images corresponding to the periphery of the object candidate region and the object candidate region based on the extracted object candidate region,
Each of the plurality of generated input images is input to a CNN (Convolutional Neural Network) for detecting a pre-learned object and recognizing a category of the object. Detecting the position of the object included in, and recognizing the category of the object detected for each of the input images, based on the recognition result of the category of the object of each of the input images, the category of the object A detection recognition unit that integrates the recognition results of
Object detection and recognition device. - 前記画像は、物体の検出の対象となる映像のシーンごとに取得し、前記シーンの画像ごとに、前記分離部、物体候補領域抽出部、入力画像生成部、及び検出認識部の各処理を行う請求項1に記載の物体検出認識装置。 The image is obtained for each scene of a video that is a target of object detection, and for each image of the scene, the respective processes of the separation unit, the object candidate region extraction unit, the input image generation unit, and the detection recognition unit are performed. The object detection and recognition device according to claim 1.
- 前記入力画像生成部は、前記抽出された物体候補領域及び前記物体候補領域の周辺から得られる複数の領域をアップサンプリングすることにより前記複数の入力画像を生成する請求項1又は請求項2に記載の物体検出認識装置。 3. The input image generation unit according to claim 1, wherein the input image generation unit generates the plurality of input images by upsampling the extracted object candidate region and a plurality of regions obtained from a periphery of the object candidate region. 4. Object detection and recognition device.
- 前記検出認識部の前記認識結果の統合は、前記入力画像の各々についてのカテゴリの認識において算出されるカテゴリの各々の信頼度から求められる、カテゴリごとの前記信頼度の最大値、又は平均値を用いて、最も前記信頼度が高いカテゴリを求め、統合した認識結果とする請求項1~請求項3の何れか1項に記載の物体検出認識装置。 Integration of the recognition results of the detection and recognition unit is determined from the reliability of each of the categories calculated in the recognition of the category of each of the input images, the maximum value of the reliability for each category, or the average value The object detection / recognition apparatus according to any one of claims 1 to 3, wherein the category having the highest reliability is obtained by using the category, and an integrated recognition result is obtained.
- 前記検出認識部で前記画像から検出された物体、及び前記物体のカテゴリを用いて、前記CNNを学習する学習部を更に含み、
前記検出認識部により、学習した前記CNNを用いて、前記検出及び前記認識を行う請求項1~請求項4の何れか1項に記載の物体検出認識装置。 The detection unit further includes a learning unit that learns the CNN using the object detected from the image and the category of the object,
The object detection / recognition device according to any one of claims 1 to 4, wherein the detection and the recognition are performed by the detection / recognition unit using the learned CNN. - 分離部が、物体の検出の対象となる画像を取得し、前記画像に含まれる背景と物体が写った前景とを分離するステップと、
物体候補領域抽出部が、前記分離された前景を表す領域のうち、所定のサイズ以下の領域を物体候補領域として抽出するステップと、
入力画像生成部が、前記抽出された物体候補領域に基づいて、前記物体候補領域及び前記物体候補領域の周辺に対応する複数の入力画像を生成するステップと、
検出認識部が、生成された前記複数の入力画像の各々を、予め学習された物体の検出及び前記物体のカテゴリの認識を行うためのCNN(Convolutional Neural Network)に入力して、前記入力画像の各々について前記入力画像に含まれる前記物体の位置を検出すると共に、前記入力画像の各々について検出された前記物体のカテゴリを認識し、前記入力画像の各々の前記物体のカテゴリの認識結果に基づいて、前記物体のカテゴリの前記認識結果を統合するステップと、
物体検出認識方法。 A separation unit acquires an image to be detected, and separates a background included in the image and a foreground in which the object is captured,
An object candidate region extracting unit, for extracting, from the regions representing the separated foreground, a region having a predetermined size or less as an object candidate region;
An input image generation unit, based on the extracted object candidate region, generating a plurality of input images corresponding to the periphery of the object candidate region and the object candidate region,
A detection / recognition unit that inputs each of the plurality of generated input images to a CNN (Convolutional Neural Network) for detecting a previously learned object and recognizing a category of the object; Detecting the position of the object included in the input image for each, recognizing the category of the object detected for each of the input images, based on the recognition result of the category of the object of each of the input images Integrating the recognition results for the category of the object;
Object detection and recognition method. - コンピュータを、請求項1~請求項5のいずれか1項に記載の物体検出認識装置の各部として機能させるためのプログラム。 A program for causing a computer to function as each unit of the object detection / recognition apparatus according to any one of claims 1 to 5.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2018140533A JP2020017136A (en) | 2018-07-26 | 2018-07-26 | Object detection and recognition apparatus, method, and program |
JP2018-140533 | 2018-07-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020022329A1 true WO2020022329A1 (en) | 2020-01-30 |
Family
ID=69182117
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2019/028838 WO2020022329A1 (en) | 2018-07-26 | 2019-07-23 | Object detection/recognition device, method, and program |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP2020017136A (en) |
WO (1) | WO2020022329A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022185473A1 (en) * | 2021-03-04 | 2022-09-09 | 日本電気株式会社 | Object detection model generation device, object detection model generation method, object detection device, object detection method, and recording medium |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102218255B1 (en) * | 2020-09-25 | 2021-02-19 | 정안수 | System and method for analyzing image based on artificial intelligence through learning of updated areas and computer program for the same |
JP7380904B2 (en) * | 2020-09-29 | 2023-11-15 | 日本電気株式会社 | Information processing device, information processing method, and program |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07200729A (en) * | 1993-11-29 | 1995-08-04 | Hewlett Packard Co <Hp> | Optical character recognition method |
US9965865B1 (en) * | 2017-03-29 | 2018-05-08 | Amazon Technologies, Inc. | Image data segmentation using depth data |
US20180157916A1 (en) * | 2016-12-05 | 2018-06-07 | Avigilon Corporation | System and method for cnn layer sharing |
-
2018
- 2018-07-26 JP JP2018140533A patent/JP2020017136A/en active Pending
-
2019
- 2019-07-23 WO PCT/JP2019/028838 patent/WO2020022329A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07200729A (en) * | 1993-11-29 | 1995-08-04 | Hewlett Packard Co <Hp> | Optical character recognition method |
US20180157916A1 (en) * | 2016-12-05 | 2018-06-07 | Avigilon Corporation | System and method for cnn layer sharing |
US9965865B1 (en) * | 2017-03-29 | 2018-05-08 | Amazon Technologies, Inc. | Image data segmentation using depth data |
Non-Patent Citations (3)
Title |
---|
GIRSHICK, R.: "Fast R-CNN", PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV, 13 December 2015 (2015-12-13), pages 1440 - 1448, XP055646790, ISBN: 978-1-4673-8391-2, DOI: 10.1109/ICCV.2015.169 * |
MISRA, I. ET AL.: "Watch and Learn: Semi-Supervised Learning of Object Detectors from Videos", PROCEEDINGS OF THE 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR, 12 June 2015 (2015-06-12), pages 3593 - 3602, XP032793810, ISBN: 978-1-4673-6964-0, DOI: 10.1109/CVPR.2015.7298982 * |
REN, S. ET AL.: "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 39, no. 6, 6 June 2016 (2016-06-06), pages 1137 - 1149, XP055583592, ISSN: 0162-8828, DOI: 10.1109/TPAMI.2016.2577031 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022185473A1 (en) * | 2021-03-04 | 2022-09-09 | 日本電気株式会社 | Object detection model generation device, object detection model generation method, object detection device, object detection method, and recording medium |
Also Published As
Publication number | Publication date |
---|---|
JP2020017136A (en) | 2020-01-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230045519A1 (en) | Target Detection Method and Apparatus | |
Meenpal et al. | Facial mask detection using semantic segmentation | |
CN109284670B (en) | Pedestrian detection method and device based on multi-scale attention mechanism | |
US11416710B2 (en) | Feature representation device, feature representation method, and program | |
US9824294B2 (en) | Saliency information acquisition device and saliency information acquisition method | |
CN109918987B (en) | Video subtitle keyword identification method and device | |
CN108388879B (en) | Target detection method, device and storage medium | |
JP6330385B2 (en) | Image processing apparatus, image processing method, and program | |
Chen et al. | Traffic sign detection and recognition for intelligent vehicle | |
KR101896357B1 (en) | Method, device and program for detecting an object | |
WO2020022329A1 (en) | Object detection/recognition device, method, and program | |
CN105405154A (en) | Target object tracking method based on color-structure characteristics | |
CN111950424A (en) | Video data processing method and device, computer and readable storage medium | |
RU2697649C1 (en) | Methods and systems of document segmentation | |
US11113507B2 (en) | System and method for fast object detection | |
EP3073443A1 (en) | 3D Saliency map | |
CN113297959B (en) | Target tracking method and system based on corner point attention twin network | |
CN110909724A (en) | Multi-target image thumbnail generation method | |
CN113743389B (en) | Facial expression recognition method and device and electronic equipment | |
CN113657225B (en) | Target detection method | |
CN110991440A (en) | Pixel-driven mobile phone operation interface text detection method | |
Golgire | Traffic Sign Recognition using Machine Learning: A Review | |
EP4332910A1 (en) | Behavior detection method, electronic device, and computer readable storage medium | |
US10380447B1 (en) | Providing regions of interest in an image | |
Oh et al. | A study on facial components detection method for face-based emotion recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19840981 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19840981 Country of ref document: EP Kind code of ref document: A1 |