KR20170028591A

KR20170028591A - Apparatus and method for object recognition with convolution neural network

Info

Publication number: KR20170028591A
Application number: KR1020150125393A
Authority: KR
Inventors: 이영운
Original assignee: 한국전자통신연구원
Priority date: 2015-09-04
Filing date: 2015-09-04
Publication date: 2017-03-14
Also published as: KR101980360B1

Abstract

The present invention relates to an apparatus and method for recognizing an object using a convolutional neural network. The apparatus includes an image input unit for acquiring and inputting a color image and a depth image, an image processor for generating a composite image of the color image and the depth image, and correcting resolution and noise of the generated composite image, A size information extracting unit that extracts size information of an object included in the image using a depth value of the image, and a size information extracting unit that extracts size information of the object extracted by the size information extracting unit and the synthesized image corrected by the image processing unit, And an object recognition unit for recognizing the object by applying it to the solution neural network.

Description

[0001] Apparatus and method for object recognition using convolution neural network [0002]

The present invention relates to an apparatus and method for recognizing an object using a convolutional neural network.

Object recognition technology extracts feature points from camera images and analyzes the distribution to identify the types of objects included in the images. Representative examples of object recognition technology include face recognition, human recognition, and traffic signal recognition.

Recently, object recognition technology using convolutional neural network has appeared, which shows accuracy exceeding the recognition rate of existing object recognition technology, and consequently object recognition research using convolutional neural network is actively proceeding.

However, the object recognition technology using the existing convolution neural network does not consider the color image and the depth image simultaneously in the feature point extraction step, so it can not accurately distinguish the area of the object and can not scale-invariant Respectively.

Korean Patent Publication No. 10-2014-0104091

An object of the present invention is to provide an apparatus and method for extracting feature points by applying convolution neural networks simultaneously to a color image and a depth image to clearly distinguish an object region and applying absolute size information derived from depth information to a convolutional neural network, And an object recognition apparatus and method using a convolution neural network capable of robust object recognition.

The technical problems of the present invention are not limited to the above-mentioned technical problems, and other technical problems which are not mentioned can be understood by those skilled in the art from the following description.

According to another aspect of the present invention, there is provided an apparatus for recognizing an object using a convolution neural network, the apparatus including an image input unit for acquiring and inputting a color image and a depth image, a synthesized image of the color image and the depth image, A size information extracting unit for extracting size information of an object included in the depth image using the depth value of the depth image, a size information extracting unit for extracting size information of the object included in the depth image, And an object recognition unit for recognizing the object by applying the size information of the object extracted by the size information extraction unit to the convolution neural network.

According to another aspect of the present invention, there is provided a method of recognizing an object using a convolution neural network, the method comprising: acquiring and inputting a color image and a depth image; generating a composite image of the color image and the depth image; Extracting the size information of the object included in the image using the depth value of the depth image, and extracting size information of the extracted composite image and the extracted object, And recognizing the object by applying it to the convolution neural network.

According to the present invention, by applying the convolutional neural network to the combined image of the color image and the depth image input from the camera and the size information of the object included in the image, the object is recognized, There is an advantage to be recognized.

1 is a block diagram of an object recognition apparatus using a convolutional neural network according to the present invention.
2 is a diagram illustrating an example of a composite image generated by an object recognition apparatus using a convolutional neural network according to the present invention.
FIG. 3 and FIG. 4 are diagrams illustrating an operation flow for an object recognition method using a convolutional neural network according to the present invention.
5 is a diagram illustrating a computing system to which the apparatus according to the present invention is applied.

Hereinafter, some embodiments of the present invention will be described in detail with reference to exemplary drawings. It should be noted that, in adding reference numerals to the constituent elements of the drawings, the same constituent elements are denoted by the same reference numerals whenever possible, even if they are shown in different drawings. In the following description of the embodiments of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the difference that the embodiments of the present invention are not conclusive.

In describing the components of the embodiment of the present invention, terms such as first, second, A, B, (a), and (b) may be used. These terms are intended to distinguish the constituent elements from other constituent elements, and the terms do not limit the nature, order or order of the constituent elements. Also, unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the relevant art and are to be interpreted in an ideal or overly formal sense unless explicitly defined in the present application Do not.

1 is a block diagram of an object recognition apparatus using a convolutional neural network according to the present invention.

1, an object recognition apparatus (hereinafter, referred to as 'object recognition apparatus') 100 using a convolution neural network according to the present invention includes a control unit 110, an image input unit 120, an input unit 130, An output unit 140, a communication unit 150, a storage unit 160, an image processing unit 170, a size information extraction unit 180, and an object recognition unit 190. Here, the control unit 110 may process signals transmitted between the respective units of the object recognition apparatus 100. [

The image input unit 120 may correspond to a camera that captures and provides a color image and a depth image. Here, the camera may have a depth sensor separately, and may include an RGB image sensor for acquiring a depth image from a color image. For example, the image input unit 120 may correspond to a Kinect that acquires an RGB image and a depth image from an RGB image sensor in real time.

The color image and the depth image obtained by the image input unit 120 may be transmitted to the image processing unit 170 and the size information extraction unit 180 through the control unit 110. [

The input unit 130 may be a key button implemented outside the object recognition apparatus 100 as a means for receiving a control command and may correspond to a soft key implemented on a display implemented in the object recognition apparatus 100 It is possible. The input unit 130 may be an input unit such as a mouse, a joystick, a jog shuttle, or a stylus pen.

The output unit 140 may include a display for displaying an operation state of the object recognition apparatus 100, an object recognition result, and the like, and may include a speaker for outputting a voice signal.

Here, the display may be used as an input device in addition to an output device when a sensor for sensing a touch operation is provided. That is, when a touch sensor such as a touch film, a touch sheet, or a touch pad is provided on the display, the display may operate as a touch screen, and the input unit 130 and the output unit 140 may be integrated.

The display may be a liquid crystal display (LCD), a thin film transistor liquid crystal display (TFT LCD), an organic light-emitting diode (OLED), a flexible display, , A field emission display (FED), and a 3D display (3D display).

The communication unit 150 may include a communication module that supports communication with a camera implemented at a remote location. In addition, the communication unit 150 may include a communication module that supports access to a server or a database implemented in the server.

The communication module may support wireless Internet access, short range communication, or wired communication. Here, the wireless Internet technology includes a wireless LAN (WLAN), a wireless broadband (Wibro), a Wi-Fi, a World Interoperability for Microwave Access (WIMAX), a High Speed Downlink Packet Access And may also include Bluetooth, ZigBee, Ultra Wideband (UWB), Radio Frequency Identification (RFID), Infrared Data Association (IrDA), and the like as the short range communication technology . The wired communication technology may include USB (Universal Serial Bus) communication, and the like.

The storage unit 160 may store data and programs necessary for the object recognition apparatus 100 to operate. For example, the storage unit 160 may store setting values for image processing, size information extraction, and object recognition in the object recognition apparatus 100, and an algorithm for performing each function may be stored. In addition, the storage unit 160 may store a command for performing an operation of the object recognition apparatus 100 and the like.

Also, the storage unit 160 may store parameter information to be referred to for image processing, size information extraction, and object recognition, and corresponding parameter values.

Here, the storage unit 160 may include a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (for example, an SD or XD memory A random access memory (SRAM), a read-only memory (ROM), a programmable read-only memory (PROM), an electrically erasable programmable read-only memory (EEPROM) Erasable Programmable Read-Only Memory).

The image processing unit 170 generates a composite image of the color image and the depth image obtained by the image input unit 120. At this time, the image processor 170 may generate a composite image by mapping pixels of the color image and the depth image. For example, the image processor 170 may map the pixels of the color image corresponding to the pixels of the depth image using the depth image values.

In addition, the image processor 170 corrects the composite image of the color image and the depth image. At this time, the image processor 170 can correct the resolution of the synthesized image and correct the noise.

Here, the image processor 170 may correct the resolution of the composite image by cutting out an area to which the color image and the depth image are not mapped in the composite image.

On the other hand, the depth image has lower resolution than the color image. Accordingly, the image processor 170 can correct the resolution of the composite image by increasing the resolution of the depth image. For example, the image processor 170 may upsamplify a depth image using a Markov Random Field (MRF), a Spatiotemporal filter, and a bilateral filter that preserves an edge The resolution of the depth image can be increased.

The image processor 170 may remove the noise of the depth image using the color image information. For example, the image processor 170 may remove noise of a depth image using a bi-directional filter and a median filter. In addition, the image processor 170 can estimate the depth value of the hole in which the depth information is not input in the depth image using the color image information. Here, the image processor 170 may estimate the depth value of the corresponding hole by linearly interpolating the depth information from the surrounding color information of the hole to which the depth information is not inputted.

In this way, the image processor 170 can generate a corrected composite image by correcting the resolution of the composite image and removing the noise. The corrected synthesized image generated by the image processing unit 170 may be transmitted to the size information extracting unit 180 and the object recognizing unit 190.

The size information extracting unit 180 extracts size information of each pixel or area using the depth value of the depth image. Here, the size information extracting unit 180 can extract information such as the length, angle, and height of each pixel or region using the depth value of the depth image.

For example, the size information extracting unit 180 may extract length information of an object included in an image using Equation (1) below.

In Equation (1), s denotes an actual length of a specific object included in the depth image, d ₁ denotes a depth value of a pixel or an area where the object is located, and s 1 denotes a length of a specific object on the depth image.

Accordingly, the size information extraction unit 180 extracts the length information of the object included in the depth image through Equation (1), and transmits the extracted length information to the object recognition unit 190.

The object recognition unit 190 includes the corrected composite image transferred from the image processing unit 170, the size information transmitted from the size information extraction unit 180, and the convolutional neural network And recognizes the object.

Here, the convolution neural network is composed of a feature point extractor and a neural network classifier for extracting feature points of the input image. The feature point extractor can be defined as a series of convolution and sub-sampling processes. The feature point extractor can predict the camera motion (Ego-motion) by tracking the corner feature points extracted from the original image, and sets the region of the object having other motion components as a region of interest (ROI) . The neural network classifier is composed of a multi-layer neural network, and classifies the objects included in the set ROI.

At this time, the convolutional neural network can learn the parameters of the convolutional neural network in advance from the database included in the storage unit 160 or the database received from the external server, and can recognize the object based on the parameters of the learned convolutional neural network have. Here, the database may include a pre-prepared composite image and a ground truth.

Accordingly, the object recognition unit 190 can recognize the object by applying the corrected composite image and the size information of the object to the convolutional neural network.

For example, the object recognition unit 190 recognizes that the object included in the composite image is located at a short distance from the camera using the corrected composite image and the size information of the object, . In this case, the object recognition unit 190 can reduce the color image value in the composite image, reduce the weight of the corresponding region applied to the convolutional neural network, and recognize the object.

As another example, the object recognition unit 190 recognizes that the object included in the composite image is located far from the camera using the corrected composite image and the size information of the object, so that the weight occupied by the composite image is smaller than the absolute size . In this case, the object recognition unit 190 can increase the color image value in the composite image or enlarge the area of the object, thereby increasing the weight of the corresponding region applied to the convolutional neural network and recognizing the object.

The object recognition result by the object recognition unit 190 can be represented as shown in FIG.

In this way, the object recognition unit 190 can apply the information of the color image and the information of the depth image to the convolution neural network at the same time, thereby making it possible to recognize the area of the object clearly by reflecting the size change of the object.

The operation flow of the control device according to the present invention will be described in more detail as follows.

FIG. 3 and FIG. 4 are diagrams illustrating an operation flow for an object recognition method using a convolutional neural network according to the present invention.

Referring to FIGS. 3 and 4, when the color image and the depth image are input from the image input means such as a camera (S110), the object recognition apparatus generates a composite image of the input color image and the depth image (S120). In step 'S120', the object recognition apparatus can generate a composite image by mapping the pixels of the color image corresponding to the pixels of the depth image using the depth image values.

In addition, the object recognition apparatus corrects the synthesized image (S130). In step 'S130', the object recognition apparatus corrects the resolution of the synthesized image (S131) and removes the noise (S135), as shown in FIG.

In step 'S131', the object recognition apparatus can correct the resolution of the composite image by cutting out areas where the color image and the depth image are not mapped in the composite image, or by increasing the resolution by upsampling the depth image. Also, in step 'S135', the object recognition apparatus can remove the noise of the composite image by estimating the depth value of the hole in which the depth information is not inputted in the depth image using the color image information.

Then, the object recognition apparatus extracts the size information of the object in the image using the depth value of the depth image (S140).

The object recognition apparatus applies the synthesized image corrected in step S130 and the size information of the object extracted in step S140 to the convolutional neural network in step S150 to recognize the object in step S160.

The object recognition apparatus can recognize the area of the object clearly by reflecting the size change of the object by applying the corrected composite image and the size information of the object to the convolution neural network at the same time.

5 is a diagram illustrating a computing system to which the apparatus according to the present invention is applied.

5, a computing system 1000 includes at least one processor 1100, a memory 1300, a user interface input device 1400, a user interface output device 1500, (1600), and a network interface (1700).

The processor 1100 may be a central processing unit (CPU) or a memory device 1300 and / or a semiconductor device that performs processing for instructions stored in the storage 1600. Memory 1300 and storage 1600 may include various types of volatile or non-volatile storage media. For example, the memory 1300 may include a ROM (Read Only Memory) and a RAM (Random Access Memory).

Thus, the steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by processor 1100, or in a combination of the two. The software module may reside in a storage medium (i.e., memory 1300 and / or storage 1600) such as a RAM memory, a flash memory, a ROM memory, an EPROM memory, an EEPROM memory, a register, a hard disk, a removable disk, You may. An exemplary storage medium is coupled to the processor 1100, which can read information from, and write information to, the storage medium. Alternatively, the storage medium may be integral to the processor 1100. [ The processor and the storage medium may reside within an application specific integrated circuit (ASIC). The ASIC may reside within an object recognition device. Alternatively, the processor and the storage medium may reside as discrete components within the object recognition device.

The foregoing description is merely illustrative of the technical idea of the present invention, and various changes and modifications may be made by those skilled in the art without departing from the essential characteristics of the present invention.

Therefore, the embodiments disclosed in the present invention are intended to illustrate rather than limit the scope of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments. The scope of protection of the present invention should be construed according to the following claims, and all technical ideas within the scope of equivalents should be construed as falling within the scope of the present invention.

100: object recognition device 110:
120: image input unit 130: input unit
140: output unit 150: communication unit
160: storage unit 170: image processing unit
180: Size information extraction unit 190: Object recognition unit

Claims

An image input unit for acquiring and inputting a color image and a depth image;
An image processor for generating a composite image of the color image and the depth image, and correcting the resolution and noise of the generated composite image;
A size information extracting unit for extracting size information of an object included in the image using the depth value of the depth image; And
An object recognition unit for recognizing an object by applying the synthesized image corrected by the image processing unit and the size information of the object extracted by the size information extracting unit to the convolutional neural network,
And an object recognition unit using the convolution neural network.