WO2019117374A1

WO2019117374A1 - Apparatus and method for detecting dynamic object

Info

Publication number: WO2019117374A1
Application number: PCT/KR2017/014775
Authority: WO
Inventors: 정태영; 이상윤; 황상원; 이경재; 이준협; 이주성; 김우진
Original assignee: 연세대학교 산학협력단
Priority date: 2017-12-12
Filing date: 2017-12-14
Publication date: 2019-06-20
Also published as: KR102002228B1; KR20190069958A

Abstract

An apparatus and a method for detecting a dynamic object are disclosed. The disclosed apparatus comprises: an image obtaining unit for obtaining a stereo image of a first frame image and a second frame image; a camera movement detection unit for detecting camera movement in a first frame period and a second frame period; a depth information operation unit for obtaining a depth per pixel measurement of the first frame image by using the stereo image of the first frame image; a transformed image generation unit for generating a transformed image obtained by transforming the first frame image on the basis of the camera movement and the depth per pixel measurement; an optical flow image generation unit for generating an optical flow image by using the second frame image and the transformed image; and a dynamic object detection unit for detecting a dynamic object by using the generated optical flow image, wherein the transformed image is an image transformed by reflecting the camera movement onto the first frame image. According to the disclosed apparatus and the method, it is possible to effectively detect a dynamic object in an image captured in a moving state.

Description

Dynamic object detection apparatus and method

Embodiments of the present invention relate to an apparatus and method for detecting dynamic objects, and more particularly to an apparatus and method for detecting a dynamic object from an acquired image for three-dimensional map generation.

Recently, studies on autonomous vehicles have progressed, and studies have been actively carried out to generate sophisticated three-dimensional maps through learning by attaching various sensors to an automobile.

A 3D map generated by an image obtained by attaching a camera to an automobile should include only background information. Objects such as a moving car or a pedestrian need to be removed from the map.

However, when the image obtained from the camera sensor is directly used for generating the three-dimensional map, it is necessary to detect and remove the dynamic objects because they remain on the map.

The optical flow image is generated by computing the optical flow using the difference between the current frame and the next frame, and by using the dynamic flow object, .

Such an optical flow image can effectively detect a dynamic object when the camera is stopped. However, there is a problem in that a dynamic object can not be properly detected when an image is acquired in a moving state as in the case of producing a three-dimensional map.

The present invention proposes a method and apparatus for effectively detecting a dynamic object in an image captured in a moving state of a camera.

According to an aspect of the present invention, there is provided an image processing apparatus including an image acquiring unit acquiring a stereo image of a first frame image and a second frame image; A camera motion detector for detecting camera motion in a first frame and a second frame; A depth information operation unit for obtaining a depth of each pixel of the first frame image using a stereo image of the first frame image; A transformed image generation unit for generating a transformed image obtained by transforming the first frame image based on the camera motion and the depth per pixel; An optical flow image generation unit generating an optical flow image using the second frame image and the transformed image; And a dynamic object detecting unit for detecting a dynamic object using the generated optical flow image, wherein the transformed image is a transformed image of the camera motion reflected on the first frame image.

The camera motion detection unit independently detects the rotational motion and the linear motion.

The transformed image generating unit independently applies the rotational motion and the linear motion to generate a transformed image.

And the transformed image generating unit transforms the first frame image according to the following equation according to the rotational motion.

In the above equation, r _u denotes a rotational motion with respect to the first frame image pixel coordinate u, r _v denotes rotational motion with respect to the first frame image pixel coordinate v, yaw, pitch, Denotes the angle formed by the vertical axis of the first frame image and the coordinate (uv), FoV denotes the viewing angle, d denotes the depth information acquired for each pixel, s denotes the distance from the vanishing point (uv), height denotes the number of pixels in the vertical axis of the first frame image, and width denotes the number of pixels in the horizontal axis of the first frame image.

The transformed image generation unit transforms the first frame image according to the following equation based on the linear motion.

The above equation, t _u and t _v refers to the conversion of the u-axis and v-axis in the first frame image coordinates (u, v), and, d is the depth information obtained for each pixel, θ is the first frame Height means the number of pixels in the vertical axis of the first frame image, and width means the number of pixels in the horizontal axis of the first frame image.

The optical flow image generation unit generates an optical flow image using the difference image of the transformed image and the second frame image.

The dynamic object detection apparatus further includes a post-processing unit for applying an erosion filter and an expansion filter to the generated optical flow image to perform post-processing.

According to another aspect of the present invention, there is provided an image processing apparatus including an image acquiring unit acquiring a stereo image of a first frame image and a second frame image; A camera motion detector for detecting camera motion in a first frame and a second frame; A depth information operation unit for obtaining a depth of each pixel of the first frame image using a stereo image of the first frame image; A transformed image generation unit for generating a transformed image obtained by transforming the first frame image based on the camera motion and the depth per pixel; An optical flow image generation unit generating an optical flow image using the second frame image and the transformed image; And a dynamic object detecting unit for detecting a dynamic object using the generated optical flow image, wherein the camera motion detecting unit independently detects a rotational motion and a linear motion.

According to another aspect of the present invention, there is provided a stereoscopic image processing method comprising the steps of: (a) acquiring a stereo image of a first frame image and a second frame image; (B) detecting camera motion in a first frame and a second frame period; (C) obtaining a depth of each pixel of the first frame image using a stereo image of the first frame image; (D) generating a transformed image obtained by transforming the first frame image based on the camera motion and the depth per pixel; (E) generating an optical flow image using the second frame image and the transformed image; And a step (f) of detecting a dynamic object using the generated optical flow image, wherein the transformed image is a transformed image of the camera motion reflected on the first frame image, .

According to the present invention, dynamic objects can be effectively detected in an image captured in a moving state.

1 is a block diagram showing a schematic structure of a dynamic object detection apparatus according to an embodiment of the present invention;

2 is a diagram for explaining a rotational motion and a linear motion detected in accordance with an embodiment of the present invention;

3 is a diagram for describing components of a rotational motion detected in accordance with an embodiment of the present invention;

4 is a diagram for explaining a relationship between a world coordinate system and an image obtained through a camera;

5 is a flowchart showing an overall flow of a dynamic object detection method according to an embodiment of the present invention.

While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. Like reference numerals are used for like elements in describing each drawing.

Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

1 is a block diagram showing a schematic structure of a dynamic object detecting apparatus according to an embodiment of the present invention.

Referring to FIG. 1, a dynamic object detecting apparatus according to an exemplary embodiment of the present invention includes a stereo image acquiring unit 100, a camera motion detecting unit 110, a depth information calculating unit 120, a transformed image generating unit 130, A flow image generating unit 140, a post-processing unit 150, and a dynamic object detecting unit 160.

The dynamic object detection apparatus according to an embodiment of the present invention can be installed in a vehicle and used to produce a three-dimensional map. It is difficult to detect only the dynamic object from the acquired image because the vehicle is moving when acquiring the image from the vehicle to produce the 3D map. Of course, in addition to these applications, if you need to detect dynamic objects from moving images, you can use them for a variety of other purposes.

The stereo image acquisition unit 100 acquires a stereo image using a stereo camera. The stereo image acquisition unit 100 includes a first camera device for acquiring a left image and a second camera device for acquiring a right image, and independently acquires a left image and a right image.

In the present invention, the dynamic object should be detected by reflecting the motion. In this case, the depth information should be obtained from the photographed image, and a stereo image is acquired using two or more cameras.

The camera motion detection unit 110 detects camera motion and detects a camera motion to generate a converted image to be described later.

If the camera is fixed on the vehicle, the camera movement may occur due to the movement of the vehicle, and movement may be caused by the movement of the camera itself.

The camera motion detection unit 110 detects a camera motion occurring between a first frame image and a second frame image among frames of an image acquired from the stereo image acquisition unit.

The camera motion detection unit 110 detects motion information of the camera. In the present invention, the motion of the camera is detected as two motions. The movement of the first camera is the rotation movement of the camera and the movement of the second camera is the movement of the straight line. The combination of the rotational motion and the linear motion can be regarded as a substantial motion of the camera, but the present invention distinguishes it.

FIG. 2 is a diagram for explaining a rotational motion and a linear motion detected according to an embodiment of the present invention. FIG.

Fig. 2 (a) is a view showing the actual movement of the camera, Fig. 2 (b) is a view showing a linear movement of the camera during movement of Fig. to be.

As shown in FIG. 2, the movement of the camera is represented by the sum of the rotational motion and the linear motion, and the camera motion detection unit 110 independently detects the linear motion and the rotational motion as shown in FIG.

According to a preferred embodiment of the present invention, the movement of the camera may be detected using various sensors known as acceleration sensors.

Rotational motion includes motion for three components yaw, pitch, and roll, and linear motion includes motion in the x-, y-, and z-axis directions.

FIG. 3 is a diagram for explaining components of rotational motion detected according to an embodiment of the present invention.

Referring to FIG. 3, roll means a motion in which the object rotates in a specific longitudinal direction (for example, z axis). yaw means a movement in which the object rotates in the lateral direction (for example, the x axis). In addition, pitch means a motion in which the object rotates in the up-and-down direction (for example, the y-axis).

The depth information calculation unit 120 calculates depth information of an image to be acquired, and calculates depth information for each pixel. Various arithmetic methods for acquiring depth information using a stereo image may be used. For example, the variation may be estimated based on the difference image between the left image and the right image, and the depth information per pixel may be calculated based on the estimated variation. The depth information calculation unit 120 calculates depth information of the first frame image using the left and right images of the first frame image.

The transformed image generation unit 130 generates transformed information based on the rotational motion and the linear motion that are obtained separately, and generates a transformed image based on the transformed information. Here, the conversion information is information for converting the first frame image based on the motion generated between the first frame image and the second frame image to be acquired, and the conversion information is information for converting one of the stereo images of the first frame image For example, a left image).

Such conversion information is generated to distinguish a fixed object from a dynamic object among the objects in the first frame image and the second frame image.

In other words, the conversion information is information on how to convert each pixel of the first frame image (left image) based on the detected camera motion, and the converted image is how the first frame image changes It can be said that it is a video that predicts whether or not it will be.

In order to generate such a transformed image, the relationship between the world coordinate system and the image coordinate system should be used. In this specification, the world coordinate system is represented by (X, Y, Z), and the pixel coordinates of the acquired image are represented by (u, v).

4 is a diagram for explaining a relationship between a world coordinate system and an image obtained through a camera.

Referring to FIG. 4, f denotes the distance between the camera lens and the image plane, that is, the focal distance. D represents the distance between the object and the camera lens. FoV (Eield of View) means the viewing angle of the camera. As described above, (u, v) is the coordinates of the image.

In Fig. 4, the unit of the world coordinate system is the actual distance, and the unit of (u, v) is the pixel. 4, the relational expression between the world coordinate system and the image pixel is expressed by Equation 1 below.

A method of generating a transformed image based on the motion of the camera on the basis of the relationship between the world coordinate system and the image coordinates is described.

(R _u ) due to the rotational motion about the pixel coordinate u and the rotational motion for the pixel coordinate v based on the detected rotational motion (yaw, pitch, roll) conversion (r _v) is performed as in the following equation (2).

In Equation (2),? Denotes an angle formed by the vertical axis of the first frame image and the coordinate (uv), FoV denotes a viewing angle, d denotes depth information obtained for each pixel, s denotes a vanishing point To (uv).

On the other hand, linear motion (tx, ty, tz) a first frame image coordinates (u, v) conversion of t _u and t _v of the made according to the following equation (3).

In the above equation, d is the depth information obtained for each pixel, θ is the angle between the vertical axis of the first frame image and the coordinate (uv), and height is the height of the y-axis of the image And width is the x-axis width of the image (number of x-axis pixels)

The rotation transformation according to Equation (2) and the linear transformation according to Equation (3) are performed independently. This means that it is not necessary to perform the sequential conversion irrespective of the order of the conversion. For example, it is possible to perform the linear conversion after performing the rotation conversion first, and conversely, the conversion may be performed.

According to the camera movement detected by the fixed objects through the generated transformed image, it is possible to predict how the position will move in the second frame image (left image).

When the converted image is generated by the converted image generating unit 130, the optical flow image generating unit 140 generates an optical flow image using the converted image and the second frame image. The optical flow image is generated using the difference image between the transformed image and the second frame image. Various methods of generating an optical flow image using two images are known, and an optical flow image can be generated by any method.

For example, the optical flow image may be generated using the Lukas-kanade method, or may be generated using a deep-running model such as Flownet.

Since the transformed image generated by the transformed image generating unit 130 is an image reflecting the motion of the camera, the static object is positioned at the same position in the transformed image and the second frame image. However, the position of the dynamic object is different in the transformed image and the second frame image.

The post-processing unit 150 performs a post-process on the transformed image for correct dynamic object recognition. The outline portion of the optical flow image may not be clear, which may make it difficult to detect the dynamic object region, and the post-processing unit 150 performs post-processing for accurate dynamic object region detection.

According to an embodiment of the present invention, post-processing may be performed through filtering using an erosion and a dialing filter. The expansion filter is a filter that adds pixels to the edge of an object in an image. The erosion filter is a filter that removes pixels of an edge of an object in an image. The present invention can perform post-processing by combining these two filters. It will be apparent to those skilled in the art that post-processing may be omitted as needed.

The dynamic object detection unit 160 detects the dynamic object from the post-processed optical flow image. The optical flow image is a binary image, and detects an object in a region having a different color in the optical flow image as a dynamic object.

The dynamic objects detected in this manner can be utilized in various forms. As described above, it can be used to remove dynamic objects in 3D map production.

Referring to FIG. 5, first, a stereo image is acquired using a camera (step 500). And acquires the first frame image and the second frame image using a camera.

As the first frame image and the second frame image are acquired, a camera motion occurring between the two frames is detected (step 502). In the present invention, the motion of the camera is divided into a rotational motion and a linear motion.

Meanwhile, depth information of each pixel of the first frame image (left image in the stereo image) is obtained from the acquired stereo image (step 504).

The converted image for the first frame image (left image in the stereo image) is generated based on the pixel-by-pixel depth information and the camera motion information (step 506). The transformed image can be generated by sequentially performing the rotation transformation and the linear transformation on the first frame image. The rotation transformation can be performed as shown in Equation (2), and the linear transformation can be performed as Equation (3).

When the transformed image is generated, an optical flow image is generated using the transformed image and the second frame image (left image in the stereo image) (step 508). As described above, the optical flow image is generated using the difference image between the transformed image and the second frame image.

Once the optical flow image is created, a post-processing is performed to further clarify the object area, and an erosion filter and an expansion filter may be used as an example, as described above, for post-processing (step 510).

A dynamic object is detected from the post-processed optical flow image (step 512).

As described above, the present invention has been described with reference to particular embodiments, such as specific elements, and specific embodiments and drawings. However, it should be understood that the present invention is not limited to the above- And various modifications and changes may be made thereto by those skilled in the art to which the present invention pertains. Accordingly, the spirit of the present invention should not be construed as being limited to the embodiments described, and all of the equivalents or equivalents of the claims, as well as the following claims, belong to the scope of the present invention .

Claims

An image acquiring unit acquiring a stereo image of the first frame image and the second frame image;

A camera motion detector for detecting camera motion in a first frame and a second frame;

A depth information operation unit for obtaining a depth of each pixel of the first frame image using a stereo image of the first frame image;

A transformed image generation unit for generating a transformed image obtained by transforming the first frame image based on the camera motion and the depth per pixel;

An optical flow image generation unit generating an optical flow image using the second frame image and the transformed image; And

And a dynamic object detection unit for detecting a dynamic object using the generated optical flow image,

Wherein the transformed image is a transformed image of the camera motion reflected on the first frame image.
The method according to claim 1,

Wherein the camera motion detecting unit independently detects the rotational motion and the linear motion.
3. The method of claim 2,

Wherein the transformed image generator independently generates the transformed image by applying the rotational motion and the linear motion independently.
The method of claim 3,

Wherein the transformed image generating unit transforms the first frame image according to the rotational motion according to the following equation.

In the above equation, r u denotes a rotational motion with respect to the first frame image pixel coordinate u, r v denotes rotational motion with respect to the first frame image pixel coordinate v, yaw, pitch, Denotes the angle formed by the vertical axis of the first frame image and the coordinate (uv), FoV denotes the viewing angle, d denotes the depth information acquired for each pixel, s denotes the distance from the vanishing point (uv), height denotes the number of pixels in the vertical axis of the first frame image, and width denotes the number of pixels in the horizontal axis of the first frame image.
The method of claim 3,

Wherein the transformed image generating unit transforms the first frame image according to the following equation according to the linear motion.

The above equation, t u and t v refers to the conversion of the u-axis and v-axis in the first frame image coordinates (u, v), and, d is the depth information obtained for each pixel, θ is the first frame Height means the number of pixels in the vertical axis of the first frame image, and width means the number of pixels in the horizontal axis of the first frame image.
The method according to claim 1,

Wherein the optical flow image generating unit generates an optical flow image using the difference image of the transformed image and the second frame image.
The method according to claim 1,

Wherein the optical flow image generating unit generates an optical flow image using the difference image of the transformed image and the second frame image.
The method according to claim 1,

Further comprising a post-processing unit for performing post-processing by applying an erosion filter and an expansion filter to the generated optical flow image.
An image acquiring unit acquiring a stereo image of the first frame image and the second frame image;

A camera motion detector for detecting camera motion in a first frame and a second frame;

A depth information operation unit for obtaining a depth of each pixel of the first frame image using a stereo image of the first frame image;

A transformed image generation unit for generating a transformed image obtained by transforming the first frame image based on the camera motion and the depth per pixel;

An optical flow image generation unit generating an optical flow image using the second frame image and the transformed image; And

And a dynamic object detection unit for detecting a dynamic object using the generated optical flow image,

Wherein the camera motion detecting unit independently detects the rotational motion and the linear motion.
10. The method of claim 9,

Wherein the transformed image generator independently generates the transformed image by applying the rotational motion and the linear motion independently.
(A) obtaining a stereo image of a first frame image and a second frame image;

(B) detecting camera motion in a first frame and a second frame period;

(C) obtaining a depth of each pixel of the first frame image using a stereo image of the first frame image;

(D) generating a transformed image obtained by transforming the first frame image based on the camera motion and the depth per pixel;

(E) generating an optical flow image using the second frame image and the transformed image; And

(F) detecting a dynamic object using the generated optical flow image,

Wherein the transformed image is a transformed image of the camera motion reflected on the first frame image.
12. The method of claim 11,

Wherein the step (b) independently detects the rotational motion and the linear motion.
13. The method of claim 12,

Wherein the step (d) independently applies the rotational motion and the linear motion to generate a transformed image.
14. The method of claim 13,

Wherein the step (d) transforms the first frame image according to the rotation motion as expressed by the following equation.

In the above equation, r u denotes a rotational motion with respect to the first frame image pixel coordinate u, r v denotes rotational motion with respect to the first frame image pixel coordinate v, yaw, pitch, Denotes the angle formed by the vertical axis of the first frame image and the coordinate (uv), FoV denotes the viewing angle, d denotes the depth information acquired for each pixel, s denotes the distance from the vanishing point (uv), height denotes the number of pixels in the vertical axis of the first frame image, and width denotes the number of pixels in the horizontal axis of the first frame image.
14. The method of claim 13,

Wherein the step (d) transforms the first frame image according to the following equation according to the linear motion.

The above equation, t u and t v refers to the conversion of the u-axis and v-axis in the first frame image coordinates (u, v), and, d is the depth information obtained for each pixel, θ is the first frame Height means the number of pixels in the vertical axis of the first frame image, and width means the number of pixels in the horizontal axis of the first frame image.
12. The method of claim 11,

Wherein the step (e) generates an optical flow image using a difference image of the transformed image and the second frame image.
12. The method of claim 11,

Wherein the step (e) generates an optical flow image using a difference image of the transformed image and the second frame image.
12. The method of claim 11,

Further comprising: applying an erosion filter and an expansion filter to the generated optical flow image to perform post-processing.