WO2022102015A1

WO2022102015A1 - Image information acquisition device, image information acquisition method, and computer program

Info

Publication number: WO2022102015A1
Application number: PCT/JP2020/042069
Authority: WO
Inventors: 勇五十嵐; 隆行黒住; 誠之高村; 英明木全
Original assignee: 日本電信電話株式会社
Priority date: 2020-11-11
Filing date: 2020-11-11
Publication date: 2022-05-19

Abstract

An image information acquisition device comprising: a classification unit for classifying, from relevant images which are the images to be processed, the subjects seen in the relevant images into a subject of the same kind; and a three-dimensional information acquisition unit for acquiring information that indicates the three-dimensional shapes of the subjects, on the basis of a plurality of images of subjects classified into the subject of the same kind by the classification unit.

Description

Image information acquisition device, image information acquisition method and computer program

The present invention relates to an image information acquisition device, an image information acquisition method, and a computer program technique.

In recent years, there has been a demand for a technique for generating an image that reproduces the appearance of a subject from an arbitrary viewpoint. As one of the means, there is a method of acquiring and recording information on the three-dimensional appearance of space. Specific examples of such means include shooting a stereo image using multiple cameras, shooting a depth image using a depth sensor, and acquiring a three-dimensional point cloud using LiDAR (Light Detection and Ringing). be. There is also a technique of using an image or video whose pixel value is a color (luminance) taken by a single camera. Specific examples of such techniques include Structure from Motion (SfM) (Non-Patent Document 1) and Simultaneous Localization and Mapping (SLAM) (Non-Patent Document 2). In these techniques, three-dimensional information is acquired by using a large number of images of the same subject.

However, the conventional technique requires a large number of images, and there is a problem that the accuracy of the obtained three-dimensional information is lowered when the number of images is small. Therefore, when obtaining three-dimensional information from a moving image, there is also a problem that a long-time image is required.
In view of the above circumstances, an object of the present invention is to provide a technique capable of acquiring more accurate three-dimensional information by using a smaller number of images.

One aspect of the present invention is a classification unit that classifies a subject reflected in the target image into the same type of subject from the target image that is the image to be processed, and a plurality of subjects classified as the same type of subject by the classification unit. This is an image information acquisition device including a three-dimensional information acquisition unit that acquires information indicating the three-dimensional shape of the subject based on the image of the above.

One aspect of the present invention is a classification step of classifying a subject reflected in the target image into the same type of subject from the target image which is an image to be processed, and a plurality of subjects classified as the same type of subject in the classification step. This is an image information acquisition method including a three-dimensional information acquisition step for acquiring information indicating the three-dimensional shape of the subject based on the image of the above.

One aspect of the present invention is a computer program for operating a computer as the above-mentioned image information acquisition device.

According to the present invention, it is possible to acquire more accurate three-dimensional information by using fewer images.

It is a figure which shows the functional structure example of the image information acquisition apparatus 100 of this invention. It is a figure which shows the specific example of the image information table which the image information storage unit 302 stores. It is a figure which shows the specific example of the target image. It is a figure which shows the specific example of a subject area. It is a figure which shows the specific example of the position and posture of each subject in a three-dimensional space. It is a figure which shows the outline of the generation of the image from the viewpoint different from the target image. It is a figure which shows the specific example of the image newly generated with respect to the new viewpoint 86_2. It is a figure which shows the specific example of the processing of the image information acquisition apparatus 100. It is a figure which shows the specific example of the hardware composition of the image information acquisition apparatus 100.

Embodiments of the present invention will be described in detail with reference to the drawings.
FIG. 1 is a diagram showing a functional configuration example of the image information acquisition device 100 of the present invention. The image information acquisition device 100 is configured by using information devices such as a personal computer, a server device, a game device, a smartphone, and an image pickup device. The image information acquisition device 100 includes an image input unit 10, an output unit 20, a storage unit 30, and a control unit 40.

The image input unit 10 receives image data input to the image information acquisition device 100. The image data input by the image input unit 10 may be still image data or moving image data. The image input unit 10 may read image data recorded on a recording medium such as a CD-ROM or a USB memory (Universal Serial Bus Memory). Further, the image input unit 10 may receive an image captured by a still camera or a video camera from the camera. Further, when the image information acquisition device 100 is built in a still camera, a video camera, or an information processing device provided with a camera, the image input unit 10 may receive the captured image or the image before imaging from the bus. good. Further, the image input unit 10 may receive image data from another information processing device via a network. The image input unit 10 may be configured in a different manner as long as it can receive input of image data.

The output unit 20 outputs image information and image data generated by the control unit 40. The output unit 20 may write image information or image data to a recording medium such as a CD-ROM or a USB memory (Universal Serial Bus Memory). When the image information acquisition device 100 is built in a still camera, a video camera, or an information processing device including a camera, the output unit 20 is provided with the generated image information and image data in these devices. It may be recorded on a recording medium or displayed as a preview image on a display device provided in these devices. Further, the output unit 20 may transmit image information or image data to another information processing device via the network. The output unit 20 may be configured in a different manner as long as it can output image information and image data.

The storage unit 30 is configured by using a storage device such as a magnetic hard disk device or a semiconductor storage device. The storage unit 30 functions as, for example, an image storage unit 301 and an image information storage unit 302. The image storage unit 301 stores image data input by the image input unit 10. The image storage unit 301 may store still image data or moving image data. The image information storage unit 302 stores image information generated by the control unit 40.

FIG. 2 is a diagram showing a specific example of an image information table stored in the image information storage unit 302. The image information table has a record for each combination of an image to be processed (hereinafter referred to as "target image") and a subject in the target image. Each record has, for example, identification information indicating a target image (hereinafter referred to as "target image identification information"), identification information indicating a subject (hereinafter referred to as "subject identification information"), and image information in association with each other. The image information is information about the image of the subject in the corresponding target image. The image information includes, for example, area information indicating the subject area of the subject, information indicating the three-dimensional shape of the subject (hereinafter referred to as "3D model"), and information such as state parameters indicating the position and posture of the subject. include.

The control unit 40 is configured by using a processor such as a CPU (Central Processing Unit) and a memory. The control unit 40 includes an input / output control unit 401, an area information acquisition unit 402, a classification unit 403, a three-dimensional information acquisition unit 404, a state parameter acquisition unit 405, an additional information acquisition unit 406, and an image when the processor executes a program. It functions as a generator 407. All or part of each function of the control unit 40 may be realized by using hardware such as ASIC (Application Specific Integrated Circuit), PLD (Programmable Logic Device), and FPGA (Field Programmable Gate Array). The above program may be recorded on a computer-readable recording medium. Computer-readable recording media include, for example, flexible disks, magneto-optical disks, ROMs, CD-ROMs, portable media such as semiconductor storage devices (for example, SSD: Solid State Drive), hard disks and semiconductor storage built into computer systems. It is a storage device such as a device. The above program may be transmitted over a telecommunication line.

The input / output control unit 401 controls the input / output of data. For example, the input / output control unit 401 acquires image data by controlling the operation of the image input unit 10. The input / output control unit 401 records the input image data in the image storage unit 301. The input / output control unit 401 may temporarily record the input image data in a storage device such as a memory, if necessary. The input / output control unit 401 outputs the image information data recorded in the image information storage unit 302 and the image data generated by the image generation unit 407 to an external device by controlling the output unit 20. ..

The area information acquisition unit 402 acquires information (hereinafter referred to as "area information") indicating the area of each subject existing in the image (hereinafter referred to as "subject area") in the target image for each subject. The target image may be an image stored as a still image in the image storage unit 301, or may be an image of a frame of a moving image stored as a moving image in the image storage unit 301. The target image may be one still image or frame, or may be a plurality of still images or frames. The target image may be a combination of a still image and a frame. When a plurality of frames are used as the target image, a plurality of frames may be acquired from one moving image. In that case, the time interval of each frame may be configured to be equal to or larger than a predetermined threshold value so that frames from different viewpoints can be obtained. In that case, the frame from which the area information is acquired may be determined by the area information acquisition unit 402 based on a predetermined criterion.

In any case, when used for the processing of the three-dimensional information acquisition unit 404 and the state parameter acquisition unit 405, the frame is a still image or a moving image obtained by photographing the same or the same type of subject. Is desirable. Further, when it is used for the processing of the state parameter acquisition unit 405, for example, the position of each subject in the three-dimensional space may be the same or different in the frame of each still image or moving image. The subject area is an area surrounded by the outline of the subject.

FIG. 3 is a diagram showing a specific example of the target image. A plurality of subjects are shown in the target image of FIG. The subject 81 and the subject 86 are heart-shaped objects. The subject 81 and the subject 86 have the same type of object or a similar shape. The subject 82, the subject 83, the subject 84, and the subject 85 are star-shaped objects. The subject 82, the subject 83, the subject 84, and the subject 85 have the same type of object or a similar shape. Each subject 81 to 86 is photographed at a unique position and tilted at a unique angle.

FIG. 4 is a diagram showing a specific example of the subject area. Each shape shown in a different pattern in FIG. 4 indicates a subject area. The subject areas 91 to 96 indicate the areas of the subjects 81 to 86, respectively.

Hereinafter, a specific example of the processing of the area information acquisition unit 402 will be described. The area information acquisition unit 402 may estimate, for example, which subject corresponds to each pixel in the target image, or which subject does not correspond to any subject. The techniques applied to this estimation need not be limited to specific ones. For example, techniques based on deep learning such as Mask-RCNN and GAN may be applied. Further, the subject area of each subject may be manually specified. The area information acquisition unit 402 records the generated area information data of each subject area in the image information storage unit 302 as image information associated with the identification information of the target image and the identification information of each subject.

The classification unit 403 classifies each subject area for each subject of the same type. The subject area to be classified is not limited to one target image (same target image), and the subject of a plurality of subject areas obtained in each of a plurality of target images may be classified. .. For example, when n target images from which m subject areas have been acquired are used (m and n are both integers of 1 or more), m × n subject areas may be the target of classification.

For example, the classification unit 403 classifies subject areas of subjects having the same appearance or similar to each other as the same group. The technique applied to the classification unit 403 need not be limited to a specific one. For example, when the deep learning used in the area information acquisition unit 402 is a technique capable of estimating the subject category, the classification unit 403 may classify the subject areas of the same category into the same group. For example, the classification unit 403 may calculate the similarity between subject areas of the same category based on the feature amount, and classify the subject areas having high similarity into the same group. By performing the processing in this way, a more subdivided classification can be realized. Further, when the candidate of the subject is known, an image in which the subject is taken (hereinafter referred to as "reference image") may be used in addition to the target image. In this case, the classification unit 403 may determine which reference image is most similar to the subject for each subject area obtained from the target image, and may generate a group for each reference image. Further, each subject area may be manually classified.

The three-dimensional information acquisition unit 404 generates a 3D model of the subject of each group based on the information obtained from a plurality of subject areas belonging to each group. The 3D model may be represented by, for example, a three-dimensional point group, a polygon, or another model. Further, the 3D model may be stored in the storage unit 30 in advance as known information. The technique applied to the three-dimensional information acquisition unit 404 does not have to be limited to a specific one. For example, the three-dimensional information acquisition unit 404 may handle each of the images in the subject area as a plurality of images of the same individual taken at different positions and different postures. For example, the three-dimensional information acquisition unit 404 may generate a 3D model by executing Structure from Motion (SfM) using the images of the plurality of subject areas described above. The three-dimensional information acquisition unit 404 records the generated 3D model data in the image information storage unit 302 as image information associated with the identification information of the target image and the identification information of the subject indicated by the 3D model.

The state parameter acquisition unit 405 generates information (hereinafter referred to as "state parameter") indicating the positional relationship with the camera, the posture, and the like for the subject in each subject area. The technique applied to the state parameter acquisition unit 405 does not have to be limited to a specific one. For example, the state parameter acquisition unit 405 may acquire the state parameter for each subject area by using SfM.

A specific example of the processing of the state parameter acquisition unit 405 will be described in more detail. Three-dimensional world coordinates are given to the 3D model. For example, when the 3D model of the subject is represented by a three-dimensional point group, the coordinates of each point are represented by world coordinates. For example, when the 3D model of the subject is represented by polygons, each point forming the polygon is represented by world coordinates. Also, the camera's internal parameters for converting world coordinates to image coordinates (eg, focal length, optical center, distortion factor, etc.) are estimated. Further, for each of the subject areas, a coordinate transformation matrix representing the coordinates and orientation of the camera such that the appearance of the subject in the subject area and the appearance of the 3D model match is estimated as a state parameter.

The following is a concrete example of how to express the coordinate conversion based on the estimated state parameters. First, consider the conversion from the world coordinate system to the camera coordinate system corresponding to a certain subject area. The coordinates (X, Y, Z) of the world coordinate system can be converted into the coordinates (X', Y', Z') of the camera coordinate system by the following equation 1.

In Equation 1, R is a coordinate transformation matrix and is expressed as in Equation 2 below.

Of the components of R, (tx, ty, yz) represents translation. Of the components of R, R11 to R33 are values corresponding to the rotation matrix. R11 to R33 can also be expressed as the following equation 3 by interpreting them in the form of rotating around each coordinate axis in the order of, for example, y-axis, z-axis, and x-axis. In addition, the coordinates of the camera coordinate system can be converted to the coordinates (i, j) of the image coordinate system by projection conversion as shown below. Here, f and (cx, cy) are the focal length and optical center of the camera, respectively.

R itself and the components of R obtained as described above are acquired as state parameters. FIG. 5 is a diagram showing specific examples of positions and postures of each subject in a three-dimensional space. Each subject shown in the image of FIG. 3 is arranged at each position in a three-dimensional space in each posture. The position and state of each subject are represented by state parameters.

The processing of the state parameter acquisition unit 405 may be provided with a constraint condition that the subjects in each subject area do not overlap three-dimensionally (they do not overlap in the same space). By providing such a constraint condition, it is possible to improve the acquisition accuracy of the state parameter. The state parameter acquisition unit 405 records the generated state parameter data in the image information storage unit 302 as image information associated with the identification information of the target image and the identification information of the subject indicated by the state parameter.

The additional information acquisition unit 406 acquires additional information for each subject area. As a specific example of the additional information, there is information about a relative three-dimensional position with respect to the subject in another subject area of the same group. The image of each subject area can be considered to match the appearance of the subject represented by the 3D model when viewed from a specific position and posture. Therefore, if a reference position that serves as a reference for the viewpoint of the 3D model is arbitrarily specified, the position relative to the reference position can be calculated. This calculation may be performed using, for example, the coordinate transformation matrix R in each subject area. By performing coordinate transformation using the inverse matrix of the coordinate transformation matrix R obtained for each subject region, the relative position of the subject in each subject region can be represented by the world coordinate system. The coordinates of the 3D model obtained by performing such coordinate conversion have the same positional relationship as in real space. This makes it possible to represent the relative position between subjects in each subject area.

As another specific example of the additional information, there is information on how the surface of the subject looks in each subject area. For example, it may be information about the texture of the surface of the subject (for example, information about the color, shape, and material) or information about the reflectance of light on the surface of the subject. As another specific example of the additional information, there is information about a light source that irradiates a subject in each subject area with light (for example, color of light in the light source, light intensity, information about the position of the light source, number of light sources). .. Such information may be obtained, for example, by using a trained model by deep learning or machine learning regarding a light source.

As another specific example of the additional information, there is information on whether or not the subject in each subject area is in contact with another subject. Such information may be acquired based on whether or not the distance of the closest portion of the surface of each subject is smaller than a predetermined value by using a 3D model and a state parameter. The additional information acquisition unit 406 records the generated additional information data in the image information storage unit 302 as image information associated with the identification information of the target image and the identification information of the subject indicated by the additional information.

When the image generation unit 407 is given information on the camera parameters required for generating an image, the image generation unit 407 generates an image according to the camera parameters. Information regarding the camera parameters may be given by the user operating an input device (keyboard, pointing device, touch panel, etc.) included in the image information acquisition device 100, or information transmitted from another information processing device may be received. May be given by doing.

For example, when the target image identification information is given together with the camera parameters, a new image is generated based on the given camera parameters for each subject shown in the target image. The information regarding the camera parameters may be, for example, information indicating a focal length, an optical center, and a strain count. The information regarding the camera parameters may be, for example, information indicating the position of the viewpoint, the direction of the line of sight, and the size of the screen (image). The information regarding the camera parameters may be any information as long as it is possible to generate an image. When the target image identification information is given, the image is generated based on the image information associated with the target image.

The image generation unit 407 may generate an image by further performing coordinate conversion using, for example, a transformation matrix representing a change in viewpoint on a 3D model whose coordinates have been converted so as to represent the position of each subject. For example, an image similar to the target image may be generated based on the same viewpoint as the target image as shown in FIG. 3, or a new image may be generated based on a viewpoint different from the target image. ..

FIG. 6 is a diagram showing an outline of image generation from a viewpoint different from the target image. In FIG. 6, the viewpoint 86_1 is a viewpoint in the target image of FIG. A value related to the subject 82 is input to the pixels of the coordinates (i_1, j_1) in the target image. In FIG. 6, the viewpoint 86_2 is a new viewpoint different from the target image. In the new image generated based on the new viewpoint 86_2, the values for the same portion of the subject 82 are in the pixels of the coordinates (i_2, j_2). With respect to the new viewpoint 86_2, a new image is generated by performing processing such as coordinate conversion on each subject and each pixel. FIG. 7 is a diagram showing a specific example of a newly generated image with respect to the new viewpoint 86_2.

FIG. 8 is a diagram showing a specific example of processing of the image information acquisition device 100. First, the input / output control unit 401 inputs the target image to be processed and records it in the image storage unit 301 (step S101). The area information acquisition unit 402 acquires area information indicating the subject area in the target image for each subject and records it in the image information storage unit 302 (step S102). The classification unit 403 classifies each subject (step S103). The three-dimensional information acquisition unit 404 generates a 3D model of the subject by using the information of a plurality of subjects classified into the same group, and records the generated 3D model data in the image information storage unit 302. (Step S104). The state parameter acquisition unit 405 generates a state parameter for each subject and records the state parameter in the image information storage unit 302 (step S105). The additional information acquisition unit 406 acquires additional information for each subject and records it in the image information storage unit 302 (step S106).

FIG. 9 is a diagram showing a specific example of the hardware configuration of the image information acquisition device 100. The image information acquisition device 100 includes, for example, an input / output device 1, an auxiliary storage device 2, a memory 3, and a processor 4 as shown in FIG. The input / output device 1 inputs / outputs information (including data) to and from the outside (including the user) in the image information acquisition device 100. The input / output device 1 functions as, for example, an image input unit 10 or an output unit 20. The auxiliary storage device 2 is configured by using a magnetic hard disk device or a semiconductor storage device. The auxiliary storage device 2 functions as, for example, a storage unit 30. The memory 3 and the processor 4 function as, for example, a control unit 40.

In the image information acquisition device 100 configured in this way, the subject in the image to be processed may be a subject classified as the same type by the classification unit 403 even if it is actually a separate subject. For example, three-dimensional information (for example, a 3D model) of an image is acquired using the images of those subjects. Therefore, if there are multiple subjects of the same type in one image, even if there are only a few images as separate subjects, it is possible to acquire more accurate three-dimensional information using those few images. Become.

(Modification example)
The image information acquisition device 100 may be configured not to include the image generation unit 407. The image generation unit 407 may be mounted on another information processing device. In this case, the image information storage unit 302 may be further mounted on the information processing apparatus on which the image generation unit 407 is mounted. With such a configuration, it becomes possible to easily generate an image of a subject in another information processing apparatus based on the image information acquired in the image information acquisition apparatus 100.

The image information acquisition device 100 may be mounted separately in a plurality of devices. In this case, for example, the image information acquisition device 100 may be implemented as an image information acquisition system including a plurality of devices. For example, the information processing device having the control unit 40 and the information processing device having the storage unit 30 may be mounted as different devices, or the functions of the storage unit 30 may be duplicated and mounted on a plurality of information processing devices. Alternatively, the function of the control unit 40 may be implemented separately in a plurality of information processing devices.

As described above, the embodiment of the present invention has been described in detail with reference to the drawings, but the specific configuration is not limited to this embodiment, and the design and the like within a range not deviating from the gist of the present invention are also included.

The present invention is applicable to an apparatus for acquiring image information.

100 ... Image information acquisition device, 10 ... Image input unit, 20 ... Output unit, 30 ... Storage unit, 301 ... Image storage unit, 302 ... Image information storage unit, 40 ... Control unit, 401 ... Input / output control unit, 402 ... Area information acquisition unit, 403 ... Classification unit, 404 ... Three-dimensional information acquisition unit, 405 ... State parameter acquisition unit, 406 ... Additional information acquisition unit, 407 ... Image generation unit, 81-86 ... Subject, 91-96 ... Subject area

Claims

A classification unit that classifies the subject appearing in the target image into the same type of subject from the target image that is the image to be processed.
A three-dimensional information acquisition unit that acquires information indicating the three-dimensional shape of the subject based on images of a plurality of subjects classified as the same type of subject by the classification unit.
Image information acquisition device including.
The image information acquisition device according to claim 1, further comprising a state parameter acquisition unit for acquiring a state parameter which is information indicating a position and a posture of the target image in a three-dimensional space for each subject.
The image information acquisition device according to claim 2, wherein the state parameter acquisition unit performs processing based on a constraint condition that subjects do not overlap each other three-dimensionally.
The image information acquisition device according to any one of claims 1 to 3, further comprising an area information acquisition unit that acquires area information that is information indicating an area occupied by the image of the subject for each subject from the target image. ..
The image information acquisition device according to any one of claims 1 to 4, further comprising an additional information acquisition unit for acquiring information regarding relative three-dimensional positions of the subjects in the target image.
From the target image, which is the image to be processed, the classification step of classifying the subject appearing in the target image into the same type of subject, and
A three-dimensional information acquisition step for acquiring information indicating the three-dimensional shape of the subject based on images of a plurality of subjects classified as the same type of subject in the classification step.
Image information acquisition method having.
A computer program for operating a computer as the image information acquisition device according to any one of claims 1 to 5.