WO2020017668A1

WO2020017668A1 - Method and apparatus for generating avatar by using multi-view image matching

Info

Publication number: WO2020017668A1
Application number: PCT/KR2018/007996
Authority: WO
Inventors: 신후랑
Original assignee: 주식회사 이누씨
Priority date: 2018-07-16
Filing date: 2018-07-16
Publication date: 2020-01-23

Abstract

A method and an apparatus for generating an avatar by using multi-view image matching are disclosed. Provided in an embodiment of the present invention are a method and an apparatus for matching multi-view images, the method and the apparatus allowing rapid modeling so as to minimize a calculation load when an avatar is generated using multi-view image matching, thereby rapidly generating virtualization view data as an avatar or transforming same into the avatar, and enabling compositing with a background image or enabling replacement of a body part with another character.

Description

A method and apparatus for generating an avatar using multiview image registration

The present embodiment relates to a method and apparatus for generating an avatar using multi-view image matching.

The contents described below merely provide background information related to this embodiment and do not constitute a prior art.

The avatar represents another self existing in the Internet or a mobile communication environment, which is a virtual space, and is capable of being transformed into any form that can be expressed not only in the form of a human but also in an animal or a plant. The production and use of the avatar is similar to the user's own appearance, and can represent the user's characteristics, and may include meanings such as user's curiosity, surrogate satisfaction, and the ideal person required by the individual. The user has become very interested in the production and use of avatars representing individuals.

In general, there may be photographic or moving picture information as a means for indicating the shape of an individual, but these data are enormous in size and are difficult to transmit and process in the Internet or a mobile communication terminal. Since the user cannot edit or control the image data or the video data, the user cannot easily be favored to other users, and the user's personality cannot be properly displayed.

Therefore, two-dimensional or three-dimensional avatars are configured in a form that can express individuality of individuals, and avatars are exchanged between users on a network, or data exchange using avatars is more actively performed.

Typically, an avatar is created by a designer directly looking at a user or a user's picture to draw an avatar, selecting a desired avatar from a predetermined avatar, and combining each item stored in a database. It can be divided into the configuration method.

In the case of selecting an avatar desired by a user from a predetermined avatar and a method of configuring an avatar by combining each item stored in a database, the user may create his or her avatar by simple operation. The avatar created by the above-described method may be produced to emphasize the personality of the user's desired form, but it is impossible to produce the avatar in a form similar to the actual appearance of the user.

As a method of producing an avatar similar to a user's actual appearance, there is a method in which a designer who creates an avatar directly creates an avatar based on the user's image. The conventional avatar generation method transmits a picture of a user photographing his or her face to the avatar service company to design an appropriate avatar according to the image provided by the user by the designer who creates the avatar in the service company.

If the designer makes an avatar using the image provided by the user, the avatar can be made close to the user's appearance. However, when the avatar is produced by the designer, it takes a lot of time to produce the avatar. Resource and effort are consumed.

Although the user's face can be recognized and modeled from the live image, and the avatar can be completed based on the modeled face image, there is a problem that it is difficult to create an avatar having high similarity with the user using only one image. There is a problem in that a large load is required to generate an avatar having high similarity to a user using a plurality of images, and the speed of generating the avatar is slow.

In this embodiment, fast modeling is performed so that computational load is minimized when generating an avatar using multi-view image matching, so that the virtualized view data can be quickly generated or transformed into an avatar, synthesized with a background image, or the like. It is an object of the present invention to provide a multi-view image registration method and apparatus that can replace other characters and body parts.

According to an aspect of the present embodiment, an image acquisition unit for obtaining a plurality of multi-view image information obtained by photographing a specific object to a multi-view from a plurality of cameras; An extraction unit for recognizing an object from each of the plurality of multi-view image information, extracting a feature point for the object, and extracting a point cloud based on the feature point; A duplicate confirmation unit for generating duplicate point data by extracting point clouds in which overlap between the point clouds occurs after performing mutual position matching between the point clouds; And removing the point clouds corresponding to the duplicated point data among all the point clouds, and performing modeling to minimize the computational load between the remaining point clouds, thereby creating an avatar capable of 360 ° rotation to enable a 3D virtualized view. It provides an image registration device comprising a matching unit.

According to another aspect of the present embodiment, the method comprises: obtaining a plurality of multi-view image information obtained by photographing a specific object from a plurality of cameras in a multi-view; Recognizing an object from each of the plurality of multi-view image information, extracting a feature point for the object, and extracting a point cloud based on the feature point; Generating overlapping point data by extracting point clouds in which overlap between the point clouds occurs after performing mutual location matching between the point clouds; And removing the point clouds corresponding to the duplicated point data among all the point clouds, and performing modeling to minimize the computational load between the remaining point clouds, thereby creating an avatar capable of 360 ° rotation to enable a 3D virtualized view. It provides a video registration method comprising a process.

As described above, according to the present embodiment, fast modeling is performed to minimize the computational load when generating an avatar using multi-view image matching, so that the virtualized view data can be quickly generated or transformed into the avatar. Rather, it can be combined with a background image or replace other characters and body parts.

1A and 1B are block diagrams schematically illustrating a multiview image matching system according to an exemplary embodiment.

2 is a block diagram schematically illustrating a user terminal for multiview image matching according to the present embodiment.

3 is a block diagram schematically illustrating a multiview image matching device according to an embodiment.

4 is a view for explaining mesh modeling according to an embodiment.

5 is a view for explaining the appearance change and the background replacement of the avatar according to the present embodiment.

6 illustrates avatar rotation according to the present embodiment.

7 is a flowchart illustrating a method of generating an avatar using multi-view image registration according to the present embodiment.

Hereinafter, the present embodiment will be described in detail with reference to the accompanying drawings.

The multi-view image matching system according to the present embodiment includes a plurality of cameras 110_1, 110_2, 110_N, a plurality of control devices 120_1, 120_2, 120_N, a user terminal 130 for matching, and a streaming server 140. do.

The plurality of cameras 110_1, 110_2, and 110_N are apparatuses for photographing a specific object. The plurality of cameras 110_1, 110_2, and 110_N photograph a specific object (eg, a user) as a multi-view and transmit the photographed object to the plurality of control devices 120_1, 120_2, and 120_N.

The plurality of cameras 110_1, 110_2, and 110_N are peripheral devices used in connection with control devices 120_1, 120_2, and 120_N that can recognize a specific object and experience games and entertainment without a separate controller. The plurality of cameras 110_1, 110_2, and 110_N may be, for example, a peripheral device such as Kinect.

A plurality of cameras (110_1, 110_2, 110_N) may be provided with a separate sensor, when equipped with a sensor, by using a sensor to recognize the operation or gesture of a specific object (user), the microphone module provided voice Can be recognized. A plurality of cameras 110_1, 110_2, and 110_N need a separate power source to connect to the plurality of control devices 120_1, 120_2, and 120_N.

The sensors provided in the plurality of cameras 110_1, 110_2, and 110_N are depth cameras, and provide RGB images and joint tracking information as well as depth information in real time.

A plurality of cameras (110_1, 110_2, 110_N) using the data provided from the depth sensor to detect the human / body parts or poses required for gesture recognition, and to play a game or human-computer interaction.

The plurality of control apparatuses 120_1, 120_2, and 120_N are apparatuses for processing an image, and receive and photograph information of a specific object (for example, a user) received from a plurality of cameras 110_1, 110_2, and 110_N. Generate point image information. The plurality of control apparatuses 120_1, 120_2, and 120_N transmit the multi-view image information of photographing a specific object (eg, a user) to the user terminal 130.

The user terminal 130 generates a avatar corresponding to a specific object (eg, a user) by quickly matching the multi-view image to minimize the computational load. The user terminal 130 includes an image matching program 232 and generates an avatar using the mounted image matching program 232. The user terminal 130 transmits the generated avatar to the streaming server 140.

As shown in FIG. 1A, the user terminal 130 includes all surfaces of a specific object (eg, a user's body shape) using a plurality of cameras 110_1, 110_2, and 110_N (at least three cameras). Create an avatar.

The streaming server 140 transmits the avatar received from the user terminal 130 to a smart phone, a tablet, a notebook, and the like. The streaming server 140 transmits and plays a multimedia file such as sound (music) or video.

Normally, the file is opened after downloading, but when playing a large file such as a video, it may take a long time to download, but the streaming server 140 waits by downloading and playing the file. Can be greatly reduced. The streaming server 140 may also stream in real time the avatar received from the user terminal 130 by streaming on the computer network.

As illustrated in FIG. 1B, the user terminal 130 collects 3D virtualized view image data captured by the plurality of cameras 110_1, 110_2, and 110_N, generates a 3D avatar, and transmits the 3D avatar to the streaming server 140. 140 may use the 3D avatar received from the user terminal 130 to transmit to the mobile devices of general users to use the online virtual fan meeting for the service.

The user terminal 130 according to the present exemplary embodiment may include a CPU 210, a main memory 220, a main memory 220, a memory 230, a display 240, an input 250, and a communicator 260. Include. Components included in the user terminal 130 are not necessarily limited thereto.

The user terminal 130 refers to an electronic device that performs voice or data communication via a network according to a user's key manipulation. The user terminal 130 includes a memory for storing a program or protocol for communicating with a game server via a network, a microprocessor for executing and controlling the program, and the like.

The user terminal 130 is preferably a personal computer (PC), but is not necessarily limited thereto, and may be a smartphone, a tablet, a laptop, a personal digital assistant (PDA). Electronic devices such as a digital assistant, a game console, a portable multimedia player (PMP), a PlayStation Portable (PSP), a wireless communication terminal, a media player, and the like.

The user terminal 130 executes (i) a communication device such as a communication modem for communicating with various devices or a wired / wireless network, (ii) a memory for storing various programs and data, and (iii) a program for operation and control. Various devices including a microprocessor for the purpose. According to at least one embodiment, the memory may be a computer such as random access memory (RAM), read only memory (ROM), flash memory, optical disk, magnetic disk, solid state disk (SSD), or the like. It may be a readable recording / storage medium.

The CPU 210 loads the image registration program 232 according to the present embodiment from the memory 230 to the main memory 220. The CPU 210 receives a game user's command using the input unit 250 including a touch screen, a mouse, and a keyboard. The CPU 210 performs an image matching program 232 and outputs the result to the display unit 240. The CPU 210 downloads the image registration program 232 from the communication unit 260 and stores the image registration program 232 in the memory 230.

The image registration program 232 according to the present exemplary embodiment obtains a plurality of multi-view image information obtained by photographing a specific object from a plurality of cameras 110_1, 110_2, and 110_N. The image registration program 232 recognizes an object from each of the plurality of multi-view image information, extracts a feature point for the object, and extracts a point cloud based on the feature point.

The image matching program 232 generates the overlapping point data by extracting the point clouds where the overlap between the point clouds occurs after performing mutual position matching between the point clouds. The image matching program 232 removes the point clouds corresponding to the duplicate point data among the entire point clouds, and performs modeling to minimize the computational load between the remaining point clouds. Enable the view.

The communication unit 260 may include a near field communication (NFC), 2G, 3G, Long Term Evolution (LTE), time-division LTE (TD-LTE), a wireless local area network (WLAN) including Wi-Fi, and It performs wired and wireless communication including a wired LAN. The communication unit 260 transmits and receives data with the plurality of control devices 120_1, 120_2, and 120_N by performing wired or wireless communication.

The multi-view image registration device 200 according to the present embodiment refers to a device corresponding to the image registration program 232. In other words, the image matching program 232 according to the present embodiment may be implemented as a separate device including hardware.

The multi-view image matching device 200 according to the present embodiment includes an image acquisition unit 310, an extraction unit 312, a duplication checker 314, a matching unit 316, a sensor unit 320, and a composite image acquisition unit. 322, an image synthesizer 324. Components included in the multi-view image registration device 200 are not necessarily limited thereto.

Each component included in the multi-view image registration device 200 may be connected to a communication path connecting a software module or a hardware module inside the device to operate organically. These components communicate using one or more communication buses or signal lines.

Each component of the multi-view image registration device 200 illustrated in FIG. 3 refers to a unit that processes at least one function or operation, and may be implemented as a software module, a hardware module, or a combination of software and hardware. .

The image acquisition unit 310 obtains a plurality of multi-view image information obtained by photographing a specific object from a plurality of cameras.

The extractor 312 recognizes an object from each of the plurality of multi-view image information. The extractor 312 extracts a feature point for the object. The extractor 312 extracts a point cloud based on the feature points.

The duplication checker 314 performs mutual location matching between the point clouds. The duplicate confirmation unit 314 extracts point clouds in which overlap between point clouds occurs, and generates duplicate point data.

The matching unit 316 removes point clouds corresponding to duplicate point data among all point clouds, and calculates a point cloud that is finally left. The matching unit 316 performs modeling to minimize computational load between the point clouds that are finally left, thereby generating an avatar capable of 360 ° rotation to enable a 3D virtualized view.

The matching unit 316 generates an avatar by matching the remaining point clouds based on grid information according to the multi-view received from the image acquisition unit 310 so that the computational load is minimized among the remaining point clouds.

The matching unit 316 extracts neighboring point clouds among the point clouds left on the basis of grid information, and maintains mesh model structures of the neighboring point clouds intact between the mesh model structures. Quickly match adjacent point clouds (matching parts) to create an avatar.

The matching unit 316 extracts x, y coordinate information included on the grid information. The matching unit 316 extracts point clouds located on the same grid among the remaining point clouds by comparing x and y coordinate information. The matching unit 316 recognizes point clouds located on the same grid as neighboring point clouds.

The matching unit 316 performs new mesh modeling between adjacent points among the neighboring point clouds while maintaining the mesh model structure of the neighboring point clouds obtained without further calculation.

The matching unit 316 performs triangulation using only adjacent points among neighboring point clouds to perform new mesh modeling.

The matching unit 316 matches one point of a cloud to minimize a computational load between a point cloud corresponding to a specific body part of the avatar and a point cloud corresponding to a specific body part of another avatar.

The sensor unit 320 senses or receives direction information about a specific object. The composite image acquisition unit 322 obtains actual image information. The image synthesizing unit 324 displays the avatar on the screen simultaneously with the actual image information, and causes the avatar to rotate based on the direction information received from the sensor unit 320.

4 is a view for explaining mesh modeling according to an embodiment.

As shown in (a) of FIG. 4, in order to match between point clouds acquired from a multiview image in a graphics model, mutual location matching between point clouds is performed.

Then, the point clouds in which the overlap between the point clouds occurs are removed, and mesh modeling is performed on the point clouds that are finally left out of all the point clouds.

However, there is a limit in processing an image in real time because a lot of load and time are consumed in the process of performing mesh modeling on the point clouds that are finally left among the entire point clouds.

As shown in (b) of FIG. 4, the multiview image matching device 200 performs mutual position matching between point clouds for matching between point clouds acquired from a multiview image.

Thereafter, the multi-view image registration device 200 removes point clouds in which overlap between point clouds occurs.

A point cloud refers to a set of points belonging to a certain coordinate system. In three-dimensional coordinate systems, points are usually defined as X, Y, and Z coordinates and are often used to represent the surface of an object. Point clouds can be obtained by three-dimensional scanning. The extraction unit 312 in the multi-view image registration device 200 automatically measures a number of points on the surface of the object for the three-dimensional scanning operation, and generates a point cloud generated as a digital file. Point clouds are converted into polygon meshes, triangle meshes, NURB models, and CAD models through a surface reconstruction process.

As shown in (a) of FIG. 4, the multi-view image registration device 200 does not perform mesh modeling on the entire point cloud that is finally left among all the point clouds.

The multi-view image registration device 200 matches using the mesh model defined as it is while removing point clouds in which overlap occurs on grid information received from the plurality of cameras 110_1, 110_2, and 110_N.

In other words, the multi-view image registration device 200 maintains the mesh structure on the grid so that the multi-view image registration device 200 can be easily and quickly performed at the viewpoints of the plurality of cameras 110_1, 110_2, and 110_N.

When the mesh structure on the grid is maintained as it is, the accuracy of the mesh modeling is slightly reduced, but the execution speed can be processed very quickly.

As shown in (b) of FIG. 4, the multi-view image registration device 200 defines a face by connecting neighboring point clouds when generating a graphics model to perform mesh modeling. When the multi-view image registration device 200 allocates a triangular face using three points to perform mesh modeling, triangulation is called triangulation.

The multi-view image registration device 200 extracts neighboring point clouds among the point clouds left on the basis of grid information, and maintains the mesh model structures of the neighboring point clouds as they are. Quickly match adjacent point clouds (matching parts) between model structures.

Accuracy of mesh modeling when the multi-view image matching device 200 quickly matches adjacent point clouds (matching portions) between each mesh model structure while maintaining the mesh model structures of neighboring point clouds Although slightly degraded, the performance can be processed very quickly, enabling real-time processing of images.

New mesh modeling using all matched cloud points requires a large amount of computation, but the multi-view image matching device 200 according to the present exemplary embodiment includes grid information received from a plurality of cameras 110_1, 110_2, and 110_N. Using (x, y coordinate information), you can use the obtained mesh model as it is without any calculation and perform new mesh modeling only in the connection part between two point clouds.

As shown in (a) of FIG. 5, the multiview image matching device 200 may change or copy the appearance of the generated avatar.

The multi-view image registration device 200 generates a single avatar by matching a specific body part of the avatar with a specific body part of another avatar.

For example, the multi-view image registration device 200 may generate a new avatar by matching the face (head) of the avatar with the face (head) of another avatar. In addition, the multi-view image registration device 200 may generate a new avatar by matching the face (head) of the avatar with the body (body) of another avatar.

The multi-view image registration device 200 matches a point cloud corresponding to a specific body part of the avatar with a point cloud corresponding to a specific body part of another avatar so as to generate a single avatar.

As shown in (b) of FIG. 5, the multi-view image registration device 200 may replace or modify the background of the generated avatar.

The multi-view image registration device 200 may display the avatar on the screen simultaneously with the actual image information. The multi-view image registration device 200 may display an avatar in an overlay form on a beach background screen, an avatar in an overlay form on a forest background screen, or an overlay form on a living room wallpaper.

6 illustrates avatar rotation according to the present embodiment.

The multi-view image registration device 200 extracts the overlapping point cloud from the 3D data obtained from the plurality of cameras 110_1, 110_2, and 110_N to efficiently generate a 3D virtualized view capable of 360 ° free rotation in real time.

The multi-view video registration device 200 extracts the overlapping point cloud and minimizes the computational load required to generate the virtualized view based on the overlapping point cloud and increases the efficiency to minimize the time required to generate the virtualized view.

The multi-view image registration device 200 displays a complex video interface in which a 3D virtualized view and a planar 2D view having sensor-based orientation information are simultaneously controlled on one screen.

The multi-view image registration device 200 displays a virtual view and a real video view on the same screen in an overlay form. At this time, the multi-view image registration device 200 provides an interface structure that displays the rotation when the user wants to rotate in the corresponding direction because the actual video view includes the direction information.

The multi-view image registration device 200 obtains a plurality of multi-view image information obtained by photographing a specific object as a multi-view from a plurality of cameras (S710). The multi-view image matching device 200 recognizes an object from each of the plurality of multi-view image information (S720).

The multi-view image registration device 200 extracts a feature point for the object and extracts a point cloud based on the feature point (S730). The multi-view image matching device 200, the overlapping checker 314 performs mutual position matching between the point clouds (S740).

The multiview image registration device 200 extracts point clouds in which overlap between point clouds occurs, and generates duplicate point data (S750). The matching unit 316 of the multi-view image matching device 200 removes point clouds corresponding to duplicate point data among all point clouds, and finally calculates the left point cloud (S760).

The multi-view image registration device 200 generates a avatar capable of 360 ° rotation by modeling the computational load to be minimized among the remaining point clouds, thereby enabling a 3D virtualized view (S770).

In operation S770, the multiview image matching device 200 matches the left point cloud based on grid information according to the multiview received from the image acquisition unit 310 so that the computational load is minimized between the point clouds that are finally left. To create an avatar.

The multi-view image registration device 200 extracts neighboring point clouds among the point clouds left on the basis of grid information, and maintains the mesh model structures of the neighboring point clouds as they are. Quickly match adjacent point clouds (matching parts) between model structures to create avatars.

The multi-view image registration device 200 extracts x, y coordinate information included on the grid information. The multi-view image registration device 200 compares x and y coordinate information and extracts point clouds located on the same grid among the remaining point clouds. The multi-view image registration device 200 recognizes point clouds located on the same grid as neighboring point clouds.

The multi-view image registration device 200 performs new mesh modeling between adjacent points among neighboring point clouds while maintaining the mesh model structure of the neighboring point clouds obtained without further calculation.

The multi-view image registration device 200 performs triangulation using only adjacent points among neighboring point clouds to perform new mesh modeling.

In FIG. 7, steps S710 to S770 are described as being sequentially executed, but are not necessarily limited thereto. In other words, since the steps described in FIG. 7 may be applied by changing the steps or executing one or more steps in parallel, FIG. 7 is not limited to the time series order.

As described above, the avatar generating method using the multi-view image registration according to the present embodiment described in FIG. 7 may be implemented in a program and recorded on a computer-readable recording medium. The computer-readable recording medium having recorded thereon a program for implementing an avatar generating method using multi-view image matching according to the present embodiment includes all kinds of recording devices storing data that can be read by a computer system.

The above description is merely illustrative of the technical idea of the present embodiment, and those skilled in the art to which the present embodiment belongs may make various modifications and changes without departing from the essential characteristics of the present embodiment. Therefore, the present embodiments are not intended to limit the technical idea of the present embodiment but to describe the present invention, and the scope of the technical idea of the present embodiment is not limited by these embodiments. The scope of protection of the present embodiment should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present embodiment.

The present invention can be applied to the field of producing a character using the photo information, there is industrial applicability.

Claims

An image acquisition unit for obtaining a plurality of multi-view image information obtained by photographing specific objects from a plurality of cameras;

An extraction unit for recognizing an object from each of the plurality of multi-view image information, extracting a feature point for the object, and extracting a point cloud based on the feature point;

A duplicate confirmation unit for generating duplicate point data by extracting point clouds in which overlap between the point clouds occurs after performing mutual position matching between the point clouds; And

Eliminates point clouds corresponding to the duplicated point data among all point clouds, and performs modeling to minimize computational load among the remaining point clouds. part

Image matching device comprising a.
The method of claim 1,

The matching part,

And the avatar is generated by matching the remaining point clouds based on grid information according to a multi-view received from the image acquisition unit so that the computational load is minimized between the finally left point clouds. .
The method of claim 2,

The matching part,

Based on the grid information, neighboring point clouds are extracted from the remaining point clouds, and adjacent point clouds between mesh model structures are maintained while maintaining mesh model structures of the neighboring point clouds. And the avatar to quickly generate the avatar.
The method of claim 3, wherein

The matching part,

Extracting x, y coordinate information included in the grid information, comparing the x, y coordinate information to extract point clouds located on the same grid of the remaining point cloud, the same And a point cloud located on a grid as the neighboring point cloud.
The method of claim 4, wherein

The matching part,

And performing new mesh modeling between adjacent points among the neighboring point clouds while maintaining the mesh model structure of the neighboring point clouds obtained without further calculation.
The method of claim 5, wherein

The matching part,

And triangulation using only adjacent points among the neighboring point clouds to perform the new mesh modeling.
The method of claim 1,

A sensor unit for sensing or receiving direction information on the specific object;

A composite image obtaining unit obtaining actual image information;

Background image synthesizer for displaying the avatar in the form of an overlay on the screen and the actual image information at the same time, and rotate the avatar based on the direction information

The image matching device, characterized in that it further comprises.
The method of claim 1,

The matching part,

And a single avatar is generated between the point cloud corresponding to the specific body part of the avatar and the point cloud corresponding to the specific body part of the other avatar so as to minimize the computational load.
Obtaining a plurality of multi-view image information obtained by photographing a specific object from a plurality of cameras in a multi-view;

Recognizing an object from each of the plurality of multi-view image information, extracting a feature point for the object, and extracting a point cloud based on the feature point;

Generating overlapping point data by extracting point clouds in which overlap between the point clouds occurs after performing mutual location matching between the point clouds; And

Process of removing point clouds corresponding to the duplicated point data among all point clouds and modeling to minimize computational load among the remaining point clouds to create an avatar capable of 360 ° rotation to enable 3D virtualized view

Image matching method comprising a.