WO2024090792A1

WO2024090792A1 - Robot for generating 3d model and control method therefor

Info

Publication number: WO2024090792A1
Application number: PCT/KR2023/014160
Authority: WO
Inventors: 이태윤; 문보석; 우석윤
Original assignee: 삼성전자주식회사
Priority date: 2022-10-24
Filing date: 2023-09-19
Publication date: 2024-05-02
Also published as: KR20240057536A

Abstract

A robot for generating a 3D model and a control method therefor are provided. The robot comprises a traveling unit, at least one camera, an output unit, a memory for storing a map, and at least one processor which is connected to the traveling unit, the at least one camera, the output unit, and the memory and controls the robot. The at least one processor: when entering a first mode of generating a 3D model for a user, identifies a photographing location for generating the 3D model for the user on the basis of a map; controls the output unit to output a message directing the user to move to the identified photographing location; when the user is located at the photographing location, acquires a photographing path for photographing the user; acquires multiple images by photographing the user through the camera at a preconfigured photographing interval while moving along the photographing path through the traveling unit; and generates the 3D model for the user on the basis of the acquired multiple images.

Description

Robot that creates 3D models and its control method

The present disclosure relates to a robot and a control method thereof, and more specifically, to a robot and a control method for generating a 3D model of a user by photographing the user from multiple directions.

Recently, as interest in the metaverse has increased, the importance of avatars that play the role of the user's alter ego within the metaverse space is increasing. In particular, technology has recently been developed to create an avatar by creating a 3D model of the user.

However, in recent years, in order to create a realistic 3D model, a plurality of cameras have been placed around the subject, and a 3D model has been created by acquiring a plurality of images obtained through the arranged cameras. Additionally, technologies such as motion capture are being applied to generate motion of the generated 3D model.

However, for general users, going to a studio where multiple cameras are placed around the subject to create their own 3D model is expensive, less convenient, and there are limitations in creating it right away when needed.

According to an embodiment of the present disclosure, a robot that generates a 3D model includes a traveling unit; At least one camera; output unit; Memory to store maps; and at least one processor connected to the traveling unit, the at least one camera, the output unit, and the memory, and controlling the robot. Upon entering the first mode of generating a 3D model for the user, the at least one processor identifies a shooting location for generating a 3D model for the user based on the map. The at least one processor controls the output unit to output a message to the user to move to the identified shooting location. When the user is located at the photographing location, the at least one processor obtains a photographing path for photographing the user. The at least one processor acquires a plurality of images by photographing the user through the camera at preset photographing intervals while moving along the photographing path through the traveling unit. The at least one processor generates a 3D model for the user based on the acquired plurality of images.

According to an embodiment of the present disclosure, a method of controlling a robot that generates a 3D model includes, upon entering a first mode of generating a 3D model for a user, creating a 3D model for the user based on a map stored in the robot. Identifying a shooting location to create; Outputting a message to the user to move to the identified shooting location; When the user is located at the photographing location, obtaining a photographing path for photographing the user; acquiring a plurality of images by photographing the user at preset photographing intervals while moving along the photographing path; and generating a 3D model for the user based on the acquired plurality of images.

According to an embodiment of the present disclosure, in a computer-readable recording medium storing a program for executing a control method of a robot for generating a 3D model, the control method includes a first mode for generating a 3D model for a user. Upon entering, identifying a shooting location to generate a 3D model for the user based on the map stored in the robot; Outputting a message to the user to move to the identified shooting location; When the user is located at the photographing location, obtaining a photographing path for photographing the user; acquiring a plurality of images by photographing the user at preset photographing intervals while moving along the photographing path; and generating a 3D model for the user based on the acquired plurality of images.

1 is a diagram schematically showing a method of generating a 3D model for a user using a robot, according to an embodiment of the present disclosure;

Figure 2 is a block diagram showing the configuration of a robot according to an embodiment of the present disclosure;

3 is a flowchart illustrating a method of generating a 3D model for a user while operating in a first 3D model creation mode, according to an embodiment of the present disclosure;

FIG. 4A is a diagram for explaining a method of identifying a shooting location based on a map, according to an embodiment of the present disclosure;

FIG. 4B is a diagram illustrating a method of obtaining a photographing path for photographing a user according to an embodiment of the present disclosure;

FIG. 4C is a diagram for explaining a method of photographing a user at a preset photographing interval according to an embodiment of the present disclosure;

5A and 5B are diagrams for explaining a method of determining the quality of a 3D model according to an embodiment of the present disclosure;

Figure 6 is a diagram for explaining a method of photographing a user at a reset photographing interval according to an embodiment of the present disclosure;

7 is a flowchart illustrating a method of generating a 3D model for a user while operating in a second 3D model creation mode according to an embodiment of the present disclosure;

FIG. 8 is a diagram for explaining a method of changing a route based on a shooting location for a user, according to an embodiment of the present disclosure;

9 is a flowchart illustrating a method of updating a 3D model according to the quality of the 3D model while operating in a second 3D model creation mode according to an embodiment of the present disclosure;

10 is a flowchart illustrating a method for generating facial expressions of a 3D model according to an embodiment of the present disclosure, and

FIG. 11 is a flowchart illustrating a method for generating motion of a 3D model according to an embodiment of the present disclosure.

Below, various embodiments of the present disclosure are described. However, this is not intended to limit the technology of the present disclosure to specific embodiments, and should be understood to include various modifications, equivalents, and/or alternatives to the embodiments of the present disclosure. .

In this document, expressions such as “have,” “may have,” “includes,” or “may include” refer to the existence of the corresponding feature (e.g., a numerical value, function, operation, or component such as a part). , and does not rule out the existence of additional features.

In this document, expressions such as “A or B,” “at least one of A or/and B,” or “one or more of A or/and B” may include all possible combinations of the items listed together. . For example, “A or B,” “at least one of A and B,” or “at least one of A or B” (1) includes at least one A, (2) includes at least one B, or (3) it may refer to all cases including both at least one A and at least one B.

Expressions such as “first,” “second,” “first,” or “second,” used in this document can modify various components regardless of order and/or importance, and refer to one component. It is only used to distinguish from other components and does not limit the components. For example, a first user device and a second user device may represent different user devices regardless of order or importance. For example, a first component may be renamed a second component without departing from the scope of rights described in this document, and similarly, the second component may also be renamed to the first component.

Terms such as “module,” “unit,” and “part” used in this document are terms to refer to components that perform at least one function or operation, and these components are implemented in hardware or software. Alternatively, it can be implemented through a combination of hardware and software. In addition, a plurality of "modules", "units", "parts", etc. are integrated into at least one module or chip, except in cases where each needs to be implemented with individual specific hardware, and is integrated into at least one processor. It can be implemented as:

A component (e.g., a first component) is “(operatively or communicatively) coupled with/to” another component (e.g., a second component). When referred to as being “connected to,” it should be understood that any component may be directly connected to the other component or may be connected through another component (e.g., a third component). On the other hand, when a component (e.g., a first component) is said to be “directly connected” or “directly connected” to another component (e.g., a second component), the component and the It may be understood that no other component (e.g., a third component) exists between other components.

As used in this document, the expression “configured to” depends on the situation, for example, “suitable for,” “having the capacity to.” ," can be used interchangeably with "designed to," "adapted to," "made to," or "capable of." The term “configured (or set to)” may not necessarily mean “specifically designed to” in hardware. Instead, in some contexts, the expression “a device configured to” may mean that the device is “capable of” working with other devices or components. For example, the phrase "processor configured (or set) to perform A, B, and C" refers to a processor dedicated to performing the operations (e.g., an embedded processor), or by executing one or more software programs stored on a memory device. , may refer to a general-purpose processor (e.g., CPU or application processor) capable of performing the corresponding operations.

Terms used in this document are merely used to describe specific embodiments and may not be intended to limit the scope of other embodiments. Singular expressions may include plural expressions, unless the context clearly indicates otherwise. Terms used herein, including technical or scientific terms, may have the same meaning as commonly understood by a person of ordinary skill in the technical field described in this document. Among the terms used in this document, terms defined in general dictionaries may be interpreted to have the same or similar meaning as the meaning they have in the context of related technology, and unless clearly defined in this document, have an ideal or excessively formal meaning. It is not interpreted as In some cases, even terms defined in this document cannot be interpreted to exclude embodiments of this document.

Hereinafter, the present disclosure will be described in more detail with reference to the drawings. However, in describing the present disclosure, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the gist of the present disclosure, the detailed description thereof will be omitted. In connection with the description of the drawings, similar reference numbers may be used for similar components.

Hereinafter, the present disclosure will be described in more detail with reference to the drawings.

According to an embodiment of the present disclosure, the robot 100 may enter a 3D model creation mode to create a 3D model for the user 10. At this time, the 3D model creation mode may include a first 3D model creation mode and a second 3D model creation mode.

When entering the first 3D model creation mode for generating a 3D model for the user, the robot 100 may identify a shooting location for generating a 3D model for the user 10 based on a pre-stored map. That is, the robot 100 can identify a drivable area as a shooting location while maintaining a critical distance from the user 10 based on the map.

Additionally, the robot 100 may output a message to the user 10 to move to the identified filming location.

When the user 10 is located at the shooting location, the robot 100 acquires a shooting path for shooting the user 10, and, as shown on the left side of FIG. 1, performs preset shooting while moving along the shooting path. Multiple images can be obtained by photographing the user at intervals.

And, the robot 100 can generate a 3D model 20 for the user, as shown on the right side of FIG. 1, based on the plurality of acquired images. At this time, the robot 100 may create a 3D model 20 for the user 10 using photogrammetry techniques or vision techniques.

When entering the second 3D model creation mode that generates a 3D model for the user, the robot 100 can recognize the user 10 while the robot 100 drives.

Additionally, the robot 100 may estimate the shooting location of the robot 100 based on information about the recognized user's location and move the robot to the estimated location.

Additionally, the robot 100 may acquire at least one image through a camera at the estimated location. At this time, the robot 100 analyzes at least one image, classifies the type of at least one image (and information about the posture and position of the camera at the time of shooting), and stores at least one image by type in a database according to the classification result. It can be saved in . At this time, the type of image may be classified according to the shooting angle at which the user 10 is captured.

When a plurality of preset types of images are stored in the database, the robot 100 may generate the 3D model 20 based on the plurality of types of images.

As described above, by creating a 3D model 20 for the user 10 at home, the user can create a 3D model for the user using the robot 100 at home even without visiting a separate studio. There will be.

Figure 2 is a block diagram showing the configuration of a robot according to an embodiment of the present disclosure. As shown in FIG. 2, the robot 100 includes a traveling unit 110, a camera 120, an output unit 130, a sensor 140, a communication interface 150, a memory 160, and at least one processor. It may include (170). According to an embodiment of the present disclosure, the robot 100 may be a robot that provides various services within the home, but this is only an embodiment and can be used in various places such as airports, hotels, supermarkets, clothing stores, logistics, and hospitals. It may be a service robot that provides services (for example, a guidance robot, etc.). Additionally, the configuration of the robot 100 is not limited to the configuration shown in FIG. 1, and of course, configurations obvious to those skilled in the art may be added.

The traveling unit 110 may drive (or move) the robot 100 under the control of at least one processor 170. In particular, the traveling unit 110 may include wheels that drive the robot 100 and a wheel drive motor that rotates the wheels.

In particular, the traveling unit 110 may move the robot 100 along a photographing path surrounding the user in order to photograph the user.

The camera 120 is configured to obtain an image by photographing the surroundings of the robot 100. In particular, at least one processor 170 may recognize the user by analyzing the image acquired through the camera 120. Alternatively, at least one processor 170 may generate a 3D model for the user using a plurality of images acquired through a camera. At this time, the camera 120 is

The output unit 130 can output various information. In particular, the output unit 130 may output a guidance message guiding the user to the shooting location. The output unit 130 may be implemented as a display or LED that provides a visual message, or may be implemented as a speaker that provides an auditory message.

The sensor 140 is configured to detect the environment around the robot 100 or the user's condition. In one embodiment, the sensor 140 may include a depth sensor and an Inertial Measurement Unit (IMU) sensor. The depth sensor is configured to detect obstacles around the robot 100. At least one processor 170 may obtain the distance from the robot 100 to the obstacle based on the sensing value of the depth sensor. For example, the depth sensor may include a LiDAR sensor. Alternatively, the depth sensor may include a radar sensor and a depth camera. The IMU sensor is configured to acquire posture information of the robot 100. IMU sensors may include gyro sensors and geomagnetic sensors. In addition, the robot 100 may include various sensors to detect the environment around the robot 100 or the user's status.

The communication interface 150 includes at least one circuit and can communicate with various types of external devices or servers. The communication interface 140 includes a BLE (Bluetooth Low Energy) module, a Wi-Fi communication module, a cellular communication module, a 3G (3rd generation) mobile communication module, a 4G (4th generation) mobile communication module, and a 4th generation LTE (Long Term Evolution) communication module. , may include at least one of 5G (5th generation) mobile communication modules.

In particular, the communication interface 150 may transmit a guidance message to an external portable terminal (eg, a smart phone of the user 10, etc.).

The memory 160 may store an operating system (OS) for controlling the overall operation of the components of the robot 100 and instructions or data related to the components of the robot 100. In particular, the memory 160 may include various modules for creating a 3D model.

In particular, the memory 160 may include a database that stores images captured by a user. Additionally, the memory 160 may store a database that matches and stores facial images and expression types, and a database that matches and stores motion images and motion types. Additionally, the memory 160 can store information about maps within your home.

Meanwhile, the memory 150 may be implemented as non-volatile memory (ex: hard disk, solid state drive (SSD), flash memory), volatile memory (may also include memory in at least one processor 180), etc. You can.

At least one processor 170 may be electrically connected to the memory 160 to control the overall functions and operations of the robot 100. When entering the 3D model creation mode, at least one processor 170 may load data for a module stored in the non-volatile memory to perform various operations into the volatile memory. Here, loading refers to an operation of loading and storing data stored in non-volatile memory in volatile memory so that at least one processor 180 can access it.

In particular, when entering the first mode for generating a 3D model for the user, at least one processor 180 identifies a shooting location for generating a 3D model for the user based on the map. At least one processor 180 controls the output unit 130 to output a message to the user to move to the identified shooting location. When a user is located at a photographing location, at least one processor 180 obtains a photographing path for photographing the user. At least one processor 180 acquires a plurality of images by photographing the user through the camera 120 at preset photographing intervals while moving along the photographing path through the traveling unit 110. At least one processor 180 generates a 3D model for the user based on the plurality of acquired images.

Additionally, at least one processor 180 may acquire a plurality of projection images by projecting the generated 3D model into 2D. At least one processor 180 may compare a plurality of images and a plurality of projection images corresponding to the plurality of images, and measure the quality value of the 3D model based on the comparison result. At this time, the at least one processor 180 uses the pixels of the plurality of first feature points included in the first image taken at a first angle with respect to the front of the user among the plurality of images and the 3D model among the plurality of projection images as the first A pixel difference value may be calculated by comparing pixels of a plurality of second feature points included in the first projection image projected in the angular direction. Additionally, at least one processor 180 may measure the quality value of the 3D model based on the calculated pixel difference value.

If the quality value of the 3D model is less than the threshold, at least one processor 180 may reset the robot's shooting interval or shooting position. At least one processor 180 may acquire a plurality of reset images by photographing the user at a reset photographing interval or photographing position while moving along the photographing path through the traveling unit 110. At least one processor 180 may update the 3D model for the user based on the plurality of acquired reset images.

Meanwhile, when entering the second mode for generating a 3D model for a user, at least one processor 180 may recognize the user while the robot is traveling. At least one processor 180 may estimate the robot's shooting location based on information about the recognized user's location. At least one processor 180 may control the traveling unit 110 to move the robot 100 to the estimated position. At least one processor 180 may acquire at least one image through the camera 110 at the estimated location. At least one processor 180 may analyze at least one image, classify the type of at least one image, and store it in a database.

And, when a plurality of preset types of images are stored in the database, at least one processor 180 may generate a 3D model for the user based on the plurality of types of stored images.

At this time, if there is an image of an unsaved type among the plurality of types, at least one processor 180 may estimate the shooting position of the robot 100 so that it can be captured at a shooting angle corresponding to the image of the unsaved type. there is.

Additionally, at least one processor 180 may acquire a plurality of projection images by projecting the generated 3D model into 2D. At least one processor 180 may compare a plurality of types of images with a plurality of projection images corresponding to the plurality of types of images, and measure the quality value of the 3D model based on the comparison result. If the quality value of the 3D model is less than the threshold, at least one processor 180 resets the shooting position of the robot, acquires a plurality of reset images by photographing the user at the reset shooting position through the traveling unit 110, and , the 3D model for the user can be updated based on the acquired plurality of reset images.

Meanwhile, when the user is recognized, at least one processor 180 may obtain a facial image by photographing the user's face. At least one processor 180 may analyze the acquired facial image and obtain an expression type corresponding to the acquired facial image. At least one processor 180 may store facial images and information about the expression type in a database. At least one processor 180 may generate a facial expression of the 3D model based on facial images for each expression type stored in a database.

Additionally, when a user is recognized, at least one processor 180 may acquire a motion image by capturing the user's motion. At least one processor 180 may analyze the acquired motion image and obtain a motion type corresponding to the acquired motion image. At least one processor 180 may store motion images and information about the motion type in a database. At least one processor 180 may generate motion of a 3D model based on motion images for each motion type stored in a database.

Hereinafter, a method of generating a 3D model for a user while operating in the first 3D model creation mode will be described with reference to FIGS. 3 to 6.

FIG. 3 is a flowchart illustrating a method of generating a 3D model for a user while operating in a first 3D model creation mode, according to an embodiment of the present disclosure.

First, the robot 100 may enter the first 3D model creation mode (S305). Specifically, when a user voice such as “Generate a 3D model” is input from the user, the robot 100 may enter the first 3D model creation mode. Alternatively, when a 3D model creation request is received from the user terminal, the robot 100 may enter the first 3D model creation mode.

The robot 100 can identify the shooting location based on the map stored in the memory 160 (S310). Specifically, in order to create a 3D model of a user, it is necessary to photograph the user from various shooting angles while maintaining a certain distance from the user. Therefore, an area larger than a critical size is required to capture a user's image from multiple shooting angles. Accordingly, the robot 100 can search for an area larger than a critical size among the maps stored in the memory 160. If a plurality of areas larger than the critical size are searched, the robot 100 may identify the area closest to the user among the plurality of searched areas as the shooting location.

Meanwhile, the threshold size may be determined according to user information (eg, user's height, etc.). Specifically, the larger the user's height, the larger the threshold size may be determined, and the shorter the user's height, the smaller the threshold size may be determined.

For example, as shown in the left drawing of FIG. 4A, when the user 10 is located in the first space in the home, the robot 100 determines the threshold as shown in the right drawing of FIG. 4A based on the map. The second space 410, which is an area larger than the size, can be determined as the shooting location.

The robot 100 may output a movement message to the user (2315). Specifically, the robot 100 may output a message to the user to move to the identified filming location. For example, the robot 100 may output a voice message to the user, “Please move to the living room,” and may move to the filming location along with a voice message to the user, “Please follow me.”

When the user is located at the shooting location, the robot 100 can acquire the shooting path (S320). Specifically, the robot 100 may acquire a shooting path based on the user's location after the user moves. For example, the robot 100 may obtain a circular imaging path that maintains a certain distance from the user based on the point where the user is located. At this time, a certain distance may be determined according to user information (eg, user's height). For example, as shown in FIG. 4B, the robot 10 may obtain a circular imaging path 420 that maintains a certain distance R from the user 10.

At this time, the robot 100 may photograph the user along the photographing path and correct the photographing path based on the acquired image. For example, if the size of the user included in the acquired image is outside the critical range, the robot 100 may correct the capturing path. That is, if the size of the user included in the acquired image exceeds the critical range, the robot 100 may correct the capturing path to increase a certain distance. If the size of the user included in the acquired image is less than the critical range, the robot 100 may correct the capturing path to reduce the certain distance.

The robot 100 may acquire a plurality of images by photographing the user at preset photographing intervals while moving along the photographing path (S325). At this time, the preset shooting interval may be determined by initial settings. For example, as shown in FIG. 4C, the robot 100 acquires five captured images at the first to fifth photographing points 430-1 to 430-5 while moving the photographing path 420. You can. At this time, each of the five captured images can store information about the shooting angle taken based on the front of the user. Meanwhile, the robot 100 may acquire a plurality of images from a plurality of shooting positions determined based on user information (eg, user's height, etc.) rather than a preset shooting interval.

The robot 100 may generate a 3D model (or basic 3D model) for the user based on the plurality of acquired images (S330). At this time, the robot 100 may create a 3D model for the user 10 using photogrammetry techniques or vision techniques.

Additionally, the robot 100 may acquire a plurality of projection images to inspect the quality of the acquired 3D model (S335). At this time, the plurality of projection images may be images obtained by projecting the 3D model into 2D at an angle corresponding to the shooting angle at which the plurality of images were captured. For example, as shown in FIG. 5A, when the first image 510 is an image taken at a first shooting angle (e.g., 72 degrees from the front), the robot 100 uses the 3D model 20 The first projection image 520 may be obtained by projecting the 3D model at a first angle (for example, 72 degrees based on the front) based on the front. In this way, the robot 100 can acquire a plurality of projection images corresponding to each of the first to fifth images 430-1 to 430-5.

And, the robot 100 can obtain the quality value of the 3D model based on a plurality of images and a plurality of projection images (S340). Specifically, the robot 100 may compare a plurality of images and a plurality of projection images corresponding to the plurality of images. At this time, the pixels of a plurality of first feature points included in the first image taken at a first angle with respect to the front of the user among the plurality of images and the first projection of the 3D model in the first angle direction among the plurality of projection images A pixel difference value can be calculated by comparing pixels of a plurality of second feature points included in the image. For example, the robot 100 includes a plurality of feature points (a point on the head, a point on the left shoulder, a point on the left hand, and a point on the left foot) and the first image 510. For a plurality of feature points (a point on the head, a point on the left shoulder, a point on the left hand, and a point on the left foot) of the corresponding first projection image 520, the pixel difference value between the corresponding pixels can be calculated. there is. And, the robot 100 can obtain the quality value of the image used to create the 3D model based on the pixel difference value. Specifically, the robot 100 can obtain a lower quality value as the pixel difference value increases, and obtain a higher quality value as the pixel difference value decreases. In the same manner as described above, the robot 100 can obtain the quality value of the 3D model by obtaining the quality value for each of the plurality of images used to generate the 3D model.

However, this is only an example, and of course, the robot 100 can obtain the quality value of the 3D model by measuring the similarity between the image and the projected image. Alternatively, the robot 100 may obtain the quality value of the 3D model using a neural network model learned to obtain the quality value by inputting an image and a projection image. That is, the robot 100 can obtain the quality value of the 3D model by inputting an image and a projection image into the neural network model.

The robot 100 may determine whether the quality value is less than the threshold (S345).

If the quality value is greater than or equal to the threshold (S345-N), the robot 100 may create an avatar using a previously created 3D model.

If the quality value is less than the threshold (S345-Y), the robot 100 may reset the robot's shooting interval or shooting position (S350). At this time, the robot 100 may reset the robot's shooting interval or shooting position based on the size of the error between the image and the projected image. In other words, the robot 100 can be reset to narrow the shooting interval at a shooting angle where the error between the image and the projected image is large, and to widen the shooting interval at a shooting angle where the size of the error between the image and the projected image is small. It can be reset. Alternatively, the robot 100 may reset the capturing position to re-photograph at an angle where the error between the image and the projected image is large.

For example, as shown in FIG. 6, the robot 100 may reset the shooting interval or shooting position to perform rephotography at seven points 610-1 to 610-7 on the shooting path.

The robot 100 may acquire a plurality of reset images by photographing the user at a reset photographing interval or photographing position during the photographing path (S355).

Then, the robot 100 may update the 3D model based on the plurality of reset images (S360). Specifically, the robot 100 may update the existing 3D model by regenerating the 3D model based on a plurality of reset images.

Additionally, the robot 100 may repeatedly perform steps S335 to S345 to check the quality of the updated 3D model.

FIG. 7 is a flowchart illustrating a method of generating a 3D model for a user while operating in a second 3D model creation mode, according to an embodiment of the present disclosure.

First, the robot 100 may enter the second 3D model creation mode (S710). Specifically, when a 3D model creation request is received from a user, the robot 100 may enter the second 3D model creation mode. At this time, the robot 100 (or user terminal) may display icons corresponding to each of the first 3D model creation mode and the second 3D model creation mode, and may enter the 3D model creation mode according to the selected icon. Meanwhile, when entering the second 3D model creation mode, the robot 100 does not immediately photograph the user, but captures the user while driving while performing other functions of the robot 100 (e.g., delivery function, cleaning function, etc.). You can shoot.

After entering the second 3D model creation mode, the robot 100 can recognize the user while driving (S720). At this time, the robot 100 can recognize the user through face recognition and voice recognition. When the user is recognized, the robot 100 can obtain information about the user's location. At this time, information about the user's location may include lighting information of the area where the user is located. Additionally, the robot 100 can track a recognized user.

The robot 100 may estimate the shooting location of the robot 100 based on information about the recognized user (eg, current location and user posture, etc.) (S730). At this time, of course, the robot 100 can estimate the shooting location of the robot based on the location of the robot 100, surrounding obstacle information, map information, etc. in addition to information about the user. In addition, the robot 100 can identify the direction in which the light shines or the direction with illuminance above a preset value among the areas where the user is located, and estimate the shooting position of the robot 100 so that the user can be photographed in the identified direction. . Alternatively, if there is an image of an unsaved type among the plurality of types required to create a 3D model, the robot 100 estimates the shooting position of the robot so that it can shoot at a shooting angle corresponding to the image of the type not saved. You can. For example, to create a 3D model, 6 types (e.g., type taken at 0 degrees, type taken at 60 degrees, type taken at 120 degrees, type taken at 180 degrees, type taken at 240 degrees) If an image of the type captured at 60 degrees does not exist in a situation where an image of the type captured at 300 degrees is required, the robot 100 adjusts the capturing position of the robot 100 so that it can capture the user from a 60-degree direction. It can be estimated.

The robot 100 can move to the estimated location (S740). At this time, the robot 100 may identify a new path to move to the estimated location from the existing path. For example, as shown in FIG. 8, when the user 10 located at the first point is recognized while driving through the first path 810, the robot 100 estimates the second point 830. It can be identified by its location. At this time, since the second point 830 is not included in the first path 810, the robot 100 can identify a new second path 820 to move to the estimated location.

The robot 100 may acquire at least one image through the camera 120 at the estimated location (S750). After acquiring at least one image, the robot 100 may travel along the second path 820 to perform an existing function.

The robot 100 may analyze at least one image and classify the type of at least one image (S760). Specifically, the robot 100 may analyze at least one acquired image to obtain information about the direction in which the at least one image was captured or information about the user's body part included in the at least one image. And, the robot 100 classifies the robot into one type among a plurality of pre-stored types based on information about the direction in which at least one image was taken or information about the user's body part included in at least one image. You can. Additionally, the robot 100 may classify the captured image based on the user's illumination level and skin tone. At this time, the robot 100 may analyze at least one image and classify the type of at least one image, and may also estimate information about the posture and position of the camera at the time of shooting and store it in a database.

The robot 100 may store information about at least one acquired image and the type of the image in a database (S770).

The robot 100 may determine whether a plurality of preset types of images are stored in the database (S780). Specifically, the robot 100 may determine whether all of the plurality of preset types of images needed to create a 3D model for the user have been stored. For example, the robot 100 may have six types needed to create a 3D model (e.g., a type taken at 0 degrees, a type taken at 60 degrees, a type taken at 120 degrees, a type taken at 180 degrees, It is possible to determine whether all images (the type taken at 240 degrees and the type taken at 300 degrees) have been saved. At this time, the robot 100 may determine whether a preset number of images (for example, three) are stored in one type.

When a plurality of preset types of images are stored in the database (S780-Y), the robot 100 can create a 3D model using the images stored in the database (S790). At this time, the robot 100 may create a 3D model for the user 10 using photogrammetry techniques or vision techniques.

If the plurality of preset types of images are not stored in the database (S780-N), the robot 100 may repeat steps S720 to S780 to acquire the user's image again. At this time, the robot 100 may estimate the shooting position of the robot so that it can shoot at a shooting angle corresponding to a type of image that is not stored in the robot 100 in step S730.

Meanwhile, in the above-described embodiment, it was explained that the type of image is classified according to the user's shooting angle, but this is only an example, and the type of image may be classified according to the body part of the user included in the image. For example, the robot 100 includes a first type including the user's face and torso, a second type including the user's left side profile, left arm, and left leg, and a third type including the user's back, back, and buttocks. Images can be classified into a fourth type, which includes the user's right side profile, right arm, and right leg.

FIG. 9 is a flowchart illustrating a method of updating a 3D model according to the quality of the 3D model while operating in a second 3D model creation mode, according to an embodiment of the present disclosure.

First, the robot 100 can acquire a plurality of projection images (S910). At this time, the plurality of projection images may be images obtained by projecting a 3D model into 2D at an angle corresponding to the type of the plurality of images stored in the database.

Additionally, the robot 100 may compare a plurality of types of images and a plurality of projection images corresponding to the plurality of types of images (S920). At this time, the robot 100 uses a plurality of first feature points included in a first image (i.e., a first type of image) taken at a first angle based on the front of the user among the plurality of images and a 3D image among the plurality of projection images. A plurality of second feature points included in a first projection image that projects the model in a first angular direction may be compared. In this way, the robot 100 can compare a plurality of first feature points included in a plurality of types of images with a plurality of second feature points included in the corresponding projection image.

The robot 100 may obtain the quality value of the 3D model based on the comparison result (S930). Specifically, the robot 100 may compare a plurality of images and a plurality of projection images corresponding to the plurality of images. Specifically, the robot 100 includes a plurality of feature points (a point on the head, a point on the left shoulder, a point on the left hand, and a point on the left foot) of the first type of image and a first type of image corresponding to the first type of image. 1 The pixel difference value between the pixels corresponding to a plurality of feature points of the projection image (a point on the head, a point on the left shoulder, a point on the left hand, and a point on the left foot) can be calculated. And, the robot 100 can obtain the quality value of the image used to create the 3D model based on the pixel difference value. Specifically, the robot 100 can obtain a lower quality value as the pixel difference value increases, and obtain a higher quality value as the pixel difference value decreases. In the same manner as described above, the robot 100 can obtain the quality value of the 3D model by obtaining quality values for each of a plurality of types of images used to generate the 3D model.

The robot 100 may determine whether the quality value is less than the threshold (S940).

If the quality value is greater than or equal to the threshold (S940-N), the robot 100 can create an avatar using an existing 3D model.

If the quality value is less than the threshold (S940-Y), the robot 100 may reset the robot's shooting position (S950). At this time, the robot 100 may reset the robot's shooting interval or shooting position based on the size of the error between the image and the projected image. In other words, the robot 100 can reset the capturing position to re-photograph at an angle where the error between the image and the projected image is large. Accordingly, while performing another function, the robot 100 can estimate the shooting position by changing the driving path to retake the image at an angle where the error between the image and the projected image is large.

The robot 100 may obtain a plurality of reset images by photographing the user at the reset shooting location (S960).

And, the robot 100 may update the 3D model based on the plurality of reset images (S970). Specifically, the robot 100 may update the existing 3D model by regenerating the 3D model based on a plurality of reset images.

That is, the robot 100 analyzes the quality of the existing 3D model, estimates the position and direction of the camera that can improve the quality of the 3D model, and tracks the camera while the robot 100 drives while performing other functions. For the user, a location and direction that can improve the quality of the 3D model can be determined, and the 3D model can be updated (or improved) with images obtained by shooting at that location and direction. Additionally, the robot 100 can continuously improve the 3D model for the user by repeating the above-described process while driving.

Additionally, the robot 100 can recognize a user while driving and obtain information about the face or motion of the recognized user. Additionally, the robot 100 may generate the facial expression or motion of the 3D model based on the acquired information about the user's face or motion. This will be described in more detail with reference to FIGS. 10 and 11.

FIG. 10 is a flowchart illustrating a method for generating facial expressions of a 3D model, according to an embodiment of the present disclosure.

First, the robot 100 can recognize the user (S1010). Specifically, the robot 100 may recognize the user while driving while performing other functions of the robot 100. Alternatively, the robot 100 may recognize the user while operating in the first 3D model creation mode or the second 3D model creation mode. At this time, the robot 100 can recognize the user by recognizing the user's face, iris, voice, etc.

The robot 100 may acquire a facial image by photographing the user's face (S1020). Specifically, the robot 100 may obtain a face image of a recognized user by photographing the user.

The robot 100 may analyze the face image and obtain the expression type corresponding to the face image (S1030). At this time, the robot 100 may acquire information about the type of expression corresponding to the facial image by analyzing feature points (eg, eyes, nose, mouth, etc.) included in the facial image. Alternatively, the robot 100 may obtain information about the type of expression corresponding to the face image by inputting the face image into a learned neural network model.

The robot 100 may store information about the acquired facial image and expression type in a database (S1040). At this time, the robot 100 may store not only facial expressions but also additional information such as facial skin tone.

The robot 100 can generate the facial expression of the 3D model based on the facial image for each expression type stored in the database (S1050. At this time, the robot 100 can finally generate the facial expression of the 3D model according to the user input. Meanwhile, the robot 100 can generate facial expressions of a 3D model using photogrammetry techniques or vision techniques.

First, the robot 100 can recognize the user (S1110). Specifically, the robot 100 may recognize the user while driving while performing other functions of the robot 100. Alternatively, the robot 100 may recognize the user while operating in the first 3D model creation mode or the second 3D model creation mode. At this time, the robot 100 can recognize the user by recognizing the user's face, iris, voice, etc.

The robot 100 may acquire a motion image by photographing the user (S1120). At this time, the robot 100 may obtain a motion image by filming the user's motion while moving.

The robot 100 may analyze the motion image and obtain a motion type corresponding to the motion image (S1130). At this time, the robot 100 may acquire information about the motion type corresponding to the motion image by analyzing feature points (eg, arms, legs, torso, face, etc.) included in the motion image. Alternatively, the robot 100 may obtain information about the motion type corresponding to the motion image by inputting the motion image into a learned neural network model.

The robot 100 may store information about the acquired motion image and motion type in a database (S1140).

The robot 100 may generate motion of a 3D model based on motion images for each motion type stored in the database (S1150). At this time, the robot 100 may apply the user's motion to the 3D model according to the motion searched by the user. Meanwhile, the robot 100 can apply the user's motion to the 3D model using motion capture-related graphics techniques.

Meanwhile, functions related to artificial intelligence (eg, learning function and inference function for a neural network model) according to the present disclosure are operated through at least one processor and memory of the robot.

The processor may consist of one or multiple processors. At this time, one or more processors may include at least one of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), and a Neural Processing Unit (NPU), but are not limited to the examples of the processors described above.

CPU is a general-purpose processor that can perform not only general calculations but also artificial intelligence calculations, and can efficiently execute complex programs through a multi-layer cache structure. CPUs are advantageous for serial processing, which allows organic connection between previous and next calculation results through sequential calculations. The general-purpose processor is not limited to the above-described examples, except where specified as the above-described CPU.

GPU is a processor for large-scale operations such as floating-point operations used in graphics processing, and can perform large-scale operations in parallel by integrating a large number of cores. In particular, GPUs may be more advantageous than CPUs in parallel processing methods such as convolution operations. Additionally, the GPU can be used as a co-processor to supplement the functions of the CPU. The processor for mass computation is not limited to the above-described example, except for the case specified as the above-described GPU.

NPU is a processor specialized in artificial intelligence calculations using artificial neural networks, and each layer that makes up the artificial neural network can be implemented in hardware (e.g., silicon). At this time, the NPU is designed specifically according to the company's requirements, so it has a lower degree of freedom than a CPU or GPU, but can efficiently process artificial intelligence calculations requested by the company. Meanwhile, as a processor specialized for artificial intelligence calculations, NPU can be implemented in various forms such as TPU (Tensor Processing Unit), IPU (Intelligence Processing Unit), and VPU (Vision processing unit). The artificial intelligence processor is not limited to the examples described above, except where specified as the NPU described above.

Additionally, one or more processors may be implemented as a System on Chip (SoC). At this time, in addition to one or more processors, the SoC may further include memory and a network interface such as a bus for data communication between the processor and memory.

If the SoC (System on Chip) included in the robot 100 includes a plurality of processors, the robot 100 uses some of the plurality of processors to perform artificial intelligence-related operations (for example, Operations related to learning or inference) can be performed. For example, the robot 100 performs artificial intelligence-related operations using at least one of a GPU, NPU, VPU, TPU, or hardware accelerator specialized for artificial intelligence operations such as convolution operation, matrix multiplication operation, etc. among a plurality of processors. can do. However, this is only an example, and of course, calculations related to artificial intelligence can be processed using general-purpose processors such as CPUs.

Additionally, the robot 100 may perform calculations on functions related to artificial intelligence using multiple cores (eg, dual core, quad core, etc.) included in one processor. In particular, the robot 100 can perform artificial intelligence operations such as convolution operations and matrix multiplication operations in parallel using multi-cores included in the processor.

One or more processors control input data to be processed according to predefined operation rules or artificial intelligence models stored in memory. Predefined operation rules or artificial intelligence models are characterized by being created through learning.

Here, being created through learning means that a predefined operation rule or artificial intelligence model with desired characteristics is created by applying a learning algorithm to a large number of learning data. This learning may be performed on the device itself that performs the artificial intelligence according to the present disclosure, or may be performed through a separate server/system.

An artificial intelligence model may be composed of multiple neural network layers. At least one layer has at least one weight value, and the operation of the layer is performed using the operation result of the previous layer and at least one defined operation. Examples of neural networks include Convolutional Neural Network (CNN), Deep Neural Network (DNN), Recurrent Neural Network (RNN), Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), Bidirectional Recurrent Deep Neural Network (BRDNN), and Deep Neural Network (BRDNN). There are Q-Networks (Deep Q-Networks) and Transformer, and the neural network in this disclosure is not limited to the above-described examples except where specified.

A learning algorithm is a method of training a target device (eg, a robot) using a large number of learning data so that the target device can make decisions or make predictions on its own. Examples of learning algorithms include supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, and the learning algorithm in the present disclosure is specified. Except, it is not limited to the examples described above.

Additionally, methods according to various embodiments of the present disclosure may be included and provided in a computer program product. Computer program products are commodities and can be traded between sellers and buyers. The computer program product may be distributed in the form of a machine-readable storage medium (e.g. compact disc read only memory (CD-ROM)) or through an application store (e.g. Play StoreTM) or on two user devices (e.g. It can be distributed (e.g. downloaded or uploaded) directly between smartphones) or online. In the case of online distribution, at least a portion of the computer program product (e.g., a downloadable app) is stored on a machine-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server. It can be temporarily stored or created temporarily.

Methods according to various embodiments of the present disclosure may be implemented as software including instructions stored in a machine-readable storage media (e.g., a computer). The device stores information stored from the storage medium. A device capable of calling a command and operating according to the called command may include an electronic device (eg, the robot 100) according to the disclosed embodiments.

Meanwhile, a storage medium that can be read by a device may be provided in the form of a non-transitory storage medium. Here, 'non-transitory storage medium' only means that it is a tangible device and does not contain signals (e.g. electromagnetic waves). This term refers to cases where data is semi-permanently stored in a storage medium and temporary storage media. It does not distinguish between cases where it is stored as . For example, a 'non-transitory storage medium' may include a buffer where data is temporarily stored.

When the instruction is executed by a processor, the processor may perform the function corresponding to the instruction directly or using other components under the control of the processor. Instructions may contain code generated or executed by a compiler or interpreter.

In the above, preferred embodiments of the present disclosure have been shown and described, but the present disclosure is not limited to the specific embodiments described above, and may be used in the technical field to which the disclosure pertains without departing from the gist of the disclosure as claimed in the claims. Of course, various modifications can be made by those skilled in the art, and these modifications should not be understood individually from the technical ideas or perspectives of the present disclosure.

Claims

In a robot that creates a 3D model,

Running part;

At least one camera;

output unit;

Memory to store maps; and

At least one processor connected to the traveling unit, the at least one camera, the output unit, and the memory, and controlling the robot,

The at least one processor,

Upon entering the first mode for creating a 3D model for the user, identifying a shooting location for generating a 3D model for the user based on the map,

Controlling the output unit to output a message to the user to move to the identified shooting location,

When the user is located at the shooting location, a shooting path for shooting the user is obtained,

Obtaining a plurality of images by photographing the user through the camera at preset photographing intervals while moving along the photographing path through the traveling unit,

A robot that generates a 3D model for the user based on the acquired plurality of images.
According to paragraph 1,

The at least one processor,

Obtaining a plurality of projection images by projecting the generated 3D model into 2D,

Comparing the plurality of images with a plurality of projection images corresponding to the plurality of images,

A robot that measures the quality value of the 3D model based on the comparison results
According to paragraph 2,

The at least one processor,

Pixels of a plurality of first feature points included in a first image taken at a first angle with respect to the front of the user among the plurality of images and a first projection of the 3D model in the first angle direction among the plurality of projection images 1 Compare pixels of a plurality of second feature points included in the projection image to calculate a pixel difference value,

A robot that measures the quality value of the 3D model based on the calculated pixel difference value.
According to paragraph 3,

The at least one processor,

If the quality value of the 3D model is less than the threshold, reset the shooting interval or shooting position of the robot,

Obtaining a plurality of reset images by photographing the user at the reset photographing interval or photographing position while moving along the photographing path through the traveling unit,

A robot that updates a 3D model for the user based on the acquired plurality of reset images.
According to paragraph 1,

The at least one processor,

Upon entering the second mode of generating a 3D model for the user, the robot recognizes the user while driving,

Estimate the shooting location of the robot based on information about the recognized user's location,

Controlling the traveling unit to move the robot to the estimated position,

Obtaining at least one image through the camera at the estimated location,

A robot that analyzes the at least one image, classifies the type of the at least one image, and stores it in a database.
According to clause 5,

The at least one processor,

A robot that, when a plurality of preset types of images are stored in the database, creates a 3D model for the user based on the plurality of types of stored images.
According to clause 6,

The at least one processor,

If there is an image of an unstored type among the plurality of types, a robot that estimates the shooting position of the robot so that it can shoot at a shooting angle corresponding to the image of the unstored type.
According to clause 6,

The at least one processor,

Obtaining a plurality of projection images by projecting the generated 3D model into 2D,

Comparing the plurality of types of images with a plurality of projection images corresponding to the plurality of types of images,

Based on the comparison results, the quality value of the 3D model is measured,

If the quality value of the 3D model is less than the threshold, reset the shooting position of the robot,

Obtaining a plurality of reset images by photographing the user at the reset shooting position through the traveling unit,

A robot that updates a 3D model for the user based on the acquired plurality of reset images.
According to paragraph 1,

The at least one processor,

When the user is recognized, obtain a facial image by photographing the user's face,

Analyzing the acquired facial image to obtain an expression type corresponding to the acquired facial image,

Store information about the face image and the expression type in a database,

A robot that generates facial expressions of a 3D model based on facial images for each expression type stored in the database.
According to paragraph 1,

The at least one processor,

When the user is recognized, obtain a motion image by photographing the user's motion,

Analyzing the acquired motion image to obtain a motion type corresponding to the acquired motion image,

Store information about the motion image and the motion type in a database,

A robot that generates motion of a 3D model based on motion images for each motion type stored in the database.
In the control method of a robot that creates a 3D model,

Upon entering a first mode for generating a 3D model for a user, identifying a shooting location for generating a 3D model for the user based on a map stored in the robot;

Outputting a message to the user to move to the identified shooting location;

When the user is located at the photographing location, obtaining a photographing path for photographing the user;

acquiring a plurality of images by photographing the user at preset photographing intervals while moving along the photographing path; and

A control method comprising: generating a 3D model for the user based on the acquired plurality of images.
According to clause 11,

The control method is,

acquiring a plurality of projection images by projecting the generated 3D model into 2D;

Comparing the plurality of images and a plurality of projection images corresponding to the plurality of images;

A control method comprising: measuring a quality value of the 3D model based on the comparison result.
According to clause 12,

The comparison step is,

Pixels of a plurality of first feature points included in a first image taken at a first angle with respect to the front of the user among the plurality of images and a first projection of the 3D model in the first angle direction among the plurality of projection images 1 Compare pixels of a plurality of second feature points included in the projection image to calculate a pixel difference value,

The step of measuring the quality value of the 3D model is,

A control method for measuring the quality value of the 3D model based on the calculated pixel difference value.
According to clause 13,

The control method is,

If the quality value of the 3D model is less than a threshold, resetting the shooting interval or shooting position of the robot;

acquiring a plurality of reset images by photographing the user at the reset photographing interval or photographing position while moving along the photographing path through the traveling unit; and

A control method including; updating a 3D model for the user based on the obtained plurality of reset images.
According to clause 11,

The control method is,

Upon entering a second mode of generating a 3D model for the user, the robot recognizes the user while driving;

estimating the shooting location of the robot based on information about the recognized user's location;

moving to the estimated location;

acquiring at least one image at the estimated location;

A control method comprising: analyzing the at least one image, classifying the type of the at least one image, and storing the type in a database.