WO2023189601A1

WO2023189601A1 - Information processing device, recording medium, and information processing method

Info

Publication number: WO2023189601A1
Application number: PCT/JP2023/010101
Authority: WO
Inventors: 康次佐藤
Original assignee: ソニーグループ株式会社
Priority date: 2022-03-29
Filing date: 2023-03-15
Publication date: 2023-10-05

Abstract

This information processing device (100) comprises a control unit (140). The control unit (140) acquires a three-dimensional model of a character. The control unit (140) performs image recognition processing on an image viewed from a virtual viewpoint of the three-dimensional model, and estimates part positions of the character. The control unit (140) estimates a part area in the three-dimensional model of the character on the basis of the three-dimensional model and the part positions.

Description

Information processing device, recording medium and information processing method

The present disclosure relates to an information processing device, a recording medium, and an information processing method.

Conventionally, in the field of computer graphics, when creating a three-dimensional model (hereinafter also referred to as a 3D model) in virtual space, a method is known in which a 3D model is created by combining multiple parts. Further, as a system for managing parts of a 3D model, for example, a system is known that extracts data on parts and searches for parts using the data.

Japanese Patent Application Publication No. 2007-109221

The system described above extracts part data from a 3D model that has already been divided into parts. In this way, the above-described system assumes that parts have already been separated from the 3D model, and does not consider how to separate the parts from the 3D model.

Analyzing 3D data such as 3D models requires complex processing and a large amount of calculation, making it highly difficult. Analysis of 3D data is used to separate the 3D model into parts. For this reason, it has been difficult to easily (for example, in a short time or with high precision) perform a process of separating a 3D model into parts. Therefore, it is desired that 3D models can be analyzed more easily.

Therefore, the present disclosure provides a mechanism that allows 3D models to be analyzed more easily.

Note that the above-mentioned problem or object is only one of the plurality of problems or objects that can be solved or achieved by the plurality of embodiments disclosed in this specification.

The information processing device of the present disclosure includes a control unit. The control unit obtains a three-dimensional model of the character. The control unit performs image recognition processing on the image seen from the virtual viewpoint of the three-dimensional model to estimate the positions of the parts of the character. The control unit estimates a part area in the three-dimensional model of the character based on the three-dimensional model and the part position.

FIG. 2 is a diagram for explaining an overview of a 3D model analysis process according to an embodiment of the present disclosure. FIG. 1 is a block diagram illustrating a configuration example of an information processing device according to an embodiment of the present disclosure. FIG. 3 is a diagram for explaining an example of rendering processing by a rendering unit according to an embodiment of the present disclosure. It is a figure showing an example of part position estimation by a position estimating part concerning an embodiment of this indication. FIG. 7 is a diagram illustrating another example of part position estimation by the position estimation unit according to the embodiment of the present disclosure. FIG. 3 is a diagram for explaining an example of a parts region estimation process by a region estimating unit according to an embodiment of the present disclosure. FIG. 3 is a diagram for explaining an example of a correction process performed by a region estimation unit according to an embodiment of the present disclosure. FIG. 7 is a diagram for explaining an example of rattling detection performed by a region estimation unit according to an embodiment of the present disclosure. FIG. 2 is a diagram for explaining an example of a search process according to an embodiment of the present disclosure. FIG. 2 is a diagram illustrating an example of a search image according to an embodiment of the present disclosure. FIG. 3 is a diagram for explaining an example of extraction of feature amount information by a search processing unit according to an embodiment of the present disclosure. FIG. 2 is a diagram for explaining an example of a search in a latent space by a search processing unit according to an embodiment of the present disclosure. FIG. 3 is a diagram illustrating an example of a search result image according to an embodiment of the present disclosure. FIG. 3 is a diagram for explaining an example of a search range changing process in a search processing unit according to an embodiment of the present disclosure. FIG. 7 is a diagram illustrating another example of a search result image according to an embodiment of the present disclosure. It is a flow chart which shows an example of a flow of the 1st parts separation processing concerning an embodiment of this indication. It is a flow chart which shows an example of the flow of the 2nd parts separation processing concerning an embodiment of this indication. FIG. 3 is a diagram for explaining an example of correction of recognition results according to an embodiment of the present disclosure. FIG. 2 is a diagram for explaining an example of a UI image showing an estimation result according to an embodiment of the present disclosure. FIG. 7 is a diagram for explaining another example of a UI image showing an estimation result according to an embodiment of the present disclosure. FIG. 7 is a diagram for explaining another example of a UI image showing an estimation result according to an embodiment of the present disclosure. It is a flowchart which shows an example of the flow of the 3rd parts separation process concerning an embodiment of this indication. 1 is a block diagram illustrating an example of a hardware configuration of an information processing device according to an embodiment. FIG.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. Note that, in this specification and the drawings, components having substantially the same functional configurations are designated by the same reference numerals and redundant explanation will be omitted.

Furthermore, in this specification and the drawings, similar components of the embodiments may be distinguished by using different alphabets or numbers after the same reference numerals. However, if there is no particular need to distinguish between similar components, only the same reference numerals are given.

One or more embodiments (including examples and modifications) described below can each be implemented independently. On the other hand, at least a portion of the plurality of embodiments described below may be implemented in combination with at least a portion of other embodiments as appropriate. These multiple embodiments may include novel features that are different from each other. Therefore, these multiple embodiments may contribute to solving mutually different objectives or problems, and may produce mutually different effects.

<<1. Introduction >>
<1.1. Challenge>
In recent years, demand for 3D character models is expected to increase due to expectations for the Metaverse. 3D character models are used not only in the Metaverse but also in video production such as movies.

Higher quality 3D models are required in the Metaverse and video production. However, producing high-quality 3D models is costly. Therefore, there is a need to simplify production so that 3D models can be produced at lower costs.

One method to simplify the production of 3D models is, for example, to store 3D models for each part in a database (DB), combine them to create the basic body of the 3D model, and finally perform finishing processing. Can be mentioned. For example, a system can be considered that creates a 3D model of a character by combining parts such as the character's body parts such as the head, eyes, nose, ears, mouth, and body, as well as the character's costumes and accessories such as hats, glasses, and clothes. .

In order to create a 3D model by combining parts in this way, it is desirable for the system to classify and store the 3D model of the character by parts. 3D models of characters are complex and do not have unified specifications, but for example, the system analyzes the shape of the 3D model and classifies each part of the character, allowing the user to combine each part to create the desired 3D model. You can easily create a character's body.

Furthermore, in order to enable the user to select (search) for a desired part, it is desirable for the system to analyze metadata for each part and store the parts and metadata in association with each other. For example, the system can create a database that stores metadata obtained from character features and parts features in association with the parts, allowing the user to more easily find the desired part by searching this database. be able to obtain it.

In this way, the system analyzes the 3D model of the character, separates the parts, and extracts metadata, allowing the user to more easily create the 3D model of the character.

However, the process of analyzing a 3D model, classifying it into parts, and extracting metadata is complex, requires a large amount of calculation, and is highly difficult. This is because a 3D model is three-dimensional data (x, y, z), and also because there are no rules for its constituent elements and the data structure has a very high degree of freedom. be.

For example, when a 3D model is expressed as mesh data, the 3D model is expressed by a plurality of vertices, edges connecting the vertices, and a surface formed by the vertices and edges.

However, there are no fixed specifications for the representation of 3D models. For example, there are no clear rules regarding which faces and vertices correspond to which parts, or the number of faces and vertices that make up a part. Therefore, it is not easy for the system to analyze the mesh data of the 3D model and extract parts.

For example, suppose that the system separates a nose as a part from a 3D model of a character's face. In this case, it is not easy for the system to determine which vertex data of the face corresponds to the nose.

Additionally, 3D models have a higher degree of freedom than 2D images, and there are no restrictions such as resolution. Therefore, the higher the quality of the 3D model, the larger the amount of data (for example, the number of vertices). Therefore, the analysis process of the 3D model becomes more complicated and the calculation load increases.

In this way, the system analyzes the 3D model in order to separate specific parts from the character's 3D model information (for example, the mesh data mentioned above) and extract metadata that is characteristic of this part. It wasn't easy to do.

<1.2. Overview of proposed technology>
Therefore, the information processing device according to the proposed technology of the present disclosure performs image recognition processing using a rendered image of a 3D model of a character, and uses the result of the image recognition processing to narrow down the 3D models to be analyzed. The information processing device performs analysis on the narrowed down 3D models. Thereby, the information processing device can more easily analyze the 3D model and can more easily separate parts from the 3D model.

FIG. 1 is a diagram for explaining an overview of a 3D model analysis process according to an embodiment of the present disclosure. The analysis process in FIG. 1 is executed by the information processing device 100, for example.

The information processing device 100 first obtains 3D model information (hereinafter also referred to as 3D model) of a character (step S1). For example, the information processing device 100 acquires a 3D model of a character from a database. The 3D model includes, for example, the mesh data described above.

Next, the information processing device 100 performs rendering (drawing) of the character based on the acquired 3D model, and generates an image of the character viewed from a virtual viewpoint (step S2).

The information processing device 100 performs part image recognition processing on the generated image (step S3). Thereby, the information processing device 100 estimates the position of the part in the image. Note that the position of a part estimated by the information processing apparatus 100 based on image recognition processing is also referred to as an image recognition position. For example, in FIG. 1, the information processing device 100 performs right eye image recognition processing on the image, and estimates an area including the right eye as the image recognition position.

The information processing device 100 estimates the image recognition position in the 3D model based on the image recognition process (step S4). For example, the information processing device 100 estimates the position of the 3D model corresponding to the image recognition position in the image as the image recognition position in the 3D model.

The information processing device 100 performs part analysis of the 3D model based on the image recognition position in the 3D model, and estimates the region of the part in the 3D model (step S5). The information processing device 100 estimates data corresponding to a part from among the mesh data of the 3D model as a 3D model of the parts area. For example, in FIG. 1, the information processing apparatus 100 estimates the vertex data group corresponding to the right eye as a 3D model of the right eye part.

The information processing device 100 extracts metadata of the parts (step S6). For example, the information processing device 100 extracts metadata based on a 3D model of a character, an image, and a 3D model of parts.

The information processing device 100 associates and stores parts and metadata (step S7). For example, the information processing device 100 associates the 3D model of the part whose region was estimated in step S5 with the metadata of the part extracted in step S6, and stores the 3D model in the database. In the example of FIG. 1, the information processing apparatus 100 stores a 3D model of a part in a parts DB (Data Base), and stores metadata of the part in the metadata DB.

In this way, the information processing device 100 acquires a 3D model of a character (an example of 3D model information). The information processing device 100 performs image recognition processing on an image drawn based on a 3D model, in which the character is viewed from a virtual viewpoint, to estimate the positions of the character's parts. The information processing device 100 estimates the part area in the 3D model of the character based on the 3D model and the part position.

Thereby, the information processing apparatus 100 can narrow down the 3D models to be analyzed from among the 3D models, and can more easily analyze the 3D models. Therefore, the information processing apparatus 100 can further reduce the processing load of separating the 3D model of the character into parts. Furthermore, the information processing apparatus 100 can separate parts of a 3D model of a character with higher precision.

Additionally, the information processing device 100 can efficiently create a character parts DB and a metadata DB used for searching parts by analyzing the 3D shape of the character through analysis processing. By using the information processing device 100, a user can more efficiently create a 3D model body of a character.

<<2. Configuration example of information processing device >>
FIG. 2 is a block diagram illustrating a configuration example of the information processing device 100 according to the embodiment of the present disclosure. The information processing apparatus 100 according to the embodiment of the present disclosure narrows the search space on the 3D model by using image processing on rendered images to analyze the shape of a 3D character that has a high degree of freedom and is difficult to process. Perform feature analysis. Thereby, the information processing apparatus 100 can perform part area estimation and metadata extraction of a 3D model with less processing load through simpler processing.

The information processing device 100 shown in FIG. 2 includes a communication section 110, an input/output section 120, a storage section 130, and a control section 140.

The information processing device 100 may be a terminal device used by a user, such as a personal computer or a tablet terminal, or may be a server device placed on a network (for example, a cloud server device or a local server device). good.

In FIG. 2, the information processing device 100 includes both a control unit 140 that executes applications such as the analysis processing described above, and a storage unit 130 that includes a parts DB 133 and a metadata DB 134 and functions as storage. Alternatively, some functions, such as the storage function of the storage unit 130, may be realized by an information processing device (for example, a server device) different from the information processing device 100 in FIG. 2.

Furthermore, as will be described later, the information processing device 100 in FIG. 2 has both an acquisition function that analyzes a 3D model of a character and acquires parts, and a search function that searches for parts. Alternatively, the search function may be realized by an information processing device different from the information processing device 100 having the acquisition function.

(Communication Department 110)
Communication unit 110 is a communication interface for communicating with other devices. For example, the communication unit 110 is a LAN (Local Area Network) interface such as a NIC (Network Interface Card). Communication unit 110 may be a wired interface or a wireless interface. The communication unit 110 communicates with other devices under the control of the control unit 140.

(Input/output unit 120)
The input/output unit 120 is a user interface for exchanging information with the user. For example, the input/output unit 120 is an operating device, such as a keyboard, a mouse, an operation key, a touch panel, etc., for the user to perform various operations. Alternatively, the input/output unit 120 is a display device such as a liquid crystal display (Liquid Crystal Display) or an organic EL display (Organic Electroluminescence Display). The input/output unit 120 may be an audio device such as a speaker or a buzzer. Further, the input/output unit 120 may be a lighting device such as an LED (Light Emitting Diode) lamp.

(Storage unit 130)
The storage unit 130 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory), a ROM (Read Only Memory), or a flash memory, or a storage device such as a hard disk or an optical disk.

The storage unit 130 in FIG. 2 includes a 3D model DB 131, a log file DB 132, a parts DB 133, and a metadata DB 134.

The 3D model DB 131 is a database that stores 3D models of characters on which the information processing device 100 performs 3D shape analysis. The log file DB 132 is a database that stores log files that hold analysis results of 3D shape analysis performed by the information processing device 100.

The parts DB 133 is a database that stores 3D models of character parts regions obtained by the information processing device 100 performing 3D shape analysis. The metadata DB 134 is a database that stores metadata corresponding to parts areas. The storage unit 130 stores the 3D model and metadata of the parts area in association with each other.

(Control unit 140)
The control unit 140 is a controller that controls each unit of the information processing device 100. The control unit 140 is realized by, for example, a processor such as a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or a GPU (Graphics Processing Unit). For example, the control unit 140 is realized by a processor executing various programs stored in a storage device inside the information processing device 100 using a RAM or the like as a work area. Note that the control unit 140 may be realized by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array). CPUs, MPUs, GPUs, ASICs, and FPGAs can all be considered controllers.

The control unit 140 includes a model acquisition unit 141, a rendering unit 142, an image recognition unit 143, a position estimation unit 144, a region estimation unit 145, an extraction unit 146, a search processing unit 147, and a UI control unit 148. , is provided. The control unit 140 realizes an acquisition function (application function) in which the model acquisition unit 141 to the extraction unit 146 analyze the 3D model of the character described above and acquire parts. Further, the control unit 140 realizes a search function (application function) for searching for parts using the search processing unit 147.

Each block (model acquisition unit 141 to UI control unit 148) constituting the control unit 140 is a functional block indicating a function of the control unit 140. These functional blocks may be software blocks or hardware blocks. For example, each of the above functional blocks may be one software module realized by software (including a microprogram), or one circuit block on a semiconductor chip (die). Of course, each functional block may be one processor or one integrated circuit. The control unit 140 may be configured in functional units different from the above-mentioned functional blocks. The functional blocks can be configured in any way.

Note that the control unit 140 may be configured in a functional unit different from the above-mentioned functional blocks. Further, some or all of the blocks (model acquisition unit 141 to UI control unit 148) constituting the control unit 140 may be operated by another device.

(Model acquisition unit 141)
The model acquisition unit 141 acquires a 3D model of the character by reading the 3D model from the 3D model DB 131. Note that the model acquisition unit 141 can acquire a 3D model of a character from another device via the communication unit 110. The model acquisition unit 141 outputs the acquired 3D model to the rendering unit 142, the position estimation unit 144, and the area estimation unit 145.

(Rendering unit 142)
FIG. 3 is a diagram for explaining an example of rendering processing by the rendering unit 142 according to the embodiment of the present disclosure. The rendering unit 142 generates a character image (2D image) based on the 3D model by executing rendering processing.

As shown in FIG. 3, the rendering unit 142 generates an image of the character viewed from the virtual viewpoint of the virtual camera C. At this time, the rendering unit 142 can generate a plurality of images with different virtual viewpoints.

For example, the rendering unit 142 generates images P_1 to P_N in which the character is viewed from each of a plurality of virtual cameras C_1 to C_N arranged around the character at a predetermined angle (for example, 30 degrees or 45 degrees). Images P_1 to P_N include characters with different orientations.

The rendering unit 142 outputs the generated images P_1 to P_N to the image recognition unit 143. At this time, the rendering unit 142 may output information regarding the virtual cameras C_1 to C_N corresponding to the images P_1 to P_N to the image recognition unit 143.

(Image recognition unit 143)
Returning to FIG. 2, the image recognition unit 143 performs image recognition processing on images P_1 to P_N to estimate parts positions (image recognition positions) included in images P_1 to P_N.

For example, the image recognition unit 143 estimates the position of a pre-designated part in images P_1 to P_N. Here, the part (parts region) estimated by the image recognition unit 143 includes, for example, a region including a body part of the character. Specifically, the part areas include, for example, the character's eye area, nose area, mouth area, and ear area.

In addition, the parts area includes an area including body parts such as fingers, palms, and feet, an area including accessories such as glasses, watches, earrings, and necklaces, and costumes such as clothes and hats.

Note that the parts estimated by the information processing device 100 are not limited to the examples described here. The information processing device 100 can estimate any part, such as a part specified by a user or a part specialized for a character.

The image recognition unit 143 performs image recognition processing on the image P using pattern recognition technology or semantic segmentation technology, and determines the pixel coordinates of parts within the image P. The image recognition unit 143 estimates, for example, a rectangular or polygonal area within the image P as the image recognition position of the part.

The image recognition unit 143 outputs information regarding the image recognition position in the image P to the position estimation unit 144. The image recognition unit 143 also outputs the result of the image recognition process to the extraction unit 146. At this time, the image recognition unit 143 can output information regarding the virtual camera C corresponding to the image P to the position estimation unit 144.

Further, if the image recognition unit 143 cannot estimate a part from the image P, it writes a message to the effect that image recognition has failed in a log file. At this time, the image recognition unit 143 may associate the parts that could not be estimated with the image P and write them in a log file, for example.

(Position estimation unit 144)
The position estimation unit 144 estimates the part position (image recognition position) in the 3D model based on the image recognition result by the image recognition unit 143. The image recognition position in the 3D model estimated by the position estimation unit 144 is a position (area) more roughly than the part area actually estimated by the information processing device 100.

FIG. 4 is a diagram illustrating an example of part position estimation by the position estimation unit 144 according to the embodiment of the present disclosure. The position estimating unit 144 estimates the approximate position of the part in the 3D model by calculating backwards from the settings of the virtual camera C used to render the 3D model.

As shown in the upper diagram of FIG. 4, for example, it is assumed that the image recognition unit 143 estimates the image recognition position Rp in the image P_1 using the right eye as a part. Here, the image P_1 is an image obtained by rendering a 3D model viewed from the virtual camera C_1.

The position estimating unit 144 calculates the image recognition position in the 3D model, as shown in the middle diagram of FIG. Estimate Rm. For example, the position estimation unit 144 estimates the image recognition position Rm in the 3D model by projecting the image recognition position Rp in the image P_1 into the 3D space of the 3D model.

Here, the lower diagram in FIG. 4 is an enlarged diagram of the image recognition position Rm in the 3D model. As shown in the lower diagram of FIG. 4, the image recognition position Rm estimated by the position estimating unit 144 represents the approximate position (area) of a part (here, the "eye") in the 3D model. Therefore, the image recognition position Rm does not necessarily match the mesh of the 3D model. That is, the contour line of the image recognition position Rm in the 3D model does not necessarily match the edge of the mesh in the 3D model.

For example, in FIG. 4, the position estimation unit 144 estimates a rectangular area as the image recognition position Rm in the 3D model. In this way, the position estimating unit 144 roughly estimates the position of the parts in the 3D model. Therefore, the estimated position (area) may differ from the actual part area (for example, a surface of a 3D model).

FIG. 5 is a diagram illustrating another example of part position estimation by the position estimation unit 144 according to the embodiment of the present disclosure. As described above, the image recognition unit 143 can estimate the position of parts in the image P using semantic segmentation technology in addition to pattern-based image recognition. FIG. 5 shows part position estimation by the position estimation unit 144 when the image recognition unit 143 estimates the part position by semantic segmentation.

For example, the image recognition unit 143 performs image recognition using semantic segmentation on the image P shown in the upper diagram of FIG. Estimate. By using semantic segmentation, the image recognition unit 143 can estimate a more detailed image recognition position Rp1 than the image recognition process shown in FIG. 4 .

The position estimation unit 144 calculates the image recognition position Rm1 in the 3D model based on the arrangement of the image recognition position Rp1 in the image P and the virtual viewpoint and angle of view of the virtual camera C in the 3D space, as shown in the lower diagram of FIG. Estimate.

The method of estimating the image recognition position Rm1 in the 3D model by the position estimation unit 144 is the same as in the case of FIG. 4, but the image recognition unit 143 estimates the image recognition position Rp1 more finely than the image recognition position Rp of FIG. 4. . Therefore, the image recognition position Rm1 in the 3D model estimated by the position estimation unit 144 is finer than the image recognition position Rm in FIG. 4.

However, as described above, the estimation of the image recognition position Rm1 (part position) in the 3D model by the position estimation unit 144 is to estimate a rough position (area). Therefore, the image recognition position Rm1 and the mesh of the 3D model do not necessarily match. Further, depending on the estimation result of the image recognition position Rp1 by the image recognition unit 143, there is a possibility that the contour of the image recognition position Rm1 in the 3D model may become uneven (jagged).

The position estimating unit 144 outputs information regarding the image recognition position (part position) in the estimated 3D model to the area estimating unit 145.

Furthermore, if the position estimation unit 144 fails in estimating the image recognition position, the position estimation unit 144 writes a notification to that effect in a log file, for example. For example, based on the recognition result of the image recognition unit 143, the position estimation unit 144 determines that estimation of the image recognition position has failed when the image recognition position estimated in the 3D space is not on the 3D model. If estimation of the image recognition position fails in this way, the position estimating unit 144 may associate the 3D model of the character with the parts and write them in a log file, for example.

(Area estimation unit 145)
Returning to FIG. 2, the area estimation unit 145 estimates the part area in the 3D model of the character based on the 3D model and the part position Rm (image recognition position Rm) estimated by the position estimation unit 144. The region estimating unit 145 estimates and extracts a 3D model (for example, mesh data) of a character's parts as a parts region. Thereby, the area estimation unit 145 separates the parts from the 3D model of the character.

As described above, the part position estimated by the position estimation unit 144 is the rough position (area) of the part in the 3D model of the character. Therefore, this part position may not match the area defined by the mesh of the actual 3D model, or may be a jagged area with unevenness. In this way, the parts positions estimated by the position estimation unit 144 cannot be said to have sufficient accuracy to separate the parts from the 3D model.

Therefore, the region estimating section 145 according to the embodiment of the present disclosure performs analysis of the 3D model (3D analysis) focusing on the parts position Rm estimated by the position estimating section 144, and estimates the parts region in the 3D model of the character. do.

FIG. 6 is a diagram for explaining an example of part region estimation processing by the region estimating unit 145 according to the embodiment of the present disclosure. As shown in FIG. 6, the area estimating unit 145 analyzes the 3D model within the image recognition position Rm estimated by the position estimating unit 144, and estimates a part area Rr along the mesh of the 3D model.

At this time, the region estimating unit 145 can estimate the part region with higher accuracy by, for example, performing an analysis based on the characteristics of the 3D shape according to the part to be extracted (for example, eyes, etc.). can.

For example, the region estimating unit 145 performs analysis by taking into account the characteristics of the 3D structure and the characteristics of the part attributes as the characteristics of each part. Features of the 3D structure include, for example, curvature, gradient, and Lavrasian in the 3D model of the part. Features of part attributes include volume ratio (for example, the ratio of the volume of the part to the volume of the entire character), and aspect ratio (for example, the aspect ratio of the part).

Furthermore, the region estimating unit 145 may perform analysis not on the entire 3D model of the character, but on the parts positions estimated by the position estimating unit 144. Thereby, the region estimating unit 145 can further reduce the processing load of analysis.

The region estimating unit 145 may separate the parts region Rr estimated by 3D analysis as a 3D model of the character's parts, but may also correct the parts region Rr as a corrected parts region Rc.

FIG. 7 is a diagram for explaining an example of a correction process performed by the region estimation unit 145 according to the embodiment of the present disclosure. FIG. 8 is a diagram for explaining an example of rattling detection performed by the area estimating unit 145 according to an embodiment of the present disclosure.

The left diagram in FIG. 7 shows the parts region Rr estimated by the region estimation unit 145. As shown by region A in the left diagram of FIG. 7, the outline of the part region Rr estimated by the region estimation unit 145 may be jittery due to the mesh structure of the 3D model.

Therefore, the region estimating unit 145 detects wobbling in the part region Rr. The region estimating unit 145 performs wobbling detection using the normal information of the part region Rr. For example, the region estimation unit 145 detects the normal vector (arrow in FIG. 8) of the outline of the part region Rr.

As shown in the left diagram of FIG. 8, when there is no wobbling in the part region Rr, the normal vector rotates in the same direction along the outline of the part region Rr. On the other hand, as shown in the right diagram of FIG. 8, when there is wobbling in the parts region Rr, there are places along the outline of the parts region Rr where the direction of the normal vector is reversed.

The region estimation unit 145 checks the direction of the normal vector of the contour line along the contour of the part region Rr. The region estimating unit 145 detects a location where the direction of the normal vector is substantially the same as a location where the contour is jittery (wobbly location).

The region estimating unit 145 generates a corrected part region Rc by correcting the detected wobbling portion. For example, as shown in FIG. 8, the region estimation unit 145 corrects the wobbling of the region A and generates a corrected parts region Rc. For example, the region estimating unit 145 generates a corrected part region Rc without wobbling in the outline by creating a new edge at the wobbling location.

In this way, the region estimation unit 145 corrects the shape of the parts region Rr according to the wobbling of the outline of the parts region Rr.

The region estimation unit 145 separates the 3D model (3D model information, for example, mesh data) of the corrected parts region Rc generated by the correction as a 3D model of the character's parts. The region estimation unit 145 separates the character into parts by, for example, generating a 3D model of the corrected parts region Rc from the 3D model of the character. The area estimation unit 145 outputs information regarding the 3D model of the separated part to the extraction unit 146.

Furthermore, if the region estimating unit 145 fails in estimating the parts region Rr, the region estimating unit 145 writes a notification to that effect in a log file, for example. For example, if the region estimating unit 145 determines that there is no parts region Rr as a result of the 3D analysis, it determines that estimation of the parts region Rr has failed. If estimation of the parts region Rr fails in this way, the region estimating unit 145 may associate the 3D model of the character with the parts and write them in a log file, for example.

Note that here, it is assumed that the region estimation unit 145 detects wobbling according to a change in the direction of the normal vector of the parts region Rr and corrects the shape of the parts region Rr. Correction of the parts region Rr is not limited to this.

For example, the region estimation unit 145 may correct the parts region Rr using machine learning. For example, the region estimating unit 145 can correct the parts region Rr using a learned correction model that receives the parts region Rr as an input and outputs the corrected parts region Rc.

(Extraction unit 146)
Returning to FIG. 2, the extraction unit 146 acquires character information regarding the character as metadata based on at least one of the image recognition processing result, the estimation result of the parts position (image recognition position Rm), and the parts region Rr. .

The metadata acquired by the extraction unit 146 includes metadata obtained by image recognition by the image recognition unit 143 and metadata obtained based on the parts region Rr estimated by the region estimation unit 145.

The extraction unit 146 extracts, for example, at least one of classification information, feature amount information, and relative information as metadata.

The classification information includes, for example, information regarding class classification in which the parts region Rr is classified into classes. The feature amount information includes information regarding the feature amount vector (in other words, the latent space indicating the feature amount) of the parts region Rr. The relative information includes information about the relative sizes and positions (relative positions) between a plurality of parts, information about the relative sizes and positions (relative positions) between a character and the parts, and the like.

The extraction unit 146 extracts, for example, classification information and feature amount information based on the image recognition result. The extraction unit 146 extracts classification information from the image P using, for example, a deep learning class classification task.

For example, the following classes are classified by the extraction unit 146 based on the image recognition results. Note that the following is an example, and the extraction unit 146 may classify parts into classes other than the following classes.

For example, the following classes can be cited as character characteristics.
・Whether it is Photo Real (PR) or Non Photo Real (NPR) ・For NPR, whether the drawing is realistic or deformed ・For NPR, age ・Age, gender, etc. ・Personality

For example, the following classes can be cited as part characteristics.
・Color (e.g. the color of eyes, hair, clothes, etc.)
・Shape (for example, for eyes, sagging or slanted eyes; for hair, short or long, etc.)
・Genre (for example, accessories such as glasses and necklaces; costumes such as hats and jackets)

Furthermore, the extraction unit 146 extracts feature information by clustering on a latent space created by Variational Auto Encoder, Generative Adversarial Network, etc., based on the image recognition results. The extraction unit 146 measures the degree of similarity according to the character of the part by extracting the feature amount information. By using the feature amount information, the information processing device 100 can estimate the degree of similarity between the same parts (for example, faces, eyes, hairstyles, etc.) of different characters (for example, characters #1 and #2, not shown).

The extraction unit 146 extracts, for example, classification information, feature amount information, and relative information based on the 3D model of the part region Rr, in other words, the analysis result of the 3D shape of the 3D model.

The extraction unit 146 extracts classification information and feature amount information from the 3D analysis results, for example, in the same manner as the image recognition results. At this time, the extraction unit 146 can extract classification information and feature amount information, for example, limited to the parts region Rr.

For example, the following classes are classified by the extraction unit 146 based on the 3D analysis results. Note that the following is an example, and the extraction unit 146 may classify parts into classes other than the following classes.
・Texture (rugged, wrinkled, smooth, etc.)
・Level of detail (mesh data with many vertices (high poly), few (low poly), etc.)

In this way, by the extraction unit 146 extracting metadata using the 3D model of the parts region Rr, the extraction accuracy can be further improved and the processing load can be further reduced.

Furthermore, as described above, the extraction unit 146 extracts metadata (for example, classification information and feature amount information) using the image recognition results. The extraction unit 146 can extract metadata with higher precision by extracting metadata based on the 3D analysis results together with the metadata extracted from the image recognition results.

For example, assume that the extraction unit 146 extracts metadata (classification information) indicating that the character is a "muscular male" and that the 3D model "has a hand region" from the image recognition results. In this case, the extraction unit 146 classifies the parts, including this metadata. Specifically, for example, the extraction unit 146 extracts the classification information of the parts by inputting the 3D model (mesh data) of the parts region Rr and the metadata acquired from the image recognition results into a neural network.

In this way, by the extraction unit 146 extracting metadata based on the image recognition result and the 3D analysis result, the extraction unit 146 can extract the metadata of the part with higher accuracy.

The extraction unit 146 extracts, for example, relative information based on the 3D model of the character, in other words, the analysis result of the 3D shape of the 3D model. The extraction unit 146 extracts, for example, the relative positional relationship and relative size of multiple parts (for example, right eye and left eye, or face and eyes) of a specific character as relative information. For example, the extraction unit 146 measures the position and size of the character's eyes relative to the head, and extracts the measurement results as metadata.

The extraction unit 146 stores the parts and the extracted metadata in the parts DB 133 and metadata DB 134 in association with each other. Note that the region estimating unit 145 may store the parts in the parts DB 133.

As described above, the extraction unit 146 extracts metadata of parts by image recognition. Further, the extraction unit 146 uses the image recognition results to narrow down the metadata extraction range, and then extracts the metadata of the part by 3D shape analysis. Thereby, the extraction unit 146 can further reduce the load of extraction processing. Further, the extraction unit 146 can extract metadata with higher accuracy.

The metadata extracted by the extraction unit 146 is used, for example, when a user searches for parts. In this manner, the extraction unit 146 extracts metadata, associates it with parts, and stores it in the metadata DB 134, allowing the user to search for desired parts faster and more easily.

(Search processing unit 147)
The search processing unit 147 presents the user with parts (an example of parts information) corresponding to the metadata according to the search conditions specified by the user. The search processing unit 147 searches the metadata DB 134 according to search conditions specified by the user, and presents parts corresponding to the search results to the user. At this time, the search processing unit 147 presents the user with a 2D rendered image of the part, for example, in accordance with the 3D model of the part.

FIG. 9 is a diagram for explaining an example of search processing according to the embodiment of the present disclosure. As shown in FIG. 9, the search processing unit 147 receives parts search conditions from the user. The search processing unit 147 receives search conditions via the input/output unit 120, for example.

The search processing unit 147 searches the metadata DB 134 by specifying metadata according to the search conditions accepted from the user. The metadata DB 134 specifies parts corresponding to the metadata specified by the search processing unit 147 to the parts DB 133.

The parts DB 133 notifies the search processing unit 147 of the parts specified from the metadata DB 134. The search processing unit 147 presents the parts acquired from the parts DB 133 to the user as a search result.

In this way, by the search processing unit 147 searching for parts using the metadata stored in the metadata DB 134, the user can search more easily and in a shorter time.

The search processing unit 147 presents a search UI image and accepts search conditions from the user. FIG. 10 is a diagram illustrating an example of a search UI image according to an embodiment of the present disclosure. The search UI image shown in FIG. 10 is generated by the UI control unit 148 based on an instruction from the search processing unit 147, for example.

For example, when searching for parts based on classification information or relative information, the search processing unit 147 presents the search UI image shown in FIG. 10 to the user. For example, when the part is an "eye", the search processing unit 147 further narrows down the parts using classification information (corresponding to words and keywords in FIG. 10) and relative information (corresponding to size specification in FIG. 10). Search information is received from the user using the search UI image shown in FIG.

The user can specify the class of the classification information, for example, by inputting a free word or selecting a tag (such as "anime" or "girl" in FIG. 10). Furthermore, the user can specify the relative positions and relative sizes of parts by adjusting numerical values using sliders. For example, in the example of FIG. 10, by adjusting the slider, the user can specify the relative size of the eyes to the size of the face. Note that the user may specify the relative information by adjusting the notification using a slider, or may specify the relative information by directly specifying a numerical value.

As shown in FIG. 10, the search processing unit 147 can obtain search conditions by the user directly specifying metadata. Alternatively, the search processing unit 147 may acquire search conditions by the user specifying an image. In this case, the search processing unit 147 extracts feature information from the image specified by the user, and performs a parts search based on the feature information.

FIG. 11 is a diagram for explaining an example of feature information extraction by the search processing unit 147 according to the embodiment of the present disclosure. In FIG. 11, it is assumed that the user has specified image S_0 as a search condition.

In this case, the search processing unit 147 inputs, for example, image S_0 to the encoder. The encoder, for example, extracts a feature amount vector (latent space indicating a feature amount) from an image. For example, the search processing unit 147 extracts the feature vector V_0 corresponding to the image S_0 by inputting the image S_0 to an encoder. The search processing unit 147 searches for a character (or part) close to the image S_0 in the latent space using the extracted feature vector V_0.

FIG. 12 is a diagram for explaining an example of a search in the latent space by the search processing unit 147 according to the embodiment of the present disclosure. Although FIG. 12 shows a two-dimensional latent space to simplify the illustration, the actual latent space is a multidimensional space with two or more dimensions.

As shown in FIG. 12, the search processing unit 147 maps the feature vector V_0 extracted from the image S_0 to the latent space. The search processing unit 147 selects a representative feature vector from among the feature vectors located within the search range SR_0 that includes the feature vector V_0 in the latent space as a search result vector.

For example, the search processing unit 147 can select a search result vector depending on the distance and direction in the latent space. For example, the search processing unit 147 may randomly select a search result vector from within the search range SR_0. In the example of FIG. 12, the search processing unit 147 selects feature vectors Vc_01 to Vc_04 as search result vectors.

The search processing unit 147, for example, specifies the search result vector and searches the metadata DB 134, thereby acquiring the parts corresponding to the search result vector from the parts DB 133 as the search results. The search processing unit 147 presents the acquired parts to the user as a search result.

FIG. 13 is a diagram illustrating an example of a search result image according to the embodiment of the present disclosure. The search result image shown in FIG. 13 is generated by the UI control unit 148 based on an instruction from the search processing unit 147, for example.

In the example of FIG. 13, the search processing unit 147 displays the image S_0 specified by the user in the center of the search result images. Furthermore, the search processing unit 147 displays 2D images Sc_01 to Sc_04 of parts (parts showing the upper body in the example of FIG. 13) that are search results around the image S_0.

In this way, the search processing unit 147 searches for parts similar to the image specified by the user (within a predetermined search range in the latent space) by searching using the latent space (feature vector). can do.

Furthermore, for example, the search processing unit 147 can accept changes in the search range from the user using icon I in FIG. 13 .

FIG. 14 is a diagram for explaining an example of search range changing processing in the search processing unit 147 according to the embodiment of the present disclosure. The search processing unit 147 receives a movement of the icon I as an instruction to change the search range from the user.

As shown in the upper diagram of FIG. 14, it is assumed that the user moves the icon I as shown by the arrow. In the upper diagram of FIG. 14, the icon I before being moved by the user is designated as icon I_0, and the icon I after being moved is designated as icon I_1.

The search processing unit 147 that has received such a movement of the icon I changes the search range according to the movement of the icon I and performs a parts search process. For example, as shown in the lower diagram of FIG. 14, the search processing unit 147 searches for parts by changing the search range SR_0 to the search range SR_1.

Here, the search range SR_0 is a search range according to the feature vector V_0 corresponding to the image S_0 specified by the user, and is, for example, a range centered on the feature vector V_0.

Further, the search range SR_1 is a search range according to the feature vector V_1 according to the movement of the icon I, and is, for example, a range centered on the feature vector V_1. The feature vector V_1 is a vector obtained by moving the feature vector V_0 in the direction of the feature vector Vc_02 according to the amount of movement of the icon I (the length of the arrow in the upper diagram of FIG. 14). For example, the search processing unit 147 moves the feature vector V_0 in the direction of the feature vector Vc_02 by the ratio of the amount of movement of the icon I to the distance between the icon I_0 and the 2D image Sc02. Calculate V_1.

The search processing unit 147 selects a search result vector from the changed search range SR_1. The search processing unit 147 selects a search result vector from within the search range SR_1 in the same manner as the method for selecting a search result vector from within the search range SR_0. In the example of FIG. 14, the search processing unit 147 selects feature vectors Vc_11 to Vc_14 as search result vectors.

FIG. 15 is a diagram illustrating another example of a search result image according to the embodiment of the present disclosure. In FIG. 15, an image showing the results of the search performed by the search processing unit 147 in the search range SR_1 is shown.

As shown in FIG. 15, the search processing unit 147 displays 2D images Sc_11 to Sc_14 of parts corresponding to the feature vectors Vc_11 to Vc_14, which are search result vectors, around the icon I. Further, the search processing unit 147 may display the image S_0 specified by the user in addition to the 2D images Sc_11 to Sc_14.

In this way, the search processing unit 147 receives a change in the search range within the latent space from the user using the icon I, for example. Thereby, the user can more intuitively change the search range in the latent space, and can more easily search for a desired part.

Here, the search processing unit 147 accepts changes in the search range using icons, more specifically, according to the movement of the icon, but the method for changing the search range is not limited to this.

For example, the search processing unit 147 may change the search range when the user clicks on the 2D images Sc_01 to Sc_04. In this case, the search processing unit 147 selects a search result vector within the search range SR according to the feature vector Vc corresponding to the clicked 2D image Sc, for example.

Alternatively, the search processing unit 147 may accept a change in the search range from the user using a tool such as a slider to adjust numerical values. In this case, the search processing unit 147 changes the search range SR according to a numerical value specified by the user using a slider, for example, and searches for parts.

Furthermore, here, it is assumed that the search processing unit 147 searches in the latent space using the image S_0 specified by the user, but the search in the latent space by the search processing unit 147 is not limited to this.

For example, the search processing unit 147 may randomly select a feature vector corresponding to the search range SR. For example, when the user specifies a part, the search processing unit 147 randomly selects one feature vector of the specified part. The search processing unit 147 sets a search range corresponding to the selected feature amount vector in the latent space, and selects a search result vector within the set search range.

In this way, the search processing unit 147 can randomly search for parts and present them to the user.

Further, here, it is assumed that the search processing unit 147 selects four feature vectors as search result vectors, but the number of feature vectors selected by the search processing unit 147 is not limited to four. For example, the search processing unit 147 may select three or less feature vectors as the search result vector, or may select five or more feature vectors.

Furthermore, here, it is assumed that the search processing unit 147 presents to the user 2D images corresponding to all the feature vectors selected as search result vectors, but the 2D images presented to the user are not limited to this. For example, the search processing unit 147 may present to the user a part of the 2D image corresponding to the feature vector selected as the search result vector. For example, as shown in FIG. 14, even if the search processing unit 147 selects four feature vectors Vc_11 to Vc_14, the search processing unit 147 can present three or less 2D images to the user.

(UI control unit 148)
Returning to FIG. 2, the UI control unit 148 generates a screen (UI) and accepts operations on the UI. The UI control unit 148 generates a search UI image or a search result image according to an instruction from the search processing unit 147, for example, and presents them to the user via the input/output unit 120. Further, the UI control unit 148 receives input of search conditions and changes in the search range from the user via the input/output unit 120, for example. The UI control unit 148 notifies, for example, the search processing unit 147 of the input results from the user.

<<3. Parts separation process >>
(First parts separation process)
FIG. 16 is a flowchart illustrating an example of the flow of the first parts separation process according to the embodiment of the present disclosure. The first parts separation process shown in FIG. 16 is executed by the information processing device 100, for example. The information processing device 100 executes the first parts separation process, for example, in accordance with an instruction from a user.

As shown in FIG. 16, the information processing device 100 first obtains a 3D model of a character (step S101). The information processing device 100 acquires a 3D model from the 3D model DB 131, for example. Note that the information processing apparatus 100 may acquire the 3D model of the character from a range specified by the user.

The information processing device 100 classifies the acquired 3D model into major parts (step S102). Here, the major classification parts are parts that are larger than the parts that the information processing apparatus 100 separates in the first parts separation process. The major classification parts include, for example, a head region and a body region. Alternatively, the major classification parts may include a head region, an upper body region, and a lower body region. In this way, the information processing apparatus 100 divides the 3D model into large classification parts that are larger than the parts (for example, eyes, nose, etc.) to be separated in the first parts separation process.

Note that the major classification parts are larger than the parts to be separated in the first part separation process. Therefore, the process in which the information processing apparatus 100 separates a 3D model into major parts requires less processing load than the process in which the 3D model is separated into parts (for example, eyes, nose, etc.).

Next, the information processing device 100 selects one major classification part from among the divided major classification parts, renders the selected major classification part, and generates an image P (step S103).

The information processing device 100 performs part image recognition on the image P generated in step S103 (step S104). For example, the information processing apparatus 100 selects one part to be separated from a plurality of parts, and executes image recognition processing to estimate the position of the selected part with respect to the image P.

As described above, the information processing device 100 generates the image P by rendering the major classification parts. Therefore, the information processing apparatus 100 can recognize the image P with higher accuracy than when performing image recognition of an image in which the entire character is rendered.

The information processing device 100 determines whether or not recognition of the image P has been successful (step S105). For example, the information processing device 100 determines whether or not the recognition of the image P is successful depending on whether or not the parts can be recognized and whether or not the recognition accuracy of the parts is equal to or higher than a threshold value.

If it is determined that recognition of the image P has failed (step S105; No), that is, if the part could not be recognized or the recognition accuracy is less than the threshold, the information processing device 100 proceeds to step S110.

When it is determined that the image P has been successfully recognized (step S105; Yes), that is, when the parts can be recognized or when the recognition accuracy is equal to or higher than the threshold, the information processing device 100 sets the image recognition position Rm in the 3D model. is estimated (step S106). The information processing device 100 estimates the image recognition position Rm in the 3D model according to the image recognition position Rp obtained from the recognition result of the image P and the setting information of the virtual camera C.

The information processing device 100 estimates the parts region Rr in the 3D model based on the image recognition position Rm (step S107). For example, the information processing device 100 estimates the parts region Rr according to the characteristics of the parts to be separated.

The information processing device 100 extracts metadata corresponding to the parts region Rr based on the image recognition results of the parts region Rr and the image P (step S108). The information processing device 100 separates the 3D model of the parts region Rr into parts.

The information processing device 100 stores the parts and metadata (step S109). The information processing device 100 associates parts and metadata and stores them in a parts DB 133 and a metadata DB 134, respectively.

The information processing device 100 determines whether all parts have been separated in the major classification parts selected in step S103 (step S110). If there are parts that have not been separated (step S110; No), the information processing device 100 returns to step S104 and executes separation processing for the parts that have not been separated yet.

On the other hand, if all parts have been separated (step S110; Yes), the information processing device 100 determines whether all major classification parts have been separated (step S111). That is, the information processing device 100 determines whether all parts have been extracted in the 3D model of the character.

If there are major classification parts that have not been separated yet (step S111; No), the information processing device 100 returns to step S103 and performs the process of separating parts for the major classification parts that have not been separated. conduct.

On the other hand, if the parts have been separated in all major classification parts (step S111; Yes), the information processing device 100 determines whether the parts of all the 3D models have been separated (step S112). That is, the information processing device 100 determines whether all parts have been extracted for all characters.

If there is a 3D model whose parts have not been separated yet (step S112; No), the information processing device 100 returns to step S101 and obtains a 3D model of the character whose parts have not been separated.

On the other hand, if parts have been separated in all 3D models (step S112; Yes), the information processing device 100 ends the first parts separation process.

Here, it is assumed that the information processing apparatus 100 divides the 3D model into major classification parts and then renders the major classification parts. Alternatively, the information processing device 100 may render the 3D model and then divide it into major parts.

For example, the information processing device 100 renders the entire 3D model and generates an image including the entire character. For example, the information processing device 100 performs image recognition processing on the image including the entire image of the character, cuts out a region including the major classification parts, and generates the image P. Alternatively, the information processing device 100 may generate the image P by estimating a region including the major classification parts through image recognition processing and re-rendering a 3D model corresponding to the estimated region.

Furthermore, if at least one of image recognition, estimation of part positions Rp and Rm, and estimation of part region Rr fails, the information processing apparatus 100 leaves a log in a log file and performs next part or next 3D Analysis (parts separation) of the model can be performed.

As described above, the information processing device 100 generates a 2D image by rendering the 3D model information of the character. The information processing device 100 performs image recognition processing to recognize parts to be separated on the generated 2D image, and estimates the part position Rp in the 2D image.

The information processing device 100 estimates a rough part position Rm in the 3D space (3D model) based on the part position Rp in the 2D image. The information processing device 100 analyzes the 3D model according to the rough part position Rm in the 3D model and the characteristics of the part, and estimates the part region Rr in the 3D model.

For example, the information processing device 100 separates (generates) 3D model information (for example, mesh data) of the parts region Rr from the 3D model information of the character. Thereby, the information processing device 100 separates the parts from the character.

Additionally, the information processing device 100 extracts metadata corresponding to the part using the parts region Rr and the results of the image recognition process. The information processing device 100 stores parts and metadata in association with each other.

Thereby, the information processing device 100 can separate parts from a character with higher accuracy while further reducing the processing load of the process of separating parts from a character. Furthermore, the information processing device 100 associates and holds parts and metadata, allowing the user to search for desired parts more accurately and in a shorter time.

(Second parts separation process)
In the first parts separation process described above, the information processing apparatus 100 automatically performs the process of separating parts from the character, but the user may perform part of the process. That is, the information processing device 100 may perform a process of separating parts from a character (second parts separation process) while interacting with the user.

FIG. 17 is a flowchart showing an example of the flow of the second parts separation process according to the embodiment of the present disclosure. The second parts separation process shown in FIG. 17 is executed by the information processing device 100, for example. The information processing apparatus 100 executes the second parts separation process shown in FIG. 17, for example, in accordance with an instruction from a user. Note that among the second parts separation processing shown in FIG. 17, the same processes as the first parts separation processing shown in FIG.

As shown in FIG. 17, the information processing device 100 that performed image recognition of the part in step S104 presents the recognition result to the user (step S201). The information processing device 100 receives a modification (change) of the recognition result from the user by presenting the recognition result to the user.

FIG. 18 is a diagram for explaining an example of correction of recognition results according to the embodiment of the present disclosure. As shown in the upper diagram of FIG. 18, the information processing apparatus 100 presents the user with a UI image PU_1 in which the part position Rp, which is the recognition result, is superimposed on the image P, which is the recognition target. For example, the image recognition unit 143 of the information processing device 100 instructs the UI control unit 148 to generate the UI image PU_1. At this time, the UI control unit 148 displays the part position Rp using, for example, a primitive figure such as an ellipse or a rectangle.

The user can check whether the part recognition by the information processing device 100 is correct using the UI image PU_1. For example, if the information processing device 100 recognizes the part incorrectly, such as when the part position Rp deviates from the actual character part position, the user corrects the part position Rp. The user corrects the part position Rp by, for example, performing a GUI operation such as drag and drop. Thereby, as shown in the lower diagram of FIG. 18, the user can instruct the information processing apparatus 100 about the correct position of the parts.

It is assumed that the UI control unit 148 is unable to generate the UI image PU_1 in which a figure indicating the part position Rp is superimposed on the image P, such as when the information processing device 100 is unable to recognize any parts. In this case, the UI control unit 148 presents the user with, for example, a UI image that includes the image P but does not include the figure indicating the part position Rp. The user instructs the information processing apparatus 100 about the correct position of the part by drawing a figure indicating the part position Rp on the image P.

Alternatively, the UI control unit 148 may present to the user a UI image in which a figure indicating the part position Rp is drawn at a predefined position (default position) such as a corner of the image P, for example. The user instructs the information processing apparatus 100 about the correct position of the part by, for example, correcting the part position Rp by performing a GUI operation such as drag and drop.

Returning to FIG. 17, the information processing device 100 estimates the image recognition position Rm in the 3D model (step S106) and the parts region Rr (step S107) based on the part position Rp specified by the user.

The information processing device 100 presents the estimation result of the parts region Rr to the user (step S202). The information processing device 100 receives a modification (change) of the estimation result from the user by presenting the estimation result of the parts region Rr to the user.

FIG. 19 is a diagram for explaining an example of a UI image showing estimation results according to the embodiment of the present disclosure. As shown in FIG. 19, the information processing apparatus 100 presents the user with a UI image PU_2 showing the parts region Rr. For example, the area estimation unit 145 of the information processing device 100 instructs the UI control unit 148 to generate the UI image PU_2.

As shown in FIG. 19, the UI control unit 148 generates a rendered image of the 3D model including the parts region Rr as the UI image PU_2. For example, the UI image PU_2 shown in FIG. 19 is a rendered image of the parts region Rr viewed from the front.

At this time, the UI control unit 148 generates the UI image PU_2 by superimposing information regarding the mesh of the 3D model (for example, information indicating vertices and edges). In this way, by the information processing device 100 presenting the parts region Rr including the information regarding the mesh to the user, the user can more easily confirm the parts region Rr in the 3D model.

The UI control unit 148 may highlight the parts region Rr in the UI image PU_2, for example by brightly highlighting the parts region Rr. The UI control unit 148 may display the area other than the parts area Rr in a display color different from that of the parts area Rr, such as by making the area other than the parts area Rr darker in the UI image PU_2.

Additionally, the information processing device 100 may present the parts region Rr viewed from a plurality of viewpoints to the user. 20 and 21 are diagrams for explaining other examples of UI images showing estimation results according to the embodiment of the present disclosure.

As shown in FIG. 20, the information processing device 100 generates a UI image PU_3 that is a rendered 3D model of the parts region Rr from a different viewpoint than the UI image PU_2, and presents it to the user. As shown in FIG. 21, the information processing apparatus 100 generates a UI image PU_4, which is a 3D model of the parts region Rr rendered from a different viewpoint than the UI images PU_2 and PU_3, and presents it to the user.

The information processing device 100 may present the UI images PU_2 to PU_3 side by side to the user. Alternatively, the information processing apparatus 100 may generate UI images PU_3 and PU_4 with changed viewpoints in response to instructions from the user, and present them to the user.

Note that the generation method of UI images PU_3 and PU_4 is the same as that of UI image PU_2.

The information processing device 100 accepts corrections to the parts region Rr from the user. The user modifies the parts area Rr by performing GUI operations. For example, the user modifies the parts region Rr by selecting a surface to be added to the parts region Rr by a click operation or the like. Alternatively, the user can modify the part region Rr by selecting multiple faces using a range selection tool such as a rectangle or a lasso (a freehand drawn figure).

Note that it is assumed that the UI control unit 148 is unable to generate the parts region Rr as the UI image PU_2, such as when the information processing device 100 is unable to estimate the parts region Rr at all. In this case, the UI control unit 148 presents the user with a UI image in which a 3D model of the character is rendered, for example. The UI control unit 148 may present the user with a UI image in which the major classification parts are rendered. The user instructs the information processing apparatus 100 about the correct part region Rr by selecting a surface included in the UI image using a click operation or a range selection tool.

Returning to FIG. 17, the information processing device 100 extracts metadata based on the parts region Rr specified by the user (step S108). The subsequent processing is the same as the first parts separation processing in FIG. 16.

In this way, the information processing device 100 can separate parts from the character with higher accuracy by accepting at least one of the correction of the parts position Rp and the correction of the parts region Rr by the user. I can do it.

(Third parts separation process)
In the second parts separation process described above, the information processing apparatus 100 processed all parts while interacting with the user, that is, while confirming the estimation results of the part position Rp and the part region Rr with the user. On the other hand, for example, when the estimation of the parts position Rp or the parts region Rr fails, the information processing apparatus 100 accepts at least one of the correction of the parts position Rp and the correction of the parts region Rr by the user. Good too. Note that if the information processing device 100 fails to estimate the parts position Rp or the parts position Rm, it does not estimate the parts region Rr. Therefore, when the information processing apparatus 100 fails to estimate the part position Rp or the part position Rm, it means that the information processing apparatus 100 fails to estimate the parts region Rr.

FIG. 22 is a flowchart showing an example of the flow of the third parts separation process according to the embodiment of the present disclosure. The third parts separation process shown in FIG. 22 is executed by the information processing device 100, for example. The information processing apparatus 100 executes the third parts separation process shown in FIG. 22, for example, in accordance with instructions from the user.

As shown in FIG. 22, the information processing device 100 executes a first parts separation process (step S301). At this time, the information processing apparatus 100 writes information regarding the 3D model for which estimation of the parts position Rp or parts region Rr has failed in the log file.

Next, the information processing device 100 obtains a log file from the log file DB 132 (step S302). The information processing device 100 performs the second parts separation process on the 3D model for which the estimation of the part position Rp or the estimation of the parts region Rr has failed (step S303).

In this way, the information processing device 100 performs the second parts separation process on the 3D model whose parts have failed to be separated, and by accepting corrections from the user, separates parts from the character with higher precision. be able to.

Furthermore, the information processing apparatus 100 does not accept corrections from the user for all 3D models, but only accepts corrections from the user for 3D models in which parts separation has failed. Thereby, the information processing apparatus 100 can separate parts with higher precision while suppressing an increase in the burden on the user.

<<4. Hardware configuration >>
FIG. 23 is a block diagram showing an example of the hardware configuration of the information processing device 100 according to this embodiment. Note that the information processing device 800 shown in FIG. 23 can realize the information processing device 100, for example. Information processing by the information processing apparatus 100 according to the present embodiment is realized by cooperation between software and hardware described below.

As shown in FIG. 11, the information processing device 800 includes, for example, a CPU 871, a ROM 872, a RAM 873, a host bus 874, a bridge 875, an external bus 876, and an interface 877. The information processing device 800 also includes an input device 878, an output device 879, a storage 880, a drive 881, a connection port 882, and a communication device 883. Note that the hardware configuration shown here is an example, and some of the components may be omitted. In addition, components other than those shown here may be further included.

(CPU871)
The CPU 871 functions, for example, as an arithmetic processing device or a control device, and controls the overall operation of each component or a portion thereof based on various programs recorded in the ROM 872, RAM 873, storage 880, or removable recording medium 901.

Specifically, the CPU 871 implements operational processing within the information processing device 100.

(ROM872, RAM873)
The ROM 872 is a means for storing programs read into the CPU 871, data used for calculations, and the like. The RAM 873 temporarily or permanently stores, for example, programs read into the CPU 871 and various parameters that change as appropriate when executing the programs.

(Host bus 874, bridge 875, external bus 876, interface 877)
The CPU 871, ROM 872, and RAM 873 are interconnected, for example, via a host bus 874 capable of high-speed data transmission. On the other hand, the host bus 874 is connected, for example, via a bridge 875 to an external bus 876 whose data transmission speed is relatively low. Further, the external bus 876 is connected to various components via an interface 877.

(Input device 878)
The input device 878 includes, for example, a mouse, a keyboard, a touch panel, a button, a switch, a lever, and the like. Furthermore, as the input device 878, a remote controller (hereinafter referred to as remote control) that can transmit control signals using infrared rays or other radio waves may be used. Furthermore, the input device 878 includes an audio input device such as a microphone.

(Output device 879)
The output device 879 is, for example, a display device such as a CRT (Cathode Ray Tube), LCD, or organic EL, an audio output device such as a speaker or headphone, a printer, a mobile phone, or a facsimile, etc., for transmitting the acquired information to the user. This is a device that can notify visually or audibly. Further, the output device 879 according to the present disclosure includes various vibration devices capable of outputting tactile stimulation.

(Storage 880)
Storage 880 is a device for storing various data. As the storage 880, for example, a magnetic storage device such as a hard disk drive (HDD), a semiconductor storage device, an optical storage device, a magneto-optical storage device, or the like is used.

(drive 881)
The drive 881 is a device that reads information recorded on a removable recording medium 901 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, or writes information to the removable recording medium 901, for example.

(Removable recording medium 901)
The removable recording medium 901 is, for example, DVD media, Blu-ray (registered trademark) media, HD DVD media, various semiconductor storage media, and the like. Of course, the removable recording medium 901 may be, for example, an IC card equipped with a non-contact IC chip, an electronic device, or the like.

(Connection port 882)
The connection port 882 is, for example, a port for connecting an external connection device 902 such as a USB (Universal Serial Bus) port, an IEEE1394 port, a SCSI (Small Computer System Interface), an RS-232C port, or an optical audio terminal. be.

(External connection device 902)
The external connection device 902 is, for example, a printer, a portable music player, a digital camera, a digital video camera, or an IC recorder.

(Communication device 883)
The communication device 883 is a communication device for connecting to a network, and includes, for example, a communication card for wired or wireless LAN, Wi-Fi (registered trademark), Bluetooth (registered trademark), or WUSB (Wireless USB), optical communication. A router for ADSL (Asymmetric Digital Subscriber Line), a modem for various communications, etc.

<<5. Summary >>
Although preferred embodiments of the present disclosure have been described above in detail with reference to the accompanying drawings, the present technology is not limited to such examples. It is clear that a person with ordinary knowledge in the technical field of the present disclosure can come up with various changes or modifications within the scope of the technical idea described in the claims, and It is understood that these also naturally fall within the technical scope of the present disclosure.

For example, it is also possible to create a computer program for causing hardware such as a CPU, ROM, and RAM built into the information processing device 100 described above to exhibit the functions of the information processing device 100. Furthermore, a computer-readable storage medium (recording medium) storing the computer program is also provided.

Furthermore, the effects described in this specification are merely explanatory or illustrative, and are not limiting. In other words, the technology according to the present disclosure can have other effects that are obvious to those skilled in the art from the description of this specification, in addition to or in place of the above effects.

Note that the present technology can also have the following configuration.
(1)
Obtain a 3D model of the character,
performing image recognition processing on an image seen from a virtual viewpoint of the three-dimensional model to estimate the position of the character's parts;
An information processing device comprising: a control unit that estimates a part area in the three-dimensional model of the character based on the three-dimensional model and the part position.
(2)
The information processing device according to (1), wherein the parts area includes an area corresponding to a body part of the character.
(3)
The information processing device according to (1) or (2), wherein the parts area includes at least one of an eye area, a nose area, a mouth area, and an ear area of the character.
(4)
The information processing device according to any one of (1) to (3), wherein the control unit generates a three-dimensional model of the parts area from the three-dimensional model of the character based on the parts area.
(5)
(1) to (4), wherein the control unit obtains character information regarding the character based on at least one of a result of the image recognition process, a result of estimating the part position, and a result of estimating the part area. The information processing device according to any one of the above.
(6)
The character information includes at least classification information regarding class classification in which the parts area is classified into classes, feature information regarding a feature vector of the parts area, and relative information regarding the relative relationship of the parts area to the character. The information processing device according to (5), including one.
(7)
The information processing device according to (5) or (6), wherein the control unit stores the character information in association with parts information regarding the parts area.
(8)
The information processing device according to (7), wherein the control unit presents the user with the parts information corresponding to the character information according to conditions specified by the user.
(9)
According to any one of (1) to (8), the control unit estimates the part position in the three-dimensional model based on the part position of the character in the image estimated by the image recognition process. information processing equipment.
(10)
The information according to (9), wherein the control unit estimates the part position in the three-dimensional model based on the angle of view of the virtual viewpoint corresponding to the image and the part position of the character in the image. Processing equipment.
(11)
The information processing device according to any one of (1) to (10), wherein the control unit estimates the part area in mesh data included in the three-dimensional model.
(12)
The information processing device according to any one of (1) to (11), wherein the control unit corrects the shape of the part area in the three-dimensional model according to wobbling in the outline of the part area. .
(13)
The information processing device according to any one of (1) to (12), wherein the control unit receives a change in at least one of the part position and the part area from a user.
(14)
The information processing according to (13), wherein the control unit receives from the user the change of at least one of the part position and the part area of the character whose image recognition has failed as a result of the image recognition process of the image. Device.
(15)
Obtaining a 3D model of the character on the computer,
performing image recognition processing on an image seen from a virtual viewpoint of the three-dimensional model to estimate the positions of the parts of the character;
estimating a part area in the three-dimensional model of the character based on the three-dimensional model and the part position;
A computer-readable recording medium that records a program for executing.
(16)
Obtaining a three-dimensional model of the character;
performing image recognition processing on an image seen from a virtual viewpoint of the three-dimensional model to estimate the positions of the parts of the character;
estimating a part area in the three-dimensional model of the character based on the three-dimensional model and the part position;
Information processing methods including.

100 Information processing device 110 Communication unit 120 Input/output unit 130 Storage unit 131 3D model DB
132 Log file DB
133 Parts DB
134 Metadata DB
140 Control unit 141 Model acquisition unit 142 Rendering unit 143 Image recognition unit 144 Position estimation unit 145 Area estimation unit 146 Extraction unit 147 Search processing unit 148 UI control unit

Claims

Obtain a 3D model of the character,
performing image recognition processing on an image seen from a virtual viewpoint of the three-dimensional model to estimate the position of the character's parts;
An information processing device comprising: a control unit that estimates a part area in the three-dimensional model of the character based on the three-dimensional model and the part position.
The information processing device according to claim 1, wherein the parts area includes an area corresponding to a body part of the character.
The information processing device according to claim 1, wherein the parts area includes at least one of an eye area, a nose area, a mouth area, and an ear area of the character.
The information processing device according to claim 1, wherein the control unit generates a three-dimensional model of the parts area from the three-dimensional model of the character based on the parts area.
The information according to claim 1, wherein the control unit acquires character information regarding the character based on at least one of a result of the image recognition process, a result of estimating the part position, and a result of estimating the part area. Processing equipment.
The character information includes at least classification information regarding class classification in which the parts area is classified into classes, feature information regarding a feature vector of the parts area, and relative information regarding the relative relationship of the parts area to the character. The information processing device according to claim 5, comprising one.
The information processing device according to claim 5, wherein the control unit stores the character information in association with parts information regarding the parts area.
The information processing device according to claim 7, wherein the control unit presents the part information corresponding to the character information to the user according to a condition instructed by the user.
The information processing device according to claim 1, wherein the control unit estimates the part position in the three-dimensional model based on the part position of the character in the image estimated by the image recognition process.
The information according to claim 9, wherein the control unit estimates the part position in the three-dimensional model based on the angle of view of the virtual viewpoint corresponding to the image and the part position of the character in the image. Processing equipment.
The information processing device according to claim 1, wherein the control unit estimates the part area in mesh data included in the three-dimensional model.
The information processing device according to claim 1, wherein the control unit corrects the shape of the part area in the three-dimensional model according to wobbling in the outline of the part area.
The information processing device according to claim 1, wherein the control unit receives a change in at least one of the part position and the part area from a user.
The information processing device according to claim 13, wherein the control unit receives from the user the change of at least one of the part position and the part area of the character whose part area has failed to be estimated.
Obtaining a 3D model of the character on the computer,
performing image recognition processing on an image seen from a virtual viewpoint of the three-dimensional model to estimate the positions of the parts of the character;
estimating a part area in the three-dimensional model of the character based on the three-dimensional model and the part position;
A computer-readable recording medium that records a program for executing.
Obtaining a three-dimensional model of the character;
performing image recognition processing on an image seen from a virtual viewpoint of the three-dimensional model to estimate the positions of the parts of the character;
estimating a part area in the three-dimensional model of the character based on the three-dimensional model and the part position;
Information processing methods including.