WO2023139961A1

WO2023139961A1 - Information processing device

Info

Publication number: WO2023139961A1
Application number: PCT/JP2022/045381
Authority: WO
Inventors: 晃平大山
Original assignee: 株式会社Ｎｔｔドコモ
Priority date: 2022-01-21
Filing date: 2022-12-08
Publication date: 2023-07-27

Abstract

An information processing device comprises a face image generation unit that generates a plurality of face images having different styles, a face image acquisition unit that acquires a first face image selected by a user from the plurality of face images, a head generation unit that generates a three-dimensional image of the head of an avatar on the basis of the first face image, and an avatar generation unit that generates a three-dimensional image of the outer appearance of the entire avatar on the basis of the three-dimensional image of the head of the avatar and a three-dimensional image of a body which is the rest of the avatar other than the head.

Description

Information processing equipment

The present invention relates to an information processing device.

On the Internet, characters called avatars are sometimes used as users' alter ego. In recent years, by using techniques such as 3D scanning, it has become possible to use avatars, which are 3D images of users, in a 3D virtual space.

For example, Patent Document 1 discloses a technique in which a server device selects one avatar from multiple avatars. Specifically, in the technique disclosed in Patent Literature 1, the user's communication device requests the server device to display a predetermined page including the avatar specified by the identification information. When the display is requested, the server device selects which of the first avatar and the second avatar having different scales to be displayed based on the presence or absence of the display area of the first avatar and the display area of the second avatar on the predetermined page. Also, the server device generates image data of the first avatar or the second avatar according to the selection result. Furthermore, the server device transmits the generated image data of the first avatar or the second avatar to the communication device of the user.

JP 2013-029951 A

However, in the conventional technology, the server device automatically selects one of two avatars with different scales, so even if either avatar was selected, the user himself could not select the avatar with the style of painting that he liked. As a result, the user sometimes had to use an avatar with a drawing style that is not to his liking.

Therefore, an object of the present invention is to provide an information processing apparatus that enables a user to use an avatar with a style that suits the user's taste from among a plurality of avatars that have different styles of painting.

An information processing apparatus according to a preferred aspect of the present invention includes: a face image generation unit that generates a plurality of face images with different styles; a face image acquisition unit that acquires a first face image selected by a user from the plurality of face images; a head generation unit that generates a three-dimensional image showing the head of an avatar based on the first face image; and an avatar generation unit that generates a three-dimensional image showing the overall appearance of the avatar using the three-dimensional image showing the head of the avatar and the three-dimensional image showing the body of the avatar.

According to the present invention, it is possible for a user to use an avatar with a painting style that suits the user's taste from among a plurality of avatars with different painting styles.

The figure which shows the whole structure of the information processing system 1 which concerns on 1st Embodiment. 2 is a block diagram showing a configuration example of a terminal device 20; FIG. The figure which shows the production|generation flow of the three-dimensional image WP which shows the appearance of the whole avatar A1. 2 is a block diagram showing a configuration example of the server 10; FIG. 3 is a functional block diagram of an acquisition unit 111; FIG. FIG. 10 is a diagram showing an example of a plurality of face images FP1 to FP4 with different styles; 4 is a flowchart showing operations of the server 10 according to the first embodiment; FIG. 4 is a diagram showing an operation example of a face image acquisition unit 111E and a head generation unit 113A; FIG. 2 is a block diagram showing a configuration example of a server 10B; FIG. 10 is a diagram showing an example of a head/body number table HT; 11 is a flowchart showing operations of a server 10B according to the third embodiment; The figure which shows the whole structure of 1 C of information processing systems which concern on the modification 1. FIG.

1: First Embodiment Hereinafter, the configuration of an information processing system 1 including a server 10 as an information processing apparatus according to a first embodiment of the present invention will be described with reference to FIGS. 1 to 7. FIG.

1-1: Configuration of the first embodiment
1-1-1: Overall Configuration FIG. 1 is a diagram showing the overall configuration of an information processing system 1 according to the first embodiment of the present invention. The information processing system 1 displays an avatar A1 corresponding to the user U1 and an avatar A2 corresponding to the user U2 on the terminal devices 20 used by the users U1 and U2.

The information processing system 1 includes a server 10 and terminal devices 20 . The server 10 is an example of an information processing device. In the information processing system 1, the server 10 and the terminal device 20 are communicably connected to each other via a communication network NET. In the following description, when distinguishing the terminal device 20 used for each user, the suffix "-X" is used for the code. X is an arbitrary integer of 1 or more. In addition, the same applies to the constituent elements of each terminal device 20 . Note that FIG. 1 shows two terminal devices 20, ie, a terminal device 20-1 and a terminal device 20-2. However, this number is merely an example, and the information processing system 1 may include any number of terminal devices 20 . In FIG. 1, it is assumed that user U1 uses terminal device 20-1 and user U2 uses terminal device 20-2.

The server 10 provides various data and cloud services to the terminal device 20 via the communication network NET. In particular, the server 10 provides the terminal device 20 with various data for displaying the avatar A1 corresponding to the user U1 and the avatar A2 corresponding to the user U2 on the terminal device 20 . More specifically, the server 10 provides the terminal device 20-1 with various data for displaying the avatar A2 on the display 24-1 of the terminal device 20-1 used by the user U1. The server 10 also provides the terminal device 20-2 with various data for displaying the avatar A1 on the display 24-2 provided in the terminal device 20-2 used by the user U2. The terminal device 20-1 and the terminal device 20-2 are preferably portable terminal devices such as smartphones and tablets, for example.

1-1-2: Configuration of Terminal Device FIG. 2 is a block diagram showing a configuration example of the terminal device 20. As shown in FIG. The terminal device 20 includes a processing device 21 , a storage device 22 , a communication device 23 , a display 24 , an input device 25 and an imaging device 26 . Each element of the terminal device 20 is interconnected by one or more buses for communicating information.

The processing device 21 is a processor that controls the terminal device 20 as a whole. Also, the processing device 21 is configured using, for example, a single chip or a plurality of chips. The processing unit 21 is configured using, for example, a central processing unit (CPU) including interfaces with peripheral devices, arithmetic units, registers, and the like. A part or all of the functions of the processing device 21 may be realized by hardware such as DSP, ASIC, PLD, and FPGA. The processing device 21 executes various processes in parallel or sequentially.

The storage device 22 is a recording medium that can be read and written by the processing device 21 . The storage device 22 also stores a plurality of programs including the control program PR2 executed by the processing device 21 .

The communication device 23 is hardware as a transmission/reception device for communicating with other devices. The communication device 23 is also called a network device, a network controller, a network card, a communication module, or the like, for example. The communication device 23 may include a connector for wired connection and an interface circuit corresponding to the connector. Further, the communication device 23 may have a wireless communication interface. Products conforming to wired LAN, IEEE1394, and USB are examples of connectors and interface circuits for wired connection. Also, as a wireless communication interface, there are products conforming to wireless LAN, Bluetooth (registered trademark), and the like.

The display 24 is a device that displays images and character information. The display 24 displays various images under the control of the processing device 21 . For example, various display panels such as a liquid crystal display panel and an organic EL (Electro Luminescence) display panel are preferably used as the display 24 . Among other things, the display 24 displays an image showing Avatar A in this embodiment. More specifically, when terminal device 20 is terminal device 20-1 used by user U1, display 24-1 mainly displays an image showing avatar A2 corresponding to user U2. On the other hand, if terminal device 20 is terminal device 20-2 used by user U2, display 24-2 mainly displays an image showing avatar A1 corresponding to user U1.

The input device 25 accepts operations from the user U1. For example, the input device 25 includes a pointing device such as a keyboard, touch pad, touch panel, or mouse. Here, when the input device 25 includes a touch panel, the input device 25 may also serve as the display 24 .

In this embodiment, the user U1 uploads an input image IP showing the front part of the user U1's face from the terminal device 20 to the server 10 for the purpose of generating a three-dimensional avatar. The input image IP is typically a two-dimensional image generated based on the photograph of the face of the user U1. However, the input image IP is not limited to a two-dimensional image generated based on the face photograph of the user U1. At the time of uploading, the input device 25 is used by the user U1 to input the input image IP to the terminal device 20 . The input image IP may be obtained by capturing an image of the user U1 with the imaging device 26 described later, or may be obtained from an external device using the communication device 23 described above.

The imaging device 26 outputs imaging information obtained by imaging the outside world. Also, the imaging device 26 includes, for example, a lens, an imaging element, an amplifier, and an AD converter. The light condensed through the lens is converted into an image pickup signal, which is an analog signal, by the image pickup device. The amplifier amplifies the imaging signal and outputs it to the AD converter. The AD converter converts the amplified imaging signal, which is an analog signal, into imaging information, which is a digital signal. The converted imaging information is output to the processing device 21 . The imaging information output to the processing device 21 is output to the server 10 via the communication device 23 .

The processing device 21 functions as an acquisition unit 211, an output unit 212, and a display control unit 213 by reading the control program PR2 from the storage device 22 and executing it.

The acquisition unit 211 acquires image information representing the input image IP representing the front portion of the face of the user U1.

As shown in FIG. 3, which will be described later, the server 10 generates a plurality of face images FP1 to FPn having different styles based on the input image IP. Note that "n" is an integer of 2 or more.
Further, the server 10 outputs the plurality of generated face images FP1 to FPn to the terminal device 20. FIG. The user U1 uses the input device 25 to select one face image FPk from a plurality of face images FP1 to FPn displayed on the display 24 of the terminal device 20. FIG. The obtaining unit 211 obtains the selection result k of the one face image FPk. In addition, "k" is an integer of 1 or more and n or less.

The user U1 also uses the input device 25 to input a number HB representing the head size of the avatar A1 to be generated, specifically, a numerical value indicating how many heads the avatar A1 is to be generated as. The acquisition unit 211 acquires the head/body count HB.

Also, the acquisition unit 211 acquires image information indicating an image displayed on the display 24 from the server 10 by using the communication device 23 .

The output unit 212 outputs to the server 10 the image information indicating the input image IP indicating the front part of the face of the user U1, the selection result k of one face image FPk, and the head-to-body number HB obtained by the obtaining unit 211.

FIG. 3 shows the flow of generating a three-dimensional image WP showing the overall appearance of avatar A1, which is the avatar corresponding to user U1. As shown in FIG. 3, a face image generation unit 112 provided in the server 10, which will be described later, generates a plurality of face images FP1 to FPn based on the input image IP. Image information representing the plurality of face images FP1 to FPn is output from the server 10 to the terminal device 20. FIG. As described above, the plurality of face images FP1 to FPn are displayed on the display 24 of the terminal device 20. FIG. The user U1 uses the input device 25 to select one face image FPk from the plurality of face images FP1 to FPn. A selection result k by the user U1 is output to the server 10 from the output unit 212 provided in the terminal device 20 . The face image FPk selected by the user U1 is used by the server 10 to generate a three-dimensional image HP representing the head. Also, the head/body count HB is output to the server 10 from the output unit 212 provided in the terminal device 20 . In the server 10, a three-dimensional image WP showing the overall appearance of the avatar A1 is generated based on the three-dimensional image HP showing the head, the number HB of the head and body, and the three-dimensional image BP showing the body parts other than the head of the avatar A1.

Returning the description to FIG. 2, the display control unit 213 causes the acquisition unit 211 to display the image indicated by the image information acquired from the server 10 on the display 24 . Particularly in this embodiment, the image information is image information indicating an image of avatar A. FIG. That is, when the terminal device 20 is the terminal device 20-1, the display control unit 213 mainly displays the avatar A2 corresponding to the user U2 on the display 24-1. On the other hand, when the terminal device 20 is the terminal device 20-2, the display control unit 213 mainly displays the avatar A1 corresponding to the user U1 on the display 24-2.

1-1-3: Server Configuration FIG. 4 is a block diagram showing a configuration example of the server 10. As shown in FIG. The server 10 comprises a processing device 11 , a storage device 12 , a communication device 13 , a display 14 and an input device 15 . Each element of server 10 is interconnected by one or more buses for communicating information.

The processing device 11 is a processor that controls the server 10 as a whole. Also, the processing device 11 is configured using, for example, a single chip or a plurality of chips. The processing unit 11 is configured using, for example, a central processing unit (CPU) including interfaces with peripheral devices, arithmetic units, registers, and the like. A part or all of the functions of the processing device 11 may be implemented by hardware such as DSP, ASIC, PLD, and FPGA. The processing device 11 executes various processes in parallel or sequentially.

The storage device 12 is a recording medium that can be read and written by the processing device 11. The storage device 12 also stores a plurality of programs including the control program PR1 executed by the processing device 11 . The storage device 12 also stores avatar information AI. Avatar information AI includes image information indicating face images FP1 to FPn generated by face image generating section 112, which will be described later. The avatar information AI also includes information used by the later-described body part generation unit 114 when generating image information representing a three-dimensional image BP representing the body part of the avatar A1.

The communication device 13 is hardware as a transmission/reception device for communicating with other devices. The communication device 13 is also called a network device, a network controller, a network card, a communication module, or the like, for example. The communication device 13 may include a connector for wired connection and an interface circuit corresponding to the connector. Further, the communication device 13 may have a wireless communication interface. Products conforming to wired LAN, IEEE1394, and USB are examples of connectors and interface circuits for wired connection. Also, as a wireless communication interface, there are products conforming to wireless LAN, Bluetooth (registered trademark), and the like.

The display 14 is a device that displays images and character information. The display 14 displays various images under the control of the processing device 11 . For example, various display panels such as a liquid crystal display panel and an organic EL display panel are preferably used as the display 14 .

The input device 15 is a device that receives operations from the administrator of the information processing system 1 . For example, the input device 15 includes a pointing device such as a keyboard, touch pad, touch panel, or mouse. Here, when the input device 15 includes a touch panel, the input device 15 may also serve as the display 14 .

For example, the processing device 11 functions as an acquisition unit 111, a face image generation unit 112, a head generation unit 113, a body generation unit 114, an avatar generation unit 115, and an output unit 116 by reading and executing the control program PR1 from the storage device 12.

The acquisition unit 111 acquires various types of information from the terminal device 20, including image information indicating the input image IP indicating the front portion of the face of the user U1, the selection result k of one face image FPk, the number of heads and bodies HB, line-of-sight information, position information, movement information, and imaging information.

FIG. 5 is a functional block diagram of the acquisition unit 111. FIG. As shown in FIG. 5, the acquisition unit 111 includes an input image acquisition unit 111A, a face image acquisition unit 111B, and a head/body acquisition unit 111C.

The input image acquisition unit 111A acquires, from the terminal device 20-1, image information representing the input image IP representing the front portion of the face of the user U1.

The facial image acquisition unit 111B acquires the selection result k of the first facial image FPk from the terminal device 20-1. Further, the facial image acquisition unit 111B acquires the facial image FPk from the storage device 12 based on the selection result k.

The head/body acquisition unit 111C acquires the head/body number HB from the terminal device 20-1.

Returning to FIG. 4, the face image generation unit 112 generates a plurality of face images FP1 to FPn with different styles based on the input image IP representing the front part of the face of the user U1 acquired by the input image acquisition unit 111A. As described above, the input image IP is typically a two-dimensional image generated based on the facial photograph of the user U1. That is, the facial image generation unit 112 typically generates a plurality of facial images FP1 to FPn based on an image representing a facial photograph of the user U1.

FIG. 6 is an example of a plurality of face images FP1 to FP4 with different styles. The face images FP1 to FP4 shown in FIG. 6 are all generated from the same input image IP, but have different styles. The face image generation unit 112 includes a plurality of generation engines from a first generation engine that generates the face image FP1 to an n-th generation engine that generates the face image FPn. After that, the face image generation unit 112 inputs the input image IP to each engine from the first generation engine to the n-th generation engine. The face image generating unit 112 generates the face images FP1 to FPn by outputting the face images FP1 to FPn from the first generation engine to the nth generation engine.

In addition, the face image generation unit 112 stores the generated face images FP1 to FPn in the storage device 12.

The head generation unit 113 generates a three-dimensional image HP representing the head of the avatar A1 based on the first face image FPk acquired by the face image acquisition unit 111B. More specifically, the head generation unit 113 generates the image information representing the three-dimensional image HP, so that the overall style of the three-dimensional image HP representing the head of the avatar A1 becomes the same style as that represented by one face image FPk.

The body generation unit 114 uses the head/body number HB acquired by the head/body acquisition unit 111C and the avatar information AI stored in the storage device 12 to generate image information representing a three-dimensional image BP of the body of the avatar A1 other than the head. For example, the body generator 114 uses information included in the avatar information AI to once generate a temporary three-dimensional image SP as a three-dimensional image of the body of the avatar A1. After that, the body part generating unit 114 adjusts the size of the provisional three-dimensional image SP to set the size ratio between the three-dimensional image HP showing the head of the avatar A1 and the provisional three-dimensional image SP as the ratio indicated by the number of heads and bodies HB. The body generation unit 114 sets the temporary 3D image SP after adjusting the size as the 3D image BP of the body of the avatar A1, and then generates image information indicating the 3D image BP. When generating image information representing the three-dimensional image BP, the body generation unit 114 may set the overall style of the three-dimensional image BP to be the same style as that represented by the face image FPk. Alternatively, when generating the image information representing the 3D image BP, the body generation unit 114 may set the overall style of the 3D image BP to be different from the style represented by the face image FPk.

The avatar generation unit 115 uses image information representing the three-dimensional image HP representing the head of the avatar A1 and image information representing the three-dimensional image BP representing the body of the avatar A1 to generate image information representing the three-dimensional image WP representing the overall appearance of the avatar A1.

The output unit 116 uses the communication device 13 to transmit image information indicating an image displayed on the display 24 of the terminal device 20 and image information indicating the plurality of face images FP1 to FPn to the terminal device 20.

The output unit 116 also uses the communication device 13 to transmit image information representing a three-dimensional image WP representing the overall appearance of the avatar A1 generated by the avatar generation unit 115 to the terminal device 20 .

As a result, the server 10 can create avatars A1 in various styles, and create avatars A1 in styles that suit the tastes of user U1.

1-2: Operation of the First Embodiment FIG. 7 is a flow chart showing the operation of the server 10 according to the first embodiment. The operation of the server 10 will be described below with reference to FIG.

In step S1, the processing device 11 functions as an input image acquisition section 111A. The processing device 11 acquires, from the terminal device 20-1, image information representing the input image IP representing the front portion of the face of the user U1.

In step S2, the processing device 11 functions as the face image generator 112. The processing device 11 generates a plurality of face images FP1 to FPn with different styles based on the input image IP representing the front portion of the face of the user U1 acquired in step S1. Further, the processing device 11 stores the generated face images FP1 to FPn in the storage device 12. FIG.

In step S3, the processing device 11 functions as the face image acquisition section 111B. The processing device 11 acquires the selection result k of the first face image FPk from the terminal device 20-1, and acquires the face image FPk from the storage device 12 based on the selection result k.

In step S4, the processing device 11 functions as the head generation unit 113. The processing device 11 generates a three-dimensional image HP representing the head of the avatar A1 based on one face image FPk acquired in step S3.

In step S5, the processing device 11 functions as the head/body acquisition unit 111C. The processing device 11 acquires the head and body number HB from the terminal device 20-1.

In step S6, the processing device 11 functions as the body generation unit 114. The processing device 11 uses the head/body count HB acquired in step S5 and the avatar information AI stored in the storage device 12 to generate image information representing the three-dimensional image BP of the body of the avatar A1.

In step S7, the processing device 11 functions as the avatar generation unit 115. The processing device 11 uses image information representing a three-dimensional image HP representing the head of the avatar A1 and image information representing a three-dimensional image BP representing the body of the avatar A1 to generate image information representing a three-dimensional image WP representing the overall appearance of the avatar A1.

In step S8, the processing device 11 functions as the output unit 116. The processing device 11 uses the communication device 13 to transmit the image information representing the three-dimensional image WP representing the overall appearance of the avatar A1 generated in step S7 to the terminal device 20 . After that, the processing device 11 ends all the processes shown in FIG.

1-3: Effects of the First Embodiment According to the above description, the server 10 as an information processing device includes a face image generation unit 112, a face image acquisition unit 111B, a head generation unit 113, a head/body acquisition unit 111C, a body generation unit 114, and an avatar generation unit 115. The face image generator 112 generates a plurality of face images FP1 to FPn with different styles. The face image obtaining unit 111B obtains the first face image FPk selected by the user U1 from the plurality of face images FP1 to FPn. The head generation unit 113 generates a three-dimensional image HP representing the head of the avatar A1 based on one face image FPk. The head and body acquisition unit 111C acquires a head and body number HB representing how many heads and bodies the avatar A1 has. The body part generation unit 114 generates a three-dimensional image BP showing the body parts other than the head of the avatar A1 based on the head-to-body number HB. The avatar generation unit 115 generates a 3D image WP representing the overall appearance of the avatar A1 using the 3D image HP representing the head of the avatar A1 and the 3D image BP representing the body of the avatar A1.

Since the server 10 has the above configuration, the user U1 can use the avatar A1 with a style that suits the user U1's taste from among a plurality of avatars A with different styles. Further, in the present embodiment, the server 10 separates and generates a three-dimensional image HP representing the head of the avatar A1 and a three-dimensional image BP representing the body of the avatar A1. More specifically, the user U1 selects a face image FPk with a style that matches his/her taste from among a plurality of face images FP1 to FPn with different styles. The head generation unit 113 generates a three-dimensional image HP representing the head of the avatar A1 based on the selected face image FPk. Here, the characteristics of the avatar A1 are likely to appear in the head of the avatar A1. That is, the user U1 can use the overall appearance of the avatar A1 having features that match his/her preferences by a simple means of selecting one face image FPk from the plurality of face images FP1 to FPn. Also, the body part generation unit 114 does not necessarily generate the three-dimensional image BP showing the body parts of the plurality of avatars A1 in different styles. That is, the processing load on the server device 10 can be kept lower when generating a plurality of face images FP1 to FPn in different styles than when generating the overall appearance of a plurality of avatars A1 in different styles. Therefore, the server 10 according to the present embodiment can generate the avatar A1 having characteristics that match the preferences of the user U1 while reducing the processing load on the server 10 .

Also, according to the above description, the face image generation unit 112 generates a plurality of face images FP1 to FPn with different styles based on the input image IP as the image representing the face photograph of the user U1.

Since the server 10 has the above-described configuration, when the user U1 uses the avatar A1 with a drawing style that suits the user U1's preference from among a plurality of avatars A having different drawing styles, the face of the user U1 can be reflected more realistically as a three-dimensional image HP showing the head of the avatar A1. Further, since the user U1 does not need to draw the input image IP from the beginning, the user U1 can more easily use the three-dimensional image HP representing the head of the avatar A1. Furthermore, since the image representing the photograph of the face of user U1 is a two-dimensional image, the server 10 can reduce the processing load on the server 10 compared to the case of using a three-dimensional image. On the other hand, since the photograph of the face accurately represents the features of the user U1, it is possible to create an avatar A1 that resembles the user U1.

2: Second Embodiment Hereinafter, the configuration of an information processing system 1A including a server 10A as an information processing apparatus according to a second embodiment of the present invention will be described with reference to FIG. In the following description, for the purpose of simplification of the description, among the components included in the information processing system 1A according to the second embodiment, the same components as those of the information processing system 1 according to the first embodiment are denoted by the same reference numerals, and the description thereof may be omitted.

2-1: Configuration of the second embodiment
2-1-1: Overall Configuration An information processing system 1A according to the second embodiment of the present invention differs from the information processing system 1 according to the first embodiment in that a server 10A is provided instead of the server 10. FIG. Otherwise, the overall configuration of the information processing system 1A is the same as the overall configuration of the information processing system 1 according to the first embodiment shown in FIG. 1, so illustration and description thereof will be omitted.

2-1-2: Configuration of Server Unlike the server 10, the server 10A is provided with a processing device 11A instead of the processing device 11 and a storage device 12A instead of the storage device 12. FIG. Unlike the storage device 12, the storage device 12A stores a control program PR1A instead of the control program PR1. Unlike the processing device 11 , the processing device 11A includes an acquisition unit 111D instead of the acquisition unit 111 and a head generation unit 113A instead of the head generation unit 113 . Further, unlike the acquisition unit 111, the acquisition unit 111D includes a face image acquisition unit 111E instead of the face image acquisition unit 111B. Otherwise, the configuration of the server 10A is the same as the configuration of the server 10 according to the first embodiment shown in FIGS. 4 and 5, so illustration and description thereof will be omitted.

The facial image acquisition unit 111E extracts an elemental image EP representing facial elements of the user U1 from the first facial image FPk selected by the user U1.

The head generation unit 113A also generates a three-dimensional image HP representing the head of the avatar A1 by superimposing the element image EP extracted by the face image acquisition unit 111E on the outline image OP prepared in advance. In particular, the head generation unit 113A generates a three-dimensional image HP representing the head of the avatar A1 by superimposing one element image EP included in one face image FPk selected from a plurality of face images FP1 to FPn having different styles on the outline image OP.

FIG. 8 is a diagram showing an operation example of the face image acquisition unit 111E and the head generation unit 113A. As shown in FIG. 8, the facial image acquisition unit 111E extracts an element image EP representing facial elements of the user U1 from the facial image FPk of the user U1. In the example shown in FIG. 8, the facial image acquisition unit 111E extracts elemental images EP representing eyebrows, eyes, nose, and mouth as facial elements from the facial image FPk. However, the face image acquisition unit 111E may extract only some of these eyebrows, eyes, nose, and mouth as the element image EP. Alternatively, the face image acquisition unit 111E may extract additional elements such as eyelashes and moles in addition to these eyebrows, eyes, nose, and mouth as elemental images EP.

Also, as shown in FIG. 8, the head generation unit 113A generates a three-dimensional image HP representing the head of the avatar A1 by superimposing the extracted element image EP on the outline image OP prepared in advance. The outline image OP may be a two-dimensional image or a three-dimensional image. Specifically, the head generating unit 113A may generate a three-dimensional image HP representing the head of the avatar A1 by superimposing the element image EP on the external image OP as a two-dimensional image, and then three-dimensionalizing the external image OP superimposed with the element image EP. Alternatively, the head generation unit 113A may generate a three-dimensional image HP representing the head of the avatar A1 by superimposing the element image EP on the outline image OP as a three-dimensional image.

As a result, the server 10A can create avatars A1 in various styles while using outline images OP prepared in advance as templates. In addition, the server 10A can create an avatar A1 with a style that matches the taste of the user U1, using an outline image OP prepared in advance as a template.

2-2: Operation of Second Embodiment Since the operation of the server 10A according to the second embodiment is basically the same as the operation of the server 10 according to the first embodiment shown in FIG. 7, its illustration and detailed description are omitted. In the operation of the server 10A, in step S3, the processing device 11A functions as the facial image acquiring section 111E. After obtaining the first facial image FPk, the processing device 11A extracts an elemental image EP representing facial elements of the user U1 from the facial image FPk. Further, in step S4, the processing device 11A functions as the head generating section 113A. The processing device 11A superimposes the extracted elemental image EP on the outline image OP prepared in advance to generate a three-dimensional image HP representing the head of the avatar A1.

2-3: Effects of the Second Embodiment According to the above description, in the server 10A as an information processing device, the facial image acquisition unit 111E extracts the element image EP representing the facial elements of the user U1 from the selected first facial image FPk. The head generation unit 113A generates a three-dimensional image HP representing the head of the avatar A1 by superimposing the extracted element image EP on the outline image OP prepared in advance.

Since the server 10A has the above configuration, the user U1 can use an avatar A1 with a style that suits his/her taste from among a plurality of avatars A with different styles while using the outline image OP prepared in advance as a template. In particular, the server 10A commonly uses outline images OP prepared in advance as templates when generating three-dimensional images HP showing the heads of a plurality of avatars A that are different from each other. As a result, the server 10A allows the plurality of avatars A to have a sense of unity. Furthermore, the server 10A can make each of the avatars A have a style that matches the preferences of each user U while giving the plurality of avatars A a sense of unity.

3: Third Embodiment Hereinafter, the configuration of an information processing system 1B including a server 10B as an information processing apparatus according to a third embodiment of the present invention will be described with reference to FIGS. 9 to 11. FIG. In the following description, for the purpose of simplifying the description, among the components provided in the information processing system 1B according to the third embodiment, the same components as those of the information processing system 1 according to the first embodiment are denoted by the same reference numerals, and the description thereof may be omitted.

3-1: Configuration of the third embodiment
3-1-1: Overall Configuration An information processing system 1B according to the third embodiment of the present invention differs from the information processing system 1 according to the first embodiment in that a server 10B is provided instead of the server 10. FIG. Otherwise, the overall configuration of the information processing system 1B is the same as the overall configuration of the information processing system 1 according to the first embodiment shown in FIG. 1, so illustration and description thereof will be omitted.

3-1-2: Server Configuration FIG. 9 is a block diagram showing a configuration example of the server 10B. Unlike the server 10, the server 10B includes a processing device 11B instead of the processing device 11 and a storage device 12B instead of the storage device 12. FIG.

Unlike the storage device 12, the storage device 12B stores the control program PR1B instead of the control program PR1. In addition to the components stored by the storage device 12, the storage device 12B also stores a learning model LM and a head/height number table HT.

The learning model LM is a learning model for the age estimating unit 117 to estimate the age of the user U1 based on the input image IP representing the facial photograph of the user U1 acquired by the input image acquiring unit 111A.

The learning model LM is generated by learning teacher data in the learning phase. The teacher data used to generate the learning model LM has a plurality of one-to-one pairs of the feature information extracted from the input image IP showing the facial photograph of one person, which is acquired by the input image acquisition unit 111A, and the age of the person.

Also, the learning model LM is generated outside the server 10B. Especially, the learning model LM is preferably generated in a second server (not shown). In this case, the server 10B acquires the learning model LM from a second server (not shown) via the communication network NET.

The head and body count table HT is a table for defining the correspondence relationship between the age estimated by the age estimation unit 117 described later and the head and body count HB. FIG. 10 shows an example of the head/body number table HT. As an example, in the head-to-body number table HT shown in FIG. 10, when the age estimated by the age estimation unit 117 is between 0 and 3 years old, it is defined that the avatar A1 has 2 heads and bodies.

Unlike the processing device 11, the processing device 11B includes an acquisition unit 111F instead of the acquisition unit 111. Unlike the acquisition unit 111, the acquisition unit 111F includes a head/body acquisition unit 111G instead of the head/body acquisition unit 111C. Other than that, the configuration of the acquisition unit 111F is the same as the configuration of the acquisition unit 111 according to the first embodiment shown in FIG. 5, so illustration and description thereof will be omitted. In addition to the components of the processing device 11, the processing device 11B includes an age estimation unit 117 and a head/body number generation unit 118. FIG.

The age estimation unit 117 estimates the age of the user U1 based on the input image IP showing the photograph of the face of the user U1. More specifically, the age estimation unit 117 inputs the input image IP acquired by the input image acquisition unit 111A to the learning model LM. After that, the age estimation unit 117 generates the estimated age of the user U1 by outputting the estimated age from the learning model LM. Age estimation section 117 also outputs the generated estimated age to head/body number generation section 118 .

Based on the age estimated by the age estimation unit 117, the head/body number generation unit 118 generates a head/body number HB that indicates the size of the avatar A1. More specifically, the head-to-body number generation unit 118 generates the head-to-body number HB of the avatar A1 by checking the age acquired from the age estimation unit 117 against the head-to-body number table HT stored in the storage device 12B.

The head and body acquisition unit 111G acquires the head and body number HB of the avatar A1 generated by the head and body generation unit 118.

As a result, the server 10B can generate the avatar A1 with the head size HB that matches the impression given by the estimated age of the user U1.

3-2: Operation of the Third Embodiment FIG. 11 is a flow chart showing the operation of the server 10B according to the third embodiment. The operation of the server 10B will be described below with reference to FIG.

In step S11, the processing device 11B functions as an input image acquisition section 111A. The processing device 11B acquires, from the terminal device 20-1, image information representing the input image IP representing the front portion of the face of the user U1.

In step S12, the processing device 11B functions as the face image generator 112. The processing device 11B generates a plurality of face images FP1 to FPn having different styles based on the input image IP representing the front portion of the face of the user U1 acquired in step S11. Further, the processing device 11B stores the generated face images FP1 to FPn in the storage device 12B.

In step S13, the processing device 11B functions as the face image acquiring section 111B. The processing device 11B acquires the selection result k of one face image FPk from the terminal device 20 . In addition, the processing device 11B functions as a facial image acquisition section 111B. The processing device 11B acquires the face image FPk from the storage device 12B based on the selection result k.

In step S14, the processing device 11B functions as the head generation unit 113. The processing device 11B generates a three-dimensional image HP representing the head of the avatar A1 based on one face image FPk acquired in step S13.

In step S15, the processing device 11B functions as the age estimation unit 117. The processing device 11B estimates the age of the user U1 based on the input image IP acquired in step S11.

In step S16, the processing device 11B functions as the head/body generation unit 118. Accordingly, the processing device 11B generates the head-to-body count HB of the avatar A1 based on the age of the user U1 estimated in step S15.

In step S17, the processing device 11B functions as the head/body acquisition unit 111G. The processing device 11B acquires the head/body count HB generated in step S16.

In step S18, the processing device 11B functions as the body generation unit 114. The processing device 11B generates image information representing a three-dimensional image BP of the body of the avatar A1 other than the head, using the head/body count HB obtained in step S17 and the avatar information AI stored in the storage device 12B.

In step S19, the processing device 11B functions as the avatar generation unit 115. The processing device 11B uses the image information representing the three-dimensional image HP representing the head of the avatar A1 and the image information representing the three-dimensional image BP representing the body of the avatar A1 to generate the image information representing the three-dimensional image WP representing the overall appearance of the avatar A1.

In step S20, the processing device 11B functions as the output unit 116. The processing device 11B transmits image information representing a three-dimensional image WP representing the overall appearance of the avatar A1 generated in step S19 to the terminal device 20 via the communication device 13 . After that, the processing device 11B ends all the processes shown in FIG.

3-3: Effect of the Third Embodiment According to the above description, the server 10B as an information processing device includes the age estimating section 117 and the head/body number generating section 118 . The age estimation unit 117 estimates the age of the user U1 based on the image showing the photograph of the face of the user U1. The head-to-body generation unit 118 generates the head-to-body number HB of the avatar A1 based on the estimated age.

Since the server 10B has the above configuration, when the user U1 uses the avatar A1 with the style that suits the taste of the user U1 from among the plurality of avatars A having different styles, the avatar A1 with the head size HB that matches the impression given by the estimated age of the user U1 can be used. Moreover, since the head/body generation unit 118 generates the head/body number HB based on the age estimated by the age estimation unit 117, the user U1 does not need to input the head/body number HB. That is, the user U1 can use the avatar A1 that matches the impression given by the estimated age of the user U1 by a simple method.

4: Modifications The present disclosure is not limited to the embodiments illustrated above. Specific modification modes are exemplified below. Two or more aspects arbitrarily selected from the following examples may be combined.

4-1: Modification 1
In the information processing systems 1 to 1B according to the above embodiments, the servers 10 to 10B display the avatar A on the display 24 of the terminal device 20. FIG. However, servers 10 to 10B may display avatar A on XR glasses instead of display 24 .

FIG. 12 is a diagram showing the overall configuration of an information processing system 1C according to this modified example. The information processing system 1C uses XR technology to provide a virtual space to users U1 and U2 wearing the XR glasses 30 . Especially in this embodiment, the information processing system 1C causes the XR glasses 30 to display an avatar A1 corresponding to the user U1 and an avatar A2 corresponding to the user U2. Note that XR technology is a general term for VR (Virtual Reality) technology, AR (Augmented Reality) technology, and MR (Mixed Reality) technology. VR technology is technology for displaying a digital virtual space on a device such as VR glasses or an HMD (Head Mounted Display) employing VR technology. AR technology is technology that adds information indicated by digital content to the real world in an augmented reality space displayed on a device such as AR glasses or an HMD that employs AR technology. MR technology is a technology that precisely superimposes a digital virtual space on a real space using MR glasses or a device such as an HMD (Head Mounted Display) employing MR technology.

As shown in FIG. 12, the information processing system 1C includes a server 10, a terminal device 20, and XR glasses 30. In the information processing system 1C, the server 10 and the terminal device 20 are communicably connected to each other via a communication network NET. Also, the terminal device 20 and the XR glasses 30 are connected so as to be able to communicate with each other. In the following description, when distinguishing the XR glasses 30 used for each user, the suffix "-X" is used for the reference numerals. In addition, the same is true for each component of the XR glass 30 . In FIG. 12, two pairs are shown as pairs of the terminal device 20 and the XR glasses 30: the pair of the terminal device 20-1 and the XR glasses 30-1 and the pair of the terminal device 20-2 and the XR glasses 30-2. However, the number of sets is merely an example, and the information processing system 1C can include any number of sets of the terminal device 20 and the XR glasses 30 . In FIG. 12, it is assumed that user U1 uses a set of terminal device 20-1 and XR glasses 30-1, and user U2 uses a set of terminal device 20-2 and XR glasses 30-2.

The server 10 provides various data and cloud services to the terminal device 20 via the communication network NET. In particular, the server 10 provides the terminal device 20 with various data for displaying the avatar A1 corresponding to the user U1 and the avatar A2 corresponding to the user U2 on the XR glasses 30 connected to the terminal device 20 . More specifically, the server 10 provides the terminal device 20-1 with various data for displaying the avatar A2 on the display 38-1 of the XR glasses 30-1 used by the user U1. The server 10 also provides the terminal device 20-2 with various data for displaying the avatar A1 on the display 38-2 of the XR glasses 30-2 used by the user U2.

The terminal device 20-1 causes the XR glasses 30-1 worn on the head by the user U1 to display virtual objects arranged in the virtual space. Further, the terminal device 20-2 causes the XR glasses 30-2 worn on the head of the user U2 to display a virtual object arranged in the virtual space. The virtual space is, for example, a celestial space. The virtual objects are, for example, virtual objects representing data such as still images, moving images, 3DCG models, HTML files, and text files, and virtual objects representing applications. Examples of text files include memos, source codes, diaries, and recipes. Examples of applications include browsers, applications for using SNS, and applications for generating document files. The terminal device 20-1 and the terminal device 20-2 are preferably portable terminal devices such as smartphones and tablets, for example.

Especially in this embodiment, the terminal device 20-1 causes the XR glasses 30-1 to display a virtual object mainly corresponding to the avatar A2. Also, the terminal device 20-2 displays a virtual object mainly corresponding to the avatar A1 on the XR glasses 30-2.

The XR glasses 30 are display devices worn on the heads of users U1 and U2. More specifically, the XR glasses 30-1 are display devices worn on the head of the user U1. Also, the XR glasses 30-2 are a display device worn on the head of the user U2. The XR glasses 30 are, for example, a see-through wearable display. The XR glasses 30 are controlled by the terminal device 20 to display a virtual object on the display panel provided corresponding to each of the binocular lenses.

With the above configuration, the user U1 and the user U2 can observe the avatars A1 and A2 displayed on the display 38. More specifically, the user U1 wearing the XR glasses 30-1 can observe the avatar A2 displayed on the display 38-1. On the other hand, the user U2 wearing the XR glasses 30-2 can observe the avatar A1 displayed on the display 38-2.

It should be noted that in the information processing system 1C, the terminal device 20 and the XR glasses 30 are implemented separately. However, the method of realizing the terminal device 20 and the XR glasses 30 in this modified example is not limited to this. For example, the terminal device 20 and the XR glasses 30 may be implemented in a single housing by providing the XR glasses 30 with the same functions as the terminal device 20 .

Also, the information processing system 1C may include a device such as an HMD that employs any one of VR technology, AR technology, and MR technology instead of the XR glasses 30 .

4-2: Modification 2
In the information processing systems 1 to 1C according to the above embodiments, the terminal device 20-1 outputs a selection result k in which one face image FPk is selected from the plurality of face images FP1 to FPn to the servers 10 to 10B. However, instead of the selection result k, the face image FPk itself may be output from the terminal device 20-1 to the servers 10 to 10B.

4-3: Modification 3
In the information processing systems 1 to 1C according to the above embodiments, the servers 10 to 10B acquire from the terminal device 20 the input image IP representing the front portion of the face of the user U1. However, the servers 10 to 10B may acquire the input image IP from a device other than the terminal device 20. FIG.

5: Others (1) In the above-described embodiments, the storage devices 12 to 12B and the storage device 22 were ROM and RAM, etc., but flexible disks, magneto-optical disks (e.g., compact discs, digital versatile discs, Blu-ray (registered trademark) discs), smart cards, flash memory devices (e.g., cards, sticks, key drives), CD-ROMs (Compact Disc-ROM), registers, removable disks, hard disks, floppy (registered trademark) disks, magnetic strips, databases, servers, and other suitable A storage medium. Also, the program may be transmitted from a network via an electric communication line. Also, the program may be transmitted from the communication network NET via an electric communication line.

(2) In the embodiments described above, the information, signals, etc. described may be represented using any of a variety of different technologies. For example, the data, instructions, commands, information, signals, bits, symbols, chips, etc. that may be referred to throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or magnetic particles, optical fields or photons, or any combination thereof.

(3) In the above-described embodiments, input/output information and the like may be stored in a specific location (for example, memory), or may be managed using a management table. Input/output information and the like can be overwritten, updated, or appended. The output information and the like may be deleted. The entered information and the like may be transmitted to another device.

(4) In the above-described embodiments, the determination may be made by a value (0 or 1) represented using 1 bit, by a boolean value (Boolean: true or false), or by numerical comparison (for example, comparison with a predetermined value).

(5) As long as there is no contradiction, the order of the processing procedures, sequences, flowcharts, and the like exemplified in the above-described embodiments may be changed. For example, the methods described in this disclosure present elements of the various steps using a sample order, and are not limited to the specific order presented.

(6) Each function illustrated in FIGS. 1 to 12 is realized by any combination of at least one of hardware and software. Also, the method of realizing each functional block is not particularly limited. That is, each functional block may be implemented using one device that is physically or logically coupled, or two or more devices that are physically or logically separated may be directly or indirectly (e.g., wired, wireless, etc.) connected and implemented using these multiple devices. A functional block may be implemented by combining software in the one device or the plurality of devices.

(7) The programs illustrated in the above embodiments should be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or by any other name.

In addition, software, instructions, information, etc. may be transmitted and received via a transmission medium. For example, if the software is transmitted from a website, server, or other remote source using wired technologies (coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), etc.) and/or wireless technologies (infrared, microwave, etc.), then these wired and/or wireless technologies are included within the definition of transmission medium.

(8) In each of the above aspects, the terms "system" and "network" are used interchangeably.

(9) Information, parameters, etc. described in the present disclosure may be expressed using absolute values, may be expressed using relative values from a predetermined value, or may be expressed using corresponding separate information.

(10) In the above-described embodiments, the servers 10 to 10B and the terminal device 20 may be mobile stations (MS). A mobile station may also be referred to by those skilled in the art as a subscriber station, mobile unit, subscriber unit, wireless unit, remote unit, mobile device, wireless device, wireless communication device, remote device, mobile subscriber station, access terminal, mobile terminal, wireless terminal, remote terminal, handset, user agent, mobile client, client, or some other suitable term. Also, in the present disclosure, terms such as "mobile station", "user terminal", "user equipment (UE)", "terminal", etc. may be used interchangeably.

(11) In the above-described embodiments, the terms "connected", "coupled", or any variation thereof, mean any direct or indirect connection or coupling between two or more elements, including the presence of one or more intermediate elements between two elements that are "connected" or "coupled" to each other. Couplings or connections between elements may be physical couplings or connections, logical couplings or connections, or a combination thereof. For example, "connection" may be replaced with "access." As used in this disclosure, two elements are considered to be “connected” or “coupled” to each other using at least one of one or more wires, cables, and printed electrical connections, and using electromagnetic energy having wavelengths in the radio frequency, microwave, and light (both visible and invisible) regions, as some non-limiting and non-exhaustive examples.

(12) In the above-described embodiments, the phrase "based on" does not mean "based only on," unless expressly specified otherwise. In other words, the phrase "based on" means both "based only on" and "based at least on."

(13) The terms "determining" and "determining" as used in this disclosure may encompass a wide variety of actions. "Judgement", "determining" can include, for example, judging, calculating, computing, processing, deriving, investigating, looking up, searching, inquiring (e.g., searching in a table, database, or other data structure), ascertaining as "judging", "determining", etc. In addition, "determining" and "determining" include receiving (e.g., receiving information), transmitting (e.g., transmitting information), input, output, and accessing (e.g., accessing data in memory). Also, "determining" or "determining" may include resolving, selecting, choosing, establishing, comparing, etc., to be regarded as "determining" or "determining." In other words, "judgment" and "decision" may include considering that some action is "judgment" and "decision". Also, "judgment (decision)" may be replaced by "assuming", "expecting", "considering", and the like.

(14) Where “include,” “including,” and variations thereof are used in the above-described embodiments, these terms, like the term “comprising,” are intended to be inclusive. Furthermore, the term "or" as used in this disclosure is not intended to be an exclusive OR.

(15) In this disclosure, where articles have been added by translation, such as a, an, and the in English, the disclosure may include the nouns following these articles being plural.

(16) In the present disclosure, the term "A and B are different" may mean "A and B are different from each other." The term may also mean that "A and B are different from C". Terms such as "separate," "coupled," etc. may also be interpreted in the same manner as "different."

(17) Each aspect/embodiment described in the present disclosure may be used alone, may be used in combination, or may be used by switching according to execution. Further, notification of predetermined information (e.g., notification of “being X”) is not limited to explicit notification, and may be performed implicitly (e.g., not notifying the predetermined information).

Although the present disclosure has been described in detail above, it is clear to those skilled in the art that the present disclosure is not limited to the embodiments described in this disclosure. The present disclosure can be practiced with modifications and variations without departing from the spirit and scope of the present disclosure as defined by the claims. Accordingly, the description of the present disclosure is for the purpose of illustration and description and is not meant to be limiting in any way on the present disclosure.

Reference Signs List 1 to 1C information processing system 10 to 10B server 11 to 11B processing device 12 to 12B storage device 13 communication device 14 display 15 input device 20 terminal device 21 processing device 22 storage device 23 communication device 24 display 25 input device 26 imaging device 30 XR glass 32 storage device 38 display 111 acquisition unit 1 11A... input image acquisition unit 111B... face image acquisition unit 111C... head/body acquisition unit 111D... acquisition unit 111E... face image acquisition unit 111F... acquisition unit 111G... head/body acquisition unit 112... face image generation unit 113, 113A... head generation unit 114... body generation unit 115... avatar generation unit 116... output unit 117... age estimation unit 118... Head and body number generation unit 211 Acquisition unit 212 Output unit 213 Display control unit A1 to A2 Avatar PR1 to PR3B Control program U1 to U2 User

Claims

a face image generation unit that generates a plurality of face images with different styles;
a facial image acquisition unit that acquires a first facial image selected by a user from the plurality of facial images;
a head generation unit that generates a three-dimensional image showing the head of the avatar based on the first face image;
a head and body acquisition unit that acquires a head and body number indicating how many heads and bodies the avatar has;
a body part generation unit that generates a three-dimensional image showing a body part other than the head part of the avatar based on the number of head and body parts;
an avatar generation unit that generates a three-dimensional image showing the overall appearance of the avatar using the three-dimensional image showing the head of the avatar and the three-dimensional image showing the body of the avatar;
Information processing device.
The information processing apparatus according to claim 1, wherein the facial image generation unit generates a plurality of facial images having different styles based on an image representing the facial photograph of the user.
The face image acquisition unit extracts an element image representing an element of the user's face from the selected first face image,
2. The information processing apparatus according to claim 1, wherein the head generation unit generates a three-dimensional image representing the head of the avatar by superimposing the extracted element image on a previously prepared outline image.
an age estimating unit for estimating the age of the user based on an image showing the face photograph of the user;
a head-to-body generation unit that generates the head-to-body number of the avatar based on the estimated age;
The information processing apparatus according to any one of claims 1 to 3, further comprising: