WO2023037812A1

WO2023037812A1 - Online dialogue support system

Info

Publication number: WO2023037812A1
Application number: PCT/JP2022/030319
Authority: WO
Inventors: 桃子阿部; 幹生岩村; 洋平藤本; 禎篤加藤
Original assignee: 株式会社Ｎｔｔドコモ
Priority date: 2021-09-10
Filing date: 2022-08-08
Publication date: 2023-03-16
Also published as: JPWO2023037812A1

Abstract

This online dialogue support system 1, which supports online dialogue between a transmission-side user terminal 10 and a receiving-side user terminal 20, is provided with: a storage unit 30 which stores a reference face image showing the face of the transmission-side user; a receiving unit 21 which acquires a hidden face image that shows the face of the transmission-side user in a state in which a subregion of the transmission user's face has been hidden; a generation unit 22 which completes the partial region of the hidden face image and generates a complementary facial image; an authentication unit 23 which performs authentication based on the reference face image and the complementary face image; and a display control unit 24 which, if authentication was successful, displays the complementary face image on the receiver-side user terminal 20.

Description

Online dialogue support system

One aspect of the present invention relates to an online dialogue support system.

In Patent Document 1, a mask pattern of an HMD region to be replaced is created from a face moving image with a head-mounted display (HMD), and replacement is performed using a region corresponding to the mask pattern in a still face image without an HMD, A device for synthesizing moving face images without an HMD is disclosed.

JP-A-11-096366

For example, in a system for online dialogue between a plurality of users, when the mechanism described in Patent Document 1 is applied, it is possible to display a face image of a certain first user without an HMD on the terminals of other users. Therefore, the effect of promoting communication between users is expected. However, when the user wearing the HMD is not actually the first user (for example, when a user other than the first user pretends to be the first user and tries to participate in the online dialogue), the other user's terminal Spoofing is encouraged by the face image of the first user being displayed above.

Therefore, it is an object of one aspect of the present invention to provide an online dialogue support system capable of preventing spoofing of a user whose face is partially hidden during online dialogue.

An online dialogue support system according to one aspect of the present invention is an online dialogue support system that supports an online dialogue between a terminal of a transmitting user and a terminal of a receiving user, and comprises a reference face indicating the face of the transmitting user. a storage unit that stores an image; an acquisition unit that acquires a hidden face image showing a face of a sending user whose face is partially hidden; A generation unit that complements and generates a complementary face image, an authentication unit that performs authentication based on the reference face image and the complementary face image, and a display that displays the complementary face image on the terminal of the receiving user when the authentication is successful. and a control unit.

In the online dialogue support system according to one aspect of the present invention, from a hidden face image in which a partial area of the sender's face is hidden, a complementary face image is generated by interpolating the partial area. Authentication is then performed based on the reference face image, which is the original (complete) face image of the sending user, and the complementary face image. If the authentication is successful, the complementary facial image is displayed on the terminal of the receiving user. According to the online dialogue support system, the receiving user can confirm that the transmitting user is the person himself/herself by confirming that the complementary face image is displayed on the receiving user's terminal. Become. Therefore, it is possible to prevent spoofing of the user on the sending side.

According to one aspect of the present invention, it is possible to provide an online dialogue support system capable of preventing spoofing of a user whose face is partially hidden in online dialogue.

1 is a diagram showing an overview of an online dialogue support system according to one embodiment; FIG. 1 is a block diagram showing an example of a functional configuration of an online dialogue support system; FIG. It is a figure which shows typically the complementing process of a face image. 4 is a sequence diagram showing an example of the operation of the online dialogue support system according to the first embodiment; FIG. FIG. 11 is a block diagram showing an example of the functional configuration of an online dialogue support system according to the second embodiment; FIG. FIG. 11 is a sequence diagram showing an example of the operation of the online dialogue support system according to the second embodiment; FIG. 12 is a block diagram showing an example of the functional configuration of an online dialogue support system according to the third embodiment; FIG. FIG. 14 is a sequence diagram showing an example of the operation of the online dialogue support system according to the third embodiment; It is a figure which shows an example of the hardware constitutions relevant to an online dialogue support system.

An embodiment of the present invention will be described in detail below with reference to the accompanying drawings. In the description of the drawings, the same or corresponding elements are denoted by the same reference numerals, and overlapping descriptions are omitted.

FIG. 1 is a diagram showing an overview of an online dialogue support system 1 according to one embodiment. The online dialogue support system 1 is a computer system that supports online dialogue between terminals of a plurality of users. In the online dialogue support system 1, a face image representing the user's face is photographed, and the face image is transmitted and received. The face image may be actual 3D data, or may show the user's whole body. In this embodiment, the user who sends the face image of himself is called the sending user, and the user who receives the face image of the sending user is called the receiving user. However, the transmitting side user and the receiving side user are not fixedly set for each user. A sending user becomes a receiving user when he/she receives another user's face image. Also, the receiving user becomes the transmitting user when transmitting his or her own face image to another user.

The online dialogue support system 1 comprises a sender's user terminal 10 and a receiver's user terminal 20 . A transmitting user terminal 10 and a receiving user terminal 20 are connected via a communication network N so as to be able to communicate with each other. The configuration of the communication network N is not limited. For example, the communication network N may include the Internet, or may include an intranet. In the example of FIG. 1, one transmitting-side user terminal 10 and one receiving-side user terminal 20 are shown, but the numbers are not limited to this. For example, the online dialogue support system 1 may comprise multiple sender user terminals 10 and multiple receiver user terminals 20 . That is, the online dialogue support system 1 can be applied as a system for conducting online dialogues among many people.

The sending user terminal 10 is a terminal used by the sending user. The type and configuration of the transmitting user terminal 10 are not limited. The sending-side user terminal 10 may be, for example, a mobile terminal such as a high-performance mobile phone (smartphone), a tablet terminal, a wearable terminal, a laptop personal computer, or a mobile phone. Alternatively, the sender user terminal 10 may be a stationary terminal such as a desktop personal computer. The sending user terminal 10 may be a user terminal possessed by each sending user as described above, or may be a server device configured to be able to communicate with each sending user's user terminal. good. Alternatively, the sender user terminal 10 may be configured by a combination of a user terminal and a server device. That is, the sender user terminal 10 may be configured by a single computer device, or may be configured by a plurality of computer devices that can communicate with each other.

The receiving user terminal 20 is a terminal used by the receiving user. The type and configuration of the receiving user terminal 20 are the same as the type and configuration of the transmitting user terminal 10 . As noted above, a receiving user can be a sending user and a sending user can be a receiving user. Therefore, when the receiving user becomes the transmitting user, the receiving user terminal 20 functions as the transmitting user terminal 10 . Also, when the transmitting user becomes the receiving user, the transmitting user terminal 10 functions as the receiving user terminal 20 .

The sending user wears a head-mounted display D on his head. The form of the head mounted display D is not limited to a specific form. The head-mounted display D can take various forms such as a goggle type, a glasses type (glasses type), a hat type, and the like. The head mounted display D is, for example, smart glasses such as XR (eXtended Reality) glasses. In one example, the head-mounted display D is AR glasses that have the function of providing the user with augmented reality (AR). That is, the head-mounted display D is a see-through type of glass configured so that the user can visually recognize the real space (outside world) as well as the virtual space. However, the head-mounted display D is not limited to the above, and may be an MR device such as MR glasses that has the function of providing mixed reality (MR) to the user, or a virtual reality (VR) to the user. It may also be a VR device such as VR glasses that has a function of providing reality.

As an example, volumetric video (or volumetric capture) technology can be applied to the online dialogue support system 1. This technology creates 3D content that accurately reproduces the subject's appearance, shape, movement, etc. It is a technology to The online dialogue support system 1 to which this technology is applied reproduces the actions of a plurality of users in 3D in real time in the same virtual space. to each user. In order to enjoy such a user experience, the sender user and the receiver user participate in the online dialogue while wearing the head-mounted display D.

[First embodiment]
FIG. 2 is a block diagram showing an example of the functional configuration of the online dialogue support system 1A(1) according to the first embodiment. The online dialogue support system 1A includes a sender user terminal 10A (10) and a receiver user terminal 20A (20). In the first embodiment, the main functions of the online dialogue support system are performed by the receiving user terminal 20A. That is, in the first embodiment, it can be considered that the receiving user terminal 20A alone constitutes an online dialogue support system. The user terminal 10A on the transmission side has an imaging unit 11 and a transmission unit 12 .

The photographing unit 11 obtains a face image by photographing the face of the user on the transmission side. For example, the photographing unit 11 photographs a reference face image representing the face of the user on the sending side. The reference face image is the original (complete) face image of the sending user, which is taken with the sending user's face not hidden. For example, the reference face image is obtained by capturing the entire face of the transmitting user in a state where the transmitting user's face is not hidden by the head mounted display D (that is, before the transmitting user wears the head mounted display D). This is an image that has been For example, the photographing unit 11 photographs the reference face image as a still image.

In addition, the photographing unit 11 photographs a hidden face image showing the face of the transmitting user with a part of the face of the transmitting user hidden. A hidden face image is an incomplete face image of the sending user captured with a part of the sending user's face hidden by an object existing between the imaging unit 11 and the sending user's face. is. For example, the hidden face image is a face image of the sending user in a state in which a part of the face of the sending user is hidden by the head mounted display D (that is, a state after the sending user wears the head mounted display D). This is an image in which the entire face is photographed. In other words, the hidden face image is an image in which the face of the sending user and the head-mounted display D that hides a part of the face of the sending user are reflected. For example, the imaging unit 11 captures hidden face images in a moving image format.

The transmission unit 12 transmits the reference face image and the hidden face image acquired by the imaging unit 11 to the receiving user terminal 20A. For example, the transmitting unit 12 transmits the reference face image to the receiving user terminal 20A before the online dialogue is started, and transmits the hidden face image to the receiving user terminal 20A after the online dialogue is started.

The receiving-side user terminal 20A has a receiving unit 21 (acquiring unit), a generating unit 22, an authenticating unit 23, and a display control unit 24.

The reception unit 21 functions as an acquisition unit that acquires the reference face image and the hidden face image by receiving the reference face image and the hidden face image from the transmission-side user terminal 10A. The receiving unit 21 stores the reference face image in the storage unit 30, which will be described later. The reference face image of the sending user may be stored (registered) directly from the sending user terminal 10 to the storage unit 30 without going through the receiving user terminal 20A.

The generation unit 22 complements a partial area of the hidden face image to generate a complemented face image. A complementary face image is a face image showing the face of the transmitting user when a part of the hidden face image is not hidden. The generation unit 22 generates a complementary face image in a moving image format. A part of the hidden face image is hidden by, for example, the head-mounted display D worn by the sender. The generation unit 22 replaces the part of the hidden face image corresponding to the head-mounted display D with another image, thereby complementing the partial area. For example, the generation unit 22 replaces the portion corresponding to the head mounted display D in the hidden face image with an image representing the face of the transmitting side user, and replaces the portion corresponding to the head mounted display D in the hidden face image with the transmitting side without the head mounted display D. Reproduce the user's face. The face image complementing process will be described later.

The authentication unit 23 performs authentication based on the reference face image and the complementary face image. The authentication unit 23 performs face authentication using a known method. In one example, the authentication unit 23 extracts feature points, face regions, and the like from the reference face image and the complementary face image. The authentication unit 23 compares the extracted values to calculate the degree of similarity between them. The authentication unit 23 determines that the authentication is successful if the degree of similarity between the two is equal to or greater than a predetermined threshold, and determines that the authentication is unsuccessful if the degree of similarity between the two is less than the predetermined threshold.

The display control unit 24 controls the display of the face image of the sending user on the receiving user terminal 20A. In response to the authentication result by the authentication unit 23, the display control unit 24 causes the complementary face image or the hidden face image to be displayed on an output device (display device) such as a display provided in the receiving user terminal 20A. For example, when the authentication is successful, the display control unit 24 causes the receiving-side user terminal 20A to display the complementary face image. On the other hand, if the authentication fails, the display control unit 24 causes the receiving-side user terminal 20A to display the hidden face image.

The storage unit 30 stores various data used or generated in the receiving user terminal 20A. For example, the storage unit 30 stores a reference face image representing the face of the sending user. The storage unit 30 may store data such as feature points of the reference face image used for authentication by the authentication unit 23 . The storage unit 30 may store at least one of the hidden face image acquired by the reception unit 21 and the complementary face image generated by the generation unit 22 . The storage unit 30 may store the shape of the head mounted display D. FIG. The storage unit 30 may be a device separate from the receiving user terminal 20A, or may be one component of the receiving user terminal 20A.

The face image complementing process will be described with reference to FIG. FIG. 3 is a diagram schematically showing complement processing of a face image. Although an example of the reference face image is shown on the side of the receiving user terminal 20A, the reference face image is not used for complementing the face image.

The sender's user terminal 10A takes a hidden face image and sends it to the receiver's user terminal 20A. In this hidden face image, a part of the sender's face is hidden by the head-mounted display D worn by the sender. Specifically, in the hidden face image, the area around the eyes of the transmitting user is hidden by the lens, frame, bridge, etc. of the head-mounted display D. FIG.

The generation unit 22 identifies a partial area R to be complemented from the hidden face image. For example, the generating unit 22 reads out the shape of the head mounted display D stored in advance in the storage unit 30, and identifies the area corresponding to the head mounted display D in the hidden face image based on the shape, thereby partially Identify the region R of

The generation unit 22 generates a complementary face image by complementing a partial region R of the hidden face image. Complementation of some regions R may be performed by machine learning. For example, the generation unit 22 may generate a plurality of face images (so-called positive examples) of a plurality of sending-side users and face images (so-called negative examples) of a plurality of users different from the sending-side user as training data by machine learning. A model is prepared in advance that is configured to input an image showing a part of the user's face and output an estimation result of another part of the sending user's face. The multiple face images of the sending users are, for example, face images showing the entire faces of the sending users photographed from various angles. The face images of a plurality of users different from the sending user are, for example, face images showing the entire faces photographed from various angles for each of the plurality of users. By performing machine learning using teacher data including positive and negative examples as described above, a model configured as follows can be obtained. That is, when an image showing a part of the face of the authentic sender user (for example, a part including the mouth that is not hidden by the head-mounted display D) is input, another image reflecting the features of the authentic sender user is displayed. Output the estimation result of the part (for example, an image including the part hidden by the head-mounted display D), and the user different from the genuine sender user (for example, the user who is trying to impersonate the genuine sender user). A model can be obtained that is configured to output an estimation result that does not reflect the features of a genuine sending user when an image showing a portion of a face is input. The generation unit 22 inputs the region of the hidden face image excluding the partial region R to the model, and interpolates the partial region R based on the output result from the model.

In another example of complementing a part of the region R, the generating unit 22 performs machine learning using face images of a plurality of sending users and face images of a plurality of users different from the sending user as teacher data. input a first range image showing a part of the partial region R, and output an estimation result of a second range showing a part of the partial region R different from the first range Prepare the model in advance. An image of the first range showing a portion of the partial region R may be acquired by the head mounted display D. FIG. For example, the head mounted display D may be provided with a camera inside the bridge portion of the glasses (on the user side). The head mounted display D captures an image of the eyes of the transmitting user (for example, a moving image including the user's eyes as subjects) while the transmitting user is wearing the head mounted display D. The head-mounted display D may capture images of the eyes of the sending user using two cameras arranged for each eye of the sending user (for example, two cameras arranged inside each lens). Then, the generating unit 22 inputs an image (for example, an image of the eyes of the transmitting user) showing a part of the transmitting user's face in the partial region R to the model, and outputs the input image and the model. Some areas may be completed based on the results.

Through the above-described processing, the part of the hidden face image corresponding to the head-mounted display D is replaced with an image representing the face of the transmitting user. As a result, the face of the transmitting user not wearing the head-mounted display D is reproduced in the partial area R of the complementary face image.

The operation of the online dialogue support system 1A will be described with reference to FIG. FIG. 4 is a sequence diagram showing the operation of the online dialogue support system 1A as a processing flow S1. In the following, it is assumed that the storage unit 30 stores in advance a reference face image representing the face of the user on the sending side.

In step S11, the photographing unit 11 photographs a hidden face image showing the face of the transmitting user with a part of the face of the transmitting user hidden. In one example, the photographing unit 11 photographs the face of the transmitting user with a part of the face hidden by the head-mounted display D worn by the transmitting user. The photographing unit 11 photographs hidden face images in a moving image format.

In step S12, the transmission unit 12 transmits the hidden face image acquired by the imaging unit 11 to the receiving user terminal 20A. For example, the transmission unit 12 transmits the hidden face image in the moving image format to the receiving user terminal 20A in real time.

In step S13, the receiving unit 21 acquires a hidden face image by receiving the hidden face image from the transmission-side user terminal 10A. The receiving section 21 may store the hidden face image in the storage section 30 .

In step S14, the generation unit 22 complements a partial area of the hidden face image to generate a complementary face image. In one example, the generation unit 22 identifies a region corresponding to the head-mounted display D in the hidden face image based on the shape of the head-mounted display D stored in advance in the storage unit 30, so that the partial region R (see FIG. 3).

As an example of complementing processing, the generation unit 22 performs machine learning using face images of a plurality of users different from the sending user as teacher data together with the face images of the sending users. A model is prepared in advance that is configured to input an image showing and output an estimation result of the other part of the face of the sending user. Input to the model and interpolate some region R based on the output results from the model. In another example, the generation unit 22 performs machine learning using facial images of a plurality of users different from the transmitting user together with facial images of a plurality of transmitting users as teacher data. Preparing in advance a model configured to input an image of a first range showing and output an estimation result of a second range showing a part different from the first range in a part of the region R, An image showing a portion of the transmitting user's face in the partial area R is input to the model, and the partial area is interpolated based on the input image and the output result from the model.

In step S15, the authentication unit 23 performs authentication based on the reference face image and the complementary face image. The authentication unit 23 performs face authentication using a known method and determines whether the authentication is successful. The authenticating unit 23 does not have to perform authentication for frames after successful authentication among the series of frames of the hidden face image in the moving image format.

In step S16, the display control unit 24 controls display of the face image of the sending user on the receiving user terminal 20A. For example, when the authentication is successful in the process of step S15, the display control unit 24 causes the receiving-side user terminal 20A to display the complementary face image. On the other hand, if the authentication fails in the process of step S15, the display control unit 24 causes the receiving-side user terminal 20A to display the hidden face image. The display control unit 24 may display the complementary face image or the hidden face image of the sending user on the head-mounted display D worn by the receiving user.

According to the online dialogue support system 1A described above, a complementary face image in which the partial region R of the face of the sending user is interpolated is generated from the hidden face image in which the partial region R is hidden. Authentication is then performed based on the reference face image, which is the original (complete) face image of the sending user, and the complementary face image. If the authentication is successful, the complementary face image is displayed on the receiving-side user terminal 20A. According to the online dialogue support system 1A, the receiving user can confirm that the transmitting user is the person himself/herself by confirming that the complementary face image is displayed on the receiving user terminal 20A. becomes. Therefore, it is possible to prevent spoofing of the user on the sending side.

The display control unit 24 causes the receiving user terminal 20A to display a hidden face image when the authentication fails. In this case, the receiving user can detect the possibility that the transmitting user is impersonating by confirming that the hidden face image is displayed on the receiving user terminal 20A.

A part of the area R is hidden by the head-mounted display D worn by the user on the sending side. The generation unit 22 complements the partial area R by replacing the part of the hidden face image corresponding to the head-mounted display D with another image. According to such a configuration, it appears as if the transmitting user is not wearing the head mounted display D on the receiving user terminal 20A. As a result, while the transmitting user visually recognizes the display on the head-mounted display D, more natural communication can be realized between the transmitting user and the receiving user.

The generation unit 22 identifies a partial region R by identifying a region corresponding to the head-mounted display D in the hidden face image based on the shape of the head-mounted display D stored in advance. In this case, since the head-mounted display D is precisely identified, the complementing accuracy of the partial area R is also improved.

The generation unit 22 inputs an image showing a part of the face of the sending user by machine learning using facial images of a plurality of users different from the sending user as teacher data together with the face images of the sending user. A model configured to output estimation results of other parts of the face of the transmitting user is prepared in advance, and an area of the hidden face image excluding a part of the area R is input to the model. A part of the region R is interpolated based on the output result from . According to such a configuration, part of the face of the transmitting user is complemented more naturally. In addition, even when the user moves, since part of the user's face is complemented according to the user's movement, the complementing accuracy of the partial area R is improved.

The generation unit 22 generates a first range indicating a part of the partial region R by machine learning using face images of a plurality of users different from the sending user as teacher data together with the face images of the sending users. A model is prepared in advance that is configured to input the image of the partial region R and output the estimation result of the second range indicating the portion different from the first range in the partial region R, and the partial region R An image showing a part of the sender's face is input to the model, and the part of the area is interpolated based on the input image and the output result from the model. According to such a configuration, a combination of an image showing a part of the transmitting user's face (in this embodiment, an image of the transmitting user's eyes) in the partial area R and the output result from the model The region R of is complemented. In this case, part of the face of the transmitting user in the partial area R is complemented by the image, so that a complemented face image closer to the original face of the user can be obtained. In addition, since the line of sight, expression, etc. of the user on the sending side are reflected in the complemented face image, part of the face of the user on the sending side is complemented more naturally.

[Second embodiment]
FIG. 5 is a block diagram showing an example of the functional configuration of an online dialogue support system 1B(1) according to the second embodiment. The online dialogue support system 1B differs from the online dialogue support system 1A in that it comprises a sender user terminal 10B (10) and a receiver user terminal 20B (20) instead of the sender user terminal 10A and the receiver user terminal 20A. do. In the online dialogue support system 1B, a complementary face image is generated on the sender's user terminal 10B, and authentication is performed based on the reference face image and the complementary face image.

The transmitting-side user terminal 10B differs from the transmitting-side user terminal 10A in that it has a generating unit 22 and an authenticating unit 23, and has a transmitting unit 12B instead of the transmitting unit 12. The receiving-side user terminal 20B differs from the receiving-side user terminal 20A in that it does not have the generation unit 22 and the authentication unit 23 and has a receiving unit 21B instead of the receiving unit 21 . The transmission unit 12B transmits the hidden face image or the complementary face image to the receiving user terminal 20B according to the authentication result of the authentication unit 23. FIG. Specifically, when the authentication by the authentication unit 23 is successful, the transmission unit 12B transmits the complementary face image to the receiving user terminal 20B. On the other hand, when the authentication by the authentication unit 23 fails, the transmission unit 12B transmits the hidden face image to the receiving user terminal 20B. The receiving unit 21B receives the hidden face image or the complementary face image from the user terminal 10B on the transmission side.

In the online dialogue support system 1B, the imaging unit 11 functions as an acquisition unit that acquires hidden face images. Further, the transmitting unit 14 that transmits the hidden face image or the complementary face image to the receiving-side user terminal 20B according to the authentication result substantially causes the receiving-side user terminal 20B to display the complementary face image when the authentication is successful. , and functions as a display control unit that displays a hidden face image on the receiving-side user terminal 20B when authentication fails. Therefore, in the second embodiment, the main functions of the online dialogue support system are performed by the sender's user terminal 10B. That is, in the second embodiment, it can be considered that the transmitting user terminal 10B alone constitutes an online dialogue support system. Note that the storage unit 30 may be a device separate from the transmitting user terminal 10B, or may be one component of the transmitting user terminal 10B.

The operation of the online dialogue support system 1B will be described with reference to FIG. FIG. 6 is a sequence diagram showing the operation of the online dialogue support system 1B as a processing flow S2. In the following, it is assumed that the storage unit 30 stores in advance a reference face image representing the face of the user on the sending side.

In step S21, the photographing unit 11 photographs a hidden face image showing the face of the transmitting user with a part of the face of the transmitting user hidden. In one example, the photographing unit 11 photographs the face of the transmitting user with a part of the face hidden by the head-mounted display D worn by the transmitting user. The photographing unit 11 photographs hidden face images in a moving image format. Also, the photographing unit 11 may store the hidden face image in the storage unit 30 .

In step S22, the generation unit 22 complements a partial region of the hidden face image to generate a complementary face image. The process of step S22 is different from step S14 in FIG. 4 in that it is performed on the transmitting user terminal 10B.

In step S23, the authentication unit 23 performs authentication based on the reference face image and the complementary face image. The process of step S23 is different from step S15 in FIG. 4 in that it is performed on the transmitting user terminal 10B.

In step S24, the transmission unit 12B transmits the hidden face image or the complementary face image to the receiving user terminal 20B. The transmission unit 12B transmits the hidden face image or the complementary face image in moving image format to the receiving user terminal 20B in real time. For example, when the authentication is successful in the process of step S23, the transmission unit 12B transmits the complementary face image to the receiving user terminal 20B. On the other hand, if authentication fails in the process of step S23, the transmission unit 12B transmits the hidden face image to the receiving user terminal 20B.

In step S25, the receiving unit 21B acquires the hidden face image or the complementary face image by receiving the hidden face image or the complementary face image from the transmission-side user terminal 10B.

In step S26, the display control unit 24 controls the display of the face image of the sending user on the receiving user terminal 20B. For example, the display control unit 24 causes the receiving-side user terminal 20B to display the hidden face image or the complementary face image acquired in the process of step S25. The display control unit 24 may display the complementary face image or the hidden face image of the sending user on the head-mounted display D worn by the receiving user.

According to the online dialogue support system 1B described above, the same effects as the online dialogue support system 1A are achieved. That is, the receiving user can confirm that the transmitting user is the person himself/herself by confirming that the complementary face image is displayed on the receiving user terminal 20B. Therefore, it is possible to prevent spoofing of the user on the sending side. In addition, according to the online dialogue support system 1B, since authentication processing is executed on the side of the user terminal 10B on the sending side, the processing load on the user terminal 20B on the receiving side can be suppressed.

[Third Embodiment]
FIG. 7 is a block diagram showing an example of the functional configuration of an online dialogue support system 1C(1) according to the third embodiment. The online dialogue support system 1C includes a sending user terminal 10C (10) and a receiving user terminal 20C (20) instead of the sending user terminal 10A and the receiving user terminal 20A, and further includes a server 40. This is different from the online dialogue support system 1A. In the online dialogue support system 1C, a complementary face image is generated on the server 40, and authentication is performed based on the reference face image and the complementary face image.

The transmitting-side user terminal 10C differs from the transmitting-side user terminal 10A in that it has a transmitting section 12C instead of the transmitting section 12. Transmission unit 12C transmits the hidden face image to server 40 . The receiving user terminal 20C differs from the receiving user terminal 20A in that it does not have the generating unit 22 and the authenticating unit 23 and has a receiving unit 21C instead of the receiving unit 21 . The receiving unit 21C receives hidden face images or complementary face images from the server 40 .

The server 40 has an image reception unit 41 (acquisition unit), a generation unit 22 , an authentication unit 23 and an image transmission unit 42 . The image receiving unit 41 functions as an acquisition unit that acquires the reference face image and the hidden face image by receiving the reference face image and the hidden face image from the transmission-side user terminal 10C. The image transmission unit 42 transmits the hidden face image or the complementary face image to the receiving user terminal 20C according to the authentication result of the authentication unit 23 . Specifically, when the authentication by the authentication unit 23 is successful, the image transmission unit 42 transmits the complementary face image to the receiving user terminal 20C. On the other hand, when the authentication by the authentication unit 23 fails, the image transmission unit 42 transmits the hidden face image to the receiving user terminal 20C. That is, the image transmission unit 42 substantially causes the user terminal 20C on the receiving side to display the complementary face image when the authentication is successful, and displays the hidden face image on the user terminal 20C on the receiving side when the authentication fails. Functions as a control unit. Therefore, in the third embodiment, the main functions of the online dialogue support system are performed by the server 40. FIG. That is, in the third embodiment, it can be considered that the server 40 alone constitutes an online dialogue support system. Note that the storage unit 30 may be a device separate from the server 40 or may be one component of the server 40 .

The operation of the online dialogue support system 1C will be described with reference to FIG. FIG. 8 is a sequence diagram showing the operation of the online dialogue support system 1C as a processing flow S3. In the following, it is assumed that the storage unit 30 stores in advance a reference face image representing the face of the user on the sending side.

The process of step S31 is the same as step S11 of FIG.

In step S32, the transmission unit 12C transmits the hidden face image acquired by the imaging unit 11 to the server 40. For example, the transmission unit 12C transmits hidden face images in moving image format to the server 40 in real time.

In step S33, the image receiving unit 41 acquires a hidden face image by receiving the hidden face image from the transmission-side user terminal 10C. The image receiving section 41 may store the hidden face image in the storage section 30 .

In step S34, the generation unit 22 complements a partial area of the hidden face image to generate a complemented face image. The processing of step S34 is different from step S14 in FIG. 4 in that it is performed on the server 40. FIG.

In step S35, the authentication unit 23 performs authentication based on the reference face image and the complementary face image. The process of step S35 is different from step S15 in FIG. 4 in that it is performed on the server 40 .

In step S36, the image transmission unit 42 transmits the hidden face image or the complementary face image to the receiving user terminal 20C. The image transmission unit 42 transmits the hidden face image or the complementary face image in moving image format to the receiving user terminal 20C in real time. For example, when the authentication is successful in the process of step S35, the image transmission unit 42 transmits the complementary face image to the receiving user terminal 20C. On the other hand, if the authentication fails in the process of step S35, the image transmission unit 42 transmits the hidden face image to the receiving user terminal 20C.

In step S37, the receiving unit 21C acquires the hidden face image or the complementary face image by receiving the hidden face image or the complementary face image from the server 40.

In step S38, the display control unit 24 controls the display of the face image of the transmitting user on the receiving user terminal 20C. For example, the display control unit 24 causes the receiving-side user terminal 20C to display the hidden face image or the complementary face image acquired in the process of step S37. The display control unit 24 may display the complementary face image or the hidden face image of the sending user on the head-mounted display D worn by the receiving user.

According to the online dialogue support system 1C described above, the same effects as the online dialogue support system 1A are achieved. That is, the receiving user can confirm that the transmitting user is the person himself/herself by confirming that the complementary face image is displayed on the receiving user terminal 20C. Therefore, it is possible to prevent spoofing of the user on the sending side. Further, according to the online dialogue support system 1C, it is possible to suppress the processing load of the receiving user terminal 20C. Furthermore, when a plurality of people (that is, between terminals of a plurality of users) conduct online dialogue, time synchronization can be easily performed regardless of the equipment, performance, etc. of the transmitting user terminal 10C.

(Modification)
In the above embodiment, an example in which a partial area of the face of the user on the sending side is hidden by the head-mounted display D has been described, but it may be hidden by a mask or the like. In the above embodiment, an example in which the hidden face image is in the form of a moving image has been described. It may be an image. Hidden face images may be obtained separately in the form of a still image used for face authentication and a moving image format after face authentication. The hidden face image may be an image in which a part of the area is hidden by, for example, mosaic. Further, in the above-described embodiment, the partial area R is specified based on the shape of the head-mounted display D stored in advance. A part of the region R may be specified by receiving an input operation or the like to specify. Furthermore, in the above-described embodiment, an example of displaying the hidden face image of the sending user when authentication fails has been described. You may end the online dialogue.

It should be noted that the block diagrams used in the description of the above embodiments show blocks for each function. These functional blocks (components) are realized by any combination of at least one of hardware and software. Also, the method of implementing each functional block is not particularly limited. That is, each functional block may be implemented using one device that is physically or logically coupled, or directly or indirectly using two or more devices that are physically or logically separated (e.g. , wired, wireless, etc.) and may be implemented using these multiple devices. A functional block may be implemented by combining software in the one device or the plurality of devices.

Functions include judging, determining, determining, calculating, calculating, processing, deriving, investigating, searching, checking, receiving, transmitting, outputting, accessing, resolving, selecting, choosing, establishing, comparing, assuming, expecting, assuming, Broadcasting, notifying, communicating, forwarding, configuring, reconfiguring, allocating, mapping, assigning, etc. can't

For example, the transmitting user terminal 10, the receiving user terminal 20, and the server 40 according to an embodiment of the present disclosure may function as computers that perform the information processing method of the present disclosure. FIG. 9 is a diagram showing an example of a hardware configuration common to the transmitting user terminal 10, the receiving user terminal 20, and the server 40 according to an embodiment of the present disclosure. Each of the sender user terminal 10, the receiver user terminal 20, and the server 40 is physically a computer including a processor 1001, a memory 1002, a storage 1003, a communication device 1004, an input device 1005, an output device 1006, a bus 1007, and the like. It may be configured as a device.

In the following explanation, the term "apparatus" can be read as a circuit, device, unit, or the like. The hardware configuration of the sender user terminal 10, the receiver user terminal 20, and the server 40 may be configured to include one or more of the devices shown in FIG. may be configured.

Each function of the sending user terminal 10, the receiving user terminal 20, and the server 40 is implemented by causing the processor 1001 to perform calculations and communication by loading predetermined software (programs) onto hardware such as the processor 1001 and memory 1002. It is realized by controlling communication by the device 1004 and controlling at least one of data reading and writing in the memory 1002 and the storage 1003 .

The processor 1001, for example, operates an operating system and controls the entire computer. The processor 1001 may be configured by a central processing unit (CPU) including an interface with peripheral devices, a control device, an arithmetic device, registers, and the like.

Also, the processor 1001 reads programs (program codes), software modules, data, etc. from at least one of the storage 1003 and the communication device 1004 to the memory 1002, and executes various processes according to them. As the program, a program that causes a computer to execute at least part of the operations described in the above embodiments is used. For example, each functional unit (eg, generating unit 22, etc.) of the transmitting user terminal 10, the receiving user terminal 20, and the server 40 may be stored in the memory 1002 and implemented by a control program that operates on the processor 1001. Other functional blocks may be similarly implemented. Although it has been explained that the above-described various processes are executed by one processor 1001, they may be executed simultaneously or sequentially by two or more processors 1001. FIG. Processor 1001 may be implemented by one or more chips. Note that the program may be transmitted from a network via an electric communication line.

The memory 1002 is a computer-readable recording medium, and is composed of at least one of, for example, ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), RAM (Random Access Memory), etc. may be The memory 1002 may also be called a register, cache, main memory (main storage device), or the like. The memory 1002 can store executable programs (program codes), software modules, etc. for implementing an information processing method according to an embodiment of the present disclosure.

The storage 1003 is a computer-readable recording medium, for example, an optical disc such as a CD-ROM (Compact Disc ROM), a hard disk drive, a flexible disc, a magneto-optical disc (for example, a compact disc, a digital versatile disc, a Blu-ray disk), smart card, flash memory (eg, card, stick, key drive), floppy disk, magnetic strip, and/or the like. Storage 1003 may also be called an auxiliary storage device. The storage medium described above may be, for example, a database, server, or other suitable medium including at least one of memory 1002 and storage 1003 .

The communication device 1004 is hardware (transmitting/receiving device) for communicating between computers via at least one of a wired network and a wireless network, and is also called a network device, a network controller, a network card, a communication module, or the like.

The input device 1005 is an input device (for example, keyboard, mouse, microphone, switch, button, sensor, etc.) that receives input from the outside. The output device 1006 is an output device (eg, display, speaker, LED lamp, etc.) that outputs to the outside. Note that the input device 1005 and the output device 1006 may be integrated (for example, a touch panel).

Each device such as the processor 1001 and the memory 1002 is connected by a bus 1007 for communicating information. The bus 1007 may be configured using a single bus, or may be configured using different buses between devices.

In addition, the transmitting side user terminal 10, the receiving side user terminal 20, and the server 40 include a microprocessor, a digital signal processor (DSP), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), an FPGA (Field Programmable Gate Array) may be included, and the hardware may implement part or all of each functional block. For example, processor 1001 may be implemented using at least one of these pieces of hardware.

Although the present embodiment has been described in detail above, it is obvious to those skilled in the art that the present embodiment is not limited to the embodiments described herein. This embodiment can be implemented as modifications and changes without departing from the spirit and scope of the present invention defined by the description of the claims. Therefore, the description in this specification is for the purpose of illustration and explanation, and does not have any restrictive meaning with respect to the present embodiment.

The order of the processing procedures, sequences, flowcharts, etc. of each aspect/embodiment described in the present disclosure may be changed as long as there is no contradiction. For example, the methods described in this disclosure present elements of the various steps using a sample order, and are not limited to the specific order presented.

Input/output information may be stored in a specific location (for example, memory) or managed using a management table. Input/output information and the like can be overwritten, updated, or appended. The output information and the like may be deleted. The entered information and the like may be transmitted to another device.

The determination may be made by a value represented by one bit (0 or 1), by a true/false value (Boolean: true or false), or by numerical comparison (for example, a predetermined value).

Each aspect/embodiment described in the present disclosure may be used alone, may be used in combination, or may be used by switching along with execution. In addition, the notification of predetermined information (for example, notification of “being X”) is not limited to being performed explicitly, but may be performed implicitly (for example, not notifying the predetermined information). good too.

Software, whether referred to as software, firmware, middleware, microcode, hardware description language or otherwise, includes instructions, instruction sets, code, code segments, program code, programs, subprograms, and software modules. , applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, and the like.

In addition, software, instructions, information, etc. may be transmitted and received via a transmission medium. For example, the software uses at least one of wired technology (coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), etc.) and wireless technology (infrared, microwave, etc.) to website, Wired and/or wireless technologies are included within the definition of transmission medium when sent from a server or other remote source.

The information, signals, etc. described in this disclosure may be represented using any of a variety of different technologies. For example, data, instructions, commands, information, signals, bits, symbols, chips, etc. that may be referred to throughout the above description may refer to voltages, currents, electromagnetic waves, magnetic fields or magnetic particles, light fields or photons, or any of these. may be represented by a combination of

In addition, the information, parameters, etc. described in the present disclosure may be expressed using absolute values, may be expressed using relative values from a predetermined value, or may be expressed using other corresponding information. may be represented.

The names used for the parameters described above are not restrictive names in any respect. Further, the formulas, etc., using these parameters may differ from those expressly disclosed in this disclosure. The various names assigned to these various information elements are not limiting names in any way, as the various information elements can be identified by any suitable name.

The term "based on" as used in this disclosure does not mean "based only on" unless otherwise specified. In other words, the phrase "based on" means both "based only on" and "based at least on."

Any reference to elements using the "first," "second," etc. designations used in this disclosure does not generally limit the quantity or order of those elements. These designations may be used in this disclosure as a convenient method of distinguishing between two or more elements. Thus, reference to a first and second element does not imply that only two elements can be employed or that the first element must precede the second element in any way.

Where "include," "including," and variations thereof are used in this disclosure, these terms are inclusive, as is the term "comprising." is intended. Furthermore, the term "or" as used in this disclosure is not intended to be an exclusive OR.

In this disclosure, if articles are added by translation, such as a, an, and the in English, the disclosure may include that the nouns following these articles are plural.

In the present disclosure, the term "A and B are different" may mean "A and B are different from each other." The term may also mean that "A and B are different from C". Terms such as "separate," "coupled," etc. may also be interpreted in the same manner as "different."

1, 1A, 1B, 1C... Online

dialogue support system

10, 10A, 10B, 10C... Sending side user terminal 11... Photographing

unit

12, 12B, 12C... Sending

unit

20, 20A, 20B, 20C... Receiving

side User terminal

21, 21B, 21C Reception unit 22 Generation unit 23 Authentication unit 24 Display control unit 30 Storage unit 40 Server 41 Image reception unit 42 Image transmission unit D ... head-mounted display, R ... part of the area.

Claims

An online dialogue support system for supporting online dialogue between a terminal of a sending user and a terminal of a receiving user,
a storage unit that stores a reference face image representing the face of the transmitting user;
an acquisition unit that acquires a hidden face image showing the face of the sending user with a partial area of the face of the sending user hidden;
a generation unit that complements the partial area of the hidden face image to generate a complementary face image;
an authentication unit that performs authentication based on the reference face image and the complementary face image;
a display control unit that displays the complementary face image on the terminal of the receiving user when the authentication is successful;
an online dialogue support system.
The display control unit causes the terminal of the receiving user to display the hidden face image when the authentication fails.
The online dialogue support system according to claim 1.
the partial area is hidden by a head-mounted display worn by the transmitting user;
The generation unit complements the partial region by replacing a portion of the hidden face image corresponding to the head-mounted display with another image.
3. The online dialogue support system according to claim 1 or 2.
The generating unit identifies the partial area by identifying an area corresponding to the head-mounted display in the hidden face image based on the pre-stored shape of the head-mounted display.
4. The online dialogue support system according to claim 3.
The generation unit generates an image showing a part of the face of the sending user by machine learning using facial images of a plurality of users different from the sending user as teacher data together with the face images of the sending user. is prepared in advance, and a model configured to output the estimation result of the other part of the face of the transmitting user is input, and the area of the hidden face image excluding the part of the area is the input to a model and interpolate the partial region based on the output results from the model;
The online dialogue support system according to any one of claims 1-4.
The generating unit performs machine learning using facial images of a plurality of users different from the transmitting user as training data together with the facial images of the transmitting users. preparing in advance a model that is configured to input an image of one range and output an estimation result of a second range indicating a portion of the partial region that is different from the first range; inputting an image showing a part of the face of the sending user out of the partial area into the model, and interpolating the partial area based on the input image and the output result from the model;
The online dialogue support system according to any one of claims 1-4.