WO2023037812A1 - Online dialogue support system - Google Patents

Online dialogue support system Download PDF

Info

Publication number
WO2023037812A1
WO2023037812A1 PCT/JP2022/030319 JP2022030319W WO2023037812A1 WO 2023037812 A1 WO2023037812 A1 WO 2023037812A1 JP 2022030319 W JP2022030319 W JP 2022030319W WO 2023037812 A1 WO2023037812 A1 WO 2023037812A1
Authority
WO
WIPO (PCT)
Prior art keywords
face image
user
face
hidden
image
Prior art date
Application number
PCT/JP2022/030319
Other languages
French (fr)
Japanese (ja)
Inventor
桃子 阿部
幹生 岩村
洋平 藤本
禎篤 加藤
Original Assignee
株式会社Nttドコモ
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社Nttドコモ filed Critical 株式会社Nttドコモ
Priority to JP2023546844A priority Critical patent/JPWO2023037812A1/ja
Publication of WO2023037812A1 publication Critical patent/WO2023037812A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/80Creating or modifying a manually drawn or painted image using a manual input device, e.g. mouse, light pen, direction keys on keyboard
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast

Definitions

  • One aspect of the present invention relates to an online dialogue support system.
  • Patent Document 1 a mask pattern of an HMD region to be replaced is created from a face moving image with a head-mounted display (HMD), and replacement is performed using a region corresponding to the mask pattern in a still face image without an HMD, A device for synthesizing moving face images without an HMD is disclosed.
  • HMD head-mounted display
  • an object of one aspect of the present invention to provide an online dialogue support system capable of preventing spoofing of a user whose face is partially hidden during online dialogue.
  • An online dialogue support system is an online dialogue support system that supports an online dialogue between a terminal of a transmitting user and a terminal of a receiving user, and comprises a reference face indicating the face of the transmitting user.
  • a storage unit that stores an image; an acquisition unit that acquires a hidden face image showing a face of a sending user whose face is partially hidden;
  • a generation unit that complements and generates a complementary face image, an authentication unit that performs authentication based on the reference face image and the complementary face image, and a display that displays the complementary face image on the terminal of the receiving user when the authentication is successful. and a control unit.
  • a complementary face image is generated by interpolating the partial area. Authentication is then performed based on the reference face image, which is the original (complete) face image of the sending user, and the complementary face image. If the authentication is successful, the complementary facial image is displayed on the terminal of the receiving user.
  • the receiving user can confirm that the transmitting user is the person himself/herself by confirming that the complementary face image is displayed on the receiving user's terminal. Become. Therefore, it is possible to prevent spoofing of the user on the sending side.
  • an online dialogue support system capable of preventing spoofing of a user whose face is partially hidden in online dialogue.
  • FIG. 1 is a diagram showing an overview of an online dialogue support system according to one embodiment
  • FIG. 1 is a block diagram showing an example of a functional configuration of an online dialogue support system
  • FIG. It is a figure which shows typically the complementing process of a face image.
  • 4 is a sequence diagram showing an example of the operation of the online dialogue support system according to the first embodiment
  • FIG. FIG. 11 is a block diagram showing an example of the functional configuration of an online dialogue support system according to the second embodiment
  • FIG. FIG. 11 is a sequence diagram showing an example of the operation of the online dialogue support system according to the second embodiment
  • FIG. 12 is a block diagram showing an example of the functional configuration of an online dialogue support system according to the third embodiment
  • FIG. FIG. 14 is a sequence diagram showing an example of the operation of the online dialogue support system according to the third embodiment
  • It is a figure which shows an example of the hardware constitutions relevant to an online dialogue support system.
  • FIG. 1 is a diagram showing an overview of an online dialogue support system 1 according to one embodiment.
  • the online dialogue support system 1 is a computer system that supports online dialogue between terminals of a plurality of users.
  • a face image representing the user's face is photographed, and the face image is transmitted and received.
  • the face image may be actual 3D data, or may show the user's whole body.
  • the user who sends the face image of himself is called the sending user
  • the user who receives the face image of the sending user is called the receiving user.
  • the transmitting side user and the receiving side user are not fixedly set for each user.
  • a sending user becomes a receiving user when he/she receives another user's face image.
  • the receiving user becomes the transmitting user when transmitting his or her own face image to another user.
  • the online dialogue support system 1 comprises a sender's user terminal 10 and a receiver's user terminal 20 .
  • a transmitting user terminal 10 and a receiving user terminal 20 are connected via a communication network N so as to be able to communicate with each other.
  • the configuration of the communication network N is not limited.
  • the communication network N may include the Internet, or may include an intranet.
  • one transmitting-side user terminal 10 and one receiving-side user terminal 20 are shown, but the numbers are not limited to this.
  • the online dialogue support system 1 may comprise multiple sender user terminals 10 and multiple receiver user terminals 20 . That is, the online dialogue support system 1 can be applied as a system for conducting online dialogues among many people.
  • the sending user terminal 10 is a terminal used by the sending user.
  • the type and configuration of the transmitting user terminal 10 are not limited.
  • the sending-side user terminal 10 may be, for example, a mobile terminal such as a high-performance mobile phone (smartphone), a tablet terminal, a wearable terminal, a laptop personal computer, or a mobile phone.
  • the sender user terminal 10 may be a stationary terminal such as a desktop personal computer.
  • the sending user terminal 10 may be a user terminal possessed by each sending user as described above, or may be a server device configured to be able to communicate with each sending user's user terminal. good.
  • the sender user terminal 10 may be configured by a combination of a user terminal and a server device. That is, the sender user terminal 10 may be configured by a single computer device, or may be configured by a plurality of computer devices that can communicate with each other.
  • the receiving user terminal 20 is a terminal used by the receiving user.
  • the type and configuration of the receiving user terminal 20 are the same as the type and configuration of the transmitting user terminal 10 .
  • a receiving user can be a sending user and a sending user can be a receiving user. Therefore, when the receiving user becomes the transmitting user, the receiving user terminal 20 functions as the transmitting user terminal 10 . Also, when the transmitting user becomes the receiving user, the transmitting user terminal 10 functions as the receiving user terminal 20 .
  • the sending user wears a head-mounted display D on his head.
  • the form of the head mounted display D is not limited to a specific form.
  • the head-mounted display D can take various forms such as a goggle type, a glasses type (glasses type), a hat type, and the like.
  • the head mounted display D is, for example, smart glasses such as XR (eXtended Reality) glasses.
  • the head-mounted display D is AR glasses that have the function of providing the user with augmented reality (AR). That is, the head-mounted display D is a see-through type of glass configured so that the user can visually recognize the real space (outside world) as well as the virtual space.
  • AR augmented reality
  • the head-mounted display D is not limited to the above, and may be an MR device such as MR glasses that has the function of providing mixed reality (MR) to the user, or a virtual reality (VR) to the user. It may also be a VR device such as VR glasses that has a function of providing reality.
  • MR mixed reality
  • VR virtual reality
  • volumetric video (or volumetric capture) technology can be applied to the online dialogue support system 1.
  • This technology creates 3D content that accurately reproduces the subject's appearance, shape, movement, etc. It is a technology to The online dialogue support system 1 to which this technology is applied reproduces the actions of a plurality of users in 3D in real time in the same virtual space. to each user. In order to enjoy such a user experience, the sender user and the receiver user participate in the online dialogue while wearing the head-mounted display D.
  • FIG. 2 is a block diagram showing an example of the functional configuration of the online dialogue support system 1A(1) according to the first embodiment.
  • the online dialogue support system 1A includes a sender user terminal 10A (10) and a receiver user terminal 20A (20).
  • the main functions of the online dialogue support system are performed by the receiving user terminal 20A. That is, in the first embodiment, it can be considered that the receiving user terminal 20A alone constitutes an online dialogue support system.
  • the user terminal 10A on the transmission side has an imaging unit 11 and a transmission unit 12 .
  • the photographing unit 11 obtains a face image by photographing the face of the user on the transmission side.
  • the photographing unit 11 photographs a reference face image representing the face of the user on the sending side.
  • the reference face image is the original (complete) face image of the sending user, which is taken with the sending user's face not hidden.
  • the reference face image is obtained by capturing the entire face of the transmitting user in a state where the transmitting user's face is not hidden by the head mounted display D (that is, before the transmitting user wears the head mounted display D). This is an image that has been
  • the photographing unit 11 photographs the reference face image as a still image.
  • the photographing unit 11 photographs a hidden face image showing the face of the transmitting user with a part of the face of the transmitting user hidden.
  • a hidden face image is an incomplete face image of the sending user captured with a part of the sending user's face hidden by an object existing between the imaging unit 11 and the sending user's face. is.
  • the hidden face image is a face image of the sending user in a state in which a part of the face of the sending user is hidden by the head mounted display D (that is, a state after the sending user wears the head mounted display D). This is an image in which the entire face is photographed.
  • the hidden face image is an image in which the face of the sending user and the head-mounted display D that hides a part of the face of the sending user are reflected.
  • the imaging unit 11 captures hidden face images in a moving image format.
  • the transmission unit 12 transmits the reference face image and the hidden face image acquired by the imaging unit 11 to the receiving user terminal 20A.
  • the transmitting unit 12 transmits the reference face image to the receiving user terminal 20A before the online dialogue is started, and transmits the hidden face image to the receiving user terminal 20A after the online dialogue is started.
  • the receiving-side user terminal 20A has a receiving unit 21 (acquiring unit), a generating unit 22, an authenticating unit 23, and a display control unit 24.
  • the reception unit 21 functions as an acquisition unit that acquires the reference face image and the hidden face image by receiving the reference face image and the hidden face image from the transmission-side user terminal 10A.
  • the receiving unit 21 stores the reference face image in the storage unit 30, which will be described later.
  • the reference face image of the sending user may be stored (registered) directly from the sending user terminal 10 to the storage unit 30 without going through the receiving user terminal 20A.
  • the generation unit 22 complements a partial area of the hidden face image to generate a complemented face image.
  • a complementary face image is a face image showing the face of the transmitting user when a part of the hidden face image is not hidden.
  • the generation unit 22 generates a complementary face image in a moving image format.
  • a part of the hidden face image is hidden by, for example, the head-mounted display D worn by the sender.
  • the generation unit 22 replaces the part of the hidden face image corresponding to the head-mounted display D with another image, thereby complementing the partial area.
  • the generation unit 22 replaces the portion corresponding to the head mounted display D in the hidden face image with an image representing the face of the transmitting side user, and replaces the portion corresponding to the head mounted display D in the hidden face image with the transmitting side without the head mounted display D. Reproduce the user's face.
  • the face image complementing process will be described later.
  • the authentication unit 23 performs authentication based on the reference face image and the complementary face image.
  • the authentication unit 23 performs face authentication using a known method.
  • the authentication unit 23 extracts feature points, face regions, and the like from the reference face image and the complementary face image.
  • the authentication unit 23 compares the extracted values to calculate the degree of similarity between them.
  • the authentication unit 23 determines that the authentication is successful if the degree of similarity between the two is equal to or greater than a predetermined threshold, and determines that the authentication is unsuccessful if the degree of similarity between the two is less than the predetermined threshold.
  • the display control unit 24 controls the display of the face image of the sending user on the receiving user terminal 20A.
  • the display control unit 24 causes the complementary face image or the hidden face image to be displayed on an output device (display device) such as a display provided in the receiving user terminal 20A.
  • an output device display device
  • the display control unit 24 causes the receiving-side user terminal 20A to display the complementary face image.
  • the display control unit 24 causes the receiving-side user terminal 20A to display the hidden face image.
  • the storage unit 30 stores various data used or generated in the receiving user terminal 20A.
  • the storage unit 30 stores a reference face image representing the face of the sending user.
  • the storage unit 30 may store data such as feature points of the reference face image used for authentication by the authentication unit 23 .
  • the storage unit 30 may store at least one of the hidden face image acquired by the reception unit 21 and the complementary face image generated by the generation unit 22 .
  • the storage unit 30 may store the shape of the head mounted display D.
  • the storage unit 30 may be a device separate from the receiving user terminal 20A, or may be one component of the receiving user terminal 20A.
  • FIG. 3 is a diagram schematically showing complement processing of a face image. Although an example of the reference face image is shown on the side of the receiving user terminal 20A, the reference face image is not used for complementing the face image.
  • the sender's user terminal 10A takes a hidden face image and sends it to the receiver's user terminal 20A.
  • this hidden face image a part of the sender's face is hidden by the head-mounted display D worn by the sender.
  • the hidden face image the area around the eyes of the transmitting user is hidden by the lens, frame, bridge, etc. of the head-mounted display D.
  • the generation unit 22 identifies a partial area R to be complemented from the hidden face image. For example, the generating unit 22 reads out the shape of the head mounted display D stored in advance in the storage unit 30, and identifies the area corresponding to the head mounted display D in the hidden face image based on the shape, thereby partially Identify the region R of
  • the generation unit 22 generates a complementary face image by complementing a partial region R of the hidden face image. Complementation of some regions R may be performed by machine learning.
  • the generation unit 22 may generate a plurality of face images (so-called positive examples) of a plurality of sending-side users and face images (so-called negative examples) of a plurality of users different from the sending-side user as training data by machine learning.
  • a model is prepared in advance that is configured to input an image showing a part of the user's face and output an estimation result of another part of the sending user's face.
  • the multiple face images of the sending users are, for example, face images showing the entire faces of the sending users photographed from various angles.
  • the face images of a plurality of users different from the sending user are, for example, face images showing the entire faces photographed from various angles for each of the plurality of users.
  • a model configured as follows can be obtained. That is, when an image showing a part of the face of the authentic sender user (for example, a part including the mouth that is not hidden by the head-mounted display D) is input, another image reflecting the features of the authentic sender user is displayed. Output the estimation result of the part (for example, an image including the part hidden by the head-mounted display D), and the user different from the genuine sender user (for example, the user who is trying to impersonate the genuine sender user).
  • a model can be obtained that is configured to output an estimation result that does not reflect the features of a genuine sending user when an image showing a portion of a face is input.
  • the generation unit 22 inputs the region of the hidden face image excluding the partial region R to the model, and interpolates the partial region R based on the output result from the model.
  • the generating unit 22 performs machine learning using face images of a plurality of sending users and face images of a plurality of users different from the sending user as teacher data. input a first range image showing a part of the partial region R, and output an estimation result of a second range showing a part of the partial region R different from the first range Prepare the model in advance.
  • An image of the first range showing a portion of the partial region R may be acquired by the head mounted display D.
  • the head mounted display D may be provided with a camera inside the bridge portion of the glasses (on the user side).
  • the head mounted display D captures an image of the eyes of the transmitting user (for example, a moving image including the user's eyes as subjects) while the transmitting user is wearing the head mounted display D.
  • the head-mounted display D may capture images of the eyes of the sending user using two cameras arranged for each eye of the sending user (for example, two cameras arranged inside each lens).
  • the generating unit 22 inputs an image (for example, an image of the eyes of the transmitting user) showing a part of the transmitting user's face in the partial region R to the model, and outputs the input image and the model. Some areas may be completed based on the results.
  • the part of the hidden face image corresponding to the head-mounted display D is replaced with an image representing the face of the transmitting user.
  • the face of the transmitting user not wearing the head-mounted display D is reproduced in the partial area R of the complementary face image.
  • FIG. 4 is a sequence diagram showing the operation of the online dialogue support system 1A as a processing flow S1.
  • the storage unit 30 stores in advance a reference face image representing the face of the user on the sending side.
  • step S11 the photographing unit 11 photographs a hidden face image showing the face of the transmitting user with a part of the face of the transmitting user hidden.
  • the photographing unit 11 photographs the face of the transmitting user with a part of the face hidden by the head-mounted display D worn by the transmitting user.
  • the photographing unit 11 photographs hidden face images in a moving image format.
  • step S12 the transmission unit 12 transmits the hidden face image acquired by the imaging unit 11 to the receiving user terminal 20A.
  • the transmission unit 12 transmits the hidden face image in the moving image format to the receiving user terminal 20A in real time.
  • step S13 the receiving unit 21 acquires a hidden face image by receiving the hidden face image from the transmission-side user terminal 10A.
  • the receiving section 21 may store the hidden face image in the storage section 30 .
  • step S14 the generation unit 22 complements a partial area of the hidden face image to generate a complementary face image.
  • the generation unit 22 identifies a region corresponding to the head-mounted display D in the hidden face image based on the shape of the head-mounted display D stored in advance in the storage unit 30, so that the partial region R (see FIG. 3).
  • the generation unit 22 performs machine learning using face images of a plurality of users different from the sending user as teacher data together with the face images of the sending users.
  • a model is prepared in advance that is configured to input an image showing and output an estimation result of the other part of the face of the sending user. Input to the model and interpolate some region R based on the output results from the model.
  • the generation unit 22 performs machine learning using facial images of a plurality of users different from the transmitting user together with facial images of a plurality of transmitting users as teacher data.
  • Preparing in advance a model configured to input an image of a first range showing and output an estimation result of a second range showing a part different from the first range in a part of the region R, An image showing a portion of the transmitting user's face in the partial area R is input to the model, and the partial area is interpolated based on the input image and the output result from the model.
  • step S15 the authentication unit 23 performs authentication based on the reference face image and the complementary face image.
  • the authentication unit 23 performs face authentication using a known method and determines whether the authentication is successful.
  • the authenticating unit 23 does not have to perform authentication for frames after successful authentication among the series of frames of the hidden face image in the moving image format.
  • step S16 the display control unit 24 controls display of the face image of the sending user on the receiving user terminal 20A.
  • the display control unit 24 causes the receiving-side user terminal 20A to display the complementary face image.
  • the display control unit 24 causes the receiving-side user terminal 20A to display the hidden face image.
  • the display control unit 24 may display the complementary face image or the hidden face image of the sending user on the head-mounted display D worn by the receiving user.
  • a complementary face image in which the partial region R of the face of the sending user is interpolated is generated from the hidden face image in which the partial region R is hidden. Authentication is then performed based on the reference face image, which is the original (complete) face image of the sending user, and the complementary face image. If the authentication is successful, the complementary face image is displayed on the receiving-side user terminal 20A.
  • the receiving user can confirm that the transmitting user is the person himself/herself by confirming that the complementary face image is displayed on the receiving user terminal 20A. becomes. Therefore, it is possible to prevent spoofing of the user on the sending side.
  • the display control unit 24 causes the receiving user terminal 20A to display a hidden face image when the authentication fails.
  • the receiving user can detect the possibility that the transmitting user is impersonating by confirming that the hidden face image is displayed on the receiving user terminal 20A.
  • a part of the area R is hidden by the head-mounted display D worn by the user on the sending side.
  • the generation unit 22 complements the partial area R by replacing the part of the hidden face image corresponding to the head-mounted display D with another image. According to such a configuration, it appears as if the transmitting user is not wearing the head mounted display D on the receiving user terminal 20A. As a result, while the transmitting user visually recognizes the display on the head-mounted display D, more natural communication can be realized between the transmitting user and the receiving user.
  • the generation unit 22 identifies a partial region R by identifying a region corresponding to the head-mounted display D in the hidden face image based on the shape of the head-mounted display D stored in advance. In this case, since the head-mounted display D is precisely identified, the complementing accuracy of the partial area R is also improved.
  • the generation unit 22 inputs an image showing a part of the face of the sending user by machine learning using facial images of a plurality of users different from the sending user as teacher data together with the face images of the sending user.
  • a model configured to output estimation results of other parts of the face of the transmitting user is prepared in advance, and an area of the hidden face image excluding a part of the area R is input to the model.
  • a part of the region R is interpolated based on the output result from . According to such a configuration, part of the face of the transmitting user is complemented more naturally. In addition, even when the user moves, since part of the user's face is complemented according to the user's movement, the complementing accuracy of the partial area R is improved.
  • the generation unit 22 generates a first range indicating a part of the partial region R by machine learning using face images of a plurality of users different from the sending user as teacher data together with the face images of the sending users.
  • a model is prepared in advance that is configured to input the image of the partial region R and output the estimation result of the second range indicating the portion different from the first range in the partial region R, and the partial region R
  • An image showing a part of the sender's face is input to the model, and the part of the area is interpolated based on the input image and the output result from the model.
  • the region R of is complemented.
  • part of the face of the transmitting user in the partial area R is complemented by the image, so that a complemented face image closer to the original face of the user can be obtained.
  • the line of sight, expression, etc. of the user on the sending side are reflected in the complemented face image, part of the face of the user on the sending side is complemented more naturally.
  • FIG. 5 is a block diagram showing an example of the functional configuration of an online dialogue support system 1B(1) according to the second embodiment.
  • the online dialogue support system 1B differs from the online dialogue support system 1A in that it comprises a sender user terminal 10B (10) and a receiver user terminal 20B (20) instead of the sender user terminal 10A and the receiver user terminal 20A. do.
  • a complementary face image is generated on the sender's user terminal 10B, and authentication is performed based on the reference face image and the complementary face image.
  • the transmitting-side user terminal 10B differs from the transmitting-side user terminal 10A in that it has a generating unit 22 and an authenticating unit 23, and has a transmitting unit 12B instead of the transmitting unit 12.
  • the receiving-side user terminal 20B differs from the receiving-side user terminal 20A in that it does not have the generation unit 22 and the authentication unit 23 and has a receiving unit 21B instead of the receiving unit 21 .
  • the transmission unit 12B transmits the hidden face image or the complementary face image to the receiving user terminal 20B according to the authentication result of the authentication unit 23.
  • FIG. Specifically, when the authentication by the authentication unit 23 is successful, the transmission unit 12B transmits the complementary face image to the receiving user terminal 20B. On the other hand, when the authentication by the authentication unit 23 fails, the transmission unit 12B transmits the hidden face image to the receiving user terminal 20B.
  • the receiving unit 21B receives the hidden face image or the complementary face image from the user terminal 10B on the transmission side.
  • the imaging unit 11 functions as an acquisition unit that acquires hidden face images.
  • the transmitting unit 14 that transmits the hidden face image or the complementary face image to the receiving-side user terminal 20B according to the authentication result substantially causes the receiving-side user terminal 20B to display the complementary face image when the authentication is successful.
  • the storage unit 30 may be a device separate from the transmitting user terminal 10B, or may be one component of the transmitting user terminal 10B.
  • FIG. 6 is a sequence diagram showing the operation of the online dialogue support system 1B as a processing flow S2.
  • the storage unit 30 stores in advance a reference face image representing the face of the user on the sending side.
  • the photographing unit 11 photographs a hidden face image showing the face of the transmitting user with a part of the face of the transmitting user hidden.
  • the photographing unit 11 photographs the face of the transmitting user with a part of the face hidden by the head-mounted display D worn by the transmitting user.
  • the photographing unit 11 photographs hidden face images in a moving image format.
  • the photographing unit 11 may store the hidden face image in the storage unit 30 .
  • step S22 the generation unit 22 complements a partial region of the hidden face image to generate a complementary face image.
  • the process of step S22 is different from step S14 in FIG. 4 in that it is performed on the transmitting user terminal 10B.
  • step S23 the authentication unit 23 performs authentication based on the reference face image and the complementary face image.
  • the process of step S23 is different from step S15 in FIG. 4 in that it is performed on the transmitting user terminal 10B.
  • step S24 the transmission unit 12B transmits the hidden face image or the complementary face image to the receiving user terminal 20B.
  • the transmission unit 12B transmits the hidden face image or the complementary face image in moving image format to the receiving user terminal 20B in real time. For example, when the authentication is successful in the process of step S23, the transmission unit 12B transmits the complementary face image to the receiving user terminal 20B. On the other hand, if authentication fails in the process of step S23, the transmission unit 12B transmits the hidden face image to the receiving user terminal 20B.
  • step S25 the receiving unit 21B acquires the hidden face image or the complementary face image by receiving the hidden face image or the complementary face image from the transmission-side user terminal 10B.
  • step S26 the display control unit 24 controls the display of the face image of the sending user on the receiving user terminal 20B.
  • the display control unit 24 causes the receiving-side user terminal 20B to display the hidden face image or the complementary face image acquired in the process of step S25.
  • the display control unit 24 may display the complementary face image or the hidden face image of the sending user on the head-mounted display D worn by the receiving user.
  • the same effects as the online dialogue support system 1A are achieved. That is, the receiving user can confirm that the transmitting user is the person himself/herself by confirming that the complementary face image is displayed on the receiving user terminal 20B. Therefore, it is possible to prevent spoofing of the user on the sending side.
  • the online dialogue support system 1B since authentication processing is executed on the side of the user terminal 10B on the sending side, the processing load on the user terminal 20B on the receiving side can be suppressed.
  • FIG. 7 is a block diagram showing an example of the functional configuration of an online dialogue support system 1C(1) according to the third embodiment.
  • the online dialogue support system 1C includes a sending user terminal 10C (10) and a receiving user terminal 20C (20) instead of the sending user terminal 10A and the receiving user terminal 20A, and further includes a server 40. This is different from the online dialogue support system 1A.
  • a complementary face image is generated on the server 40, and authentication is performed based on the reference face image and the complementary face image.
  • the transmitting-side user terminal 10C differs from the transmitting-side user terminal 10A in that it has a transmitting section 12C instead of the transmitting section 12.
  • Transmission unit 12C transmits the hidden face image to server 40 .
  • the receiving user terminal 20C differs from the receiving user terminal 20A in that it does not have the generating unit 22 and the authenticating unit 23 and has a receiving unit 21C instead of the receiving unit 21 .
  • the receiving unit 21C receives hidden face images or complementary face images from the server 40 .
  • the server 40 has an image reception unit 41 (acquisition unit), a generation unit 22 , an authentication unit 23 and an image transmission unit 42 .
  • the image receiving unit 41 functions as an acquisition unit that acquires the reference face image and the hidden face image by receiving the reference face image and the hidden face image from the transmission-side user terminal 10C.
  • the image transmission unit 42 transmits the hidden face image or the complementary face image to the receiving user terminal 20C according to the authentication result of the authentication unit 23 . Specifically, when the authentication by the authentication unit 23 is successful, the image transmission unit 42 transmits the complementary face image to the receiving user terminal 20C. On the other hand, when the authentication by the authentication unit 23 fails, the image transmission unit 42 transmits the hidden face image to the receiving user terminal 20C.
  • the image transmission unit 42 substantially causes the user terminal 20C on the receiving side to display the complementary face image when the authentication is successful, and displays the hidden face image on the user terminal 20C on the receiving side when the authentication fails.
  • the storage unit 30 may be a device separate from the server 40 or may be one component of the server 40 .
  • FIG. 8 is a sequence diagram showing the operation of the online dialogue support system 1C as a processing flow S3.
  • the storage unit 30 stores in advance a reference face image representing the face of the user on the sending side.
  • step S31 is the same as step S11 of FIG.
  • step S32 the transmission unit 12C transmits the hidden face image acquired by the imaging unit 11 to the server 40.
  • the transmission unit 12C transmits hidden face images in moving image format to the server 40 in real time.
  • step S33 the image receiving unit 41 acquires a hidden face image by receiving the hidden face image from the transmission-side user terminal 10C.
  • the image receiving section 41 may store the hidden face image in the storage section 30 .
  • step S34 the generation unit 22 complements a partial area of the hidden face image to generate a complemented face image.
  • the processing of step S34 is different from step S14 in FIG. 4 in that it is performed on the server 40.
  • step S35 the authentication unit 23 performs authentication based on the reference face image and the complementary face image.
  • the process of step S35 is different from step S15 in FIG. 4 in that it is performed on the server 40 .
  • step S36 the image transmission unit 42 transmits the hidden face image or the complementary face image to the receiving user terminal 20C.
  • the image transmission unit 42 transmits the hidden face image or the complementary face image in moving image format to the receiving user terminal 20C in real time. For example, when the authentication is successful in the process of step S35, the image transmission unit 42 transmits the complementary face image to the receiving user terminal 20C. On the other hand, if the authentication fails in the process of step S35, the image transmission unit 42 transmits the hidden face image to the receiving user terminal 20C.
  • step S37 the receiving unit 21C acquires the hidden face image or the complementary face image by receiving the hidden face image or the complementary face image from the server 40.
  • step S38 the display control unit 24 controls the display of the face image of the transmitting user on the receiving user terminal 20C.
  • the display control unit 24 causes the receiving-side user terminal 20C to display the hidden face image or the complementary face image acquired in the process of step S37.
  • the display control unit 24 may display the complementary face image or the hidden face image of the sending user on the head-mounted display D worn by the receiving user.
  • the same effects as the online dialogue support system 1A are achieved. That is, the receiving user can confirm that the transmitting user is the person himself/herself by confirming that the complementary face image is displayed on the receiving user terminal 20C. Therefore, it is possible to prevent spoofing of the user on the sending side. Further, according to the online dialogue support system 1C, it is possible to suppress the processing load of the receiving user terminal 20C. Furthermore, when a plurality of people (that is, between terminals of a plurality of users) conduct online dialogue, time synchronization can be easily performed regardless of the equipment, performance, etc. of the transmitting user terminal 10C.
  • the hidden face image is in the form of a moving image. It may be an image. Hidden face images may be obtained separately in the form of a still image used for face authentication and a moving image format after face authentication.
  • the hidden face image may be an image in which a part of the area is hidden by, for example, mosaic.
  • the partial area R is specified based on the shape of the head-mounted display D stored in advance. A part of the region R may be specified by receiving an input operation or the like to specify.
  • an example of displaying the hidden face image of the sending user when authentication fails has been described. You may end the online dialogue.
  • each functional block may be implemented using one device that is physically or logically coupled, or directly or indirectly using two or more devices that are physically or logically separated (e.g. , wired, wireless, etc.) and may be implemented using these multiple devices.
  • a functional block may be implemented by combining software in the one device or the plurality of devices.
  • Functions include judging, determining, determining, calculating, calculating, processing, deriving, investigating, searching, checking, receiving, transmitting, outputting, accessing, resolving, selecting, choosing, establishing, comparing, assuming, expecting, assuming, Broadcasting, notifying, communicating, forwarding, configuring, reconfiguring, allocating, mapping, assigning, etc. can't
  • the transmitting user terminal 10, the receiving user terminal 20, and the server 40 may function as computers that perform the information processing method of the present disclosure.
  • FIG. 9 is a diagram showing an example of a hardware configuration common to the transmitting user terminal 10, the receiving user terminal 20, and the server 40 according to an embodiment of the present disclosure.
  • Each of the sender user terminal 10, the receiver user terminal 20, and the server 40 is physically a computer including a processor 1001, a memory 1002, a storage 1003, a communication device 1004, an input device 1005, an output device 1006, a bus 1007, and the like. It may be configured as a device.
  • the term "apparatus" can be read as a circuit, device, unit, or the like.
  • the hardware configuration of the sender user terminal 10, the receiver user terminal 20, and the server 40 may be configured to include one or more of the devices shown in FIG. may be configured.
  • Each function of the sending user terminal 10, the receiving user terminal 20, and the server 40 is implemented by causing the processor 1001 to perform calculations and communication by loading predetermined software (programs) onto hardware such as the processor 1001 and memory 1002. It is realized by controlling communication by the device 1004 and controlling at least one of data reading and writing in the memory 1002 and the storage 1003 .
  • the processor 1001 for example, operates an operating system and controls the entire computer.
  • the processor 1001 may be configured by a central processing unit (CPU) including an interface with peripheral devices, a control device, an arithmetic device, registers, and the like.
  • CPU central processing unit
  • the processor 1001 reads programs (program codes), software modules, data, etc. from at least one of the storage 1003 and the communication device 1004 to the memory 1002, and executes various processes according to them.
  • programs program codes
  • software modules software modules
  • data etc.
  • the program a program that causes a computer to execute at least part of the operations described in the above embodiments is used.
  • each functional unit eg, generating unit 22, etc.
  • each functional unit eg, generating unit 22, etc.
  • the transmitting user terminal 10 the receiving user terminal 20, and the server 40
  • each functional unit eg, generating unit 22, etc.
  • the transmitting user terminal 10 the receiving user terminal 20
  • the server 40 may be stored in the memory 1002 and implemented by a control program that operates on the processor 1001.
  • Other functional blocks may be similarly implemented.
  • FIG. Processor 1001 may be implemented by one or more chips.
  • the program may be transmitted from a network via an electric communication line.
  • the memory 1002 is a computer-readable recording medium, and is composed of at least one of, for example, ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), RAM (Random Access Memory), etc. may be
  • ROM Read Only Memory
  • EPROM Erasable Programmable ROM
  • EEPROM Electrical Erasable Programmable ROM
  • RAM Random Access Memory
  • the memory 1002 may also be called a register, cache, main memory (main storage device), or the like.
  • the memory 1002 can store executable programs (program codes), software modules, etc. for implementing an information processing method according to an embodiment of the present disclosure.
  • the storage 1003 is a computer-readable recording medium, for example, an optical disc such as a CD-ROM (Compact Disc ROM), a hard disk drive, a flexible disc, a magneto-optical disc (for example, a compact disc, a digital versatile disc, a Blu-ray disk), smart card, flash memory (eg, card, stick, key drive), floppy disk, magnetic strip, and/or the like.
  • Storage 1003 may also be called an auxiliary storage device.
  • the storage medium described above may be, for example, a database, server, or other suitable medium including at least one of memory 1002 and storage 1003 .
  • the communication device 1004 is hardware (transmitting/receiving device) for communicating between computers via at least one of a wired network and a wireless network, and is also called a network device, a network controller, a network card, a communication module, or the like.
  • the input device 1005 is an input device (for example, keyboard, mouse, microphone, switch, button, sensor, etc.) that receives input from the outside.
  • the output device 1006 is an output device (eg, display, speaker, LED lamp, etc.) that outputs to the outside. Note that the input device 1005 and the output device 1006 may be integrated (for example, a touch panel).
  • Each device such as the processor 1001 and the memory 1002 is connected by a bus 1007 for communicating information.
  • the bus 1007 may be configured using a single bus, or may be configured using different buses between devices.
  • the transmitting side user terminal 10, the receiving side user terminal 20, and the server 40 include a microprocessor, a digital signal processor (DSP), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), an FPGA (Field Programmable Gate Array) may be included, and the hardware may implement part or all of each functional block.
  • DSP digital signal processor
  • ASIC Application Specific Integrated Circuit
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • processor 1001 may be implemented using at least one of these pieces of hardware.
  • Input/output information may be stored in a specific location (for example, memory) or managed using a management table. Input/output information and the like can be overwritten, updated, or appended. The output information and the like may be deleted. The entered information and the like may be transmitted to another device.
  • the determination may be made by a value represented by one bit (0 or 1), by a true/false value (Boolean: true or false), or by numerical comparison (for example, a predetermined value).
  • notification of predetermined information is not limited to being performed explicitly, but may be performed implicitly (for example, not notifying the predetermined information). good too.
  • Software whether referred to as software, firmware, middleware, microcode, hardware description language or otherwise, includes instructions, instruction sets, code, code segments, program code, programs, subprograms, and software modules. , applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, and the like.
  • software, instructions, information, etc. may be transmitted and received via a transmission medium.
  • the software uses at least one of wired technology (coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), etc.) and wireless technology (infrared, microwave, etc.) to website, Wired and/or wireless technologies are included within the definition of transmission medium when sent from a server or other remote source.
  • wired technology coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), etc.
  • wireless technology infrared, microwave, etc.
  • data, instructions, commands, information, signals, bits, symbols, chips, etc. may refer to voltages, currents, electromagnetic waves, magnetic fields or magnetic particles, light fields or photons, or any of these. may be represented by a combination of
  • information, parameters, etc. described in the present disclosure may be expressed using absolute values, may be expressed using relative values from a predetermined value, or may be expressed using other corresponding information. may be represented.
  • any reference to elements using the "first,” “second,” etc. designations used in this disclosure does not generally limit the quantity or order of those elements. These designations may be used in this disclosure as a convenient method of distinguishing between two or more elements. Thus, reference to a first and second element does not imply that only two elements can be employed or that the first element must precede the second element in any way.
  • a and B are different may mean “A and B are different from each other.”
  • the term may also mean that "A and B are different from C”.
  • Terms such as “separate,” “coupled,” etc. may also be interpreted in the same manner as “different.”

Abstract

This online dialogue support system 1, which supports online dialogue between a transmission-side user terminal 10 and a receiving-side user terminal 20, is provided with: a storage unit 30 which stores a reference face image showing the face of the transmission-side user; a receiving unit 21 which acquires a hidden face image that shows the face of the transmission-side user in a state in which a subregion of the transmission user's face has been hidden; a generation unit 22 which completes the partial region of the hidden face image and generates a complementary facial image; an authentication unit 23 which performs authentication based on the reference face image and the complementary face image; and a display control unit 24 which, if authentication was successful, displays the complementary face image on the receiver-side user terminal 20.

Description

オンライン対話支援システムOnline dialogue support system
 本発明の一側面は、オンライン対話支援システムに関する。 One aspect of the present invention relates to an online dialogue support system.
 特許文献1には、ヘッドマウントディスプレイ(HMD)付顔動画像から置換対象であるHMD領域のマスクパターンを作成しておき、HMDなし顔静止画像におけるマスクパターンに対応する領域を用いて置換し、HMDなしの顔動画像を合成する装置が開示されている。 In Patent Document 1, a mask pattern of an HMD region to be replaced is created from a face moving image with a head-mounted display (HMD), and replacement is performed using a region corresponding to the mask pattern in a still face image without an HMD, A device for synthesizing moving face images without an HMD is disclosed.
特開平11-096366号公報JP-A-11-096366
 例えば、複数のユーザ間でオンライン対話を行うシステムにおいて、上記特許文献1に記載された仕組みを適用した場合、ある第1ユーザのHMDなしの顔画像を他のユーザの端末上に表示させることができるため、ユーザ間のコミュニケーションを促進する効果が期待される。しかし、HMDを装着したユーザが実際には第1ユーザではない場合(例えば、第1ユーザ以外のユーザが第1ユーザになりすましてオンライン対話に参加しようとしている場合)には、他のユーザの端末上に第1ユーザの顔画像が表示されてしまうことにより、なりすましが助長されることになる。 For example, in a system for online dialogue between a plurality of users, when the mechanism described in Patent Document 1 is applied, it is possible to display a face image of a certain first user without an HMD on the terminals of other users. Therefore, the effect of promoting communication between users is expected. However, when the user wearing the HMD is not actually the first user (for example, when a user other than the first user pretends to be the first user and tries to participate in the online dialogue), the other user's terminal Spoofing is encouraged by the face image of the first user being displayed above.
 そこで、本発明の一側面は、オンライン対話において、顔の一部の領域が隠されたユーザのなりすましを防止可能なオンライン対話支援システムを提供することを目的とする。 Therefore, it is an object of one aspect of the present invention to provide an online dialogue support system capable of preventing spoofing of a user whose face is partially hidden during online dialogue.
 本発明の一側面に係るオンライン対話支援システムは、送信側ユーザの端末と受信側ユーザの端末との間のオンライン対話を支援するオンライン対話支援システムであって、送信側ユーザの顔を示す基準顔画像を記憶する記憶部と、送信側ユーザの顔の一部の領域が隠された状態の送信側ユーザの顔を示す隠れ顔画像を取得する取得部と、隠れ顔画像の一部の領域を補完し、補完顔画像を生成する生成部と、基準顔画像及び補完顔画像に基づく認証を実行する認証部と、認証が成功した場合に、受信側ユーザの端末に補完顔画像を表示させる表示制御部と、を備える。 An online dialogue support system according to one aspect of the present invention is an online dialogue support system that supports an online dialogue between a terminal of a transmitting user and a terminal of a receiving user, and comprises a reference face indicating the face of the transmitting user. a storage unit that stores an image; an acquisition unit that acquires a hidden face image showing a face of a sending user whose face is partially hidden; A generation unit that complements and generates a complementary face image, an authentication unit that performs authentication based on the reference face image and the complementary face image, and a display that displays the complementary face image on the terminal of the receiving user when the authentication is successful. and a control unit.
 本発明の一側面に係るオンライン対話支援システムにおいては、送信側ユーザの顔の一部の領域が隠された隠れ顔画像から、該一部の領域を補完した補完顔画像が生成される。そして、送信側ユーザの本来の(完全な)顔画像である基準顔画像、及び補完顔画像に基づく認証が実行される。認証が成功した場合に、受信側ユーザの端末に補完顔画像が表示される。上記オンライン対話支援システムによれば、受信側ユーザは、補完顔画像が受信側ユーザの端末に表示されていることを確認することによって、送信側ユーザが本人であることを確認することが可能となる。よって、送信側ユーザのなりすましを防止することができる。 In the online dialogue support system according to one aspect of the present invention, from a hidden face image in which a partial area of the sender's face is hidden, a complementary face image is generated by interpolating the partial area. Authentication is then performed based on the reference face image, which is the original (complete) face image of the sending user, and the complementary face image. If the authentication is successful, the complementary facial image is displayed on the terminal of the receiving user. According to the online dialogue support system, the receiving user can confirm that the transmitting user is the person himself/herself by confirming that the complementary face image is displayed on the receiving user's terminal. Become. Therefore, it is possible to prevent spoofing of the user on the sending side.
 本発明の一側面によれば、オンライン対話において、顔の一部の領域が隠されたユーザのなりすましを防止可能なオンライン対話支援システムを提供することができる。 According to one aspect of the present invention, it is possible to provide an online dialogue support system capable of preventing spoofing of a user whose face is partially hidden in online dialogue.
一実施形態に係るオンライン対話支援システムの概要を示す図である。1 is a diagram showing an overview of an online dialogue support system according to one embodiment; FIG. オンライン対話支援システムの機能構成の一例を示すブロック図である。1 is a block diagram showing an example of a functional configuration of an online dialogue support system; FIG. 顔画像の補完処理を模式的に示す図である。It is a figure which shows typically the complementing process of a face image. 第1実施形態に係るオンライン対話支援システムの動作の一例を示すシーケンス図である。4 is a sequence diagram showing an example of the operation of the online dialogue support system according to the first embodiment; FIG. 第2実施形態に係るオンライン対話支援システムの機能構成の一例を示すブロック図である。FIG. 11 is a block diagram showing an example of the functional configuration of an online dialogue support system according to the second embodiment; FIG. 第2実施形態に係るオンライン対話支援システムの動作の一例を示すシーケンス図である。FIG. 11 is a sequence diagram showing an example of the operation of the online dialogue support system according to the second embodiment; 第3実施形態に係るオンライン対話支援システムの機能構成の一例を示すブロック図である。FIG. 12 is a block diagram showing an example of the functional configuration of an online dialogue support system according to the third embodiment; FIG. 第3実施形態に係るオンライン対話支援システムの動作の一例を示すシーケンス図である。FIG. 14 is a sequence diagram showing an example of the operation of the online dialogue support system according to the third embodiment; オンライン対話支援システムに関連するハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions relevant to an online dialogue support system.
 以下、添付図面を参照して、本発明の一実施形態について詳細に説明する。なお、図面の説明において同一又は相当要素には同一符号を付し、重複する説明を省略する。 An embodiment of the present invention will be described in detail below with reference to the accompanying drawings. In the description of the drawings, the same or corresponding elements are denoted by the same reference numerals, and overlapping descriptions are omitted.
 図1は、一実施形態に係るオンライン対話支援システム1の概要を示す図である。オンライン対話支援システム1は、複数のユーザの端末間のオンライン対話を支援するコンピュータシステムである。オンライン対話支援システム1では、ユーザの顔を示す顔画像が撮影され、その顔画像が送受信される。顔画像は、実写の3Dデータであってもよく、ユーザの全身が映っていてもよい。本実施形態では、自身の顔画像を送信する側のユーザを送信側ユーザといい、送信側ユーザの顔画像を受信する側のユーザを受信側ユーザという。ただし、送信側ユーザ及び受信側ユーザは、各ユーザに対して固定的に設定されるものではない。送信側ユーザは、自身が他のユーザの顔画像を受信する場面では受信側ユーザになる。また、受信側ユーザは、自身の顔画像を他のユーザに送信する場面では送信側ユーザになる。 FIG. 1 is a diagram showing an overview of an online dialogue support system 1 according to one embodiment. The online dialogue support system 1 is a computer system that supports online dialogue between terminals of a plurality of users. In the online dialogue support system 1, a face image representing the user's face is photographed, and the face image is transmitted and received. The face image may be actual 3D data, or may show the user's whole body. In this embodiment, the user who sends the face image of himself is called the sending user, and the user who receives the face image of the sending user is called the receiving user. However, the transmitting side user and the receiving side user are not fixedly set for each user. A sending user becomes a receiving user when he/she receives another user's face image. Also, the receiving user becomes the transmitting user when transmitting his or her own face image to another user.
 オンライン対話支援システム1は、送信側ユーザ端末10及び受信側ユーザ端末20を備えている。送信側ユーザ端末10及び受信側ユーザ端末20は、通信ネットワークNを介して通信可能に接続される。通信ネットワークNの構成は限定されない。例えば、通信ネットワークNはインターネットを含んで構成されてもよいし、イントラネットを含んで構成されてもよい。図1の例では、送信側ユーザ端末10及び受信側ユーザ端末20がそれぞれ一個示されているが、個数はこれに限られない。例えば、オンライン対話支援システム1は複数の送信側ユーザ端末10及び複数の受信側ユーザ端末20を備えていてもよい。すなわち、オンライン対話支援システム1は多人数間のオンライン対話を行うシステムとして適用され得る。 The online dialogue support system 1 comprises a sender's user terminal 10 and a receiver's user terminal 20 . A transmitting user terminal 10 and a receiving user terminal 20 are connected via a communication network N so as to be able to communicate with each other. The configuration of the communication network N is not limited. For example, the communication network N may include the Internet, or may include an intranet. In the example of FIG. 1, one transmitting-side user terminal 10 and one receiving-side user terminal 20 are shown, but the numbers are not limited to this. For example, the online dialogue support system 1 may comprise multiple sender user terminals 10 and multiple receiver user terminals 20 . That is, the online dialogue support system 1 can be applied as a system for conducting online dialogues among many people.
 送信側ユーザ端末10は、送信側ユーザによって使用される端末である。送信側ユーザ端末10の種類及び構成は限定されない。送信側ユーザ端末10は、例えば、高機能携帯電話機(スマートフォン)、タブレット端末、ウェアラブル端末、ラップトップ型パーソナルコンピュータ、携帯電話機等の携帯端末でもよい。或いは、送信側ユーザ端末10は、デスクトップ型パーソナルコンピュータ等の据置型端末でもよい。また、送信側ユーザ端末10は、上述したような各送信側ユーザによって所持されるユーザ端末であってもよいし、各送信側ユーザのユーザ端末と通信可能に構成されたサーバ装置であってもよい。或いは、送信側ユーザ端末10は、ユーザ端末及びサーバ装置の組み合わせによって構成されてもよい。すなわち、送信側ユーザ端末10は、単一のコンピュータ装置によって構成されてもよいし、互いに通信可能な複数のコンピュータ装置によって構成されてもよい。 The sending user terminal 10 is a terminal used by the sending user. The type and configuration of the transmitting user terminal 10 are not limited. The sending-side user terminal 10 may be, for example, a mobile terminal such as a high-performance mobile phone (smartphone), a tablet terminal, a wearable terminal, a laptop personal computer, or a mobile phone. Alternatively, the sender user terminal 10 may be a stationary terminal such as a desktop personal computer. The sending user terminal 10 may be a user terminal possessed by each sending user as described above, or may be a server device configured to be able to communicate with each sending user's user terminal. good. Alternatively, the sender user terminal 10 may be configured by a combination of a user terminal and a server device. That is, the sender user terminal 10 may be configured by a single computer device, or may be configured by a plurality of computer devices that can communicate with each other.
 受信側ユーザ端末20は、受信側ユーザによって使用される端末である。受信側ユーザ端末20の種類及び構成は、送信側ユーザ端末10の種類及び構成と同様である。上述した通り、受信側ユーザは送信側ユーザになり得るし、送信側ユーザは受信側ユーザになり得る。従って、受信側ユーザが送信側ユーザとなる場合には、受信側ユーザ端末20は送信側ユーザ端末10として機能する。また、送信側ユーザが受信側ユーザとなる場合には、送信側ユーザ端末10は受信側ユーザ端末20として機能する。 The receiving user terminal 20 is a terminal used by the receiving user. The type and configuration of the receiving user terminal 20 are the same as the type and configuration of the transmitting user terminal 10 . As noted above, a receiving user can be a sending user and a sending user can be a receiving user. Therefore, when the receiving user becomes the transmitting user, the receiving user terminal 20 functions as the transmitting user terminal 10 . Also, when the transmitting user becomes the receiving user, the transmitting user terminal 10 functions as the receiving user terminal 20 .
 送信側ユーザは、ヘッドマウントディスプレイDを頭部に装着している。ヘッドマウントディスプレイDの形態は特定の形態に限定されない。ヘッドマウントディスプレイDは、例えば、ゴーグル型、グラス型(眼鏡型)、帽子型等の種々の形態を取り得る。ヘッドマウントディスプレイDは、例えばXR(eXtended Reality)グラス等のスマートグラスである。一例では、ヘッドマウントディスプレイDは、ユーザに拡張現実(AR:Augmented Reality)を提供する機能を有するARグラスである。すなわち、ヘッドマウントディスプレイDは、ユーザが仮想空間と共に現実空間(外界)を視認できるように構成されたシースルー型のグラスである。ただし、ヘッドマウントディスプレイDは上記に限定されず、ユーザに複合現実(MR:Mixed Reality)を提供する機能を有するMRグラス等のMRデバイスであってもよいし、ユーザに仮想現実(VR:Virtual Reality)を提供する機能を有するVRグラス等のVRデバイスであってもよい。 The sending user wears a head-mounted display D on his head. The form of the head mounted display D is not limited to a specific form. The head-mounted display D can take various forms such as a goggle type, a glasses type (glasses type), a hat type, and the like. The head mounted display D is, for example, smart glasses such as XR (eXtended Reality) glasses. In one example, the head-mounted display D is AR glasses that have the function of providing the user with augmented reality (AR). That is, the head-mounted display D is a see-through type of glass configured so that the user can visually recognize the real space (outside world) as well as the virtual space. However, the head-mounted display D is not limited to the above, and may be an MR device such as MR glasses that has the function of providing mixed reality (MR) to the user, or a virtual reality (VR) to the user. It may also be a VR device such as VR glasses that has a function of providing reality.
 一例として、オンライン対話支援システム1には、Volumetric Video(又はVolumetric Capture)技術が適用され得る。この技術は、被写体を複数のカメラ等を用いて全方位から撮影し、被写体のあるがままの姿、形及び動き等(以下「動作等」という。)を高精度に再現する3Dコンテンツを作成する技術である。この技術が適用されたオンライン対話支援システム1は、3D化された複数のユーザの動作等を同一の仮想空間上にリアルタイムに再現することにより、同一の空間で対話しているかのようなユーザ体験を各ユーザに提供する。このようなユーザ体験を享受するために、送信側ユーザ及び受信側ユーザは、ヘッドマウントディスプレイDを装着したままオンライン対話に参加する。 As an example, volumetric video (or volumetric capture) technology can be applied to the online dialogue support system 1. This technology creates 3D content that accurately reproduces the subject's appearance, shape, movement, etc. It is a technology to The online dialogue support system 1 to which this technology is applied reproduces the actions of a plurality of users in 3D in real time in the same virtual space. to each user. In order to enjoy such a user experience, the sender user and the receiver user participate in the online dialogue while wearing the head-mounted display D.
[第1実施形態]
 図2は、第1実施形態に係るオンライン対話支援システム1A(1)の機能構成の一例を示すブロック図である。オンライン対話支援システム1Aは、送信側ユーザ端末10A(10)及び受信側ユーザ端末20A(20)を含んで構成される。第1実施形態においては、オンライン対話支援システムの主要な機能は、受信側ユーザ端末20Aによって実行される。すなわち、第1実施形態においては、受信側ユーザ端末20Aが単独でオンライン対話支援システムを構成していると見做すことができる。送信側ユーザ端末10Aは、撮影部11及び送信部12を備えている。
[First embodiment]
FIG. 2 is a block diagram showing an example of the functional configuration of the online dialogue support system 1A(1) according to the first embodiment. The online dialogue support system 1A includes a sender user terminal 10A (10) and a receiver user terminal 20A (20). In the first embodiment, the main functions of the online dialogue support system are performed by the receiving user terminal 20A. That is, in the first embodiment, it can be considered that the receiving user terminal 20A alone constitutes an online dialogue support system. The user terminal 10A on the transmission side has an imaging unit 11 and a transmission unit 12 .
 撮影部11は、送信側ユーザの顔を撮影して顔画像を取得する。例えば、撮影部11は、送信側ユーザの顔を示す基準顔画像を撮影する。基準顔画像とは、送信側ユーザの顔が隠されていない状態で撮影された送信側ユーザの本来の(完全な)顔画像である。例えば、基準顔画像は、送信側ユーザの顔がヘッドマウントディスプレイDによって隠されていない状態(すなわち、送信側ユーザがヘッドマウントディスプレイDを装着する前の状態)で送信側ユーザの顔全体が撮影された画像である。例えば、撮影部11は、基準顔画像を静止画として撮影する。 The photographing unit 11 obtains a face image by photographing the face of the user on the transmission side. For example, the photographing unit 11 photographs a reference face image representing the face of the user on the sending side. The reference face image is the original (complete) face image of the sending user, which is taken with the sending user's face not hidden. For example, the reference face image is obtained by capturing the entire face of the transmitting user in a state where the transmitting user's face is not hidden by the head mounted display D (that is, before the transmitting user wears the head mounted display D). This is an image that has been For example, the photographing unit 11 photographs the reference face image as a still image.
 また、撮影部11は、送信側ユーザの顔の一部の領域が隠された状態の送信側ユーザの顔を示す隠れ顔画像を撮影する。隠れ顔画像とは、撮影部11と送信側ユーザの顔との間に存在する物体によって、送信側ユーザの顔の一部が隠された状態で撮影された送信側ユーザの不完全な顔画像である。例えば、隠れ顔画像は、ヘッドマウントディスプレイDによって送信側ユーザの顔の一部の領域が隠された状態(すなわち、送信側ユーザがヘッドマウントディスプレイDを装着した後の状態)で送信側ユーザの顔全体が撮影された画像である。換言すると、隠れ顔画像は送信側ユーザの顔と共に、送信側ユーザの顔の一部の領域を隠すヘッドマウントディスプレイDが映り込んだ画像である。例えば、撮影部11は、隠れ顔画像を動画像形式で撮影する。 In addition, the photographing unit 11 photographs a hidden face image showing the face of the transmitting user with a part of the face of the transmitting user hidden. A hidden face image is an incomplete face image of the sending user captured with a part of the sending user's face hidden by an object existing between the imaging unit 11 and the sending user's face. is. For example, the hidden face image is a face image of the sending user in a state in which a part of the face of the sending user is hidden by the head mounted display D (that is, a state after the sending user wears the head mounted display D). This is an image in which the entire face is photographed. In other words, the hidden face image is an image in which the face of the sending user and the head-mounted display D that hides a part of the face of the sending user are reflected. For example, the imaging unit 11 captures hidden face images in a moving image format.
 送信部12は、撮影部11によって取得された基準顔画像及び隠れ顔画像を受信側ユーザ端末20Aに送信する。例えば、送信部12は、オンライン対話が開始される前に基準顔画像を受信側ユーザ端末20Aに送信し、オンライン対話が開始された後に隠れ顔画像を受信側ユーザ端末20Aに送信する。 The transmission unit 12 transmits the reference face image and the hidden face image acquired by the imaging unit 11 to the receiving user terminal 20A. For example, the transmitting unit 12 transmits the reference face image to the receiving user terminal 20A before the online dialogue is started, and transmits the hidden face image to the receiving user terminal 20A after the online dialogue is started.
 受信側ユーザ端末20Aは、受信部21(取得部)と、生成部22と、認証部23と、表示制御部24と、を有する。 The receiving-side user terminal 20A has a receiving unit 21 (acquiring unit), a generating unit 22, an authenticating unit 23, and a display control unit 24.
 受信部21は、送信側ユーザ端末10Aから基準顔画像及び隠れ顔画像を受信することによって、基準顔画像及び隠れ顔画像を取得する取得部として機能する。受信部21は、基準顔画像を後述の記憶部30に格納する。なお、送信側ユーザの基準顔画像は、受信側ユーザ端末20Aを経由することなく、送信側ユーザ端末10から記憶部30へと直接記憶(登録)されてもよい。 The reception unit 21 functions as an acquisition unit that acquires the reference face image and the hidden face image by receiving the reference face image and the hidden face image from the transmission-side user terminal 10A. The receiving unit 21 stores the reference face image in the storage unit 30, which will be described later. The reference face image of the sending user may be stored (registered) directly from the sending user terminal 10 to the storage unit 30 without going through the receiving user terminal 20A.
 生成部22は、隠れ顔画像の一部の領域を補完し、補完顔画像を生成する。補完顔画像とは、隠れ顔画像の一部の領域が隠されていない状態の送信側ユーザの顔を示す顔画像である。生成部22は補完顔画像を動画形式で生成する。隠れ顔画像の一部の領域は、例えば、送信側ユーザが装着しているヘッドマウントディスプレイDによって隠されている。生成部22は、隠れ顔画像のうちヘッドマウントディスプレイDに対応する部分を他の画像に置換することによって、一部の領域を補完する。例えば、生成部22は、隠れ顔画像のうちヘッドマウントディスプレイDに対応する部分を、送信側ユーザの顔の部分を表す画像に置換して、ヘッドマウントディスプレイDを装着していない状態の送信側ユーザの顔を再現する。顔画像の補完処理については後述する。 The generation unit 22 complements a partial area of the hidden face image to generate a complemented face image. A complementary face image is a face image showing the face of the transmitting user when a part of the hidden face image is not hidden. The generation unit 22 generates a complementary face image in a moving image format. A part of the hidden face image is hidden by, for example, the head-mounted display D worn by the sender. The generation unit 22 replaces the part of the hidden face image corresponding to the head-mounted display D with another image, thereby complementing the partial area. For example, the generation unit 22 replaces the portion corresponding to the head mounted display D in the hidden face image with an image representing the face of the transmitting side user, and replaces the portion corresponding to the head mounted display D in the hidden face image with the transmitting side without the head mounted display D. Reproduce the user's face. The face image complementing process will be described later.
 認証部23は、基準顔画像及び補完顔画像に基づく認証を実行する。認証部23は、公知の手法を用いて顔認証を実行する。一例では、認証部23は、基準顔画像及び補完顔画像における特徴点及び顔の領域等をそれぞれ抽出する。認証部23は、抽出したそれぞれの値を照合することによって両者の類似度を算出する。認証部23は、両者の類似度が所定の閾値以上であれば認証が成功であると判定し、両者の類似度が所定の閾値未満であれば認証が失敗であると判定する。 The authentication unit 23 performs authentication based on the reference face image and the complementary face image. The authentication unit 23 performs face authentication using a known method. In one example, the authentication unit 23 extracts feature points, face regions, and the like from the reference face image and the complementary face image. The authentication unit 23 compares the extracted values to calculate the degree of similarity between them. The authentication unit 23 determines that the authentication is successful if the degree of similarity between the two is equal to or greater than a predetermined threshold, and determines that the authentication is unsuccessful if the degree of similarity between the two is less than the predetermined threshold.
 表示制御部24は、受信側ユーザ端末20Aにおける送信側ユーザの顔画像の表示を制御する。表示制御部24は、認証部23による認証結果に応答して、受信側ユーザ端末20Aが備えるディスプレイ等の出力装置(表示装置)上に補完顔画像又は隠れ顔画像を表示させる。例えば、認証が成功した場合に、表示制御部24は受信側ユーザ端末20Aに補完顔画像を表示させる。一方、認証が失敗した場合に、表示制御部24は受信側ユーザ端末20Aに隠れ顔画像を表示させる。 The display control unit 24 controls the display of the face image of the sending user on the receiving user terminal 20A. In response to the authentication result by the authentication unit 23, the display control unit 24 causes the complementary face image or the hidden face image to be displayed on an output device (display device) such as a display provided in the receiving user terminal 20A. For example, when the authentication is successful, the display control unit 24 causes the receiving-side user terminal 20A to display the complementary face image. On the other hand, if the authentication fails, the display control unit 24 causes the receiving-side user terminal 20A to display the hidden face image.
 記憶部30は、受信側ユーザ端末20Aにおいて使用又は生成される各種データを記憶する。例えば、記憶部30は、送信側ユーザの顔を示す基準顔画像を記憶する。記憶部30は、認証部23による認証に用いられる基準顔画像の特徴点等のデータを記憶してもよい。記憶部30は、受信部21によって取得された隠れ顔画像、及び生成部22によって生成された補完顔画像のうち少なくとも一つを記憶してもよい。記憶部30は、ヘッドマウントディスプレイDの形状を記憶してもよい。記憶部30は、受信側ユーザ端末20Aとは別の装置であってもよく、受信側ユーザ端末20Aの一構成要素であってもよい。 The storage unit 30 stores various data used or generated in the receiving user terminal 20A. For example, the storage unit 30 stores a reference face image representing the face of the sending user. The storage unit 30 may store data such as feature points of the reference face image used for authentication by the authentication unit 23 . The storage unit 30 may store at least one of the hidden face image acquired by the reception unit 21 and the complementary face image generated by the generation unit 22 . The storage unit 30 may store the shape of the head mounted display D. FIG. The storage unit 30 may be a device separate from the receiving user terminal 20A, or may be one component of the receiving user terminal 20A.
 図3を参照して、顔画像の補完処理について説明する。図3は、顔画像の補完処理を模式的に示す図である。なお、受信側ユーザ端末20A側に基準顔画像の一例を図示しているが、基準顔画像は顔画像の補完に用いられない。 The face image complementing process will be described with reference to FIG. FIG. 3 is a diagram schematically showing complement processing of a face image. Although an example of the reference face image is shown on the side of the receiving user terminal 20A, the reference face image is not used for complementing the face image.
 送信側ユーザ端末10Aは、隠れ顔画像を撮影して受信側ユーザ端末20Aに送信する。この隠れ顔画像は、送信側ユーザが装着しているヘッドマウントディスプレイDによって、送信側ユーザの顔の一部が隠されている。具体的には、隠れ顔画像は、送信側ユーザの目付近の領域がヘッドマウントディスプレイDのレンズ、フレーム、及びブリッジ等によって隠されている。 The sender's user terminal 10A takes a hidden face image and sends it to the receiver's user terminal 20A. In this hidden face image, a part of the sender's face is hidden by the head-mounted display D worn by the sender. Specifically, in the hidden face image, the area around the eyes of the transmitting user is hidden by the lens, frame, bridge, etc. of the head-mounted display D. FIG.
 生成部22は、隠れ顔画像から補完する対象の一部の領域Rを特定する。例えば、生成部22は、記憶部30に予め記憶されたヘッドマウントディスプレイDの形状を読み出し、該形状に基づいて隠れ顔画像のうちヘッドマウントディスプレイDに対応する領域を特定することにより、一部の領域Rを特定する。 The generation unit 22 identifies a partial area R to be complemented from the hidden face image. For example, the generating unit 22 reads out the shape of the head mounted display D stored in advance in the storage unit 30, and identifies the area corresponding to the head mounted display D in the hidden face image based on the shape, thereby partially Identify the region R of
 生成部22は、隠れ顔画像の一部の領域Rを補完することによって、補完顔画像を生成する。一部の領域Rの補完は、機械学習によって行われてもよい。例えば、生成部22は、複数の送信側ユーザの顔画像(いわゆる正例)と共に送信側ユーザとは異なる複数のユーザの顔画像(いわゆる負例)を教師データとして用いた機械学習によって、送信側ユーザの顔の一部を示す画像を入力して送信側ユーザの顔の他の部分の推定結果を出力するように構成されたモデルを予め用意しておく。複数の送信側ユーザの顔画像は、例えば、様々な角度から撮影された送信側ユーザの顔全体を示す顔画像である。送信側ユーザとは異なる複数のユーザの顔画像は、例えば、複数のユーザの各々について、様々な角度から撮影された顔全体を示す顔画像である。上記のように正例及び負例を含む教師データを用いた機械学習を行うことにより、以下のように構成されたモデルを得ることができる。すなわち、真正の送信側ユーザの顔の一部(例えば、ヘッドマウントディスプレイDによって隠されない口元を含む部分等)を示す画像が入力された場合に真正の送信側ユーザの特徴が反映された他の部分の推定結果(例えば、ヘッドマウントディスプレイDによって隠された部分を含む画像)を出力し、真正の送信側ユーザとは異なるユーザ(例えば、真正の送信側ユーザになりすまそうとしているユーザ)の顔の一部を示す画像が入力された場合に真正の送信側ユーザの特徴が反映されていない推定結果を出力するように構成されたモデルを得ることができる。生成部22は、隠れ顔画像のうち一部の領域Rを除いた領域をモデルに入力し、モデルからの出力結果に基づいて一部の領域Rを補完する。 The generation unit 22 generates a complementary face image by complementing a partial region R of the hidden face image. Complementation of some regions R may be performed by machine learning. For example, the generation unit 22 may generate a plurality of face images (so-called positive examples) of a plurality of sending-side users and face images (so-called negative examples) of a plurality of users different from the sending-side user as training data by machine learning. A model is prepared in advance that is configured to input an image showing a part of the user's face and output an estimation result of another part of the sending user's face. The multiple face images of the sending users are, for example, face images showing the entire faces of the sending users photographed from various angles. The face images of a plurality of users different from the sending user are, for example, face images showing the entire faces photographed from various angles for each of the plurality of users. By performing machine learning using teacher data including positive and negative examples as described above, a model configured as follows can be obtained. That is, when an image showing a part of the face of the authentic sender user (for example, a part including the mouth that is not hidden by the head-mounted display D) is input, another image reflecting the features of the authentic sender user is displayed. Output the estimation result of the part (for example, an image including the part hidden by the head-mounted display D), and the user different from the genuine sender user (for example, the user who is trying to impersonate the genuine sender user). A model can be obtained that is configured to output an estimation result that does not reflect the features of a genuine sending user when an image showing a portion of a face is input. The generation unit 22 inputs the region of the hidden face image excluding the partial region R to the model, and interpolates the partial region R based on the output result from the model.
 一部の領域Rの補完に関する別の例では、生成部22は、複数の送信側ユーザの顔画像と共に送信側ユーザとは異なる複数のユーザの顔画像を教師データとして用いた機械学習によって、一部の領域Rのうち一部を示す第一範囲の画像を入力して、一部の領域Rのうち第一範囲とは異なる部分を示す第二範囲の推定結果を出力するように構成されたモデルを予め用意しておく。一部の領域Rのうち一部を示す第一範囲の画像は、ヘッドマウントディスプレイDによって取得されてもよい。例えば、ヘッドマウントディスプレイDは、眼鏡のブリッジ部分の内側(ユーザ側)にカメラが設けられていてもよい。ヘッドマウントディスプレイDは、送信側ユーザがヘッドマウントディスプレイDを装着している間において、送信側ユーザの目の画像(例えば、ユーザの両目を被写体として含む動画)を撮影する。ヘッドマウントディスプレイDは、送信側ユーザの目毎に配置された2つのカメラ(例えば、各レンズの内側に配置された2つのカメラ)によって、送信側ユーザの目の画像を撮影してもよい。そして、生成部22は、一部の領域Rのうち送信側ユーザの顔の一部を示す画像(例えば、送信側ユーザの目の画像)をモデルに入力し、入力した画像及びモデルからの出力結果に基づいて一部の領域を補完してもよい。 In another example of complementing a part of the region R, the generating unit 22 performs machine learning using face images of a plurality of sending users and face images of a plurality of users different from the sending user as teacher data. input a first range image showing a part of the partial region R, and output an estimation result of a second range showing a part of the partial region R different from the first range Prepare the model in advance. An image of the first range showing a portion of the partial region R may be acquired by the head mounted display D. FIG. For example, the head mounted display D may be provided with a camera inside the bridge portion of the glasses (on the user side). The head mounted display D captures an image of the eyes of the transmitting user (for example, a moving image including the user's eyes as subjects) while the transmitting user is wearing the head mounted display D. The head-mounted display D may capture images of the eyes of the sending user using two cameras arranged for each eye of the sending user (for example, two cameras arranged inside each lens). Then, the generating unit 22 inputs an image (for example, an image of the eyes of the transmitting user) showing a part of the transmitting user's face in the partial region R to the model, and outputs the input image and the model. Some areas may be completed based on the results.
 上記のような処理によって、隠れ顔画像のうちヘッドマウントディスプレイDに対応する部分が、送信側ユーザの顔の部分を表す画像に置換される。その結果、補完顔画像における一部の領域Rでは、ヘッドマウントディスプレイDを装着していない状態の送信側ユーザの顔が再現される。 Through the above-described processing, the part of the hidden face image corresponding to the head-mounted display D is replaced with an image representing the face of the transmitting user. As a result, the face of the transmitting user not wearing the head-mounted display D is reproduced in the partial area R of the complementary face image.
 図4を参照して、オンライン対話支援システム1Aの動作について説明する。図4はオンライン対話支援システム1Aの動作を処理フローS1として示すシーケンス図である。以下では、記憶部30が送信側ユーザの顔を示す基準顔画像を予め記憶していることを前提とする。 The operation of the online dialogue support system 1A will be described with reference to FIG. FIG. 4 is a sequence diagram showing the operation of the online dialogue support system 1A as a processing flow S1. In the following, it is assumed that the storage unit 30 stores in advance a reference face image representing the face of the user on the sending side.
 ステップS11において、撮影部11は、送信側ユーザの顔の一部の領域が隠された状態の送信側ユーザの顔を示す隠れ顔画像を撮影する。一例では、撮影部11は、送信側ユーザが装着しているヘッドマウントディスプレイDによって一部の領域が隠された状態の送信側ユーザの顔を撮影する。撮影部11は、隠れ顔画像を動画像形式で撮影する。 In step S11, the photographing unit 11 photographs a hidden face image showing the face of the transmitting user with a part of the face of the transmitting user hidden. In one example, the photographing unit 11 photographs the face of the transmitting user with a part of the face hidden by the head-mounted display D worn by the transmitting user. The photographing unit 11 photographs hidden face images in a moving image format.
 ステップS12において、送信部12は、撮影部11によって取得された隠れ顔画像を受信側ユーザ端末20Aに送信する。例えば、送信部12は、動画像形式の隠れ顔画像をリアルタイムで受信側ユーザ端末20Aに送信する。 In step S12, the transmission unit 12 transmits the hidden face image acquired by the imaging unit 11 to the receiving user terminal 20A. For example, the transmission unit 12 transmits the hidden face image in the moving image format to the receiving user terminal 20A in real time.
 ステップS13において、受信部21は、隠れ顔画像を送信側ユーザ端末10Aから受信することによって、隠れ顔画像を取得する。受信部21は、隠れ顔画像を記憶部30に格納してもよい。 In step S13, the receiving unit 21 acquires a hidden face image by receiving the hidden face image from the transmission-side user terminal 10A. The receiving section 21 may store the hidden face image in the storage section 30 .
 ステップS14において、生成部22は、隠れ顔画像の一部の領域を補完し、補完顔画像を生成する。一例では、生成部22は、記憶部30に予め記憶されたヘッドマウントディスプレイDの形状に基づいて、隠れ顔画像のうちヘッドマウントディスプレイDに対応する領域を特定することにより、一部の領域R(図3参照)を特定する。 In step S14, the generation unit 22 complements a partial area of the hidden face image to generate a complementary face image. In one example, the generation unit 22 identifies a region corresponding to the head-mounted display D in the hidden face image based on the shape of the head-mounted display D stored in advance in the storage unit 30, so that the partial region R (see FIG. 3).
 補完処理の一例として、生成部22は、複数の送信側ユーザの顔画像と共に送信側ユーザとは異なる複数のユーザの顔画像を教師データとして用いた機械学習によって、送信側ユーザの顔の一部を示す画像を入力して送信側ユーザの顔の他の部分の推定結果を出力するように構成されたモデルを予め用意しておき、隠れ顔画像のうち一部の領域Rを除いた領域をモデルに入力し、モデルからの出力結果に基づいて一部の領域Rを補完する。別の例では、生成部22は、複数の送信側ユーザの顔画像と共に送信側ユーザとは異なる複数のユーザの顔画像を教師データとして用いた機械学習によって、一部の領域Rのうち一部を示す第一範囲の画像を入力して、一部の領域Rのうち第一範囲とは異なる部分を示す第二範囲の推定結果を出力するように構成されたモデルを予め用意しておき、一部の領域Rのうち送信側ユーザの顔の一部を示す画像をモデルに入力し、入力した画像及びモデルからの出力結果に基づいて一部の領域を補完する。 As an example of complementing processing, the generation unit 22 performs machine learning using face images of a plurality of users different from the sending user as teacher data together with the face images of the sending users. A model is prepared in advance that is configured to input an image showing and output an estimation result of the other part of the face of the sending user. Input to the model and interpolate some region R based on the output results from the model. In another example, the generation unit 22 performs machine learning using facial images of a plurality of users different from the transmitting user together with facial images of a plurality of transmitting users as teacher data. Preparing in advance a model configured to input an image of a first range showing and output an estimation result of a second range showing a part different from the first range in a part of the region R, An image showing a portion of the transmitting user's face in the partial area R is input to the model, and the partial area is interpolated based on the input image and the output result from the model.
 ステップS15において、認証部23は、基準顔画像及び補完顔画像に基づく認証を実行する。認証部23は、公知の手法を用いて顔認証を実行し、認証の成否を判定する。認証部23は、動画像形式である隠れ顔画像の一連のフレームのうち、認証が成功したフレーム以降については認証を実行しなくてもよい。 In step S15, the authentication unit 23 performs authentication based on the reference face image and the complementary face image. The authentication unit 23 performs face authentication using a known method and determines whether the authentication is successful. The authenticating unit 23 does not have to perform authentication for frames after successful authentication among the series of frames of the hidden face image in the moving image format.
 ステップS16において、表示制御部24は、受信側ユーザ端末20Aにおける送信側ユーザの顔画像の表示を制御する。例えば、ステップS15の処理で認証が成功した場合に、表示制御部24は受信側ユーザ端末20Aに補完顔画像を表示させる。一方、ステップS15の処理で認証が失敗した場合に、表示制御部24は受信側ユーザ端末20Aに隠れ顔画像を表示させる。表示制御部24は、受信側ユーザが装着しているヘッドマウントディスプレイD上に、送信側ユーザの補完顔画像又は隠れ顔画像を表示させてもよい。 In step S16, the display control unit 24 controls display of the face image of the sending user on the receiving user terminal 20A. For example, when the authentication is successful in the process of step S15, the display control unit 24 causes the receiving-side user terminal 20A to display the complementary face image. On the other hand, if the authentication fails in the process of step S15, the display control unit 24 causes the receiving-side user terminal 20A to display the hidden face image. The display control unit 24 may display the complementary face image or the hidden face image of the sending user on the head-mounted display D worn by the receiving user.
 以上説明したオンライン対話支援システム1Aによれば、送信側ユーザの顔の一部の領域Rが隠された隠れ顔画像から、該一部の領域Rを補完した補完顔画像が生成される。そして、送信側ユーザの本来の(完全な)顔画像である基準顔画像、及び補完顔画像に基づく認証が実行される。認証が成功した場合に、受信側ユーザ端末20Aに補完顔画像が表示される。上記オンライン対話支援システム1Aによれば、受信側ユーザは、補完顔画像が受信側ユーザ端末20Aに表示されていることを確認することによって、送信側ユーザが本人であることを確認することが可能となる。よって、送信側ユーザのなりすましを防止することができる。 According to the online dialogue support system 1A described above, a complementary face image in which the partial region R of the face of the sending user is interpolated is generated from the hidden face image in which the partial region R is hidden. Authentication is then performed based on the reference face image, which is the original (complete) face image of the sending user, and the complementary face image. If the authentication is successful, the complementary face image is displayed on the receiving-side user terminal 20A. According to the online dialogue support system 1A, the receiving user can confirm that the transmitting user is the person himself/herself by confirming that the complementary face image is displayed on the receiving user terminal 20A. becomes. Therefore, it is possible to prevent spoofing of the user on the sending side.
 表示制御部24は、認証が失敗した場合に、受信側ユーザ端末20Aに隠れ顔画像を表示させる。この場合、受信側ユーザは、隠れ顔画像が受信側ユーザ端末20Aに表示されていることを確認することによって、送信側ユーザがなりすましである可能性を検知できる。 The display control unit 24 causes the receiving user terminal 20A to display a hidden face image when the authentication fails. In this case, the receiving user can detect the possibility that the transmitting user is impersonating by confirming that the hidden face image is displayed on the receiving user terminal 20A.
 一部の領域Rは、送信側ユーザが装着しているヘッドマウントディスプレイDによって隠されている。生成部22は、隠れ顔画像のうちヘッドマウントディスプレイDに対応する部分を他の画像に置換することによって、一部の領域Rを補完する。このような構成によれば、受信側ユーザ端末20A上には、送信側ユーザがヘッドマウントディスプレイDを装着していないように表示される。これにより、送信側ユーザがヘッドマウントディスプレイDによる表示等を視認しながら、送信側ユーザと受信側ユーザとの間でより自然なコミュニケーションを実現することができる。 A part of the area R is hidden by the head-mounted display D worn by the user on the sending side. The generation unit 22 complements the partial area R by replacing the part of the hidden face image corresponding to the head-mounted display D with another image. According to such a configuration, it appears as if the transmitting user is not wearing the head mounted display D on the receiving user terminal 20A. As a result, while the transmitting user visually recognizes the display on the head-mounted display D, more natural communication can be realized between the transmitting user and the receiving user.
 生成部22は、予め記憶されたヘッドマウントディスプレイDの形状に基づいて、隠れ顔画像のうちヘッドマウントディスプレイDに対応する領域を特定することにより、一部の領域Rを特定する。この場合、ヘッドマウントディスプレイDが精緻に特定されることにより、一部の領域Rの補完精度も向上する。 The generation unit 22 identifies a partial region R by identifying a region corresponding to the head-mounted display D in the hidden face image based on the shape of the head-mounted display D stored in advance. In this case, since the head-mounted display D is precisely identified, the complementing accuracy of the partial area R is also improved.
 生成部22は、複数の送信側ユーザの顔画像と共に送信側ユーザとは異なる複数のユーザの顔画像を教師データとして用いた機械学習によって、送信側ユーザの顔の一部を示す画像を入力して送信側ユーザの顔の他の部分の推定結果を出力するように構成されたモデルを予め用意しておき、隠れ顔画像のうち一部の領域Rを除いた領域をモデルに入力し、モデルからの出力結果に基づいて一部の領域Rを補完する。このような構成によれば、送信側ユーザの顔の一部がより自然に補完される。また、ユーザの動きがあった場合でも、ユーザの動きに合わせてユーザの顔の一部が補完されるため、一部の領域Rの補完精度が向上する。 The generation unit 22 inputs an image showing a part of the face of the sending user by machine learning using facial images of a plurality of users different from the sending user as teacher data together with the face images of the sending user. A model configured to output estimation results of other parts of the face of the transmitting user is prepared in advance, and an area of the hidden face image excluding a part of the area R is input to the model. A part of the region R is interpolated based on the output result from . According to such a configuration, part of the face of the transmitting user is complemented more naturally. In addition, even when the user moves, since part of the user's face is complemented according to the user's movement, the complementing accuracy of the partial area R is improved.
 生成部22は、複数の送信側ユーザの顔画像と共に送信側ユーザとは異なる複数のユーザの顔画像を教師データとして用いた機械学習によって、一部の領域Rのうち一部を示す第一範囲の画像を入力して、一部の領域Rのうち第一範囲とは異なる部分を示す第二範囲の推定結果を出力するように構成されたモデルを予め用意しておき、一部の領域Rのうち送信側ユーザの顔の一部を示す画像をモデルに入力し、入力した画像及びモデルからの出力結果に基づいて一部の領域を補完する。このような構成によれば、一部の領域Rのうち送信側ユーザの顔の一部を示す画像(本実施形態では送信側ユーザの目の画像)及びモデルからの出力結果の組合せにより一部の領域Rが補完される。この場合、一部の領域Rのうち送信側ユーザの顔の一部が画像により補完されるため、よりユーザの本来の顔に近い補完顔画像が得られる。また、送信側ユーザの視線及び表情等が補完顔画像に反映されるため、送信側ユーザの顔の一部がより自然に補完される。 The generation unit 22 generates a first range indicating a part of the partial region R by machine learning using face images of a plurality of users different from the sending user as teacher data together with the face images of the sending users. A model is prepared in advance that is configured to input the image of the partial region R and output the estimation result of the second range indicating the portion different from the first range in the partial region R, and the partial region R An image showing a part of the sender's face is input to the model, and the part of the area is interpolated based on the input image and the output result from the model. According to such a configuration, a combination of an image showing a part of the transmitting user's face (in this embodiment, an image of the transmitting user's eyes) in the partial area R and the output result from the model The region R of is complemented. In this case, part of the face of the transmitting user in the partial area R is complemented by the image, so that a complemented face image closer to the original face of the user can be obtained. In addition, since the line of sight, expression, etc. of the user on the sending side are reflected in the complemented face image, part of the face of the user on the sending side is complemented more naturally.
[第2実施形態]
 図5は、第2実施形態に係るオンライン対話支援システム1B(1)の機能構成の一例を示すブロック図である。オンライン対話支援システム1Bは、送信側ユーザ端末10A及び受信側ユーザ端末20Aの代わりに送信側ユーザ端末10B(10)及び受信側ユーザ端末20B(20)を備える点で、オンライン対話支援システム1Aと相違する。オンライン対話支援システム1Bでは、送信側ユーザ端末10B上で補完顔画像が生成され、基準顔画像及び補完顔画像に基づく認証が実行される。
[Second embodiment]
FIG. 5 is a block diagram showing an example of the functional configuration of an online dialogue support system 1B(1) according to the second embodiment. The online dialogue support system 1B differs from the online dialogue support system 1A in that it comprises a sender user terminal 10B (10) and a receiver user terminal 20B (20) instead of the sender user terminal 10A and the receiver user terminal 20A. do. In the online dialogue support system 1B, a complementary face image is generated on the sender's user terminal 10B, and authentication is performed based on the reference face image and the complementary face image.
 送信側ユーザ端末10Bは、生成部22及び認証部23を有し、送信部12の代わりに送信部12Bを有する点で、送信側ユーザ端末10Aと相違する。受信側ユーザ端末20Bは、生成部22及び認証部23を有さず、受信部21の代わりに受信部21Bを有する点で、受信側ユーザ端末20Aと相違する。送信部12Bは、認証部23の認証結果に応じて、隠れ顔画像又は補完顔画像を受信側ユーザ端末20Bに送信する。具体的には、認証部23による認証が成功した場合に、送信部12Bは補完顔画像を受信側ユーザ端末20Bに送信する。一方、認証部23による認証が失敗した場合に、送信部12Bは隠れ顔画像を受信側ユーザ端末20Bに送信する。受信部21Bは、送信側ユーザ端末10Bから、隠れ顔画像又は補完顔画像を受信する。 The transmitting-side user terminal 10B differs from the transmitting-side user terminal 10A in that it has a generating unit 22 and an authenticating unit 23, and has a transmitting unit 12B instead of the transmitting unit 12. The receiving-side user terminal 20B differs from the receiving-side user terminal 20A in that it does not have the generation unit 22 and the authentication unit 23 and has a receiving unit 21B instead of the receiving unit 21 . The transmission unit 12B transmits the hidden face image or the complementary face image to the receiving user terminal 20B according to the authentication result of the authentication unit 23. FIG. Specifically, when the authentication by the authentication unit 23 is successful, the transmission unit 12B transmits the complementary face image to the receiving user terminal 20B. On the other hand, when the authentication by the authentication unit 23 fails, the transmission unit 12B transmits the hidden face image to the receiving user terminal 20B. The receiving unit 21B receives the hidden face image or the complementary face image from the user terminal 10B on the transmission side.
 オンライン対話支援システム1Bでは、撮影部11が、隠れ顔画像を取得する取得部として機能する。また、認証結果に応じて隠れ顔画像又は補完顔画像を受信側ユーザ端末20Bに送信する送信部14が、実質的に、認証が成功した場合に受信側ユーザ端末20Bに補完顔画像を表示させ、認証が失敗した場合に受信側ユーザ端末20Bに隠れ顔画像を表示させる表示制御部として機能する。従って、第2実施形態においては、オンライン対話支援システムの主要な機能は、送信側ユーザ端末10Bによって実行される。すなわち、第2実施形態においては、送信側ユーザ端末10Bが単独でオンライン対話支援システムを構成していると見做すことができる。なお、記憶部30は、送信側ユーザ端末10Bとは別の装置であってもよく、送信側ユーザ端末10Bの一構成要素であってもよい。 In the online dialogue support system 1B, the imaging unit 11 functions as an acquisition unit that acquires hidden face images. Further, the transmitting unit 14 that transmits the hidden face image or the complementary face image to the receiving-side user terminal 20B according to the authentication result substantially causes the receiving-side user terminal 20B to display the complementary face image when the authentication is successful. , and functions as a display control unit that displays a hidden face image on the receiving-side user terminal 20B when authentication fails. Therefore, in the second embodiment, the main functions of the online dialogue support system are performed by the sender's user terminal 10B. That is, in the second embodiment, it can be considered that the transmitting user terminal 10B alone constitutes an online dialogue support system. Note that the storage unit 30 may be a device separate from the transmitting user terminal 10B, or may be one component of the transmitting user terminal 10B.
 図6を参照して、オンライン対話支援システム1Bの動作について説明する。図6はオンライン対話支援システム1Bの動作を処理フローS2として示すシーケンス図である。以下では、記憶部30が送信側ユーザの顔を示す基準顔画像を予め記憶していることを前提とする。 The operation of the online dialogue support system 1B will be described with reference to FIG. FIG. 6 is a sequence diagram showing the operation of the online dialogue support system 1B as a processing flow S2. In the following, it is assumed that the storage unit 30 stores in advance a reference face image representing the face of the user on the sending side.
 ステップS21において、撮影部11は、送信側ユーザの顔の一部の領域が隠された状態の送信側ユーザの顔を示す隠れ顔画像を撮影する。一例では、撮影部11は、送信側ユーザが装着しているヘッドマウントディスプレイDによって一部の領域が隠された状態の送信側ユーザの顔を撮影する。撮影部11は、隠れ顔画像を動画像形式で撮影する。また、撮影部11は、隠れ顔画像を記憶部30に格納してもよい。 In step S21, the photographing unit 11 photographs a hidden face image showing the face of the transmitting user with a part of the face of the transmitting user hidden. In one example, the photographing unit 11 photographs the face of the transmitting user with a part of the face hidden by the head-mounted display D worn by the transmitting user. The photographing unit 11 photographs hidden face images in a moving image format. Also, the photographing unit 11 may store the hidden face image in the storage unit 30 .
 ステップS22において、生成部22は、隠れ顔画像の一部の領域を補完し、補完顔画像を生成する。ステップS22の処理は、送信側ユーザ端末10B上で行われる点で図4におけるステップS14とは異なる。 In step S22, the generation unit 22 complements a partial region of the hidden face image to generate a complementary face image. The process of step S22 is different from step S14 in FIG. 4 in that it is performed on the transmitting user terminal 10B.
 ステップS23において、認証部23は、基準顔画像及び補完顔画像に基づく認証を実行する。ステップS23の処理は、送信側ユーザ端末10B上で行われる点で図4におけるステップS15とは異なる。 In step S23, the authentication unit 23 performs authentication based on the reference face image and the complementary face image. The process of step S23 is different from step S15 in FIG. 4 in that it is performed on the transmitting user terminal 10B.
 ステップS24において、送信部12Bは、隠れ顔画像又は補完顔画像を受信側ユーザ端末20Bに送信する。送信部12Bは、動画像形式の隠れ顔画像又は補完顔画像をリアルタイムで受信側ユーザ端末20Bに送信する。例えば、ステップS23の処理で認証が成功した場合に、送信部12Bは補完顔画像を受信側ユーザ端末20Bに送信する。一方、ステップS23の処理で認証が失敗した場合に、送信部12Bは隠れ顔画像を受信側ユーザ端末20Bに送信する。 In step S24, the transmission unit 12B transmits the hidden face image or the complementary face image to the receiving user terminal 20B. The transmission unit 12B transmits the hidden face image or the complementary face image in moving image format to the receiving user terminal 20B in real time. For example, when the authentication is successful in the process of step S23, the transmission unit 12B transmits the complementary face image to the receiving user terminal 20B. On the other hand, if authentication fails in the process of step S23, the transmission unit 12B transmits the hidden face image to the receiving user terminal 20B.
 ステップS25において、受信部21Bは、隠れ顔画像又は補完顔画像を送信側ユーザ端末10Bから受信することによって、隠れ顔画像又は補完顔画像を取得する。 In step S25, the receiving unit 21B acquires the hidden face image or the complementary face image by receiving the hidden face image or the complementary face image from the transmission-side user terminal 10B.
 ステップS26において、表示制御部24は、受信側ユーザ端末20Bにおける送信側ユーザの顔画像の表示を制御する。例えば、表示制御部24は、受信側ユーザ端末20BにステップS25の処理で取得された隠れ顔画像又は補完顔画像を表示させる。表示制御部24は、受信側ユーザが装着しているヘッドマウントディスプレイD上に、送信側ユーザの補完顔画像又は隠れ顔画像を表示させてもよい。 In step S26, the display control unit 24 controls the display of the face image of the sending user on the receiving user terminal 20B. For example, the display control unit 24 causes the receiving-side user terminal 20B to display the hidden face image or the complementary face image acquired in the process of step S25. The display control unit 24 may display the complementary face image or the hidden face image of the sending user on the head-mounted display D worn by the receiving user.
 以上説明したオンライン対話支援システム1Bによれば、オンライン対話支援システム1Aと同様の作用効果を奏する。すなわち、受信側ユーザは、補完顔画像が受信側ユーザ端末20Bに表示されていることを確認することによって、送信側ユーザが本人であることを確認することが可能となる。よって、送信側ユーザのなりすましを防止することができる。また、オンライン対話支援システム1Bによれば、送信側ユーザ端末10B側で認証処理が実行されるため、受信側ユーザ端末20Bの処理負荷を抑制することができる。 According to the online dialogue support system 1B described above, the same effects as the online dialogue support system 1A are achieved. That is, the receiving user can confirm that the transmitting user is the person himself/herself by confirming that the complementary face image is displayed on the receiving user terminal 20B. Therefore, it is possible to prevent spoofing of the user on the sending side. In addition, according to the online dialogue support system 1B, since authentication processing is executed on the side of the user terminal 10B on the sending side, the processing load on the user terminal 20B on the receiving side can be suppressed.
[第3実施形態]
 図7は、第3実施形態に係るオンライン対話支援システム1C(1)の機能構成の一例を示すブロック図である。オンライン対話支援システム1Cは、送信側ユーザ端末10A及び受信側ユーザ端末20Aの代わりに送信側ユーザ端末10C(10)及び受信側ユーザ端末20C(20)を備える点、並びに、サーバ40を更に備える点で、オンライン対話支援システム1Aと相違する。オンライン対話支援システム1Cでは、サーバ40上で補完顔画像が生成され、基準顔画像及び補完顔画像に基づく認証が実行される。
[Third Embodiment]
FIG. 7 is a block diagram showing an example of the functional configuration of an online dialogue support system 1C(1) according to the third embodiment. The online dialogue support system 1C includes a sending user terminal 10C (10) and a receiving user terminal 20C (20) instead of the sending user terminal 10A and the receiving user terminal 20A, and further includes a server 40. This is different from the online dialogue support system 1A. In the online dialogue support system 1C, a complementary face image is generated on the server 40, and authentication is performed based on the reference face image and the complementary face image.
 送信側ユーザ端末10Cは、送信部12の代わりに送信部12Cを有する点で、送信側ユーザ端末10Aと相違する。送信部12Cは、隠れ顔画像をサーバ40に送信する。受信側ユーザ端末20Cは、生成部22及び認証部23を有さず、受信部21の代わりに受信部21Cを有する点で、受信側ユーザ端末20Aと相違する。受信部21Cは、サーバ40から隠れ顔画像又は補完顔画像を受信する。 The transmitting-side user terminal 10C differs from the transmitting-side user terminal 10A in that it has a transmitting section 12C instead of the transmitting section 12. Transmission unit 12C transmits the hidden face image to server 40 . The receiving user terminal 20C differs from the receiving user terminal 20A in that it does not have the generating unit 22 and the authenticating unit 23 and has a receiving unit 21C instead of the receiving unit 21 . The receiving unit 21C receives hidden face images or complementary face images from the server 40 .
 サーバ40は、画像受信部41(取得部)、生成部22、認証部23及び画像送信部42を有する。画像受信部41は、送信側ユーザ端末10Cから基準顔画像及び隠れ顔画像を受信することによって、基準顔画像及び隠れ顔画像を取得する取得部として機能する。画像送信部42は、認証部23の認証結果に応じて、隠れ顔画像又は補完顔画像を受信側ユーザ端末20Cに送信する。具体的には、認証部23による認証が成功した場合に、画像送信部42は補完顔画像を受信側ユーザ端末20Cに送信する。一方、認証部23による認証が失敗した場合に、画像送信部42は隠れ顔画像を受信側ユーザ端末20Cに送信する。すなわち、画像送信部42は、実質的に、認証が成功した場合に受信側ユーザ端末20Cに補完顔画像を表示させ、認証が失敗した場合に受信側ユーザ端末20Cに隠れ顔画像を表示させる表示制御部として機能する。従って、第3実施形態においては、オンライン対話支援システムの主要な機能は、サーバ40によって実行される。すなわち、第3実施形態においては、サーバ40が単独でオンライン対話支援システムを構成していると見做すことができる。なお、記憶部30は、サーバ40とは別の装置であってもよく、サーバ40の一構成要素であってもよい。 The server 40 has an image reception unit 41 (acquisition unit), a generation unit 22 , an authentication unit 23 and an image transmission unit 42 . The image receiving unit 41 functions as an acquisition unit that acquires the reference face image and the hidden face image by receiving the reference face image and the hidden face image from the transmission-side user terminal 10C. The image transmission unit 42 transmits the hidden face image or the complementary face image to the receiving user terminal 20C according to the authentication result of the authentication unit 23 . Specifically, when the authentication by the authentication unit 23 is successful, the image transmission unit 42 transmits the complementary face image to the receiving user terminal 20C. On the other hand, when the authentication by the authentication unit 23 fails, the image transmission unit 42 transmits the hidden face image to the receiving user terminal 20C. That is, the image transmission unit 42 substantially causes the user terminal 20C on the receiving side to display the complementary face image when the authentication is successful, and displays the hidden face image on the user terminal 20C on the receiving side when the authentication fails. Functions as a control unit. Therefore, in the third embodiment, the main functions of the online dialogue support system are performed by the server 40. FIG. That is, in the third embodiment, it can be considered that the server 40 alone constitutes an online dialogue support system. Note that the storage unit 30 may be a device separate from the server 40 or may be one component of the server 40 .
 図8を参照して、オンライン対話支援システム1Cの動作について説明する。図8はオンライン対話支援システム1Cの動作を処理フローS3として示すシーケンス図である。以下では、記憶部30が送信側ユーザの顔を示す基準顔画像を予め記憶していることを前提とする。 The operation of the online dialogue support system 1C will be described with reference to FIG. FIG. 8 is a sequence diagram showing the operation of the online dialogue support system 1C as a processing flow S3. In the following, it is assumed that the storage unit 30 stores in advance a reference face image representing the face of the user on the sending side.
 ステップS31の処理は、図4のステップS11と同様である。 The process of step S31 is the same as step S11 of FIG.
 ステップS32において、送信部12Cは、撮影部11によって取得された隠れ顔画像をサーバ40に送信する。例えば、送信部12Cは、動画像形式の隠れ顔画像をリアルタイムでサーバ40に送信する。 In step S32, the transmission unit 12C transmits the hidden face image acquired by the imaging unit 11 to the server 40. For example, the transmission unit 12C transmits hidden face images in moving image format to the server 40 in real time.
 ステップS33において、画像受信部41は、隠れ顔画像を送信側ユーザ端末10Cから受信することによって、隠れ顔画像を取得する。画像受信部41は、隠れ顔画像を記憶部30に格納してもよい。 In step S33, the image receiving unit 41 acquires a hidden face image by receiving the hidden face image from the transmission-side user terminal 10C. The image receiving section 41 may store the hidden face image in the storage section 30 .
 ステップS34において、生成部22は、隠れ顔画像の一部の領域を補完し、補完顔画像を生成する。ステップS34の処理は、サーバ40上で行われる点で図4におけるステップS14とは異なる。 In step S34, the generation unit 22 complements a partial area of the hidden face image to generate a complemented face image. The processing of step S34 is different from step S14 in FIG. 4 in that it is performed on the server 40. FIG.
 ステップS35において、認証部23は、基準顔画像及び補完顔画像に基づく認証を実行する。ステップS35の処理は、サーバ40上で行われる点で図4におけるステップS15とは異なる。 In step S35, the authentication unit 23 performs authentication based on the reference face image and the complementary face image. The process of step S35 is different from step S15 in FIG. 4 in that it is performed on the server 40 .
 ステップS36において、画像送信部42は、隠れ顔画像又は補完顔画像を受信側ユーザ端末20Cに送信する。画像送信部42は、動画像形式の隠れ顔画像又は補完顔画像をリアルタイムで受信側ユーザ端末20Cに送信する。例えば、ステップS35の処理で認証が成功した場合に、画像送信部42は補完顔画像を受信側ユーザ端末20Cに送信する。一方、ステップS35の処理で認証が失敗した場合に、画像送信部42は隠れ顔画像を受信側ユーザ端末20Cに送信する。 In step S36, the image transmission unit 42 transmits the hidden face image or the complementary face image to the receiving user terminal 20C. The image transmission unit 42 transmits the hidden face image or the complementary face image in moving image format to the receiving user terminal 20C in real time. For example, when the authentication is successful in the process of step S35, the image transmission unit 42 transmits the complementary face image to the receiving user terminal 20C. On the other hand, if the authentication fails in the process of step S35, the image transmission unit 42 transmits the hidden face image to the receiving user terminal 20C.
 ステップS37において、受信部21Cは、隠れ顔画像又は補完顔画像をサーバ40から受信することによって、隠れ顔画像又は補完顔画像を取得する。 In step S37, the receiving unit 21C acquires the hidden face image or the complementary face image by receiving the hidden face image or the complementary face image from the server 40.
 ステップS38において、表示制御部24は、受信側ユーザ端末20Cにおける送信側ユーザの顔画像の表示を制御する。例えば、表示制御部24は、受信側ユーザ端末20CにステップS37の処理で取得された隠れ顔画像又は補完顔画像を表示させる。表示制御部24は、受信側ユーザが装着しているヘッドマウントディスプレイD上に、送信側ユーザの補完顔画像又は隠れ顔画像を表示させてもよい。 In step S38, the display control unit 24 controls the display of the face image of the transmitting user on the receiving user terminal 20C. For example, the display control unit 24 causes the receiving-side user terminal 20C to display the hidden face image or the complementary face image acquired in the process of step S37. The display control unit 24 may display the complementary face image or the hidden face image of the sending user on the head-mounted display D worn by the receiving user.
 以上説明したオンライン対話支援システム1Cによれば、オンライン対話支援システム1Aと同様の作用効果を奏する。すなわち、受信側ユーザは、補完顔画像が受信側ユーザ端末20Cに表示されていることを確認することによって、送信側ユーザが本人であることを確認することが可能となる。よって、送信側ユーザのなりすましを防止することができる。また、オンライン対話支援システム1Cによれば、受信側ユーザ端末20Cの処理負荷を抑制することができる。さらに、複数人(すなわち複数のユーザの端末間)でオンライン対話を行う場合に、送信側ユーザ端末10Cの機材及び性能等によらず容易に時刻同期が可能となる。 According to the online dialogue support system 1C described above, the same effects as the online dialogue support system 1A are achieved. That is, the receiving user can confirm that the transmitting user is the person himself/herself by confirming that the complementary face image is displayed on the receiving user terminal 20C. Therefore, it is possible to prevent spoofing of the user on the sending side. Further, according to the online dialogue support system 1C, it is possible to suppress the processing load of the receiving user terminal 20C. Furthermore, when a plurality of people (that is, between terminals of a plurality of users) conduct online dialogue, time synchronization can be easily performed regardless of the equipment, performance, etc. of the transmitting user terminal 10C.
(変形例)
 上記実施形態では、送信側ユーザの顔の一部の領域がヘッドマウントディスプレイDによって隠されている例を説明したが、マスク等によって隠されていてもよい。上記実施形態では、隠れ顔画像が動画像形式である例を説明したが、受信側ユーザ端末20上に、静止画としての送信側ユーザの顔画像を表示する場合には、隠れ顔画像は静止画像であってもよい。隠れ顔画像は、顔認証に用いられる静止画と、顔認証後の動画像形式とで個別に取得されてもよい。隠れ顔画像は、例えばモザイク等によって一部の領域が隠された画像であってもよい。また、上記実施形態では、予め記憶されたヘッドマウントディスプレイDの形状に基づいて一部の領域Rを特定しているが、生成部22が送信側ユーザ又は受信側ユーザによって一部の領域Rを指定する入力操作等を受け付けることにより、一部の領域Rを特定してもよい。さらに、上記実施形態では、認証が失敗した場合に、送信側ユーザの隠れ顔画像を表示する例を説明したが、表示制御部24が認証の失敗を示すエラーメッセージを表示してもよいし、オンライン対話を終了してもよい。
(Modification)
In the above embodiment, an example in which a partial area of the face of the user on the sending side is hidden by the head-mounted display D has been described, but it may be hidden by a mask or the like. In the above embodiment, an example in which the hidden face image is in the form of a moving image has been described. It may be an image. Hidden face images may be obtained separately in the form of a still image used for face authentication and a moving image format after face authentication. The hidden face image may be an image in which a part of the area is hidden by, for example, mosaic. Further, in the above-described embodiment, the partial area R is specified based on the shape of the head-mounted display D stored in advance. A part of the region R may be specified by receiving an input operation or the like to specify. Furthermore, in the above-described embodiment, an example of displaying the hidden face image of the sending user when authentication fails has been described. You may end the online dialogue.
 なお、上記実施形態の説明に用いたブロック図は、機能単位のブロックを示している。これらの機能ブロック(構成部)は、ハードウェア及びソフトウェアの少なくとも一方の任意の組み合わせによって実現される。また、各機能ブロックの実現方法は特に限定されない。すなわち、各機能ブロックは、物理的又は論理的に結合した1つの装置を用いて実現されてもよいし、物理的又は論理的に分離した2つ以上の装置を直接的又は間接的に(例えば、有線、無線などを用いて)接続し、これら複数の装置を用いて実現されてもよい。機能ブロックは、上記1つの装置又は上記複数の装置にソフトウェアを組み合わせて実現されてもよい。 It should be noted that the block diagrams used in the description of the above embodiments show blocks for each function. These functional blocks (components) are realized by any combination of at least one of hardware and software. Also, the method of implementing each functional block is not particularly limited. That is, each functional block may be implemented using one device that is physically or logically coupled, or directly or indirectly using two or more devices that are physically or logically separated (e.g. , wired, wireless, etc.) and may be implemented using these multiple devices. A functional block may be implemented by combining software in the one device or the plurality of devices.
 機能には、判断、決定、判定、計算、算出、処理、導出、調査、探索、確認、受信、送信、出力、アクセス、解決、選択、選定、確立、比較、想定、期待、見做し、報知(broadcasting)、通知(notifying)、通信(communicating)、転送(forwarding)、構成(configuring)、再構成(reconfiguring)、割り当て(allocating、mapping)、割り振り(assigning)などがあるが、これらに限られない。 Functions include judging, determining, determining, calculating, calculating, processing, deriving, investigating, searching, checking, receiving, transmitting, outputting, accessing, resolving, selecting, choosing, establishing, comparing, assuming, expecting, assuming, Broadcasting, notifying, communicating, forwarding, configuring, reconfiguring, allocating, mapping, assigning, etc. can't
 例えば、本開示の一実施の形態における送信側ユーザ端末10、受信側ユーザ端末20及びサーバ40は、本開示の情報処理方法を行うコンピュータとして機能してもよい。図9は、本開示の一実施の形態に係る送信側ユーザ端末10、受信側ユーザ端末20及びサーバ40に共通のハードウェア構成の一例を示す図である。送信側ユーザ端末10、受信側ユーザ端末20及びサーバ40の各々は、物理的には、プロセッサ1001、メモリ1002、ストレージ1003、通信装置1004、入力装置1005、出力装置1006、バス1007などを含むコンピュータ装置として構成されてもよい。 For example, the transmitting user terminal 10, the receiving user terminal 20, and the server 40 according to an embodiment of the present disclosure may function as computers that perform the information processing method of the present disclosure. FIG. 9 is a diagram showing an example of a hardware configuration common to the transmitting user terminal 10, the receiving user terminal 20, and the server 40 according to an embodiment of the present disclosure. Each of the sender user terminal 10, the receiver user terminal 20, and the server 40 is physically a computer including a processor 1001, a memory 1002, a storage 1003, a communication device 1004, an input device 1005, an output device 1006, a bus 1007, and the like. It may be configured as a device.
 なお、以下の説明では、「装置」という文言は、回路、デバイス、ユニットなどに読み替えることができる。送信側ユーザ端末10、受信側ユーザ端末20及びサーバ40のハードウェア構成は、図9に示した各装置を1つ又は複数含むように構成されてもよいし、一部の装置を含まずに構成されてもよい。 In the following explanation, the term "apparatus" can be read as a circuit, device, unit, or the like. The hardware configuration of the sender user terminal 10, the receiver user terminal 20, and the server 40 may be configured to include one or more of the devices shown in FIG. may be configured.
 送信側ユーザ端末10、受信側ユーザ端末20及びサーバ40における各機能は、プロセッサ1001、メモリ1002などのハードウェア上に所定のソフトウェア(プログラム)を読み込ませることによって、プロセッサ1001が演算を行い、通信装置1004による通信を制御したり、メモリ1002及びストレージ1003におけるデータの読み出し及び書き込みの少なくとも一方を制御したりすることによって実現される。 Each function of the sending user terminal 10, the receiving user terminal 20, and the server 40 is implemented by causing the processor 1001 to perform calculations and communication by loading predetermined software (programs) onto hardware such as the processor 1001 and memory 1002. It is realized by controlling communication by the device 1004 and controlling at least one of data reading and writing in the memory 1002 and the storage 1003 .
 プロセッサ1001は、例えば、オペレーティングシステムを動作させてコンピュータ全体を制御する。プロセッサ1001は、周辺装置とのインターフェース、制御装置、演算装置、レジスタなどを含む中央処理装置(CPU:Central Processing Unit)によって構成されてもよい。 The processor 1001, for example, operates an operating system and controls the entire computer. The processor 1001 may be configured by a central processing unit (CPU) including an interface with peripheral devices, a control device, an arithmetic device, registers, and the like.
 また、プロセッサ1001は、プログラム(プログラムコード)、ソフトウェアモジュール、データなどを、ストレージ1003及び通信装置1004の少なくとも一方からメモリ1002に読み出し、これらに従って各種の処理を実行する。プログラムとしては、上述の実施の形態において説明した動作の少なくとも一部をコンピュータに実行させるプログラムが用いられる。例えば、送信側ユーザ端末10、受信側ユーザ端末20及びサーバ40の各機能部(例えば、生成部22等)は、メモリ1002に格納され、プロセッサ1001において動作する制御プログラムによって実現されてもよく、他の機能ブロックについても同様に実現されてもよい。上述の各種処理は、1つのプロセッサ1001によって実行される旨を説明してきたが、2以上のプロセッサ1001により同時又は逐次に実行されてもよい。プロセッサ1001は、1以上のチップによって実装されてもよい。なお、プログラムは、電気通信回線を介してネットワークから送信されても良い。 Also, the processor 1001 reads programs (program codes), software modules, data, etc. from at least one of the storage 1003 and the communication device 1004 to the memory 1002, and executes various processes according to them. As the program, a program that causes a computer to execute at least part of the operations described in the above embodiments is used. For example, each functional unit (eg, generating unit 22, etc.) of the transmitting user terminal 10, the receiving user terminal 20, and the server 40 may be stored in the memory 1002 and implemented by a control program that operates on the processor 1001. Other functional blocks may be similarly implemented. Although it has been explained that the above-described various processes are executed by one processor 1001, they may be executed simultaneously or sequentially by two or more processors 1001. FIG. Processor 1001 may be implemented by one or more chips. Note that the program may be transmitted from a network via an electric communication line.
 メモリ1002は、コンピュータ読み取り可能な記録媒体であり、例えば、ROM(Read Only Memory)、EPROM(Erasable Programmable ROM)、EEPROM(Electrically Erasable Programmable ROM)、RAM(Random Access Memory)などの少なくとも1つによって構成されてもよい。メモリ1002は、レジスタ、キャッシュ、メインメモリ(主記憶装置)などと呼ばれてもよい。メモリ1002は、本開示の一実施の形態に係る情報処理方法を実施するために実行可能なプログラム(プログラムコード)、ソフトウェアモジュールなどを保存することができる。 The memory 1002 is a computer-readable recording medium, and is composed of at least one of, for example, ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), RAM (Random Access Memory), etc. may be The memory 1002 may also be called a register, cache, main memory (main storage device), or the like. The memory 1002 can store executable programs (program codes), software modules, etc. for implementing an information processing method according to an embodiment of the present disclosure.
 ストレージ1003は、コンピュータ読み取り可能な記録媒体であり、例えば、CD-ROM(Compact Disc ROM)などの光ディスク、ハードディスクドライブ、フレキシブルディスク、光磁気ディスク(例えば、コンパクトディスク、デジタル多用途ディスク、Blu-ray(登録商標)ディスク)、スマートカード、フラッシュメモリ(例えば、カード、スティック、キードライブ)、フロッピー(登録商標)ディスク、磁気ストリップなどの少なくとも1つによって構成されてもよい。ストレージ1003は、補助記憶装置と呼ばれてもよい。上述の記憶媒体は、例えば、メモリ1002及びストレージ1003の少なくとも一方を含むデータベース、サーバその他の適切な媒体であってもよい。 The storage 1003 is a computer-readable recording medium, for example, an optical disc such as a CD-ROM (Compact Disc ROM), a hard disk drive, a flexible disc, a magneto-optical disc (for example, a compact disc, a digital versatile disc, a Blu-ray disk), smart card, flash memory (eg, card, stick, key drive), floppy disk, magnetic strip, and/or the like. Storage 1003 may also be called an auxiliary storage device. The storage medium described above may be, for example, a database, server, or other suitable medium including at least one of memory 1002 and storage 1003 .
 通信装置1004は、有線ネットワーク及び無線ネットワークの少なくとも一方を介してコンピュータ間の通信を行うためのハードウェア(送受信デバイス)であり、例えばネットワークデバイス、ネットワークコントローラ、ネットワークカード、通信モジュールなどともいう。 The communication device 1004 is hardware (transmitting/receiving device) for communicating between computers via at least one of a wired network and a wireless network, and is also called a network device, a network controller, a network card, a communication module, or the like.
 入力装置1005は、外部からの入力を受け付ける入力デバイス(例えば、キーボード、マウス、マイクロフォン、スイッチ、ボタン、センサなど)である。出力装置1006は、外部への出力を実施する出力デバイス(例えば、ディスプレイ、スピーカー、LEDランプなど)である。なお、入力装置1005及び出力装置1006は、一体となった構成(例えば、タッチパネル)であってもよい。 The input device 1005 is an input device (for example, keyboard, mouse, microphone, switch, button, sensor, etc.) that receives input from the outside. The output device 1006 is an output device (eg, display, speaker, LED lamp, etc.) that outputs to the outside. Note that the input device 1005 and the output device 1006 may be integrated (for example, a touch panel).
 また、プロセッサ1001、メモリ1002などの各装置は、情報を通信するためのバス1007によって接続される。バス1007は、単一のバスを用いて構成されてもよいし、装置間ごとに異なるバスを用いて構成されてもよい。 Each device such as the processor 1001 and the memory 1002 is connected by a bus 1007 for communicating information. The bus 1007 may be configured using a single bus, or may be configured using different buses between devices.
 また、送信側ユーザ端末10、受信側ユーザ端末20及びサーバ40は、マイクロプロセッサ、デジタル信号プロセッサ(DSP:Digital Signal Processor)、ASIC(Application Specific Integrated Circuit)、PLD(Programmable Logic Device)、FPGA(Field Programmable Gate Array)などのハードウェアを含んで構成されてもよく、当該ハードウェアにより、各機能ブロックの一部又は全てが実現されてもよい。例えば、プロセッサ1001は、これらのハードウェアの少なくとも1つを用いて実装されてもよい。 In addition, the transmitting side user terminal 10, the receiving side user terminal 20, and the server 40 include a microprocessor, a digital signal processor (DSP), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), an FPGA (Field Programmable Gate Array) may be included, and the hardware may implement part or all of each functional block. For example, processor 1001 may be implemented using at least one of these pieces of hardware.
 以上、本実施形態について詳細に説明したが、当業者にとっては、本実施形態が本明細書中に説明した実施形態に限定されるものではないということは明らかである。本実施形態は、特許請求の範囲の記載により定まる本発明の趣旨及び範囲を逸脱することなく修正及び変更態様として実施することができる。したがって、本明細書の記載は、例示説明を目的とするものであり、本実施形態に対して何ら制限的な意味を有するものではない。 Although the present embodiment has been described in detail above, it is obvious to those skilled in the art that the present embodiment is not limited to the embodiments described herein. This embodiment can be implemented as modifications and changes without departing from the spirit and scope of the present invention defined by the description of the claims. Therefore, the description in this specification is for the purpose of illustration and explanation, and does not have any restrictive meaning with respect to the present embodiment.
 本開示において説明した各態様/実施形態の処理手順、シーケンス、フローチャートなどは、矛盾の無い限り、順序を入れ替えてもよい。例えば、本開示において説明した方法については、例示的な順序を用いて様々なステップの要素を提示しており、提示した特定の順序に限定されない。 The order of the processing procedures, sequences, flowcharts, etc. of each aspect/embodiment described in the present disclosure may be changed as long as there is no contradiction. For example, the methods described in this disclosure present elements of the various steps using a sample order, and are not limited to the specific order presented.
 入出力された情報等は特定の場所(例えば、メモリ)に保存されてもよいし、管理テーブルを用いて管理してもよい。入出力される情報等は、上書き、更新、又は追記され得る。出力された情報等は削除されてもよい。入力された情報等は他の装置へ送信されてもよい。 Input/output information may be stored in a specific location (for example, memory) or managed using a management table. Input/output information and the like can be overwritten, updated, or appended. The output information and the like may be deleted. The entered information and the like may be transmitted to another device.
 判定は、1ビットで表される値(0か1か)によって行われてもよいし、真偽値(Boolean:true又はfalse)によって行われてもよいし、数値の比較(例えば、所定の値との比較)によって行われてもよい。 The determination may be made by a value represented by one bit (0 or 1), by a true/false value (Boolean: true or false), or by numerical comparison (for example, a predetermined value).
 本開示において説明した各態様/実施形態は単独で用いてもよいし、組み合わせて用いてもよいし、実行に伴って切り替えて用いてもよい。また、所定の情報の通知(例えば、「Xであること」の通知)は、明示的に行うものに限られず、暗黙的(例えば、当該所定の情報の通知を行わない)ことによって行われてもよい。 Each aspect/embodiment described in the present disclosure may be used alone, may be used in combination, or may be used by switching along with execution. In addition, the notification of predetermined information (for example, notification of “being X”) is not limited to being performed explicitly, but may be performed implicitly (for example, not notifying the predetermined information). good too.
 ソフトウェアは、ソフトウェア、ファームウェア、ミドルウェア、マイクロコード、ハードウェア記述言語と呼ばれるか、他の名称で呼ばれるかを問わず、命令、命令セット、コード、コードセグメント、プログラムコード、プログラム、サブプログラム、ソフトウェアモジュール、アプリケーション、ソフトウェアアプリケーション、ソフトウェアパッケージ、ルーチン、サブルーチン、オブジェクト、実行可能ファイル、実行スレッド、手順、機能などを意味するよう広く解釈されるべきである。 Software, whether referred to as software, firmware, middleware, microcode, hardware description language or otherwise, includes instructions, instruction sets, code, code segments, program code, programs, subprograms, and software modules. , applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, and the like.
 また、ソフトウェア、命令、情報などは、伝送媒体を介して送受信されてもよい。例えば、ソフトウェアが、有線技術(同軸ケーブル、光ファイバケーブル、ツイストペア、デジタル加入者回線(DSL:Digital Subscriber Line)など)及び無線技術(赤外線、マイクロ波など)の少なくとも一方を使用してウェブサイト、サーバ、又は他のリモートソースから送信される場合、これらの有線技術及び無線技術の少なくとも一方は、伝送媒体の定義内に含まれる。 In addition, software, instructions, information, etc. may be transmitted and received via a transmission medium. For example, the software uses at least one of wired technology (coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), etc.) and wireless technology (infrared, microwave, etc.) to website, Wired and/or wireless technologies are included within the definition of transmission medium when sent from a server or other remote source.
 本開示において説明した情報、信号などは、様々な異なる技術のいずれかを使用して表されてもよい。例えば、上記の説明全体に渡って言及され得るデータ、命令、コマンド、情報、信号、ビット、シンボル、チップなどは、電圧、電流、電磁波、磁界若しくは磁性粒子、光場若しくは光子、又はこれらの任意の組み合わせによって表されてもよい。 The information, signals, etc. described in this disclosure may be represented using any of a variety of different technologies. For example, data, instructions, commands, information, signals, bits, symbols, chips, etc. that may be referred to throughout the above description may refer to voltages, currents, electromagnetic waves, magnetic fields or magnetic particles, light fields or photons, or any of these. may be represented by a combination of
 また、本開示において説明した情報、パラメータなどは、絶対値を用いて表されてもよいし、所定の値からの相対値を用いて表されてもよいし、対応する別の情報を用いて表されてもよい。 In addition, the information, parameters, etc. described in the present disclosure may be expressed using absolute values, may be expressed using relative values from a predetermined value, or may be expressed using other corresponding information. may be represented.
 上述したパラメータに使用する名称はいかなる点においても限定的な名称ではない。さらに、これらのパラメータを使用する数式等は、本開示で明示的に開示したものと異なる場合もある。様々な情報要素は、あらゆる好適な名称によって識別できるので、これらの様々な情報要素に割り当てている様々な名称は、いかなる点においても限定的な名称ではない。 The names used for the parameters described above are not restrictive names in any respect. Further, the formulas, etc., using these parameters may differ from those expressly disclosed in this disclosure. The various names assigned to these various information elements are not limiting names in any way, as the various information elements can be identified by any suitable name.
 本開示において使用する「に基づいて」という記載は、別段に明記されていない限り、「のみに基づいて」を意味しない。言い換えれば、「に基づいて」という記載は、「のみに基づいて」と「に少なくとも基づいて」の両方を意味する。 The term "based on" as used in this disclosure does not mean "based only on" unless otherwise specified. In other words, the phrase "based on" means both "based only on" and "based at least on."
 本開示において使用する「第1の」、「第2の」などの呼称を使用した要素へのいかなる参照も、それらの要素の量又は順序を全般的に限定しない。これらの呼称は、2つ以上の要素間を区別する便利な方法として本開示において使用され得る。したがって、第1及び第2の要素への参照は、2つの要素のみが採用され得ること、又は何らかの形で第1の要素が第2の要素に先行しなければならないことを意味しない。 Any reference to elements using the "first," "second," etc. designations used in this disclosure does not generally limit the quantity or order of those elements. These designations may be used in this disclosure as a convenient method of distinguishing between two or more elements. Thus, reference to a first and second element does not imply that only two elements can be employed or that the first element must precede the second element in any way.
 本開示において、「含む(include)」、「含んでいる(including)」及びそれらの変形が使用されている場合、これらの用語は、用語「備える(comprising)」と同様に、包括的であることが意図される。さらに、本開示において使用されている用語「又は(or)」は、排他的論理和ではないことが意図される。 Where "include," "including," and variations thereof are used in this disclosure, these terms are inclusive, as is the term "comprising." is intended. Furthermore, the term "or" as used in this disclosure is not intended to be an exclusive OR.
 本開示において、例えば、英語でのa, an及びtheのように、翻訳により冠詞が追加された場合、本開示は、これらの冠詞の後に続く名詞が複数形であることを含んでもよい。 In this disclosure, if articles are added by translation, such as a, an, and the in English, the disclosure may include that the nouns following these articles are plural.
 本開示において、「AとBが異なる」という用語は、「AとBが互いに異なる」ことを意味してもよい。なお、当該用語は、「AとBがそれぞれCと異なる」ことを意味してもよい。「離れる」、「結合される」などの用語も、「異なる」と同様に解釈されてもよい。 In the present disclosure, the term "A and B are different" may mean "A and B are different from each other." The term may also mean that "A and B are different from C". Terms such as "separate," "coupled," etc. may also be interpreted in the same manner as "different."
 1,1A,1B,1C…オンライン対話支援システム、10,10A,10B,10C…送信側ユーザ端末、11…撮影部、12,12B,12C…送信部、20,20A,20B,20C…受信側ユーザ端末、21,21B,21C…受信部、22…生成部、23…認証部、24…表示制御部、30…記憶部、40…サーバ、41…画像受信部、42…画像送信部、D…ヘッドマウントディスプレイ、R…一部の領域。

 
1, 1A, 1B, 1C... Online dialogue support system 10, 10A, 10B, 10C... Sending side user terminal 11... Photographing unit 12, 12B, 12C... Sending unit 20, 20A, 20B, 20C... Receiving side User terminal 21, 21B, 21C Reception unit 22 Generation unit 23 Authentication unit 24 Display control unit 30 Storage unit 40 Server 41 Image reception unit 42 Image transmission unit D ... head-mounted display, R ... part of the area.

Claims (6)

  1.  送信側ユーザの端末と受信側ユーザの端末との間のオンライン対話を支援するオンライン対話支援システムであって、
     前記送信側ユーザの顔を示す基準顔画像を記憶する記憶部と、
     前記送信側ユーザの顔の一部の領域が隠された状態の前記送信側ユーザの顔を示す隠れ顔画像を取得する取得部と、
     前記隠れ顔画像の前記一部の領域を補完し、補完顔画像を生成する生成部と、
     前記基準顔画像及び前記補完顔画像に基づく認証を実行する認証部と、
     前記認証が成功した場合に、前記受信側ユーザの端末に前記補完顔画像を表示させる表示制御部と、
     を備えるオンライン対話支援システム。
    An online dialogue support system for supporting online dialogue between a terminal of a sending user and a terminal of a receiving user,
    a storage unit that stores a reference face image representing the face of the transmitting user;
    an acquisition unit that acquires a hidden face image showing the face of the sending user with a partial area of the face of the sending user hidden;
    a generation unit that complements the partial area of the hidden face image to generate a complementary face image;
    an authentication unit that performs authentication based on the reference face image and the complementary face image;
    a display control unit that displays the complementary face image on the terminal of the receiving user when the authentication is successful;
    an online dialogue support system.
  2.  前記表示制御部は、前記認証が失敗した場合に、前記受信側ユーザの端末に前記隠れ顔画像を表示させる、
     請求項1に記載のオンライン対話支援システム。
    The display control unit causes the terminal of the receiving user to display the hidden face image when the authentication fails.
    The online dialogue support system according to claim 1.
  3.  前記一部の領域は、前記送信側ユーザが装着しているヘッドマウントディスプレイによって隠されており、
     前記生成部は、前記隠れ顔画像のうち前記ヘッドマウントディスプレイに対応する部分を他の画像に置換することによって、前記一部の領域を補完する、
     請求項1又は2に記載のオンライン対話支援システム。
    the partial area is hidden by a head-mounted display worn by the transmitting user;
    The generation unit complements the partial region by replacing a portion of the hidden face image corresponding to the head-mounted display with another image.
    3. The online dialogue support system according to claim 1 or 2.
  4.  前記生成部は、予め記憶された前記ヘッドマウントディスプレイの形状に基づいて、前記隠れ顔画像のうち前記ヘッドマウントディスプレイに対応する領域を特定することにより、前記一部の領域を特定する、
     請求項3に記載のオンライン対話支援システム。
    The generating unit identifies the partial area by identifying an area corresponding to the head-mounted display in the hidden face image based on the pre-stored shape of the head-mounted display.
    4. The online dialogue support system according to claim 3.
  5.  前記生成部は、複数の前記送信側ユーザの顔画像と共に前記送信側ユーザとは異なる複数のユーザの顔画像を教師データとして用いた機械学習によって、前記送信側ユーザの顔の一部を示す画像を入力して前記送信側ユーザの顔の他の部分の推定結果を出力するように構成されたモデルを予め用意しておき、前記隠れ顔画像のうち前記一部の領域を除いた領域を前記モデルに入力し、前記モデルからの出力結果に基づいて前記一部の領域を補完する、
     請求項1~4のいずれか一項に記載のオンライン対話支援システム。
    The generation unit generates an image showing a part of the face of the sending user by machine learning using facial images of a plurality of users different from the sending user as teacher data together with the face images of the sending user. is prepared in advance, and a model configured to output the estimation result of the other part of the face of the transmitting user is input, and the area of the hidden face image excluding the part of the area is the input to a model and interpolate the partial region based on the output results from the model;
    The online dialogue support system according to any one of claims 1-4.
  6.  前記生成部は、複数の前記送信側ユーザの顔画像と共に前記送信側ユーザとは異なる複数のユーザの顔画像を教師データとして用いた機械学習によって、前記一部の領域のうち一部を示す第一範囲の画像を入力して、前記一部の領域のうち前記第一範囲とは異なる部分を示す第二範囲の推定結果を出力するように構成されたモデルを予め用意しておき、前記一部の領域のうち前記送信側ユーザの顔の一部を示す画像を前記モデルに入力し、前記入力した画像及び前記モデルからの出力結果に基づいて前記一部の領域を補完する、
     請求項1~4のいずれか一項に記載のオンライン対話支援システム。

     
    The generating unit performs machine learning using facial images of a plurality of users different from the transmitting user as training data together with the facial images of the transmitting users. preparing in advance a model that is configured to input an image of one range and output an estimation result of a second range indicating a portion of the partial region that is different from the first range; inputting an image showing a part of the face of the sending user out of the partial area into the model, and interpolating the partial area based on the input image and the output result from the model;
    The online dialogue support system according to any one of claims 1-4.

PCT/JP2022/030319 2021-09-10 2022-08-08 Online dialogue support system WO2023037812A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2023546844A JPWO2023037812A1 (en) 2021-09-10 2022-08-08

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021-147550 2021-09-10
JP2021147550 2021-09-10

Publications (1)

Publication Number Publication Date
WO2023037812A1 true WO2023037812A1 (en) 2023-03-16

Family

ID=85507535

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/030319 WO2023037812A1 (en) 2021-09-10 2022-08-08 Online dialogue support system

Country Status (2)

Country Link
JP (1) JPWO2023037812A1 (en)
WO (1) WO2023037812A1 (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1196366A (en) * 1997-09-19 1999-04-09 Nippon Telegr & Teleph Corp <Ntt> Method and device for synthesizing facial image of person wearing head mount display
JP2000004395A (en) * 1998-06-15 2000-01-07 Sony Corp Image processor for video camera and head mounted display
JP2007148872A (en) * 2005-11-29 2007-06-14 Mitsubishi Electric Corp Image authentication apparatus
JP2009135705A (en) * 2007-11-29 2009-06-18 Kyocera Corp Portable terminal
JP2015142193A (en) * 2014-01-28 2015-08-03 株式会社リコー transmission terminal and program
JP2020507221A (en) * 2017-02-03 2020-03-05 ベステル エレクトロニク サナイー ベ ティカレト エー.エス. Improved method and system for video conferencing using HMD
CN112597867A (en) * 2020-12-17 2021-04-02 佛山科学技术学院 Face recognition method and system for mask, computer equipment and storage medium
JP2021114324A (en) * 2016-11-11 2021-08-05 マジック リープ, インコーポレイテッドMagic Leap, Inc. Periocular and audio synthesis of full face image

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1196366A (en) * 1997-09-19 1999-04-09 Nippon Telegr & Teleph Corp <Ntt> Method and device for synthesizing facial image of person wearing head mount display
JP2000004395A (en) * 1998-06-15 2000-01-07 Sony Corp Image processor for video camera and head mounted display
JP2007148872A (en) * 2005-11-29 2007-06-14 Mitsubishi Electric Corp Image authentication apparatus
JP2009135705A (en) * 2007-11-29 2009-06-18 Kyocera Corp Portable terminal
JP2015142193A (en) * 2014-01-28 2015-08-03 株式会社リコー transmission terminal and program
JP2021114324A (en) * 2016-11-11 2021-08-05 マジック リープ, インコーポレイテッドMagic Leap, Inc. Periocular and audio synthesis of full face image
JP2020507221A (en) * 2017-02-03 2020-03-05 ベステル エレクトロニク サナイー ベ ティカレト エー.エス. Improved method and system for video conferencing using HMD
CN112597867A (en) * 2020-12-17 2021-04-02 佛山科学技术学院 Face recognition method and system for mask, computer equipment and storage medium

Also Published As

Publication number Publication date
JPWO2023037812A1 (en) 2023-03-16

Similar Documents

Publication Publication Date Title
TWI751161B (en) Terminal equipment, smart phone, authentication method and system based on face recognition
US20210311558A1 (en) Mixed reality display system and mixed reality display terminal
WO2020203999A1 (en) Communication assistance system, communication assistance method, and image control program
CN110716645A (en) Augmented reality data presentation method and device, electronic equipment and storage medium
CN102780893B (en) Image processing apparatus and control method thereof
WO2019130991A1 (en) Information processing device
US11282481B2 (en) Information processing device
EP3513326B1 (en) Methods, systems, and media for detecting stereoscopic videos by generating fingerprints for multiple portions of a video frame
JP2018005477A (en) Display control program, display control method and display control device
US20190311796A1 (en) Color analysis and control using an electronic mobile device transparent display screen integral with the use of augmented reality glasses
CN111353336B (en) Image processing method, device and equipment
CN104867112B (en) Photo processing method and device
JP2022524672A (en) Information recognition methods and devices, systems, electronic devices, storage media and computer programs
CN107977636B (en) Face detection method and device, terminal and storage medium
JP2009206924A (en) Information processing apparatus, information processing system and information processing program
CN111881740A (en) Face recognition method, face recognition device, electronic equipment and medium
CN111290722A (en) Screen sharing method, device and system, electronic equipment and storage medium
WO2023037812A1 (en) Online dialogue support system
US10783666B2 (en) Color analysis and control using an electronic mobile device transparent display screen integral with the use of augmented reality glasses
JP7094759B2 (en) System, information processing method and program
CN111610886A (en) Method and device for adjusting brightness of touch screen and computer readable storage medium
JP6404526B2 (en) Captured image sharing system, captured image sharing method, and program
WO2023204104A1 (en) Virtual space presenting device
CN117041670B (en) Image processing method and related equipment
WO2023120472A1 (en) Avatar generation system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22867134

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023546844

Country of ref document: JP