WO2023037812A1 - オンライン対話支援システム - Google Patents

オンライン対話支援システム Download PDF

Info

Publication number
WO2023037812A1
WO2023037812A1 PCT/JP2022/030319 JP2022030319W WO2023037812A1 WO 2023037812 A1 WO2023037812 A1 WO 2023037812A1 JP 2022030319 W JP2022030319 W JP 2022030319W WO 2023037812 A1 WO2023037812 A1 WO 2023037812A1
Authority
WO
WIPO (PCT)
Prior art keywords
face image
user
face
hidden
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2022/030319
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
桃子 阿部
幹生 岩村
洋平 藤本
禎篤 加藤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NTT Docomo Inc
Original Assignee
NTT Docomo Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NTT Docomo Inc filed Critical NTT Docomo Inc
Priority to JP2023546844A priority Critical patent/JP7588243B2/ja
Publication of WO2023037812A1 publication Critical patent/WO2023037812A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/00Two-dimensional [2D] image generation
    • G06T11/80Creating or modifying a manually drawn or painted image using a manual input device, e.g. mouse, light pen, direction keys on keyboard
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast

Definitions

  • One aspect of the present invention relates to an online dialogue support system.
  • Patent Document 1 a mask pattern of an HMD region to be replaced is created from a face moving image with a head-mounted display (HMD), and replacement is performed using a region corresponding to the mask pattern in a still face image without an HMD, A device for synthesizing moving face images without an HMD is disclosed.
  • HMD head-mounted display
  • an object of one aspect of the present invention to provide an online dialogue support system capable of preventing spoofing of a user whose face is partially hidden during online dialogue.
  • An online dialogue support system is an online dialogue support system that supports an online dialogue between a terminal of a transmitting user and a terminal of a receiving user, and comprises a reference face indicating the face of the transmitting user.
  • a storage unit that stores an image; an acquisition unit that acquires a hidden face image showing a face of a sending user whose face is partially hidden;
  • a generation unit that complements and generates a complementary face image, an authentication unit that performs authentication based on the reference face image and the complementary face image, and a display that displays the complementary face image on the terminal of the receiving user when the authentication is successful. and a control unit.
  • a complementary face image is generated by interpolating the partial area. Authentication is then performed based on the reference face image, which is the original (complete) face image of the sending user, and the complementary face image. If the authentication is successful, the complementary facial image is displayed on the terminal of the receiving user.
  • the receiving user can confirm that the transmitting user is the person himself/herself by confirming that the complementary face image is displayed on the receiving user's terminal. Become. Therefore, it is possible to prevent spoofing of the user on the sending side.
  • an online dialogue support system capable of preventing spoofing of a user whose face is partially hidden in online dialogue.
  • FIG. 1 is a diagram showing an overview of an online dialogue support system according to one embodiment
  • FIG. 1 is a block diagram showing an example of a functional configuration of an online dialogue support system
  • FIG. It is a figure which shows typically the complementing process of a face image.
  • 4 is a sequence diagram showing an example of the operation of the online dialogue support system according to the first embodiment
  • FIG. FIG. 11 is a block diagram showing an example of the functional configuration of an online dialogue support system according to the second embodiment
  • FIG. FIG. 11 is a sequence diagram showing an example of the operation of the online dialogue support system according to the second embodiment
  • FIG. 12 is a block diagram showing an example of the functional configuration of an online dialogue support system according to the third embodiment
  • FIG. FIG. 14 is a sequence diagram showing an example of the operation of the online dialogue support system according to the third embodiment
  • It is a figure which shows an example of the hardware constitutions relevant to an online dialogue support system.
  • FIG. 1 is a diagram showing an overview of an online dialogue support system 1 according to one embodiment.
  • the online dialogue support system 1 is a computer system that supports online dialogue between terminals of a plurality of users.
  • a face image representing the user's face is photographed, and the face image is transmitted and received.
  • the face image may be actual 3D data, or may show the user's whole body.
  • the user who sends the face image of himself is called the sending user
  • the user who receives the face image of the sending user is called the receiving user.
  • the transmitting side user and the receiving side user are not fixedly set for each user.
  • a sending user becomes a receiving user when he/she receives another user's face image.
  • the receiving user becomes the transmitting user when transmitting his or her own face image to another user.
  • the online dialogue support system 1 comprises a sender's user terminal 10 and a receiver's user terminal 20 .
  • a transmitting user terminal 10 and a receiving user terminal 20 are connected via a communication network N so as to be able to communicate with each other.
  • the configuration of the communication network N is not limited.
  • the communication network N may include the Internet, or may include an intranet.
  • one transmitting-side user terminal 10 and one receiving-side user terminal 20 are shown, but the numbers are not limited to this.
  • the online dialogue support system 1 may comprise multiple sender user terminals 10 and multiple receiver user terminals 20 . That is, the online dialogue support system 1 can be applied as a system for conducting online dialogues among many people.
  • the sending user terminal 10 is a terminal used by the sending user.
  • the type and configuration of the transmitting user terminal 10 are not limited.
  • the sending-side user terminal 10 may be, for example, a mobile terminal such as a high-performance mobile phone (smartphone), a tablet terminal, a wearable terminal, a laptop personal computer, or a mobile phone.
  • the sender user terminal 10 may be a stationary terminal such as a desktop personal computer.
  • the sending user terminal 10 may be a user terminal possessed by each sending user as described above, or may be a server device configured to be able to communicate with each sending user's user terminal. good.
  • the sender user terminal 10 may be configured by a combination of a user terminal and a server device. That is, the sender user terminal 10 may be configured by a single computer device, or may be configured by a plurality of computer devices that can communicate with each other.
  • the receiving user terminal 20 is a terminal used by the receiving user.
  • the type and configuration of the receiving user terminal 20 are the same as the type and configuration of the transmitting user terminal 10 .
  • a receiving user can be a sending user and a sending user can be a receiving user. Therefore, when the receiving user becomes the transmitting user, the receiving user terminal 20 functions as the transmitting user terminal 10 . Also, when the transmitting user becomes the receiving user, the transmitting user terminal 10 functions as the receiving user terminal 20 .
  • the sending user wears a head-mounted display D on his head.
  • the form of the head mounted display D is not limited to a specific form.
  • the head-mounted display D can take various forms such as a goggle type, a glasses type (glasses type), a hat type, and the like.
  • the head mounted display D is, for example, smart glasses such as XR (eXtended Reality) glasses.
  • the head-mounted display D is AR glasses that have the function of providing the user with augmented reality (AR). That is, the head-mounted display D is a see-through type of glass configured so that the user can visually recognize the real space (outside world) as well as the virtual space.
  • AR augmented reality
  • the head-mounted display D is not limited to the above, and may be an MR device such as MR glasses that has the function of providing mixed reality (MR) to the user, or a virtual reality (VR) to the user. It may also be a VR device such as VR glasses that has a function of providing reality.
  • MR mixed reality
  • VR virtual reality
  • volumetric video (or volumetric capture) technology can be applied to the online dialogue support system 1.
  • This technology creates 3D content that accurately reproduces the subject's appearance, shape, movement, etc. It is a technology to The online dialogue support system 1 to which this technology is applied reproduces the actions of a plurality of users in 3D in real time in the same virtual space. to each user. In order to enjoy such a user experience, the sender user and the receiver user participate in the online dialogue while wearing the head-mounted display D.
  • FIG. 2 is a block diagram showing an example of the functional configuration of the online dialogue support system 1A(1) according to the first embodiment.
  • the online dialogue support system 1A includes a sender user terminal 10A (10) and a receiver user terminal 20A (20).
  • the main functions of the online dialogue support system are performed by the receiving user terminal 20A. That is, in the first embodiment, it can be considered that the receiving user terminal 20A alone constitutes an online dialogue support system.
  • the user terminal 10A on the transmission side has an imaging unit 11 and a transmission unit 12 .
  • the photographing unit 11 obtains a face image by photographing the face of the user on the transmission side.
  • the photographing unit 11 photographs a reference face image representing the face of the user on the sending side.
  • the reference face image is the original (complete) face image of the sending user, which is taken with the sending user's face not hidden.
  • the reference face image is obtained by capturing the entire face of the transmitting user in a state where the transmitting user's face is not hidden by the head mounted display D (that is, before the transmitting user wears the head mounted display D). This is an image that has been
  • the photographing unit 11 photographs the reference face image as a still image.
  • the photographing unit 11 photographs a hidden face image showing the face of the transmitting user with a part of the face of the transmitting user hidden.
  • a hidden face image is an incomplete face image of the sending user captured with a part of the sending user's face hidden by an object existing between the imaging unit 11 and the sending user's face. is.
  • the hidden face image is a face image of the sending user in a state in which a part of the face of the sending user is hidden by the head mounted display D (that is, a state after the sending user wears the head mounted display D). This is an image in which the entire face is photographed.
  • the hidden face image is an image in which the face of the sending user and the head-mounted display D that hides a part of the face of the sending user are reflected.
  • the imaging unit 11 captures hidden face images in a moving image format.
  • the transmission unit 12 transmits the reference face image and the hidden face image acquired by the imaging unit 11 to the receiving user terminal 20A.
  • the transmitting unit 12 transmits the reference face image to the receiving user terminal 20A before the online dialogue is started, and transmits the hidden face image to the receiving user terminal 20A after the online dialogue is started.
  • the receiving-side user terminal 20A has a receiving unit 21 (acquiring unit), a generating unit 22, an authenticating unit 23, and a display control unit 24.
  • the reception unit 21 functions as an acquisition unit that acquires the reference face image and the hidden face image by receiving the reference face image and the hidden face image from the transmission-side user terminal 10A.
  • the receiving unit 21 stores the reference face image in the storage unit 30, which will be described later.
  • the reference face image of the sending user may be stored (registered) directly from the sending user terminal 10 to the storage unit 30 without going through the receiving user terminal 20A.
  • the generation unit 22 complements a partial area of the hidden face image to generate a complemented face image.
  • a complementary face image is a face image showing the face of the transmitting user when a part of the hidden face image is not hidden.
  • the generation unit 22 generates a complementary face image in a moving image format.
  • a part of the hidden face image is hidden by, for example, the head-mounted display D worn by the sender.
  • the generation unit 22 replaces the part of the hidden face image corresponding to the head-mounted display D with another image, thereby complementing the partial area.
  • the generation unit 22 replaces the portion corresponding to the head mounted display D in the hidden face image with an image representing the face of the transmitting side user, and replaces the portion corresponding to the head mounted display D in the hidden face image with the transmitting side without the head mounted display D. Reproduce the user's face.
  • the face image complementing process will be described later.
  • the authentication unit 23 performs authentication based on the reference face image and the complementary face image.
  • the authentication unit 23 performs face authentication using a known method.
  • the authentication unit 23 extracts feature points, face regions, and the like from the reference face image and the complementary face image.
  • the authentication unit 23 compares the extracted values to calculate the degree of similarity between them.
  • the authentication unit 23 determines that the authentication is successful if the degree of similarity between the two is equal to or greater than a predetermined threshold, and determines that the authentication is unsuccessful if the degree of similarity between the two is less than the predetermined threshold.
  • the display control unit 24 controls the display of the face image of the sending user on the receiving user terminal 20A.
  • the display control unit 24 causes the complementary face image or the hidden face image to be displayed on an output device (display device) such as a display provided in the receiving user terminal 20A.
  • an output device display device
  • the display control unit 24 causes the receiving-side user terminal 20A to display the complementary face image.
  • the display control unit 24 causes the receiving-side user terminal 20A to display the hidden face image.
  • the storage unit 30 stores various data used or generated in the receiving user terminal 20A.
  • the storage unit 30 stores a reference face image representing the face of the sending user.
  • the storage unit 30 may store data such as feature points of the reference face image used for authentication by the authentication unit 23 .
  • the storage unit 30 may store at least one of the hidden face image acquired by the reception unit 21 and the complementary face image generated by the generation unit 22 .
  • the storage unit 30 may store the shape of the head mounted display D.
  • the storage unit 30 may be a device separate from the receiving user terminal 20A, or may be one component of the receiving user terminal 20A.
  • FIG. 3 is a diagram schematically showing complement processing of a face image. Although an example of the reference face image is shown on the side of the receiving user terminal 20A, the reference face image is not used for complementing the face image.
  • the sender's user terminal 10A takes a hidden face image and sends it to the receiver's user terminal 20A.
  • this hidden face image a part of the sender's face is hidden by the head-mounted display D worn by the sender.
  • the hidden face image the area around the eyes of the transmitting user is hidden by the lens, frame, bridge, etc. of the head-mounted display D.
  • the generation unit 22 identifies a partial area R to be complemented from the hidden face image. For example, the generating unit 22 reads out the shape of the head mounted display D stored in advance in the storage unit 30, and identifies the area corresponding to the head mounted display D in the hidden face image based on the shape, thereby partially Identify the region R of
  • the generation unit 22 generates a complementary face image by complementing a partial region R of the hidden face image. Complementation of some regions R may be performed by machine learning.
  • the generation unit 22 may generate a plurality of face images (so-called positive examples) of a plurality of sending-side users and face images (so-called negative examples) of a plurality of users different from the sending-side user as training data by machine learning.
  • a model is prepared in advance that is configured to input an image showing a part of the user's face and output an estimation result of another part of the sending user's face.
  • the multiple face images of the sending users are, for example, face images showing the entire faces of the sending users photographed from various angles.
  • the face images of a plurality of users different from the sending user are, for example, face images showing the entire faces photographed from various angles for each of the plurality of users.
  • a model configured as follows can be obtained. That is, when an image showing a part of the face of the authentic sender user (for example, a part including the mouth that is not hidden by the head-mounted display D) is input, another image reflecting the features of the authentic sender user is displayed. Output the estimation result of the part (for example, an image including the part hidden by the head-mounted display D), and the user different from the genuine sender user (for example, the user who is trying to impersonate the genuine sender user).
  • a model can be obtained that is configured to output an estimation result that does not reflect the features of a genuine sending user when an image showing a portion of a face is input.
  • the generation unit 22 inputs the region of the hidden face image excluding the partial region R to the model, and interpolates the partial region R based on the output result from the model.
  • the generating unit 22 performs machine learning using face images of a plurality of sending users and face images of a plurality of users different from the sending user as teacher data. input a first range image showing a part of the partial region R, and output an estimation result of a second range showing a part of the partial region R different from the first range Prepare the model in advance.
  • An image of the first range showing a portion of the partial region R may be acquired by the head mounted display D.
  • the head mounted display D may be provided with a camera inside the bridge portion of the glasses (on the user side).
  • the head mounted display D captures an image of the eyes of the transmitting user (for example, a moving image including the user's eyes as subjects) while the transmitting user is wearing the head mounted display D.
  • the head-mounted display D may capture images of the eyes of the sending user using two cameras arranged for each eye of the sending user (for example, two cameras arranged inside each lens).
  • the generating unit 22 inputs an image (for example, an image of the eyes of the transmitting user) showing a part of the transmitting user's face in the partial region R to the model, and outputs the input image and the model. Some areas may be completed based on the results.
  • the part of the hidden face image corresponding to the head-mounted display D is replaced with an image representing the face of the transmitting user.
  • the face of the transmitting user not wearing the head-mounted display D is reproduced in the partial area R of the complementary face image.
  • FIG. 4 is a sequence diagram showing the operation of the online dialogue support system 1A as a processing flow S1.
  • the storage unit 30 stores in advance a reference face image representing the face of the user on the sending side.
  • step S11 the photographing unit 11 photographs a hidden face image showing the face of the transmitting user with a part of the face of the transmitting user hidden.
  • the photographing unit 11 photographs the face of the transmitting user with a part of the face hidden by the head-mounted display D worn by the transmitting user.
  • the photographing unit 11 photographs hidden face images in a moving image format.
  • step S12 the transmission unit 12 transmits the hidden face image acquired by the imaging unit 11 to the receiving user terminal 20A.
  • the transmission unit 12 transmits the hidden face image in the moving image format to the receiving user terminal 20A in real time.
  • step S13 the receiving unit 21 acquires a hidden face image by receiving the hidden face image from the transmission-side user terminal 10A.
  • the receiving section 21 may store the hidden face image in the storage section 30 .
  • step S14 the generation unit 22 complements a partial area of the hidden face image to generate a complementary face image.
  • the generation unit 22 identifies a region corresponding to the head-mounted display D in the hidden face image based on the shape of the head-mounted display D stored in advance in the storage unit 30, so that the partial region R (see FIG. 3).
  • the generation unit 22 performs machine learning using face images of a plurality of users different from the sending user as teacher data together with the face images of the sending users.
  • a model is prepared in advance that is configured to input an image showing and output an estimation result of the other part of the face of the sending user. Input to the model and interpolate some region R based on the output results from the model.
  • the generation unit 22 performs machine learning using facial images of a plurality of users different from the transmitting user together with facial images of a plurality of transmitting users as teacher data.
  • Preparing in advance a model configured to input an image of a first range showing and output an estimation result of a second range showing a part different from the first range in a part of the region R, An image showing a portion of the transmitting user's face in the partial area R is input to the model, and the partial area is interpolated based on the input image and the output result from the model.
  • step S15 the authentication unit 23 performs authentication based on the reference face image and the complementary face image.
  • the authentication unit 23 performs face authentication using a known method and determines whether the authentication is successful.
  • the authenticating unit 23 does not have to perform authentication for frames after successful authentication among the series of frames of the hidden face image in the moving image format.
  • step S16 the display control unit 24 controls display of the face image of the sending user on the receiving user terminal 20A.
  • the display control unit 24 causes the receiving-side user terminal 20A to display the complementary face image.
  • the display control unit 24 causes the receiving-side user terminal 20A to display the hidden face image.
  • the display control unit 24 may display the complementary face image or the hidden face image of the sending user on the head-mounted display D worn by the receiving user.
  • a complementary face image in which the partial region R of the face of the sending user is interpolated is generated from the hidden face image in which the partial region R is hidden. Authentication is then performed based on the reference face image, which is the original (complete) face image of the sending user, and the complementary face image. If the authentication is successful, the complementary face image is displayed on the receiving-side user terminal 20A.
  • the receiving user can confirm that the transmitting user is the person himself/herself by confirming that the complementary face image is displayed on the receiving user terminal 20A. becomes. Therefore, it is possible to prevent spoofing of the user on the sending side.
  • the display control unit 24 causes the receiving user terminal 20A to display a hidden face image when the authentication fails.
  • the receiving user can detect the possibility that the transmitting user is impersonating by confirming that the hidden face image is displayed on the receiving user terminal 20A.
  • a part of the area R is hidden by the head-mounted display D worn by the user on the sending side.
  • the generation unit 22 complements the partial area R by replacing the part of the hidden face image corresponding to the head-mounted display D with another image. According to such a configuration, it appears as if the transmitting user is not wearing the head mounted display D on the receiving user terminal 20A. As a result, while the transmitting user visually recognizes the display on the head-mounted display D, more natural communication can be realized between the transmitting user and the receiving user.
  • the generation unit 22 identifies a partial region R by identifying a region corresponding to the head-mounted display D in the hidden face image based on the shape of the head-mounted display D stored in advance. In this case, since the head-mounted display D is precisely identified, the complementing accuracy of the partial area R is also improved.
  • the generation unit 22 inputs an image showing a part of the face of the sending user by machine learning using facial images of a plurality of users different from the sending user as teacher data together with the face images of the sending user.
  • a model configured to output estimation results of other parts of the face of the transmitting user is prepared in advance, and an area of the hidden face image excluding a part of the area R is input to the model.
  • a part of the region R is interpolated based on the output result from . According to such a configuration, part of the face of the transmitting user is complemented more naturally. In addition, even when the user moves, since part of the user's face is complemented according to the user's movement, the complementing accuracy of the partial area R is improved.
  • the generation unit 22 generates a first range indicating a part of the partial region R by machine learning using face images of a plurality of users different from the sending user as teacher data together with the face images of the sending users.
  • a model is prepared in advance that is configured to input the image of the partial region R and output the estimation result of the second range indicating the portion different from the first range in the partial region R, and the partial region R
  • An image showing a part of the sender's face is input to the model, and the part of the area is interpolated based on the input image and the output result from the model.
  • the region R of is complemented.
  • part of the face of the transmitting user in the partial area R is complemented by the image, so that a complemented face image closer to the original face of the user can be obtained.
  • the line of sight, expression, etc. of the user on the sending side are reflected in the complemented face image, part of the face of the user on the sending side is complemented more naturally.
  • FIG. 5 is a block diagram showing an example of the functional configuration of an online dialogue support system 1B(1) according to the second embodiment.
  • the online dialogue support system 1B differs from the online dialogue support system 1A in that it comprises a sender user terminal 10B (10) and a receiver user terminal 20B (20) instead of the sender user terminal 10A and the receiver user terminal 20A. do.
  • a complementary face image is generated on the sender's user terminal 10B, and authentication is performed based on the reference face image and the complementary face image.
  • the transmitting-side user terminal 10B differs from the transmitting-side user terminal 10A in that it has a generating unit 22 and an authenticating unit 23, and has a transmitting unit 12B instead of the transmitting unit 12.
  • the receiving-side user terminal 20B differs from the receiving-side user terminal 20A in that it does not have the generation unit 22 and the authentication unit 23 and has a receiving unit 21B instead of the receiving unit 21 .
  • the transmission unit 12B transmits the hidden face image or the complementary face image to the receiving user terminal 20B according to the authentication result of the authentication unit 23.
  • FIG. Specifically, when the authentication by the authentication unit 23 is successful, the transmission unit 12B transmits the complementary face image to the receiving user terminal 20B. On the other hand, when the authentication by the authentication unit 23 fails, the transmission unit 12B transmits the hidden face image to the receiving user terminal 20B.
  • the receiving unit 21B receives the hidden face image or the complementary face image from the user terminal 10B on the transmission side.
  • the imaging unit 11 functions as an acquisition unit that acquires hidden face images.
  • the transmitting unit 14 that transmits the hidden face image or the complementary face image to the receiving-side user terminal 20B according to the authentication result substantially causes the receiving-side user terminal 20B to display the complementary face image when the authentication is successful.
  • the storage unit 30 may be a device separate from the transmitting user terminal 10B, or may be one component of the transmitting user terminal 10B.
  • FIG. 6 is a sequence diagram showing the operation of the online dialogue support system 1B as a processing flow S2.
  • the storage unit 30 stores in advance a reference face image representing the face of the user on the sending side.
  • the photographing unit 11 photographs a hidden face image showing the face of the transmitting user with a part of the face of the transmitting user hidden.
  • the photographing unit 11 photographs the face of the transmitting user with a part of the face hidden by the head-mounted display D worn by the transmitting user.
  • the photographing unit 11 photographs hidden face images in a moving image format.
  • the photographing unit 11 may store the hidden face image in the storage unit 30 .
  • step S22 the generation unit 22 complements a partial region of the hidden face image to generate a complementary face image.
  • the process of step S22 is different from step S14 in FIG. 4 in that it is performed on the transmitting user terminal 10B.
  • step S23 the authentication unit 23 performs authentication based on the reference face image and the complementary face image.
  • the process of step S23 is different from step S15 in FIG. 4 in that it is performed on the transmitting user terminal 10B.
  • step S24 the transmission unit 12B transmits the hidden face image or the complementary face image to the receiving user terminal 20B.
  • the transmission unit 12B transmits the hidden face image or the complementary face image in moving image format to the receiving user terminal 20B in real time. For example, when the authentication is successful in the process of step S23, the transmission unit 12B transmits the complementary face image to the receiving user terminal 20B. On the other hand, if authentication fails in the process of step S23, the transmission unit 12B transmits the hidden face image to the receiving user terminal 20B.
  • step S25 the receiving unit 21B acquires the hidden face image or the complementary face image by receiving the hidden face image or the complementary face image from the transmission-side user terminal 10B.
  • step S26 the display control unit 24 controls the display of the face image of the sending user on the receiving user terminal 20B.
  • the display control unit 24 causes the receiving-side user terminal 20B to display the hidden face image or the complementary face image acquired in the process of step S25.
  • the display control unit 24 may display the complementary face image or the hidden face image of the sending user on the head-mounted display D worn by the receiving user.
  • the same effects as the online dialogue support system 1A are achieved. That is, the receiving user can confirm that the transmitting user is the person himself/herself by confirming that the complementary face image is displayed on the receiving user terminal 20B. Therefore, it is possible to prevent spoofing of the user on the sending side.
  • the online dialogue support system 1B since authentication processing is executed on the side of the user terminal 10B on the sending side, the processing load on the user terminal 20B on the receiving side can be suppressed.
  • FIG. 7 is a block diagram showing an example of the functional configuration of an online dialogue support system 1C(1) according to the third embodiment.
  • the online dialogue support system 1C includes a sending user terminal 10C (10) and a receiving user terminal 20C (20) instead of the sending user terminal 10A and the receiving user terminal 20A, and further includes a server 40. This is different from the online dialogue support system 1A.
  • a complementary face image is generated on the server 40, and authentication is performed based on the reference face image and the complementary face image.
  • the transmitting-side user terminal 10C differs from the transmitting-side user terminal 10A in that it has a transmitting section 12C instead of the transmitting section 12.
  • Transmission unit 12C transmits the hidden face image to server 40 .
  • the receiving user terminal 20C differs from the receiving user terminal 20A in that it does not have the generating unit 22 and the authenticating unit 23 and has a receiving unit 21C instead of the receiving unit 21 .
  • the receiving unit 21C receives hidden face images or complementary face images from the server 40 .
  • the server 40 has an image reception unit 41 (acquisition unit), a generation unit 22 , an authentication unit 23 and an image transmission unit 42 .
  • the image receiving unit 41 functions as an acquisition unit that acquires the reference face image and the hidden face image by receiving the reference face image and the hidden face image from the transmission-side user terminal 10C.
  • the image transmission unit 42 transmits the hidden face image or the complementary face image to the receiving user terminal 20C according to the authentication result of the authentication unit 23 . Specifically, when the authentication by the authentication unit 23 is successful, the image transmission unit 42 transmits the complementary face image to the receiving user terminal 20C. On the other hand, when the authentication by the authentication unit 23 fails, the image transmission unit 42 transmits the hidden face image to the receiving user terminal 20C.
  • the image transmission unit 42 substantially causes the user terminal 20C on the receiving side to display the complementary face image when the authentication is successful, and displays the hidden face image on the user terminal 20C on the receiving side when the authentication fails.
  • the storage unit 30 may be a device separate from the server 40 or may be one component of the server 40 .
  • FIG. 8 is a sequence diagram showing the operation of the online dialogue support system 1C as a processing flow S3.
  • the storage unit 30 stores in advance a reference face image representing the face of the user on the sending side.
  • step S31 is the same as step S11 of FIG.
  • step S32 the transmission unit 12C transmits the hidden face image acquired by the imaging unit 11 to the server 40.
  • the transmission unit 12C transmits hidden face images in moving image format to the server 40 in real time.
  • step S33 the image receiving unit 41 acquires a hidden face image by receiving the hidden face image from the transmission-side user terminal 10C.
  • the image receiving section 41 may store the hidden face image in the storage section 30 .
  • step S34 the generation unit 22 complements a partial area of the hidden face image to generate a complemented face image.
  • the processing of step S34 is different from step S14 in FIG. 4 in that it is performed on the server 40.
  • step S35 the authentication unit 23 performs authentication based on the reference face image and the complementary face image.
  • the process of step S35 is different from step S15 in FIG. 4 in that it is performed on the server 40 .
  • step S36 the image transmission unit 42 transmits the hidden face image or the complementary face image to the receiving user terminal 20C.
  • the image transmission unit 42 transmits the hidden face image or the complementary face image in moving image format to the receiving user terminal 20C in real time. For example, when the authentication is successful in the process of step S35, the image transmission unit 42 transmits the complementary face image to the receiving user terminal 20C. On the other hand, if the authentication fails in the process of step S35, the image transmission unit 42 transmits the hidden face image to the receiving user terminal 20C.
  • step S37 the receiving unit 21C acquires the hidden face image or the complementary face image by receiving the hidden face image or the complementary face image from the server 40.
  • step S38 the display control unit 24 controls the display of the face image of the transmitting user on the receiving user terminal 20C.
  • the display control unit 24 causes the receiving-side user terminal 20C to display the hidden face image or the complementary face image acquired in the process of step S37.
  • the display control unit 24 may display the complementary face image or the hidden face image of the sending user on the head-mounted display D worn by the receiving user.
  • the same effects as the online dialogue support system 1A are achieved. That is, the receiving user can confirm that the transmitting user is the person himself/herself by confirming that the complementary face image is displayed on the receiving user terminal 20C. Therefore, it is possible to prevent spoofing of the user on the sending side. Further, according to the online dialogue support system 1C, it is possible to suppress the processing load of the receiving user terminal 20C. Furthermore, when a plurality of people (that is, between terminals of a plurality of users) conduct online dialogue, time synchronization can be easily performed regardless of the equipment, performance, etc. of the transmitting user terminal 10C.
  • the hidden face image is in the form of a moving image. It may be an image. Hidden face images may be obtained separately in the form of a still image used for face authentication and a moving image format after face authentication.
  • the hidden face image may be an image in which a part of the area is hidden by, for example, mosaic.
  • the partial area R is specified based on the shape of the head-mounted display D stored in advance. A part of the region R may be specified by receiving an input operation or the like to specify.
  • an example of displaying the hidden face image of the sending user when authentication fails has been described. You may end the online dialogue.
  • each functional block may be implemented using one device that is physically or logically coupled, or directly or indirectly using two or more devices that are physically or logically separated (e.g. , wired, wireless, etc.) and may be implemented using these multiple devices.
  • a functional block may be implemented by combining software in the one device or the plurality of devices.
  • Functions include judging, determining, determining, calculating, calculating, processing, deriving, investigating, searching, checking, receiving, transmitting, outputting, accessing, resolving, selecting, choosing, establishing, comparing, assuming, expecting, assuming, Broadcasting, notifying, communicating, forwarding, configuring, reconfiguring, allocating, mapping, assigning, etc. can't
  • the transmitting user terminal 10, the receiving user terminal 20, and the server 40 may function as computers that perform the information processing method of the present disclosure.
  • FIG. 9 is a diagram showing an example of a hardware configuration common to the transmitting user terminal 10, the receiving user terminal 20, and the server 40 according to an embodiment of the present disclosure.
  • Each of the sender user terminal 10, the receiver user terminal 20, and the server 40 is physically a computer including a processor 1001, a memory 1002, a storage 1003, a communication device 1004, an input device 1005, an output device 1006, a bus 1007, and the like. It may be configured as a device.
  • the term "apparatus" can be read as a circuit, device, unit, or the like.
  • the hardware configuration of the sender user terminal 10, the receiver user terminal 20, and the server 40 may be configured to include one or more of the devices shown in FIG. may be configured.
  • Each function of the sending user terminal 10, the receiving user terminal 20, and the server 40 is implemented by causing the processor 1001 to perform calculations and communication by loading predetermined software (programs) onto hardware such as the processor 1001 and memory 1002. It is realized by controlling communication by the device 1004 and controlling at least one of data reading and writing in the memory 1002 and the storage 1003 .
  • the processor 1001 for example, operates an operating system and controls the entire computer.
  • the processor 1001 may be configured by a central processing unit (CPU) including an interface with peripheral devices, a control device, an arithmetic device, registers, and the like.
  • CPU central processing unit
  • the processor 1001 reads programs (program codes), software modules, data, etc. from at least one of the storage 1003 and the communication device 1004 to the memory 1002, and executes various processes according to them.
  • programs program codes
  • software modules software modules
  • data etc.
  • the program a program that causes a computer to execute at least part of the operations described in the above embodiments is used.
  • each functional unit eg, generating unit 22, etc.
  • each functional unit eg, generating unit 22, etc.
  • the transmitting user terminal 10 the receiving user terminal 20, and the server 40
  • each functional unit eg, generating unit 22, etc.
  • the transmitting user terminal 10 the receiving user terminal 20
  • the server 40 may be stored in the memory 1002 and implemented by a control program that operates on the processor 1001.
  • Other functional blocks may be similarly implemented.
  • FIG. Processor 1001 may be implemented by one or more chips.
  • the program may be transmitted from a network via an electric communication line.
  • the memory 1002 is a computer-readable recording medium, and is composed of at least one of, for example, ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), RAM (Random Access Memory), etc. may be
  • ROM Read Only Memory
  • EPROM Erasable Programmable ROM
  • EEPROM Electrical Erasable Programmable ROM
  • RAM Random Access Memory
  • the memory 1002 may also be called a register, cache, main memory (main storage device), or the like.
  • the memory 1002 can store executable programs (program codes), software modules, etc. for implementing an information processing method according to an embodiment of the present disclosure.
  • the storage 1003 is a computer-readable recording medium, for example, an optical disc such as a CD-ROM (Compact Disc ROM), a hard disk drive, a flexible disc, a magneto-optical disc (for example, a compact disc, a digital versatile disc, a Blu-ray disk), smart card, flash memory (eg, card, stick, key drive), floppy disk, magnetic strip, and/or the like.
  • Storage 1003 may also be called an auxiliary storage device.
  • the storage medium described above may be, for example, a database, server, or other suitable medium including at least one of memory 1002 and storage 1003 .
  • the communication device 1004 is hardware (transmitting/receiving device) for communicating between computers via at least one of a wired network and a wireless network, and is also called a network device, a network controller, a network card, a communication module, or the like.
  • the input device 1005 is an input device (for example, keyboard, mouse, microphone, switch, button, sensor, etc.) that receives input from the outside.
  • the output device 1006 is an output device (eg, display, speaker, LED lamp, etc.) that outputs to the outside. Note that the input device 1005 and the output device 1006 may be integrated (for example, a touch panel).
  • Each device such as the processor 1001 and the memory 1002 is connected by a bus 1007 for communicating information.
  • the bus 1007 may be configured using a single bus, or may be configured using different buses between devices.
  • the transmitting side user terminal 10, the receiving side user terminal 20, and the server 40 include a microprocessor, a digital signal processor (DSP), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), an FPGA (Field Programmable Gate Array) may be included, and the hardware may implement part or all of each functional block.
  • DSP digital signal processor
  • ASIC Application Specific Integrated Circuit
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • processor 1001 may be implemented using at least one of these pieces of hardware.
  • Input/output information may be stored in a specific location (for example, memory) or managed using a management table. Input/output information and the like can be overwritten, updated, or appended. The output information and the like may be deleted. The entered information and the like may be transmitted to another device.
  • the determination may be made by a value represented by one bit (0 or 1), by a true/false value (Boolean: true or false), or by numerical comparison (for example, a predetermined value).
  • notification of predetermined information is not limited to being performed explicitly, but may be performed implicitly (for example, not notifying the predetermined information). good too.
  • Software whether referred to as software, firmware, middleware, microcode, hardware description language or otherwise, includes instructions, instruction sets, code, code segments, program code, programs, subprograms, and software modules. , applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, and the like.
  • software, instructions, information, etc. may be transmitted and received via a transmission medium.
  • the software uses at least one of wired technology (coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), etc.) and wireless technology (infrared, microwave, etc.) to website, Wired and/or wireless technologies are included within the definition of transmission medium when sent from a server or other remote source.
  • wired technology coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), etc.
  • wireless technology infrared, microwave, etc.
  • data, instructions, commands, information, signals, bits, symbols, chips, etc. may refer to voltages, currents, electromagnetic waves, magnetic fields or magnetic particles, light fields or photons, or any of these. may be represented by a combination of
  • information, parameters, etc. described in the present disclosure may be expressed using absolute values, may be expressed using relative values from a predetermined value, or may be expressed using other corresponding information. may be represented.
  • any reference to elements using the "first,” “second,” etc. designations used in this disclosure does not generally limit the quantity or order of those elements. These designations may be used in this disclosure as a convenient method of distinguishing between two or more elements. Thus, reference to a first and second element does not imply that only two elements can be employed or that the first element must precede the second element in any way.
  • a and B are different may mean “A and B are different from each other.”
  • the term may also mean that "A and B are different from C”.
  • Terms such as “separate,” “coupled,” etc. may also be interpreted in the same manner as “different.”

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Processing Or Creating Images (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Collating Specific Patterns (AREA)
  • Image Processing (AREA)
  • Closed-Circuit Television Systems (AREA)
PCT/JP2022/030319 2021-09-10 2022-08-08 オンライン対話支援システム Ceased WO2023037812A1 (ja)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2023546844A JP7588243B2 (ja) 2021-09-10 2022-08-08 オンライン対話支援システム

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021147550 2021-09-10
JP2021-147550 2021-09-10

Publications (1)

Publication Number Publication Date
WO2023037812A1 true WO2023037812A1 (ja) 2023-03-16

Family

ID=85507535

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/030319 Ceased WO2023037812A1 (ja) 2021-09-10 2022-08-08 オンライン対話支援システム

Country Status (2)

Country Link
JP (1) JP7588243B2 (https=)
WO (1) WO2023037812A1 (https=)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1196366A (ja) * 1997-09-19 1999-04-09 Nippon Telegr & Teleph Corp <Ntt> ヘッドマウントディスプレイを装着した人物の顔画像合成方法およびその装置
JP2000004395A (ja) * 1998-06-15 2000-01-07 Sony Corp ビデオカメラの画像処理装置及びヘッドマウントディスプレイ
JP2007148872A (ja) * 2005-11-29 2007-06-14 Mitsubishi Electric Corp 画像認証装置
JP2009135705A (ja) * 2007-11-29 2009-06-18 Kyocera Corp 携帯端末装置
JP2015142193A (ja) * 2014-01-28 2015-08-03 株式会社リコー 伝送端末及びプログラム
JP2020507221A (ja) * 2017-02-03 2020-03-05 ベステル エレクトロニク サナイー ベ ティカレト エー.エス. Hmdを用いたビデオ会議の改良された方法およびシステム
CN112597867A (zh) * 2020-12-17 2021-04-02 佛山科学技术学院 戴口罩人脸识别方法、系统、计算机设备及存储介质
JP2021114324A (ja) * 2016-11-11 2021-08-05 マジック リープ, インコーポレイテッドMagic Leap, Inc. 完全な顔画像の眼球周囲およびオーディオ合成

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1196366A (ja) * 1997-09-19 1999-04-09 Nippon Telegr & Teleph Corp <Ntt> ヘッドマウントディスプレイを装着した人物の顔画像合成方法およびその装置
JP2000004395A (ja) * 1998-06-15 2000-01-07 Sony Corp ビデオカメラの画像処理装置及びヘッドマウントディスプレイ
JP2007148872A (ja) * 2005-11-29 2007-06-14 Mitsubishi Electric Corp 画像認証装置
JP2009135705A (ja) * 2007-11-29 2009-06-18 Kyocera Corp 携帯端末装置
JP2015142193A (ja) * 2014-01-28 2015-08-03 株式会社リコー 伝送端末及びプログラム
JP2021114324A (ja) * 2016-11-11 2021-08-05 マジック リープ, インコーポレイテッドMagic Leap, Inc. 完全な顔画像の眼球周囲およびオーディオ合成
JP2020507221A (ja) * 2017-02-03 2020-03-05 ベステル エレクトロニク サナイー ベ ティカレト エー.エス. Hmdを用いたビデオ会議の改良された方法およびシステム
CN112597867A (zh) * 2020-12-17 2021-04-02 佛山科学技术学院 戴口罩人脸识别方法、系统、计算机设备及存储介质

Also Published As

Publication number Publication date
JPWO2023037812A1 (https=) 2023-03-16
JP7588243B2 (ja) 2024-11-21

Similar Documents

Publication Publication Date Title
US12333080B2 (en) Mixed reality display system and mixed reality display terminal
TWI751161B (zh) 終端設備、智慧型手機、基於臉部識別的認證方法和系統
WO2019130991A1 (ja) 情報処理装置
US11282481B2 (en) Information processing device
JP2009206924A (ja) 情報処理装置、情報処理システム及び情報処理プログラム
US10783666B2 (en) Color analysis and control using an electronic mobile device transparent display screen integral with the use of augmented reality glasses
CN113342157B (zh) 眼球追踪处理方法及相关装置
JP7588243B2 (ja) オンライン対話支援システム
US20250166309A1 (en) Information interaction method, computer-readable storage medium and communication terminal
JP7733815B2 (ja) 仮想空間提供装置
JP7723765B2 (ja) メッセージ送信装置及びメッセージ受信装置
JP7777039B2 (ja) 情報処理装置
JP2017188787A (ja) 撮像装置、画像合成方法、および画像合成プログラム
JP7824977B2 (ja) アバター生成装置
JP2025122317A (ja) 作業支援装置および作業支援方法
US20260064194A1 (en) Display control apparatus
CN117041670B (zh) 图像处理方法及相关设备
EP4607323A1 (en) Controlling a headset
JP7562831B2 (ja) 情報処理装置
JP7713541B2 (ja) 表示制御装置及びサーバ
US20250157090A1 (en) Display control apparatus
JP6840646B2 (ja) 情報処理装置、端末装置および情報処理システム
JP2024162061A (ja) 仮想空間の管理装置
WO2025248775A1 (ja) 装置、および方法
JP2025122315A (ja) 作業支援装置および作業支援方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22867134

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023546844

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22867134

Country of ref document: EP

Kind code of ref document: A1