WO2024075707A1

WO2024075707A1 - System, electronic device, method for controlling system, and program

Info

Publication number: WO2024075707A1
Application number: PCT/JP2023/035965
Authority: WO
Inventors: 遥矢 ▲高▼瀬
Original assignee: 京セラ株式会社
Priority date: 2022-10-07
Filing date: 2023-10-02
Publication date: 2024-04-11

Abstract

This system comprises: a first electronic device that acquires video of at least one first user; a second electronic device that outputs the video of the first user to a second user, and acquires information about the line of sight of the second user; and a control unit that performs control so that the first electronic device indicates a position to which the line of sight of the second user in the video of the first user is directed.

Description

SYSTEM, ELECTRONIC DEVICE, SYSTEM CONTROL METHOD, AND PROGRAM - Patent application

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to patent application No. 2022-162777, filed in Japan on October 7, 2022, the entire disclosure of which is incorporated herein by reference.

This disclosure relates to a system, an electronic device, a method for controlling the system, and a program.

In recent years, so-called remote conferences, such as web conferences or video conferences, have become more common. In remote conferences, electronic devices (or systems including electronic devices) are used to enable communication between participants in multiple locations. For example, consider a situation in which a conference is held in an office, and at least one of the conference participants holds the remote conference at his or her home in a remote location. In this case, audio and/or video of the conference in the office is acquired by, for example, an electronic device installed in the office, and transmitted to, for example, an electronic device installed in the participant's home. Also, audio and/or video at the participant's home is acquired by, for example, an electronic device installed in the participant's home, and transmitted to, for example, an electronic device installed in the office. Such electronic devices allow a conference to be held without all participants gathering in the same place.

Various technologies that can be applied to remote conferences such as those described above have been proposed. For example, Patent Document 1 discloses a device that displays a graphic that represents the output range of directional sound output by a speaker, superimposed on an image captured by a camera. This device makes it possible to visually grasp the output range of directional sound. Furthermore, for example, Patent Document 2 discloses a system in which, when a speaker and a listener in separate locations are engaged in a conversation, a listener robot is attached to the speaker's side, and a speaker robot is attached to the listener's side.

JP 2010-21705 A JP 2000-349920 A

In one embodiment, the system includes:
a first electronic device that captures an image of at least one first user;
a second electronic device that outputs a video of the first user to a second user and acquires information on a line of sight of the second user;
a control unit that controls a position of a gaze of the second user in the image of the first user so as to be indicated by the first electronic device;
including.

The electronic device according to an embodiment includes:
An electronic device configured to be able to communicate with other electronic devices,
an acquisition unit that acquires a video of at least one first user;
a control unit that controls a position of a line of sight of a second user using the other electronic device in the image of the first user so that the electronic device indicates the position;
Equipped with.

A method for controlling a system according to an embodiment includes the steps of:
A first electronic device acquires an image of at least one first user;
A step of a second electronic device outputting a video of the first user to a second user;
The second electronic device acquires information of a line of sight of the second user;
controlling a position of a gaze direction of the second user in the image of the first user to be indicated by the first electronic device;
including.

A program according to an embodiment includes:
On the computer,
A first electronic device acquires an image of at least one first user;
A step of a second electronic device outputting a video of the first user to a second user;
The second electronic device acquires information of a line of sight of the second user;
controlling a position of a gaze direction of the second user in the image of the first user to be indicated by the first electronic device;
Execute the command.

FIG. 1 is a diagram illustrating an example of a usage mode of a system according to an embodiment. FIG. 2 is a functional block diagram illustrating a schematic configuration of a first electronic device according to an embodiment. 5A to 5C are diagrams illustrating an example of driving by a driving unit of the first electronic device according to an embodiment. FIG. 4 is a functional block diagram illustrating a schematic configuration of a second electronic device according to an embodiment. FIG. 4 is a functional block diagram illustrating a configuration of a third electronic device according to an embodiment. FIG. 2 is a sequence diagram illustrating a basic operation of a system according to an embodiment. 1 is a flowchart illustrating an operation of a system according to an embodiment. 1 is a flowchart illustrating an operation of a system according to an embodiment.

In this disclosure, an "electronic device" may be, for example, a device that is powered by power supplied from a power system or a battery. In this disclosure, a "system" may be, for example, a device that includes at least an electronic device. In this disclosure, a "user" may be a person who uses or may use an electronic device according to an embodiment (typically a human), and a person who uses or may use a system including an electronic device according to an embodiment. In addition, in this disclosure, a conference in which at least one participant participates by communication from a different location than the other participants, such as a web conference or video conference, is collectively referred to as a "remote conference."

Further improvements in functionality are desired for electronic devices that enable communication between multiple locations during remote conferences, etc., for example to facilitate communication. The purpose of the present disclosure is to provide a system, electronic device, system control method, and program that facilitate communication between multiple locations. According to one embodiment, it is possible to provide a system, electronic device, system control method, and program that facilitate communication between multiple locations. Below, a system including an electronic device according to one embodiment is described in detail with reference to the drawings.

FIG. 1 is a diagram showing an example of how a system according to an embodiment is used. The following description assumes a situation in which participant Mg remotely participates in a conference held in a conference room MR from his/her home RL, as shown in FIG. 1. As shown in FIG. 1, participants Ma, Mb, Mc, and Md participate in the conference in the conference room MR. In the conference room MR, the participants of the conference are not limited to participants Ma, Mb, Mc, and Md, and may include, for example, other participants. In the conference room MR, the participants of the conference may be any number of at least one person. Participants other than participant Mg may also remotely participate in the conference from their respective homes.

As shown in FIG. 1, the system according to an embodiment may include, for example, a first electronic device 1, a second electronic device 100, and a third electronic device 300. In FIG. 1, the first electronic device 1, the second electronic device 100, and the third electronic device 300 are shown only in schematic form. The system according to an embodiment may not include at least any of the first electronic device 1, the second electronic device 100, and the third electronic device 300, and may include devices other than the electronic devices mentioned above.

The first electronic device 1 according to one embodiment may be installed in the conference room MR. Meanwhile, the second electronic device 100 according to one embodiment may be installed in the home RL of the participant Mg. The first electronic device 1 and the second electronic device 100 may be configured to be able to communicate with each other. The location of the home RL of the participant Mg may be a location different from the location of the conference room MR. The location of the home RL of the participant Mg may be far away from the location of the conference room MR, or may be close to the location of the conference room MR (for example, a room adjacent to the conference room MR).

1, the first electronic device 1 according to an embodiment may be connected to the second electronic device 100 according to an embodiment, for example, via a network N. Also, as shown in FIG. 1, the third electronic device 300 according to an embodiment may be connected to at least one of the first electronic device 1 and the second electronic device 100, for example, via a network N. The first electronic device 1 according to an embodiment may be connected to the second electronic device 100 according to an embodiment, by at least one of wireless and wired. The third electronic device 300 according to an embodiment may be connected to at least one of the first electronic device 1 and the second electronic device 100, by at least one of wireless and wired. In FIG. 1, the first electronic device 1, the second electronic device 100, and the third electronic device 300 are shown by dashed lines as being connected wirelessly and/or wired via the network N. In an embodiment, the first electronic device 1 and the second electronic device 100 may be included in a remote conference system according to an embodiment. Also, the third electronic device 300 may be included in a remote conference system according to an embodiment.

In the present disclosure, the network N as shown in FIG. 1 may include various electronic devices and/or devices such as a server as appropriate. The network N as shown in FIG. 1 may also include devices such as a base station and/or a repeater as appropriate. In the present disclosure, for example, when the first electronic device 1 and the second electronic device 100 "communicate", the first electronic device 1 and the second electronic device 100 may communicate directly. In the present disclosure, for example, when the first electronic device 1 and the second electronic device 100 "communicate", the first electronic device 1 and the second electronic device 100 may communicate via at least one of other devices such as the third electronic device 300, a repeater, and/or a base station. In the present disclosure, for example, when the first electronic device 1 and the second electronic device 100 "communicate", more specifically, the communication unit of the first electronic device 1 and the communication unit of the second electronic device 100 may communicate.

The above-mentioned notation may include the same intention as above not only when the first electronic device 1 and the second electronic device 100 "communicate" with each other, but also when one "sends" information to the other and/or when the other "receives" information sent by one. Furthermore, the above-mentioned notation may include the same intention as above not only when the first electronic device 1 and the second electronic device 100 "communicate" with each other, but also when any electronic device, including the third electronic device 300, communicates with any other electronic device.

The first electronic device 1 according to one embodiment may be arranged in the conference room MR, for example as shown in FIG. 1. In this case, the first electronic device 1 may be arranged in a position where it can acquire the voice and/or video of at least one of the conference participants Ma, Mb, Mc, and Md. Furthermore, the first electronic device 1 outputs the voice and/or video of participant Mg, as described below. Therefore, the first electronic device 1 may be arranged so that the voice and/or video of participant Mg output from the first electronic device 1 reaches at least one of the conference participants Ma, Mb, Mc, and Md.

The second electronic device 100 according to one embodiment may be arranged in the home RL of the participant Mg, for example, in a manner as shown in FIG. 1. In this case, the second electronic device 100 may be arranged in a position where it is possible to acquire the voice and/or image of the participant Mg. The second electronic device 100 may acquire the voice and/or image of the participant Mg by a microphone or a headset and/or a camera connected to the second electronic device 100. Furthermore, the second electronic device 100 according to one embodiment may acquire information on the gaze of the participant Mg, such as the gaze of the participant Mg, the direction of the gaze, and/or the movement of the gaze, as described below. The acquisition of gaze information by the second electronic device 100 will be described further below.

Furthermore, the second electronic device 100 outputs the audio and/or video of at least one of the participants Ma, Mb, Mc, and Md of the conference in the conference room MR, as described below. For this reason, the second electronic device 100 may be positioned so that the audio and/or video output from the second electronic device 100 reaches the participant Mg. The audio output from the second electronic device 100 may be positioned so that it reaches the ears of the participant Mg, for example, via headphones, earphones, speakers, or a headset. Furthermore, the video output from the second electronic device 100 may be positioned so that it is visually recognized by the participant Mg, for example, via a display.

The third electronic device 300 may be, for example, a server-like device that relays between the first electronic device 1 and the second electronic device 100. Also, the system according to one embodiment does not need to include the third electronic device 300.

FIG. 1 shows only one example of a usage mode of the first electronic device 1, the second electronic device 100, and the third embodiment 300 according to an embodiment. The first electronic device 1, the second electronic device 100, and the third embodiment 300 according to an embodiment may be used in various other modes.

The remote conference system including the first electronic device 1 and the second electronic device 100 shown in FIG. 1 allows the participant Mg to behave as if he or she is participating in a conference held in the conference room MR while staying at home RL. Also, the remote conference system including the first electronic device 1 and the second electronic device 100 shown in FIG. 1 allows the conference participants Ma, Mb, Mc, and Md to feel as if the participant Mg is actually participating in the conference held in the conference room MR. That is, in the remote conference system including the first electronic device 1 and the second electronic device 100, the first electronic device 1 arranged in the conference room MR can play a role like an avatar of the participant Mg. In this case, the first electronic device 1 may function as a physical avatar (such as a telepresence robot) that resembles the participant Mg. Also, the first electronic device 1 may function as a virtual avatar that displays an image of the participant Mg or an image of the participant Mg that is, for example, a character. The image of participant Mg or the display of the image of participant Mg by the first electronic device 1 may be, for example, a display provided in the first electronic device 1 itself, an external display, or a 3D hologram projected by the first electronic device 1.

Next, the functional configurations of the first electronic device 1, the second electronic device 100, and the third electronic device 300 according to one embodiment will be described.

2 is a block diagram showing a schematic configuration of the functions of the first electronic device 1 shown in FIG. 1. An example of the configuration of the first electronic device 1 according to an embodiment will be described below. As shown in FIG. 1, the first electronic device 1 may be used in the conference room MR by participants Ma, Mb, Mc, Md, etc. The second electronic device 100 described later has a function of outputting the voice, video, and/or gaze information of the participant Mg acquired by the second electronic device 100 when the participant Mg speaks to the first electronic device 1. The first electronic device 1 also has a function of outputting the voice and/or video of the participants Ma, Mb, Mc, Md, etc. acquired by the first electronic device 1 when the participants Ma, Mb, Mc, Md, etc. speak to the second electronic device 100. The first electronic device 1 allows the participants Ma, Mb, Mc, Md, etc. to hold a remote conference or video conference in the conference room MR even if the participant Mg is in a remote location. Therefore, the first electronic device 1 is also referred to as an electronic device "used locally" as appropriate.

The first electronic device 1 according to one embodiment may be configured to reproduce the line of sight of the participant Mg. That is, the first electronic device 1 can perform an operation that simulates the line of sight of the participant Mg. Specifically, the first electronic device 1 can cause the participants Ma, Mb, Mc, Md, etc. in the conference room MR to recognize in which direction the participant Mg is looking. For example, the first electronic device 1 can cause people around the first electronic device 1 in the conference room MR to recognize whether the participant Mg is looking at the participant Ma, whether the participant Mg is looking at the participant Mb, or whether the participant Mg is not looking at any of the participants.

The first electronic device 1 according to one embodiment may be various devices, but may be, for example, a specially designed device. For example, the first electronic device 1 according to one embodiment may have a housing with an illustration of a human or the like drawn on it, or may have a shape that imitates at least a part of a human or the like or a robot-like shape. The first electronic device 1 according to one embodiment may be, for example, a general-purpose smartphone, tablet, phablet, notebook computer (notebook PC or laptop), or computer (desktop). The first electronic device 1 according to one embodiment may draw at least a part of a human or robot on the display of a notebook PC, for example. The first electronic device 1 according to one embodiment may project at least a part of a human or robot as a 3D hologram, for example.

As shown in FIG. 2, the first electronic device 1 according to one embodiment may include a control unit 10, a memory unit 20, a communication unit 30, an imaging unit 40, an audio input unit 50, an audio output unit 60, a display unit 70, and a drive unit 80. The control unit 10 may also include, for example, an identification unit 12 and an estimation unit 14. In one embodiment, the first electronic device 1 may not include at least some of the functional units shown in FIG. 2, or may include components other than the functional units shown in FIG. 2.

The control unit 10 controls and/or manages the entire first electronic device 1, including each functional unit constituting the first electronic device 1. The control unit 10 may include at least one processor, such as a CPU (Central Processing Unit) or a DSP (Digital Signal Processor), to provide control and processing power for executing various functions. The control unit 10 may be realized as a single processor, as a number of processors, or as individual processors. The processor may be realized as a single integrated circuit (IC). The processor may be realized as a number of communicatively connected integrated circuits and discrete circuits. The processor may be realized based on various other known technologies.

The control unit 10 may include one or more processors and memories. The processor may include a general-purpose processor that loads a specific program to execute a specific function, and a dedicated processor specialized for a specific process. The dedicated processor may include an application specific integrated circuit (ASIC). The processor may include a programmable logic device (PLD). The PLD may include a field-programmable gate array (FPGA). The control unit 10 may be either a system-on-a-chip (SoC) or a system in a package (SiP) in which one or more processors work together. The control unit 10 controls the operation of each component of the first electronic device 1.

The control unit 10 may be configured to include, for example, at least one of software and hardware resources. Furthermore, in the first electronic device 1 according to one embodiment, the control unit 10 may be configured by specific means in which software and hardware resources work together. Furthermore, in the first electronic device 1 according to one embodiment, at least one of the other functional units may also be configured by specific means in which software and hardware resources work together.

In the first electronic device 1 according to one embodiment, the control unit 10 performs various operations such as control, which will be described later. The determination unit 12 of the control unit 10 can perform various determination processes. The estimation unit 14 can perform various estimation processes.

The storage unit 20 may function as a memory that stores various information. The storage unit 20 may store, for example, a program executed in the control unit 10 and the results of processing executed in the control unit 10. The storage unit 20 may also function as a work memory for the control unit 10. As shown in FIG. 2, the storage unit 20 may be connected to the control unit 10 by wire and/or wirelessly. The storage unit 20 may include, for example, at least one of a RAM (Random Access Memory) and a ROM (Read Only Memory). The storage unit 20 may be configured, for example, by a semiconductor memory or the like, but is not limited to this, and may be any storage device. For example, the storage unit 20 may be a storage medium such as a memory card inserted into the first electronic device 1 according to one embodiment. The storage unit 20 may also be an internal memory of a CPU used as the control unit 10, or may be connected to the control unit 10 as a separate unit.

The communication unit 30 has an interface function for wireless and/or wired communication with, for example, an external device. The communication method performed by the communication unit 30 in one embodiment may be a wireless communication standard. For example, the wireless communication standard includes cellular phone communication standards such as 2G, 3G, 4G, and 5G. For example, the cellular phone communication standards include LTE (Long Term Evolution), W-CDMA (Wideband Code Division Multiple Access), CDMA2000, PDC (Personal Digital Cellular), GSM (Registered Trademark) (Global System for Mobile communications), and PHS (Personal Handy-phone System), etc. For example, wireless communication standards include WiMAX (Worldwide Interoperability for Microwave Access), IEEE 802.11, WiFi, Bluetooth (registered trademark), IrDA (Infrared Data Association), and NFC (Near Field Communication). The communication unit 30 may include, for example, a modem whose communication method is standardized by ITU-T (International Telecommunication Union Telecommunication Standardization Sector). The communication unit 30 can support one or more of the above communication standards.

The communication unit 30 may be configured to include, for example, an antenna for transmitting and receiving radio waves and an appropriate RF unit. The communication unit 30 may wirelessly communicate with, for example, a communication unit of another electronic device via an antenna. The communication unit 30 may have a function of transmitting any information from the first electronic device 1 to another device, and/or a function of receiving any information from another device in the first electronic device 1. For example, the communication unit 30 may wirelessly communicate with the second electronic device 100 shown in FIG. 1. In this case, the communication unit 30 may wirelessly communicate with a communication unit 130 (described later) of the second electronic device 100. Thus, in one embodiment, the communication unit 30 has a function of communicating with the second electronic device 100. Also, for example, the communication unit 30 may wirelessly communicate with the third electronic device 300 shown in FIG. 1. In this case, the communication unit 30 may wirelessly communicate with a communication unit 330 (described later) of the third electronic device 300. Thus, in one embodiment, the communication unit 30 may have a function of communicating with the third electronic device 300. The communication unit 30 may also be configured as an interface such as a connector for wired connection to the outside. The communication unit 30 can be configured using known technology for wireless communication, so a detailed description of the hardware and the like is omitted.

As shown in FIG. 2, the communication unit 30 may be connected to the control unit 10 via a wired and/or wireless connection. Various pieces of information received by the communication unit 30 may be supplied to, for example, the storage unit 20 and/or the control unit 10. Various pieces of information received by the communication unit 30 may be stored in, for example, a memory built into the control unit 10. Furthermore, the communication unit 30 may transmit, for example, the results of processing by the control unit 10 and/or information stored in the storage unit 20 to the outside.

The imaging unit 40 may be configured to include an image sensor that captures images electronically, such as a digital camera. The imaging unit 40 may be configured to include an imaging element that performs photoelectric conversion, such as a CCD (Charge Coupled Device Image Sensor) or a CMOS (Complementary Metal Oxide Semiconductor) sensor. The imaging unit 40 can capture an image of the surroundings of the first electronic device 1, for example. The imaging unit 40 may capture an image of the inside of the conference room MR shown in FIG. 1, for example. In one embodiment, the imaging unit 40 may capture images of participants Ma, Mb, Mc, and Md of a conference held in the conference room MR shown in FIG. 1, for example.

The imaging unit 40 may be configured to capture video having a predetermined range of angle of view centered on a specific direction. For example, the imaging unit 40 according to one embodiment may capture video centered on participant Mb in FIG. 1, where participant Ma and/or participant Md are not included in the angle of view. The imaging unit 40 may also be configured to simultaneously capture video in all directions (e.g., 360 degrees), such as the horizontal direction. For example, the imaging unit 40 according to one embodiment may capture all-directional video including participants Ma, Mb, Mc, and Md in FIG. 1.

The imaging unit 40 may convert the captured image into a signal and transmit it to the control unit 10. For this reason, the imaging unit 40 may be connected to the control unit 10 via a wired and/or wireless connection. Furthermore, a signal based on the image captured by the imaging unit 40 may be supplied to any functional unit of the first electronic device 1, such as the memory unit 20 and/or the display unit 70. The imaging unit 40 is not limited to an imaging device such as a digital camera, and may be any device that captures an image of the state inside the conference room MR shown in FIG. 1.

In one embodiment, the imaging unit 40 may capture images of the state inside the conference room MR as still images at predetermined time intervals (e.g., 15 frames per second). Also, in one embodiment, the imaging unit 40 may capture images of the state inside the conference room MR as a continuous video. Furthermore, the imaging unit 40 may be configured to include a fixed camera, or may be configured to include a movable camera.

The voice input unit 50 detects (acquires) sounds or voices around the first electronic device 1, including human voices. For example, the voice input unit 50 may detect sounds or voices as air vibrations, for example, with a diaphragm, and convert them into an electrical signal. Specifically, the voice input unit 50 may include an acoustic device that converts sounds into an electrical signal, such as a microphone. In one embodiment, the voice input unit 50 may detect (acquire) the voices of at least one of the participants Ma, Mb, Mc, and Md in the conference room MR shown in FIG. 1, for example. The voices (electrical signals) detected by the voice input unit 50 may be input to the control unit 10, for example. For this reason, the voice input unit 50 may be connected to the control unit 10 by wire and/or wirelessly.

In one embodiment, the audio input unit 50 may be configured to include, for example, a stereo microphone or a microphone array. An audio input unit 50 including multiple channels, such as a stereo microphone or a microphone array, can identify (or estimate) the direction and/or position of a sound source. With such an audio input unit 50, it can be identified (or estimated) from which direction and/or position a sound detected in, for example, a conference room MR originates, based on the first electronic device 1 equipped with the audio input unit 50.

The audio input unit 50 may convert the acquired sound or voice into an electrical signal and supply it to the control unit 10. The audio input unit 50 may also supply the electrical signal (audio signal) into which the sound or voice has been converted to a functional unit of the first electronic device 1, such as the memory unit 20. The audio input unit 50 may be any device that detects (acquires) sound or voice within the conference room MR shown in FIG. 1.

The audio output unit 60 converts an electrical signal (audio signal) of sound or voice supplied from the control unit 10 into sound, and outputs the audio signal as sound or voice. The audio output unit 60 may be connected to the control unit 10 by wire and/or wirelessly. The audio output unit 60 may be configured to include a device having a function of outputting sound, such as an arbitrary speaker (loudspeaker). In one embodiment, the audio output unit 60 may be configured to include a directional speaker that transmits sound in a specific direction. The audio output unit 60 may also be configured to be able to change the directionality of the sound. The audio output unit 60 may include an amplifier or an amplification circuit that appropriately amplifies the electrical signal (audio signal).

In one embodiment, the audio output unit 60 may amplify the audio signal that the communication unit 30 receives from the second electronic device 100. Here, the audio signal received from the second electronic device 100 may be, for example, the audio signal of a speaker (e.g., participant Mg shown in FIG. 1) who is speaking (currently speaking) that is received by the communication unit 30 from the second electronic device 100 of that speaker. In other words, the audio output unit 60 may output the audio signal of a speaker (e.g., participant Mg shown in FIG. 1) as the voice of that speaker.

The display unit 70 may be any display device, such as a Liquid Crystal Display (LCD), an Organic Electro-Luminescence panel, or an Inorganic Electro-Luminescence panel. The display unit 70 may also be, for example, a projector that projects a 3D hologram. The display unit 70 may display various types of information, such as characters, figures, or symbols. The display unit 70 may also display objects and icon images that constitute various GUIs, for example, to prompt the user to operate the first electronic device 1.

Various data necessary for display on the display unit 70 may be supplied, for example, from the control unit 10 or the memory unit 20. For this reason, the display unit 70 may be connected to the control unit 10 or the like by wire and/or wirelessly. Furthermore, when the display unit 70 includes, for example, an LCD, it may be configured to include a backlight, etc., as appropriate.

In one embodiment, the display unit 70 may display an image based on a video signal transmitted from the second electronic device 100. As described below, the second electronic device 100 acquires, for example, audio, video, and/or gaze information of the participant Mg shown in FIG. 1 and outputs it to the first electronic device 1. The display unit 70 may then represent the gaze of the participant Mg in the image based on the video and/or gaze information of the participant Mg input from the first electronic device 1. By displaying the image of the gaze of the participant Mg on the display unit 70 of the first electronic device 1, for example, the participants Ma, Mb, Mc, and Md shown in FIG. 1 can visually know the gaze of the participant Mg who is in a location away from the conference room MR.

The display unit 70 may display, for example, an image of the gaze of the participant Mg captured by the second electronic device 100 as is. On the other hand, the display unit 70 may display, for example, an image that characterizes the gaze of the participant Mg (for example, the gaze of an avatar or robot). The display unit 70 may represent the gaze of the user of the second electronic device 100 by an image. The display unit 70 may also represent the gaze direction and/or gaze movement of the user of the second electronic device 100 by an image. In this way, the first electronic device 1 according to one embodiment may include a display unit 70 that represents the gaze and/or gaze direction of the user of the second electronic device 100 by an image.

The driving unit 80 drives a specific moving part in the first electronic device 1. The driving unit 80 may be configured to include a power source such as a servo motor that drives the moving part in the first electronic device 1. The driving unit 80 may drive any moving part in the first electronic device 1 under the control of the control unit 10. For this reason, the driving unit 80 may be connected to the control unit 10 by wire and/or wirelessly.

In one embodiment, the driving unit 80 may drive, for example, at least a part of the housing of the first electronic device 1. Furthermore, for example, when the first electronic device 1 has a shape that imitates at least a part of a human or a robot, the driving unit 80 may drive at least a part of the shape of a human or a robot. In particular, when the first electronic device 1 has a shape that imitates at least a part of a human face or a robot face, the driving unit 80 may represent the line of sight, line of sight direction, and/or line of sight movement of a human or a robot by a physical configuration (shape) and/or movement.

As described below, the second electronic device 100 acquires, for example, audio, video, and/or gaze information of the participant Mg shown in FIG. 1 and outputs it to the first electronic device 1. The drive unit 80 may represent the gaze of the image of the participant Mg by a physical configuration (shape) and/or movement based on the video and/or gaze information of the participant Mg input from the first electronic device 1. By the drive unit 80 of the first electronic device 1 representing the gaze of the participant Mg, for example, the participants Ma, Mb, Mc, and Md shown in FIG. 1 can visually know the gaze state of the participant Mg who is in a location away from the conference room MR.

The driving unit 80 may directly reproduce the gaze direction and/or movement of the participant Mg captured by the second electronic device 100, for example. On the other hand, the driving unit 80 may express the gaze direction and/or movement of the participant Mg in a characterized form (such as the gaze of an avatar or robot). The driving unit 80 may express the gaze, gaze direction, and/or gaze movement of the user of the second electronic device 100 by a physical configuration (form) and/or movement. In this way, the first electronic device 1 according to one embodiment may include a driving unit 80 that expresses the gaze and/or gaze direction of the user of the second electronic device 100 by driving a mechanical structure.

FIG. 3 is a diagram illustrating an example of the operation of the driving unit 80 in the first electronic device 1 according to one embodiment.

As shown in FIG. 3, in one embodiment, the driving unit 80 may realize driving about at least one of the driving axes α, β, γ, δ, ε, and ζ in the first electronic device 1. For example, the driving unit 80 may express a negative movement (shaking the head from side to side) of the user of the second electronic device 100 (e.g., participant Mg) by performing driving about the driving axis α in the first electronic device 1. Also, for example, the driving unit 80 may express a positive movement (nodding movement) of the user of the second electronic device 100 (e.g., participant Mg) by performing driving about the driving axis β in the first electronic device 1. Also, for example, the driving unit 80 may express a movement (tilting the head) in which the user of the second electronic device 100 (e.g., participant Mg) is undecided by performing driving about the driving axis γ in the first electronic device 1. Also, for example, the driving unit 80 may express a negative or rejecting movement (shaking the body from side to side) of the user of the second electronic device 100 (e.g., participant Mg) by performing driving about the driving axis δ of the first electronic device 1. Also, for example, the driving unit 80 may express a polite movement (bowing) of the user of the second electronic device 100 (e.g., participant Mg) by performing driving about the driving axis ε of the first electronic device 1. Also, for example, the driving unit 80 may express a movement of the user of the second electronic device 100 (e.g., participant Mg) by performing driving about the driving axis ζ of the first electronic device 1.

In one embodiment, the driving unit 80 may express the movement of the eyes E1 and E2 in the face portion Fc of the first electronic device 1 shown in FIG. 3, that is, the line of sight of the user (e.g., participant Mg) of the second electronic device 100. In this case, the driving unit 80 may express the line of sight of the user (e.g., participant Mg) of the second electronic device 100 by driving at least one of the eyes E1 and E2 in the face portion Fc of the first electronic device 1. In one embodiment, the driving unit 80 may express the line of sight of the user (e.g., participant Mg) of the second electronic device 100 by driving the movement of at least one of the eyes E1 and E2 in the face portion Fc of the first electronic device 1. Specifically, the driving unit 80 may express the line of sight of the user (e.g., participant Mg) of the second electronic device 100 by moving, for example, at least one of the eyes E1 and E2 in the face portion Fc of the first electronic device 1 in any direction of the arrows shown in FIG. 3.

In one embodiment, the display unit 70 may represent the gaze of the user of the second electronic device 100 (e.g., participant Mg) by displaying, for example, the eyes E1 and E2 in the face portion Fc shown in FIG. 3. In one embodiment, at least one of the display unit 70 and the drive unit 80 may represent the gaze of the user of the second electronic device 100 (e.g., participant Mg) by displaying at least one of the eyes E1 and E2 of the first electronic device 1.

As described above, various operations expressing the emotions and/or behavior of a human being such as participant Mg can be expressed by displaying the display unit 70 and/or driving the drive unit 80. Various known techniques may be used for the operations expressing the emotions and/or behavior of a human being such as participant Mg by displaying the display unit 70 and/or driving the drive unit 80. For this reason, a detailed description of the operations expressing the emotions and/or behavior of a human being such as participant Mg by displaying the display unit 70 and/or driving the drive unit 80 will be omitted. The first electronic device 1 according to one embodiment can perform operations expressing the emotions and/or behavior of participant Mg by displaying the display unit 70 and/or driving the drive unit 80.

In one embodiment, the first electronic device 1 may be a dedicated device as described above. Meanwhile, in one embodiment, the first electronic device 1 may include, for example, an audio output unit 60 and a drive unit 80 among the functional units shown in FIG. 2. In this case, the first electronic device 1 may be connected to another electronic device to supplement at least some of the functions of the other functional units shown in FIG. 2. Here, the other electronic device may be, for example, a general-purpose smartphone, tablet, phablet, notebook computer (notebook PC or laptop), or computer (desktop).

The manner in which various actions expressing the emotions and/or behavior of a human being such as participant Mg are expressed by the display unit 70 and/or the drive unit 80 in the first electronic device 1 shown in FIG. 3 may merely be considered as examples that can be envisioned. The first electronic device 1 according to one embodiment may express various actions expressing the emotions and/or behavior of a human being such as participant Mg by using various configurations and/or operating modes.

FIG. 4 is a block diagram showing a schematic configuration of the second electronic device 100 shown in FIG. 1. An example of the configuration of the second electronic device 100 according to an embodiment will be described below. As shown in FIG. 1, the second electronic device 100 may be, for example, an device used by the participant Mg at his/her home RL. The above-mentioned first electronic device 1 has a function of outputting the voice and/or video of the participants Ma, Mb, Mc, Md, etc. acquired by the first electronic device 1 when the participants Ma, Mb, Mc, Md, etc. speak to the second electronic device 100. The first electronic device 1 can express the gaze of the participant Mg. In addition, the second electronic device 100 has a function of outputting the voice and/or video of the participant Mg acquired by the second electronic device 100 to the first electronic device 1 when the participant Mg speaks. Furthermore, the second electronic device 100 has a function of outputting the gaze information of the participant Mg acquired by the second electronic device 100 to the first electronic device 1. The second electronic device 100 allows the participants Mg to hold a remote conference or video conference even when they are in a location far from the conference room MR. Therefore, the second electronic device 100 is also referred to as an electronic device "used remotely" as appropriate.

As shown in FIG. 4, the second electronic device 100 according to one embodiment may include a control unit 110, a memory unit 120, a communication unit 130, an imaging unit 140, an audio input unit 150, an audio output unit 160, a display unit 170, and a gaze information acquisition unit 200. The control unit 110 may also include, for example, an identification unit 112 and an estimation unit 114. In one embodiment, the second electronic device 100 may not include at least some of the functional units shown in FIG. 4, or may include components other than the functional units shown in FIG. 4.

The control unit 110 controls and/or manages the entire second electronic device 100, including each functional unit constituting the second electronic device 100. The control unit 110 may basically be configured based on the same concept as the control unit 10 shown in FIG. 2, for example. The identification unit 112 and estimation unit 114 of the control unit 110 may also be configured based on the same concept as the identification unit 12 and estimation unit 14 of the control unit 10 shown in FIG. 2, for example.

The storage unit 120 may function as a memory that stores various types of information. The storage unit 120 may store, for example, programs executed in the control unit 110 and results of processing executed in the control unit 110. The storage unit 120 may also function as a work memory for the control unit 110. As shown in FIG. 4, the storage unit 120 may be connected to the control unit 110 via a wired and/or wireless connection. The storage unit 120 may basically be configured based on the same concept as the storage unit 20 shown in FIG. 2, for example.

The communication unit 130 has an interface function for wireless and/or wired communication. The communication unit 130 may wirelessly communicate with, for example, a communication unit of another electronic device, for example, via an antenna. For example, the communication unit 130 may wirelessly communicate with the first electronic device 1 shown in FIG. 1. In this case, the communication unit 130 may wirelessly communicate with the communication unit 30 of the first electronic device 1. In this way, in one embodiment, the communication unit 130 has a function of communicating with the first electronic device 1. Also, for example, the communication unit 130 may wirelessly communicate with the third electronic device 300 shown in FIG. 1. In this case, the communication unit 130 may wirelessly communicate with the communication unit 330 (described later) of the third electronic device 300. In this way, in one embodiment, the communication unit 130 may have a function of communicating with the third electronic device 300. As shown in FIG. 4, the communication unit 130 may be connected to the control unit 110 in a wired and/or wireless manner. The communication unit 130 may basically have a configuration based on the same idea as the communication unit 30 shown in FIG. 2, for example.

The imaging unit 140 may be configured to include an image sensor that captures images electronically, such as a digital camera. The imaging unit 140 may capture images of the interior of the home RL shown in FIG. 1, for example. In one embodiment, the imaging unit 140 may capture images of participants Mg who join a conference from the home RL shown in FIG. 1, for example. The imaging unit 140 may convert the captured images into signals and transmit them to the control unit 110. For this reason, the imaging unit 140 may be connected to the control unit 110 by wire and/or wirelessly. The imaging unit 140 may basically be configured based on the same concept as the imaging unit 40 shown in FIG. 2, for example.

The audio input unit 150 detects (acquires) sounds or voices around the second electronic device 100, including human voices. For example, the audio input unit 150 may detect sounds or voices as air vibrations, for example, with a diaphragm, and convert them into an electrical signal. Specifically, the audio input unit 150 may include an acoustic device that converts sounds into an electrical signal, such as an arbitrary microphone. In one embodiment, the audio input unit 150 may detect (acquire) the voice of the participant Mg in the home RL shown in FIG. 1, for example. The voice (electrical signal) detected by the audio input unit 150 may be input to the control unit 110, for example. For this reason, the audio input unit 150 may be connected to the control unit 110 by wire and/or wirelessly. The audio input unit 150 may basically be configured based on the same concept as the audio input unit 50 shown in FIG. 2, for example.

The audio output unit 160 converts an electrical signal (audio signal) supplied from the control unit 110 into sound, and outputs the audio signal as sound or voice. The audio output unit 160 may be connected to the control unit 110 by wire and/or wirelessly. The audio output unit 160 may be configured to include a device having a function of outputting sound, such as an arbitrary speaker (loudspeaker). In one embodiment, the audio output unit 160 may output a sound detected by the audio input unit 50 of the first electronic device 1. Here, the sound detected by the audio input unit 50 of the first electronic device 1 may be at least one of the voices of the participants Ma, Mb, Mc, and Md in the conference room MR shown in FIG. 1. The audio output unit 160 may basically be configured based on the same idea as the audio output unit 60 shown in FIG. 2, for example.

The display unit 170 may be any display device, such as a Liquid Crystal Display (LCD), an Organic Electro-Luminescence panel, or an Inorganic Electro-Luminescence panel. The display unit 170 may basically be configured based on the same concept as the display unit 70 shown in FIG. 2, for example. Various data required for display on the display unit 170 may be supplied from, for example, the control unit 110 or the memory unit 120. For this reason, the display unit 170 may be connected to the control unit 110, etc., via a wired and/or wireless connection.

The display unit 170 may be, for example, a touch screen display equipped with a touch panel function that detects input by contact with the participant Mg's finger or stylus.

In one embodiment, the display unit 170 may display an image based on the video signal transmitted from the first electronic device 1. The display unit 170 may display images of participants Ma, Mb, Mc, Md, etc. captured by the first electronic device 1 (its imaging unit 40) as an image based on the video signal transmitted from the first electronic device 1. By displaying images of participants Ma, Mb, Mc, Md, etc. on the display unit 170 of the second electronic device 100, for example, participant Mg shown in FIG. 1 can visually know the state of participants Ma, Mb, Mc, Md, etc. in a conference room MR away from his/her home RL.

The display unit 170 may directly display images of the participants Ma, Mb, Mc, Md, etc. captured by the first electronic device 1. On the other hand, the display unit 170 may display images (e.g., avatars) that characterize the participants Ma, Mb, Mc, Md, etc.

The gaze information acquisition unit 200 acquires gaze information of the user of the second electronic device 100 (e.g., participant Mg). The gaze information acquisition unit 200 may acquire gaze information of the user of the second electronic device 100, such as the gaze of the user of the second electronic device 100, the direction of the gaze, and/or the movement of the gaze. The gaze information acquisition unit 200 may have a function of tracking the movement of the gaze of the user of the second electronic device 100 (e.g., participant Mg), such as an eye tracker. The gaze information acquisition unit 200 may be any component capable of acquiring gaze information of the user of the second electronic device 100, such as the gaze of the user of the second electronic device 100, the direction of the gaze, and/or the movement of the gaze.

The second electronic device 100 according to one embodiment may acquire gaze information of a user (e.g., participant Mg) of the second electronic device 100 based on the eye movement of the user captured by the imaging unit 140. In this case, the second electronic device 100 may not include the gaze information acquisition unit 200, or the imaging unit 140 may also function as the gaze information acquisition unit 200. The gaze information acquired by the gaze information acquisition unit 200 may be input to the control unit 110, for example. For this reason, the gaze information acquisition unit 200 may be connected to the control unit 110 via a wired and/or wireless connection.

In one embodiment, the second electronic device 100 may be a dedicated device as described above. Meanwhile, in one embodiment, the second electronic device 100 may include some of the functional units shown in FIG. 4, for example. In this case, the second electronic device 100 may be connected to another electronic device to supplement at least some of the functions of the other functional units shown in FIG. 4. Here, the other electronic device may be, for example, a general-purpose smartphone, tablet, phablet, notebook computer (notebook PC or laptop), or computer (desktop), etc.

In particular, a smartphone or a laptop computer often has almost all of the functional units shown in FIG. 4. For this reason, in one embodiment, the second electronic device 100 may be a smartphone or a laptop computer. In this case, the second electronic device 100 may be a smartphone or a laptop computer with an application (program) installed for linking with the first electronic device 1.

FIG. 5 is a block diagram showing a schematic configuration of the third electronic device 300 shown in FIG. 1. An example of the configuration of the third electronic device 300 according to an embodiment will be described below. The third electronic device 300 may be installed in a location other than the participant Mg's home RL and the conference room MR, as shown in FIG. 1. The third electronic device 300 may be installed in the participant Mg's home RL or nearby, or in the conference room MR or nearby.

The first electronic device 1 has a function of transmitting the audio and/or video data of the participants Ma, Mb, Mc, Md, etc. acquired by the first electronic device 1 to the third electronic device 300 when the participants Ma, Mb, Mc, Md, etc. speak. The third electronic device 300 may transmit the audio and/or video data received from the first electronic device 1 to the second electronic device 100. The second electronic device 100 also has a function of transmitting the audio and/or video data of the participant Mg acquired by the second electronic device 100 to the third electronic device 300 when the participant Mg speaks. The third electronic device 300 may transmit the audio and/or video data received from the second electronic device 100 to the first electronic device 1. In this way, the third electronic device 300 may have a function of relaying between the first electronic device 1 and the second electronic device 100. The third electronic device 100 is also referred to as a "server" as appropriate.

As shown in FIG. 5, the third electronic device 300 according to one embodiment may include a control unit 310, a storage unit 320, and a communication unit 330. The control unit 310 may also include, for example, an identification unit 312 and an estimation unit 314. In one embodiment, the third electronic device 300 may not include at least some of the functional units shown in FIG. 5, or may include components other than the functional units shown in the figure.

The control unit 310 controls and/or manages the entire third electronic device 300, including each functional unit constituting the third electronic device 300. The control unit 310 may basically be configured based on the same concept as the control unit 10 shown in FIG. 2, for example. The identification unit 312 and estimation unit 314 of the control unit 310 may also be configured based on the same concept as the identification unit 12 and estimation unit 14 of the control unit 10 shown in FIG. 2, for example.

The storage unit 320 may function as a memory that stores various types of information. The storage unit 320 may store, for example, programs executed in the control unit 310 and results of processing executed in the control unit 310. The storage unit 320 may also function as a work memory for the control unit 310. As shown in FIG. 5, the storage unit 320 may be connected to the control unit 310 via a wired and/or wireless connection. The storage unit 320 may basically be configured based on the same concept as the storage unit 20 shown in FIG. 2, for example.

The communication unit 330 has an interface function for wireless and/or wired communication. The communication unit 330 may wirelessly communicate with, for example, a communication unit of another electronic device, for example, via an antenna. For example, the communication unit 330 may wirelessly communicate with the first electronic device 1 shown in FIG. 1. In this case, the communication unit 330 may wirelessly communicate with the communication unit 30 of the first electronic device 1. In this way, in one embodiment, the communication unit 330 has a function of communicating with the first electronic device 1. Also, for example, the communication unit 330 may wirelessly communicate with the second electronic device 100 shown in FIG. 1. In this case, the communication unit 330 may wirelessly communicate with the communication unit 130 of the second electronic device 100. In this way, in one embodiment, the communication unit 330 may have a function of communicating with the second electronic device 100. As shown in FIG. 5, the communication unit 330 may be connected to the control unit 310 in a wired and/or wireless manner. The communication unit 330 may basically be configured based on the same idea as the communication unit 30 shown in FIG. 2.

In one embodiment, the third electronic device 300 may be, for example, a specially designed device. On the other hand, in one embodiment, the third electronic device 300 may include, for example, some of the functional units shown in FIG. 5. In this case, the third electronic device 300 may be connected to other electronic devices to supplement at least some of the functions of the other functional units shown in FIG. 5. Here, the other electronic devices may be, for example, devices such as a general-purpose computer or server. In one embodiment, the third electronic device 300 may be, for example, a relay server, a web server, or an application server.

Next, the basic operation of the first electronic device 1 and the second electronic device 100 according to one embodiment will be described. The following description will be given assuming a situation in which a participant Mg participates in a remote conference held in a conference room MR from his/her home RL, as shown in FIG. 1.

That is, the first electronic device 1 according to one embodiment is installed in the conference room MR and acquires video and/or audio of at least one of the participants Ma, Mb, Mc, and Md. The video and/or audio acquired by the first electronic device 1 is transmitted to the second electronic device 100 installed in the home RL of the participant Mg. The second electronic device 100 outputs the video and/or audio of at least one of the participants Ma, Mb, Mc, and Md acquired by the first electronic device 1. This allows the participant Mg to recognize the video and/or audio of at least one of the participants Ma, Mb, Mc, and Md.

On the other hand, the second electronic device 100 according to one embodiment is installed in the home RL of the participant Mg and acquires the voice of the participant Mg. The second electronic device 100 also acquires information on the gaze of the participant Mg. The voice and/or gaze information acquired by the second electronic device 100 is transmitted to the first electronic device 1 installed in the conference room MR. The first electronic device 1 outputs the voice of the participant Mg received from the second electronic device 100. As a result, at least one of the participants Ma, Mb, Mc, and Md can hear the voice of the participant Mg. The first electronic device 1 also expresses the gaze of the participant Mg based on the gaze information of the participant Mg received from the second electronic device 100. As a result, at least one of the participants Ma, Mb, Mc, and Md can visually recognize the state of the gaze of the participant Mg. Furthermore, the second electronic device 100 according to one embodiment may acquire an image of the participant Mg. The image acquired by the second electronic device 100 may be transmitted to the first electronic device 1 installed in the conference room MR. In this case, the first electronic device 1 may output the video of the participant Mg received from the second electronic device 100.

FIG. 6 is a sequence diagram explaining the basic operation of the system according to the embodiment described above. FIG. 6 is a diagram showing the exchange of data and the like between the first electronic device 1, the second electronic device 100, and the third electronic device 300. Below, the basic operation when a remote conference or video conference is held using the system according to the embodiment will be explained with reference to FIG. 6.

In the operation shown in FIG. 6, the first electronic device 1 used locally may be used by the first user. Here, the first user may be, for example, at least one of the participants Ma, Mb, Mc, and Md shown in FIG. 1 (hereinafter also referred to as a local user). The second electronic device 100 used remotely may be used by the second user. Here, the second user may be, for example, the participant Mg shown in FIG. 1 (hereinafter also referred to as a remote user). Hereinafter, the operation performed by the first electronic device 1 may be, in more detail, performed by, for example, the control unit 10 of the first electronic device 1. In this specification, the operation performed by the control unit 10 of the first electronic device 1 may be referred to as the operation performed by the first electronic device 1. Similarly, the operation performed by the second electronic device 100 may be, in more detail, performed by, for example, the control unit 110 of the second electronic device 100. In this specification, the operation performed by the control unit 110 of the second electronic device 100 may be referred to as the operation performed by the second electronic device 100. Furthermore, the operations performed by the third electronic device 300 may be more specifically performed by, for example, the control unit 310 of the third electronic device 300. In this specification, the operations performed by the control unit 310 of the third electronic device 300 may be referred to as operations performed by the third electronic device 300.

6 starts, the first electronic device 1 acquires at least one of the video and audio of the first user (e.g., at least one of the participants Ma, Mb, Mc, and Md) (step S1). Specifically, in step S1, the first electronic device 1 may capture the video of the first user using the imaging unit 40 and acquire (or detect) the audio of the first user using the audio input unit 50. Next, the first electronic device 1 encodes at least one of the video and audio of the first user (step S2). In step S2, encoding may mean compressing the video and/or audio data according to a predetermined rule and converting it into a format according to the purpose, including encryption. The first electronic device 1 may perform various known encoding methods, such as software encoding or hardware encoding.

Next, the first electronic device 1 transmits the encoded video and/or audio data to the third electronic device 300 (step S3). Specifically, in step S3, the first electronic device 1 transmits the video and/or audio data from the communication unit 30 to the communication unit 330 of the third electronic device 300. Also in step S3, the third electronic device 300 receives the video and/or audio data transmitted from the communication unit 30 of the first electronic device 1 via the communication unit 330.

Next, the third electronic device 300 transmits the encoded video and/or audio data received from the communication unit 30 to the second electronic device 100 (step S4). Specifically, in step S4, the third electronic device 300 transmits the video and/or audio data from the communication unit 330 to the communication unit 130 of the second electronic device 100. Also, in step S4, the second electronic device 100 receives the video and/or audio data transmitted from the communication unit 330 of the third electronic device 300 via the communication unit 130.

Then, the second electronic device 100 decodes the encoded video and/or audio data received from the communication unit 330 (step S5). In step S5, decoding may mean returning the format of the encoded video and/or audio data to its original format. The second electronic device 100 may perform various known decoding methods, such as software encoding or hardware encoding.

Next, the second electronic device 100 presents at least one of the video and audio of the first user (e.g., at least one of participants Ma, Mb, Mc, and Md) to the second user (e.g., participant Mg) (step S6). Specifically, in step S6, the second electronic device 100 may display the video of the first user on the display unit 170 and output the audio of the first user from the audio output unit 160.

By performing the operations of steps S1 to S6, for example, a second user (e.g., participant Mg) at home RL can recognize the video and/or audio of a first user (e.g., at least one of participants Ma, Mb, Mc, and Md) in a conference room MR.

The above describes a manner in which the first electronic device 1 transmits video and/or audio of the first user to the second electronic device 100 via the third electronic device 300. By reversing the procedure, the second electronic device 100 can transmit audio and/or gaze information of the second user to the first electronic device 1 via the third electronic device 300.

That is, the second electronic device 100 acquires at least one of the voice and gaze information of the second user (e.g., participant Mg) (step S11). Specifically, in step S11, the second electronic device 100 may acquire (or detect) the voice of the second user by the voice input unit 150. Also, in step S11, the second electronic device 100 may acquire gaze information of the second user by the gaze information acquisition unit 200. Next, the second electronic device 100 encodes at least one of the voice and gaze information of the second user (step S12).

Next, the second electronic device 100 transmits the encoded voice and/or gaze data to the third electronic device 300 (step S13). Specifically, in step S13, the second electronic device 100 transmits the voice and/or gaze data from the communication unit 130 to the communication unit 330 of the third electronic device 300. Also in step S13, the third electronic device 300 receives the voice and/or gaze data transmitted from the communication unit 130 of the second electronic device 100 via the communication unit 330.

Next, the third electronic device 300 transmits the encoded voice and/or gaze data received from the communication unit 130 to the first electronic device 1 (step S14). Specifically, in step S14, the third electronic device 300 transmits the voice and/or gaze data from the communication unit 330 to the communication unit 30 of the first electronic device 1. Also, in step S14, the first electronic device 1 receives the voice and/or gaze data transmitted from the communication unit 330 of the third electronic device 300 via the communication unit 30.

Next, the first electronic device 1 decodes the encoded voice and/or gaze data received from the communication unit 330 (step S15).

Next, the first electronic device 1 presents at least one of the voice and/or gaze of the second user (e.g., participant Mg) to the first user (e.g., at least one of participants Ma, Mb, Mc, and Md) (step S16). Specifically, in step S16, the first electronic device 1 may output the voice of the second user from the audio output unit 60. Also, in step S16, the first electronic device 1 may express the gaze of the second user by driving the drive unit 80.

By performing the operations of steps S11 to S16, for example, a first user (e.g., at least one of participants Ma, Mb, Mc, and Md) in the conference room MR can recognize the voice and/or line of sight of a second user (e.g., participant Mg) in the home RL.

The operations from step S1 to step S6 and the operations from step S11 to step S16 may be executed in the reverse order. That is, the operations from step S11 to step S16 may be executed first, and then the operations from step S1 to step S6. Furthermore, the operations from step S1 to step S6 and the operations from step S11 to step S16 may be executed simultaneously, or may be executed so that they at least partially overlap.

Here, we will explain some of the issues that may arise when using remote or video conferencing in the manner described above.

For example, it is possible to assume that the results of eye tracking of the gaze of the user (participant Mg) of the second electronic device 100 by the gaze information acquisition unit 200 are always reflected in the gaze expression by the drive unit 80 of the first electronic device 1. However, even with such control, it is possible that the acquisition of gaze information by the gaze information acquisition unit 200 of the second electronic device 100 and/or the gaze expression by the drive unit 80 of the first electronic device 1 cannot keep up with the actual gaze movement of the participant Mg. In addition, as described above, if the gaze expression by the drive unit 80 of the first electronic device 1 is always made to follow the actual gaze movement of the participant Mg, it is also assumed that the frequency of the gaze movement by the drive unit 80 of the first electronic device 1 becomes too high. If the frequency of the gaze movement by the drive unit 80 of the first electronic device 1 becomes too high, it may cause discomfort to the participants Ma, Mb, Mc, and Md who visually observe the movement of the first electronic device 1.

It is also conceivable that the driver 80 of the first electronic device 1 may express the gaze when the actual gaze of the participant Mg is fixed for a predetermined time, such as three seconds. However, even with this type of control, the driver 80 of the first electronic device 1 does not express the gaze until the predetermined time, such as three seconds, has elapsed, making it difficult to improve the real-time nature of the gaze expression.

Furthermore, instead of acquiring gaze information of the user of the second electronic device 100 by the gaze information acquisition unit 200, the first electronic device 1 may automatically express the gaze. For example, the first electronic device 1 may identify participants Ma, Mb, Mc, Md, etc., and control the drive unit 80 to drive the gaze toward the identified participants. However, in such control, the gaze of the user of the second electronic device 100 (participant Mg) is not reflected, and participants Ma, Mb, Mc, Md, etc. cannot visually recognize the gaze movement of participant Mg.

In order for smooth communication to take place during a remote conference, it is desirable for the gaze of a participant joining the remote conference from a remote location to be properly recognized by the other participants. Therefore, a system according to one embodiment realizes a situation in which the gaze of a user of an electronic device used remotely is properly recognized by a user of an electronic device used locally.

Next, a characteristic operation of the system according to an embodiment will be described. FIG. 7 is a flowchart illustrating a characteristic operation of the system according to an embodiment. The operation shown in FIG. 7 may be executed by at least one of the first electronic device 1, the second electronic device 100, and the third electronic device 300 included in the system according to an embodiment. Hereinafter, the operation shown in FIG. 7 will be described as being executed by the control unit 310 of the third electronic device 300. However, in the system according to an embodiment, the operation shown in FIG. 7 may be executed by the control unit 10 of the first electronic device 1, or may be executed by the control unit 110 of the second electronic device 100.

The operations shown in FIG. 7 may be executed in parallel with the operations shown in FIG. 6. The operations shown in FIG. 7 may also be executed so as to interrupt the operations shown in FIG. 6 while they are being performed. On the other hand, the operations shown in FIG. 6 may also be executed so as to interrupt the operations shown in FIG. 7 while they are being performed.

Below, referring to FIG. 7, a description will be given of characteristic operations when a remote conference or video conference is held using a system according to one embodiment. The encoding and decoding of data described in FIG. 5 may use known technology. For this reason, a description of the encoding and decoding of data will be omitted in FIG. 7. Below, a description of content that is the same or similar to that already described in FIG. 6 may be simplified or omitted as appropriate.

At the time when the operation shown in FIG. 7 starts, the first electronic device 1 is assumed to be ready to acquire at least one of the video and audio of the first user (e.g., at least one of the participants Ma, Mb, Mc, and Md). The first electronic device 1 is also assumed to be ready to transmit at least one of the video and audio of the first user that it has acquired to the third electronic device 300. Furthermore, the first electronic device 1 is assumed to be ready to receive various types of information transmitted from the third electronic device 300.

Similarly, at the time when the operation shown in FIG. 7 starts, the second electronic device 100 is assumed to be ready to acquire at least one of the voice and gaze information of the second user (e.g., participant Mg). Also, the second electronic device 100 is assumed to be ready to transmit at least one of the voice and gaze information of the second user acquired to the third electronic device 300. Furthermore, the second electronic device 100 is assumed to be ready to receive various types of information transmitted from the third electronic device 300.

7 starts, the control unit 310 determines whether or not a voice spoken by any of the first users has been acquired by the first electronic device 1 (step S101). Here, the first user may be, for example, at least one of the participants Ma, Mb, Mc, and Md, and any of the first users may be, for example, participant Mc. The first electronic device 1 may acquire the voice when any of the first users (participant Mc in this case) starts a conversation, for example, and transmit it to the third electronic device 300.

Next, the control unit 310 identifies a speaker among the first users who is speaking (currently speaking) based on at least one of the video and audio of the first user acquired by the first electronic device 1 (step S102). That is, in step S102, the control unit 310 may identify a speaker (e.g., one or more) among a possible plurality of first users. Here, the control unit 310 may identify participant Mc as the speaker based on the video of multiple participants including participant Mc and the audio of participant Mc acquired by the first electronic device 1.

In step S102, the control unit 310 may use various techniques to identify the speaker of the first user based on at least one of the video and audio of the first user. For example, the control unit 310 may identify the speaker of the first user by performing person detection from the video (image) of the first user and estimating the direction of the sound source from the audio of the first user. In this case, the direction of the sound source may be estimated in the section where the audio of the first user is detected. The control unit 310 may also perform person detection from the video (image) of the first user and determine whether the first user is speaking from the video (image) of the first user's mouth. In this case, the control unit 310 may also appropriately perform processing such as lip reading from the video (image) of the first user's mouth. The control unit 310 may also perform person detection from the video (image) of the first user and identify the speaker of the first user by detecting the body movement (behavior) of the first user. As described above, the control unit 310 may identify the speaker of the first user by any process based on at least one of the video and audio of the first user.

Next, the control unit 310 identifies the position of the speaker in the image of the first user acquired by the first electronic device 1 (step S103). In step S102, the control unit 310 identified the speaker from among the possible multiple first users. Then, in step S103, the control unit 310 identifies the position of the speaker (here, participant Mc) in the image containing the possible multiple first users. Typically, the control unit 310 may identify the coordinates of the position of the speaker (here, participant Mc) in the image of the first user.

The processing of steps S102 and S103 may be executed by the control unit 310, for example, by the identification unit 112.

Next, the control unit 310 determines whether or not gaze information of the second user (participant Mg) has been acquired by the second electronic device 100 (step S104). The process performed when gaze information of the second user has not been acquired in step S104 will be described later.

When the second user's gaze information is acquired in step S104, the control unit 310 estimates where the second user's gaze is directed in the image of the first user (step S105). That is, in step S105, the control unit 310 estimates (acquires) the position where the second user's gaze is directed in the image of the first user, based on the second user's gaze information acquired by the second electronic device 100.

To execute the process of step S105, the position (coordinates) in the image of the first user and the position of the direction of the gaze of the second user may be associated with each other. For this association, for example, the position (two-dimensional coordinates) in the image of the first user may be converted into a position (three-dimensional coordinates) in real space of the direction of the gaze of the second user. Also, for example, the position (three-dimensional coordinates) in real space of the direction of the gaze of the second user may be converted into a position (two-dimensional coordinates) in the image of the first user.

The processing of step S105 may be executed by the control unit 310, for example, by the estimation unit 114.

Next, the control unit 310 determines whether the position identified in step S103 and the position estimated in step S105 are within a predetermined distance (step S106). That is, the control unit 310 determines whether the position of the speaker in the video of the first user and the position in the video of the first user where the gaze of the second user is directed are within a predetermined distance. When there are multiple speakers, the control unit 310 may individually determine whether the position of each speaker in the video of the first user and the position in the video of the first user where the gaze of the second user is directed are within a predetermined distance.

When the determination in step S106 is positive (YES in step S106), it means that the position where the second user's gaze is directed is relatively close to the position of the speaker. That is, in this case, it may be determined that the second user is directing his/her gaze at the speaker. Therefore, in this case, the control unit 310 may control the first electronic device to indicate that the second user's gaze is directed at the speaker (step S107). That is, in this case, the control unit 310 may drive the drive unit 80 by controlling the first electronic device 1 so that the gaze of the first electronic device 1 is directed (facing) the speaker. When there are multiple speakers within a predetermined distance from the position where the second user's gaze is directed in the video of the first user, the control unit 310 may control the first electronic device to indicate that the second user's gaze is directed at the speaker who is closest to the position where the second user's gaze is directed. In this way, in a system according to one embodiment, the control unit 310 may control the first electronic device 1 to indicate the position to which the second user's gaze is most directed in the video of the first user.

On the other hand, if the determination in step S106 is negative (NO in step S106), this means that the position towards which the second user's gaze is directed is relatively far from the position of the speaker. That is, in this case, it may be determined that the second user is not directing his/her gaze at the speaker. Therefore, in this case, the control unit 310 may control the first electronic device 1 to indicate that the second user's gaze is not directed at the speaker (step S108). That is, in this case, the control unit 310 may control the first electronic device 1 to drive the drive unit 80 so that the gaze of the first electronic device 1 is not directed at (not directly facing) the speaker.

Furthermore, if the information on the second user's line of sight is not acquired in step S104, the second user's line of sight cannot be reflected in the first electronic device 1. Therefore, in this case as well, the control unit 310 may perform the operation of step S108.

The above-mentioned operation can be specifically described as follows for a remote conference as shown in FIG. 1. For example, when participant Mc in the conference room MR speaks, participant Mg in the home RL can recognize the state in which participant Mc is speaking via the display unit 170 of the second electronic device 100. Now, assume that participant Mg in the home RL turns his/her gaze on participant Mc who is speaking on the display unit 170 of the second electronic device 100. In this case, the gaze of the first electronic device 1 in the conference room MR is directed at participant Mc. Therefore, participant Mc can recognize the situation in which participant Mg in the home RL is turning his/her gaze at participant Mc. In addition, other participants in the conference room MR, such as participant Ma, participant Mb, and/or participant Md, can also recognize the situation in which participant Mg in the home RL is turning his/her gaze at participant Mc.

As described above, the system according to one embodiment can control the direction of gaze of the first electronic device 1 using the speaker's speech as a trigger. Therefore, the system according to one embodiment can control the gaze of a participant who is participating in a remote conference at home, for example, while reflecting the gaze of the participant on the first electronic device 1 to an extent that does not cause discomfort to the other participants. Furthermore, the system according to one embodiment can control the first electronic device 1 to instantly direct its gaze in response to the speaker's speech. Therefore, the system according to one embodiment can facilitate communication between multiple locations.

In step S103 of FIG. 7, the control unit 310 (identification unit 312) may identify the position in real space of the speaker of the first user (e.g., participant Mc) based on the position of the first electronic device 1 in real space (e.g., conference room MR). In this way, the position of the speaker of the first user can be identified more accurately. Then, if the determination in step S106 is positive (YES in step S106), the control unit 130 may control the line of sight of the second user (participant Mg) represented by the first electronic device 1 to be directed toward the position of the speaker (participant Mc) in real space.

In addition, in step S103 of FIG. 7, when the control unit 310 (identification unit 312) identifies the position of the speaker of the first user, the control unit 310 (identification unit 312) may determine that the positions of the candidate speakers are each areas having a predetermined area in the image of the first user. In other words, the identification unit 312 may identify the speaker of the first user depending on whether the position of the speaker estimated based on the voice of the first user is included in each area of the first user that is set based on each position of the first user in the image of the first user.

Next, other characteristic operations of the system according to one embodiment will be described. FIG. 8 is a flowchart illustrating the characteristic operations of the system according to one embodiment. The operations shown in FIG. 8 are partial modifications of the operations shown in FIG. 7. Therefore, descriptions that are the same or similar to those already explained in FIG. 7 will be omitted as appropriate.

8 starts, the control unit 310 identifies each first user based on the video of the first user acquired by the first electronic device 1 (step S201). That is, in step S201, the control unit 310 may identify each of the first users, of which there may be multiple. Here, the control unit 310 may identify, for example, participant Ma, participant Mb, participant Mc, and participant Md, based on the video of multiple participants including participant Mc acquired by the first electronic device 1.

In step S201, the control unit 310 may use various techniques to identify each first user based on the video of the first user. For example, the control unit 310 may perform person detection from the video (image) of the first user to identify each first user. The control unit 310 may identify each first user by any processing based on the video of the first user. Furthermore, when the voice of the first user can be acquired, the control unit 310 may also identify each first user by taking into account an estimation of the direction of the sound source from the voice of the first user. The control unit 310 may identify each first user by any processing based on at least one of the video and voice of the first user.

Next, the control unit 310 identifies the position of each first user in the image of the first user acquired by the first electronic device 1 (step S202). In step S201, the control unit 310 identified the speaker from among multiple possible first users. Then, in step S202, the control unit 310 identifies the position of each first user in the image including multiple possible first users. Typically, the control unit 310 may identify the coordinates of the position of each first user (here, for example, participant Ma, participant Mb, participant Mc, and participant Md) in the image of the first users.

The processing of steps S201 and S202 may be executed by the control unit 310, for example, by the identification unit 112.

Next, the control unit 310 determines whether or not gaze information of the second user (participant Mg) has been acquired by the second electronic device 100 (step S104). The process performed in step S104 may be similar to step S104 shown in FIG. 7. The process performed when gaze information of the second user has not been acquired will be described later.

When the second user's gaze information is acquired in step S104, the control unit 310 estimates where the second user's gaze is directed in the image of the first user (step S105). That is, in step S105, the control unit 310 estimates (acquires) the position where the second user's gaze is directed in the image of the first user based on the second user's gaze information acquired by the second electronic device 100. The process performed in step S105 may be the same as step S105 shown in FIG. 7.

Next, the control unit 310 determines whether any of the positions identified in step S202 and the position estimated in step S105 are within a predetermined distance (step S203). That is, the control unit 310 determines whether any of the positions of the first user in the first user's video and the position to which the second user's gaze is directed in the first user's video are within a predetermined distance.

If the determination in step S203 is positive (YES in step S203), this means that the position to which the second user is looking is relatively close to the position of one of the first users. In other words, in this case, it may be determined that the second user is looking at one of the first users.

If the determination in step S203 is positive (YES in step S203), the control unit 310 determines whether the voice of the second user has been acquired by the second electronic device 100 (step S204). The second electronic device 100 may acquire the voice when the second user starts a conversation, for example, and transmit the voice to the third electronic device 300.

If the voice of the second user is acquired in step S204, the control unit 310 may control the first electronic device to indicate that the second user's gaze is directed toward one of the first users based on the voice of the second user (step S205). That is, in this case, the control unit 310 may control the first electronic device 1 to drive the drive unit 80 so that the gaze of the first electronic device 1 is directed toward (facing) one of the first users.

On the other hand, if the determination in step S203 is negative (NO in step S203), this means that the position towards which the second user's gaze is directed is relatively far from the positions of any of the first users. That is, in this case, it may be determined that the second user is not directing his/her gaze towards any of the first users. Therefore, in this case, the control unit 310 may control the first electronic device 1 to indicate that the gaze of the second user is not directed towards any of the first users (step S206). That is, in this case, the control unit 310 may control the first electronic device 1 to drive the drive unit 80 so that the gaze of the first electronic device 1 is not directed towards (not directly facing) any of the first users.

In addition, if the second user's gaze information is not acquired in step S104, the second user's gaze cannot be reflected in the first electronic device 1. Therefore, in this case too, the control unit 310 may perform the operation of step S206. In addition, if the second user's voice is not acquired in step S204, the second user's gaze does not have to be reflected in the first electronic device 1. Therefore, in this case too, the control unit 310 may perform the operation of step S206.

The above-mentioned operation will be specifically described as follows for a remote conference as shown in FIG. 1. For example, when a remote conference in the conference room MR starts, the participant Mg in the home RL can recognize the participants Ma, Mb, Mc, and Md through the display unit 170 of the second electronic device 100. Assume that the participant Mg in the home RL starts speaking to the participant Mc while looking at the participant Mc on the display unit 170 of the second electronic device 100. In this case, in the conference room MR, the first electronic device 1 turns its gaze to the participant Mc in response to the speech of the participant Mg in the home RL. Therefore, the participant Mc can recognize the situation in which the participant Mg in the home RL is speaking while looking at the participant Mc. In addition, other participants in the conference room MR, such as the participant Ma, the participant Mb, and/or the participant Md, can also recognize the situation in which the participant Mg in the home RL is speaking while looking at the participant Mc.

In this way, the system according to one embodiment may include, for example, the first electronic device 1, the second electronic device 100, and the control unit 310. The first electronic device 1 acquires an image of at least one first user. The second electronic device 100 outputs the image of the first user to the second user and acquires information on the line of sight of the second user. The control unit 310 controls the position of the line of sight of the second user in the image of the first user so that it is indicated by the first electronic device 1. As described above, the system according to one embodiment can control the direction of the line of sight of the first electronic device 1 using the speech of a speaker as a trigger. Therefore, the system according to one embodiment can control the line of sight of a participant participating in a remote conference at home or the like while reflecting the line of sight of the participant on the first electronic device 1 to a degree that does not cause discomfort to other participants. Furthermore, the system according to one embodiment can control the first electronic device 1 to immediately direct its line of sight in response to the speech of the speaker. Therefore, the system according to one embodiment can facilitate communication between multiple locations.

In step S202 of FIG. 8, the control unit 310 (identification unit 312) may identify the position of each of the first users in real space based on the position of the first electronic device 1 in real space (e.g., conference room MR). In this way, the position of each of the first users can be identified more accurately. Then, if the determination in step S203 is positive (YES in step S203), the control unit 130 may control the line of sight of the second user (participant Mg) represented by the first electronic device 1 to be directed toward the position of one of the first users in real space.

Although the embodiments of the present disclosure have been described based on the drawings and examples, it should be noted that those skilled in the art would easily be able to make various modifications or corrections based on the present disclosure. Therefore, it should be noted that these modifications or corrections are included in the scope of the present disclosure. For example, the functions included in each component or step can be rearranged so as not to cause logical inconsistencies, and multiple components or steps can be combined into one or divided. Although the embodiments of the present disclosure have been described mainly with respect to the device, the embodiments of the present disclosure can also be realized as a method including steps executed by each component of the device. The embodiments of the present disclosure can also be realized as a method, a program executed by a processor or the like included in the device, or a storage medium or storage medium on which a program is recorded. It should be understood that these are also included in the scope of the present disclosure.

For example, the control unit 310 may execute the process of step S205 instead of the process of step S204 shown in FIG. 8. In this case, even if the second user is not speaking, the control unit 310 can control the first electronic device 1 to drive the drive unit 80 so that the line of sight of the first electronic device 1 is directed (facing) towards one of the first users.

In the embodiment described above, the control unit 310 makes the determination using a predetermined distance in step S106 shown in FIG. 7 and step S203 shown in FIG. 8. However, in one embodiment, the control unit 310 may identify a speaker who satisfies other conditions or a position (coordinates) in the video of the first user as the position toward which the second user's gaze is directed.

For example, in order for the control unit 310 to execute the process of step S105, in addition to associating the position (coordinates) in the image of the first user with the position to which the gaze of the second user is directed, other control may also be performed. In this case, the control unit 310 may further associate, for example, the time or number of times that the gaze of the second user is directed to the position (coordinates) in the image of the first user. Then, instead of the determination using a predetermined distance in step S106 or step S203, the control unit 310 may execute the following process. That is, the control unit 310 may determine the position (coordinates) in the image of the first user to which the gaze of the second user is directed for the longest time or the most number of times during a predetermined period in the past up to the time when the process is executed. In this case, the control unit 310 may execute the process in step S107 or step S205 with the position determined as described above as the position to be used for controlling the first electronic device.

For example, the control unit 310 may execute the following process to execute the process of step S105. That is, the control unit 310 may not only associate the position (coordinates) in the image of the first user with the position of the second user's gaze, but may also further associate an evaluation value according to the time or number of times that the gaze of the second user was directed to the position (coordinates) in the image of the first user. Then, instead of the determination using a predetermined distance in step S106 or step S203, the control unit 310 may execute the following process. That is, the control unit 310 may find the position (coordinates) in the image of the first user that has the highest evaluation value among the evaluation values associated with the positions (coordinates) in the image of the first user during a predetermined period in the past up to the time when the process is executed. In this case, the control unit 310 may execute the process in step S107 or step S205 as the position to be used for controlling the first electronic device. The evaluation value may be set highest for the position (coordinates) in the video of the first user that corresponds to the position of the gaze of the second user, and the evaluation values assigned to the coordinates around this position may gradually decrease as the distance from the position increases. Also, instead of associating an evaluation value with each coordinate in the video of the first user, the video of the first user may be divided into a number of regions, and an evaluation value may be associated with each divided region.

Furthermore, for example, in order for the control unit 310 to execute the process of step S105, an evaluation value according to the position of the first user and/or the behavior (movement) of the first user that attracts the attention of the second user may be added to the above-mentioned evaluation value. For example, an evaluation value may be set in advance for each of the actions such as the position of the first user or the speaker, the speaker's speech volume, the physical movement of the first user, the line of sight of the first user, and/or the facial movement of the first user. Then, the evaluation value based on the position and/or the behavior of the first user may be added to the evaluation value of the corresponding position (coordinates) in the video of the first user. Then, instead of the determination using a predetermined distance in step S106 or step S203, the control unit 310 may execute the following process. That is, the control unit 310 may determine the position (coordinates) in the video of the first user that has the highest evaluation value among the evaluation values associated with the positions (coordinates) in the video of the first user during a predetermined period in the past from the time of executing the process. In this case, the control unit 310 may execute processing in step S107 or step S205 as a position to be used for controlling the first electronic device. The evaluation value may be set highest for the position (coordinates) in the image of the first user corresponding to the position and behavior of the first user, and evaluation values may be assigned to the coordinates around this position that gradually decrease as they move away from the position. In this way, the control unit 310 may control the position of the gaze of the second user to be indicated by the first electronic device 1 based on various conditions. For example, the control unit 310 may control the position of the gaze of the second user to be indicated by the first electronic device 1 based on the position of the first user in the image of the first user, at least one of the actions of the first user, and the position to which the gaze of the second user is most directed in the image of the first user.

The above-described embodiments are not limited to implementation as a system. For example, the above-described embodiments may be implemented as a control method for a system, or as a program executed in a system. For example, the above-described embodiments may be implemented as at least one of the first electronic device 1, the second electronic device 100, and the third electronic device 300. The above-described embodiments may be implemented as a control method for at least one of the first electronic device 1, the second electronic device 100, and the third electronic device 300. Furthermore, the above-described embodiments may be implemented as a program executed by at least one of the first electronic device 1, the second electronic device 100, and the third electronic device 300, or as a storage medium or recording medium on which the program is recorded.

For example, the above-described embodiment may be implemented as the first electronic device 1. In this case, the first electronic device 1 may be configured to be able to communicate with the second electronic device 100. The first electronic device 1 may include an acquisition unit, an identification unit, an estimation unit, and a control unit. The acquisition unit acquires an image of at least one first user. The identification unit identifies each first user based on the image of the first user acquired by the acquisition unit, and identifies the position of each first user in the image of the first user acquired by the acquisition unit. The estimation unit estimates the position of the second user's gaze direction in the image of the first user based on information on the gaze of the second user acquired by the second electronic device. The control unit determines whether or not the position of any of the first users in the image of the first user and the position of the second user's gaze direction in the image of the first user are within a predetermined distance. Depending on the determination result, the control unit controls the first electronic device 1 to indicate that the gaze of the second user is directed toward any of the first users based on the voice of the second user acquired by the second electronic device 200.

LIST OF SYMBOLS 1 First electronic device 10 Control unit 12 Identification unit 14 Estimation unit 20 Memory unit 30 Communication unit 40 Imaging unit 50 Audio input unit 60 Audio output unit 70 Display unit 80 Driving unit 100 Second electronic device 110 Control unit 112 Identification unit 114 Estimation unit 120 Memory unit 130 Communication unit 140 Imaging unit 150 Audio input unit 160 Audio output unit 170 Display unit 200 Line-of-sight information acquisition unit 300 Third electronic device 310 Control unit 312 Identification unit 314 Estimation unit 320 Memory unit 330 Communication unit N Network

Claims

a first electronic device that captures an image of at least one first user;
a second electronic device that outputs a video of the first user to a second user and acquires information on a line of sight of the second user;
a control unit that controls a position of a gaze of the second user in the image of the first user so as to be indicated by the first electronic device;
Including, the system.
The first electronic device acquires video and audio of the at least one first user;
The second electronic device outputs the video and audio of the first user acquired from the first electronic device to the second user and acquires line-of-sight information of the second user; and
The control unit is
Identifying a speaker included in the first user who is currently speaking based on at least one of an image and an audio of the first user, and identifying a position of the speaker in the image of the first user;
acquiring a position of a destination of the gaze of the second user in the video of the first user based on information of the gaze of the second user;
and controlling the first electronic device to indicate that the line of sight of the second user is directed toward the speaker, based on a position of the speaker in the image of the first user and a position of a destination of the line of sight of the second user in the image of the first user.
The system of claim 1 .
The control unit is
Identifying a position in real space of the speaker based on a position in real space of the first electronic device;
The system described in claim 2, wherein when the position of the speaker in the image of the first user and the position to which the second user's gaze is directed in the image of the first user are within a predetermined distance, the gaze of the second user represented by the first electronic device is controlled to be directed toward the speaker's position in real space.
The control unit is
The system of claim 2, wherein the speaker of the first user is identified depending on whether or not the position of the speaker identified based on the voice of the first user is included in an area of each of the first users that is set based on the position of each of the first users in the video of the first user.
The system of claim 2, wherein the control unit controls the first electronic device to indicate that the second user's gaze is not directed toward the speaker when the position of the speaker in the image of the first user and the position of the second user's gaze in the image of the first user are not within a predetermined distance.
The system according to claim 2, wherein the control unit controls the first electronic device to indicate that the second user's gaze is not directed toward the speaker when gaze information of the second user cannot be obtained.
the second electronic device receives a video of the first user acquired from the first electronic device, outputs the video to the second user, and acquires information on the line of sight of the second user;
The control unit is
Identifying each of the first users based on the video of the first users, and identifying a position of each of the first users in the video of the first users acquired by the first electronic device;
acquiring a position of a destination of the gaze of the second user in the video of the first user based on information of the gaze of the second user;
Controlling the first electronic device to indicate the line of sight of the second user based on a position of the first user in the image of the first user and a position of a destination of the line of sight of the second user in the image of the first user.
The system of claim 1 .
The second electronic device further acquires a voice of the second user;
The system described in claim 7, wherein the control unit controls the first electronic device to indicate, based on the voice of the second user, that the second user's gaze is directed toward one of the first users when a predetermined distance is between a position of any of the first users in the image of the first user and a position in the image of the first user toward which the second user's gaze is directed.
The control unit is
Identifying a position in the real space of each of the first users based on a position in the real space of the first electronic device;
The system described in claim 8, wherein when a position of the first user in the image of the first user and a position to which the second user's gaze is directed in the image of the first user are within a predetermined distance, the gaze of the second user represented by the first electronic device is controlled to be directed toward a position of the first user in real space.
The system of claim 7, wherein the control unit controls the first electronic device to indicate that the second user's gaze is not directed toward any of the first users when a position of any of the first users in the image of the first user and a position to which the second user's gaze is directed in the image of the first user are not within a predetermined distance.
The system according to claim 7, wherein the control unit controls the first electronic device to indicate the position to which the second user's gaze is most directed in the image of the first user.
The system of claim 7, wherein the control unit controls the position of the gaze of the second user to be indicated by the first electronic device based on the position of the first user in the image of the first user, at least one of the actions of the first user, and the position to which the gaze of the second user is most directed in the image of the first user.
The system according to claim 7, wherein the control unit controls the first electronic device to indicate that the second user's line of sight is not directed toward any of the first users when information about the second user's line of sight cannot be obtained.
The system according to any one of claims 1 to 13, wherein the first electronic device is provided with a display unit that displays the line of sight and/or the direction of the line of sight of the second user through an image.
The system according to any one of claims 1 to 13, wherein the first electronic device includes a drive unit that expresses the line of sight and/or the direction of the line of sight of the second user by driving a mechanical structure.
An electronic device configured to be able to communicate with other electronic devices,
an acquisition unit that acquires a video of at least one first user;
a control unit that controls a position of a line of sight of a second user using the other electronic device in the image of the first user so that the electronic device indicates the position;
An electronic device comprising:
A first electronic device acquires an image of at least one first user;
A step of a second electronic device outputting a video of the first user to a second user;
The second electronic device acquires information of a line of sight of the second user;
controlling a position of a gaze direction of the second user in the image of the first user to be indicated by the first electronic device;
A method for controlling a system, comprising:
On the computer,
A first electronic device acquires an image of at least one first user;
A step of a second electronic device outputting a video of the first user to a second user;
The second electronic device acquires information of a line of sight of the second user;
controlling a position of a gaze direction of the second user in the image of the first user to be indicated by the first electronic device;
A program to execute.