WO2024070550A1 - System, electronic device, system control method, and program - Google Patents

System, electronic device, system control method, and program Download PDF

Info

Publication number
WO2024070550A1
WO2024070550A1 PCT/JP2023/032576 JP2023032576W WO2024070550A1 WO 2024070550 A1 WO2024070550 A1 WO 2024070550A1 JP 2023032576 W JP2023032576 W JP 2023032576W WO 2024070550 A1 WO2024070550 A1 WO 2024070550A1
Authority
WO
WIPO (PCT)
Prior art keywords
electronic device
user
unit
video
response timing
Prior art date
Application number
PCT/JP2023/032576
Other languages
French (fr)
Japanese (ja)
Inventor
石田 華子 瀬戸
隆行 荒川
Original Assignee
京セラ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京セラ株式会社 filed Critical 京セラ株式会社
Publication of WO2024070550A1 publication Critical patent/WO2024070550A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working

Definitions

  • This disclosure relates to a system, an electronic device, a method for controlling the system, and a program.
  • remote conferences such as web conferences or video conferences
  • electronic devices or systems including electronic devices
  • audio and/or video of the conference in the office is acquired by, for example, an electronic device installed in the office, and transmitted to, for example, an electronic device installed in the participant's home.
  • audio and/or video at the participant's home is acquired by, for example, an electronic device installed in the participant's home, and transmitted to, for example, an electronic device installed in the office.
  • Such electronic devices allow a conference to be held without all participants gathering in the same place.
  • Patent Document 1 discloses a device that displays a graphic that represents the output range of directional sound output by a speaker, superimposed on an image captured by a camera. This device makes it possible to visually grasp the output range of directional sound.
  • Patent Document 2 discloses a system in which, when a speaker and a listener in separate locations are engaged in a conversation, a listener robot is attached to the speaker's side, and a speaker robot is attached to the listener's side.
  • the system includes: a first electronic device that acquires at least one of video and audio of a first user; a second electronic device configured to be able to communicate with the first electronic device and configured to output at least one of a video and a voice of the first user acquired by the first electronic device to a second user who responds to a speech of the first user; an estimation unit that estimates a response timing of the second user responding to an utterance of the first user based on at least one of a video and a voice of the first user; a control unit that causes the second electronic device to acquire information indicating the response timing estimated by the estimation unit; including.
  • the electronic device includes: An electronic device configured to be able to communicate with other electronic devices, an acquisition unit that acquires at least one of a video and a voice of a user of the other electronic device; an output unit that outputs at least one of a video and a voice of a user of the other electronic device to the user of the other electronic device in response to a speech of the user of the other electronic device; an estimation unit that estimates a response timing of a user of the other electronic device responding to an utterance of the user of the other electronic device based on at least one of an image and an audio of the user of the other electronic device; a presentation unit that presents information indicating the response timing estimated by the estimation unit; Equipped with.
  • a method for controlling a system includes the steps of: A first electronic device acquires at least one of a video and a voice of a first user; A step in which a second electronic device configured to be able to communicate with the first electronic device outputs at least one of a video and a voice of the first user acquired by the first electronic device to a second user who responds to an utterance of the first user; estimating a response timing of the second user responding to the speech of the first user based on at least one of a video and a voice of the first user; causing the second electronic device to acquire information indicating the response timing; including.
  • a program includes: On the computer, A first electronic device acquires at least one of a video and a voice of a first user; A step in which a second electronic device configured to be able to communicate with the first electronic device outputs at least one of a video and a voice of the first user acquired by the first electronic device to a second user who responds to an utterance of the first user; estimating a response timing of the second user responding to the speech of the first user based on at least one of a video and a voice of the first user; causing the second electronic device to acquire information indicating the response timing; Execute the command.
  • FIG. 1 is a diagram illustrating an example of a usage mode of a system according to an embodiment.
  • FIG. 2 is a functional block diagram illustrating a schematic configuration of a first electronic device according to an embodiment.
  • FIG. 4 is a functional block diagram illustrating a schematic configuration of a second electronic device according to an embodiment.
  • FIG. 4 is a functional block diagram illustrating a configuration of a third electronic device according to an embodiment.
  • FIG. 2 is a sequence diagram illustrating a basic operation of a system according to an embodiment.
  • FIG. 11 is a diagram illustrating response timing according to an embodiment.
  • FIG. 2 is a sequence diagram illustrating the operation of a system according to an embodiment.
  • FIG. 2 is a sequence diagram illustrating the operation of a system according to an embodiment.
  • an "electronic device” may be, for example, a device that is powered by power supplied from a power system or a battery.
  • a “system” may be, for example, a device that includes at least an electronic device.
  • a "user” may be a person who uses or may use an electronic device according to an embodiment (typically a human), and a person who uses or may use a system including an electronic device according to an embodiment.
  • a conference in which at least one participant participates by communication from a different location than the other participants is collectively referred to as a "remote conference.”
  • FIG. 1 is a diagram showing an example of how a system according to an embodiment is used.
  • participant Mg remotely participates in a conference held in conference room MR from his/her home RL, as shown in FIG. 1.
  • participants Ma, Mb, Mc, and Md participate in the conference in conference room MR.
  • the participants of the conference are not limited to participants Ma, Mb, Mc, and Md, and may include, for example, other participants.
  • participants other than participant Mg may also remotely participate in the conference from their respective homes.
  • the system according to an embodiment may include, for example, a first electronic device 1, a second electronic device 100, and a third electronic device 300.
  • the first electronic device 1, the second electronic device 100, and the third electronic device 300 are shown only in schematic form.
  • the system according to an embodiment may not include at least any of the first electronic device 1, the second electronic device 100, and the third electronic device 300, and may include devices other than the electronic devices mentioned above.
  • the first electronic device 1 may be installed in the conference room MR.
  • the second electronic device 100 capable of communicating with the first electronic device 1 may be installed in the home RL of the participant Mg.
  • the location of the home RL of the participant Mg may be a location different from the location of the conference room MR.
  • the location of the home RL of the participant Mg may be far away from the location of the conference room MR, or may be close to the location of the conference room MR.
  • the first electronic device 1 according to an embodiment is connected to the second electronic device 100 according to an embodiment, for example, via a network N.
  • the third electronic device 300 according to an embodiment may be connected to at least one of the first electronic device 1 and the second electronic device 100, for example, via a network N.
  • the first electronic device 1 according to an embodiment may be connected to the second electronic device 100 according to an embodiment, at least one of wirelessly and wired.
  • the third electronic device 300 according to an embodiment may be connected to at least one of the first electronic device 1 and the second electronic device 100, at least one of wirelessly and wired.
  • the first electronic device 1, the second electronic device 100, and the third electronic device 300 are connected wirelessly and/or wired via the network N, as shown by dashed lines.
  • the first electronic device 1 and the second electronic device 100 may be included in a remote conference system according to an embodiment.
  • the third electronic device 300 may be included in a remote conference system according to an embodiment.
  • the network N as shown in FIG. 1 may include various electronic devices and/or devices such as a server as appropriate.
  • the network N as shown in FIG. 1 may also include devices such as a base station and/or a repeater as appropriate.
  • the first electronic device 1 and the second electronic device 100 may communicate directly.
  • the first electronic device 1 and the second electronic device 100 may communicate via at least one of other devices such as the third electronic device 300 and/or a base station.
  • the communication unit of the first electronic device 1 and the communication unit of the second electronic device 100 may communicate.
  • the above-mentioned notation may include the same intention as above not only when the first electronic device 1 and the second electronic device 100 "communicate” with each other, but also when one "sends” information to the other and/or when the other "receives” information sent by one. Furthermore, the above-mentioned notation may include the same intention as above not only when the first electronic device 1 and the second electronic device 100 "communicate” with each other, but also when any electronic device, including the third electronic device 300, communicates with any other electronic device.
  • the first electronic device 1 may be arranged in the conference room MR, for example as shown in FIG. 1.
  • the first electronic device 1 may be arranged in a position where it can acquire the voice and/or video of at least one of the conference participants Ma, Mb, Mc, and Md.
  • the first electronic device 1 outputs the voice and/or video of participant Mg, as described below. Therefore, the first electronic device 1 may be arranged so that the voice and/or video of participant Mg output from the first electronic device 1 reaches at least one of the conference participants Ma, Mb, Mc, and Md.
  • the second electronic device 100 may be arranged in the home RL of the participant Mg, for example, in a manner as shown in FIG. 1.
  • the second electronic device 100 may be arranged in a position where it is possible to acquire the voice and/or image of the participant Mg.
  • the second electronic device 100 may acquire the voice and/or image of the participant Mg by a microphone or a headset and/or a camera connected to the second electronic device 100.
  • the second electronic device 100 also outputs audio and/or video of at least one of the participants Ma, Mb, Mc, and Md of the conference in the conference room MR, as described below. For this reason, the second electronic device 100 may be positioned so that the audio and/or video output from the second electronic device 100 reaches the participant Mg. The audio output from the second electronic device 100 may be positioned so that it reaches the ears of the participant Mg via, for example, headphones, earphones, speakers, or a headset.
  • the third electronic device 300 may be, for example, a server-like device that relays between the first electronic device 1 and the second electronic device 100. Also, the system according to one embodiment does not need to include the third electronic device 300.
  • FIG. 1 shows only one example of a usage mode of the first electronic device 1, the second electronic device 100, and the third embodiment 300 according to an embodiment.
  • the first electronic device 1, the second electronic device 100, and the third embodiment 300 according to an embodiment may be used in various other modes.
  • the remote conference system including the first electronic device 1 and the second electronic device 100 shown in FIG. 1 allows the participant Mg to behave as if he or she is participating in a conference held in the conference room MR while staying at home RL. Also, the remote conference system including the first electronic device 1 and the second electronic device 100 shown in FIG. 1 allows the conference participants Ma, Mb, Mc, and Md to feel as if the participant Mg is actually participating in the conference held in the conference room MR. That is, in the remote conference system including the first electronic device 1 and the second electronic device 100, the first electronic device 1 arranged in the conference room MR can play a role like an avatar of the participant Mg.
  • the first electronic device 1 may function as a physical avatar (such as a telepresence robot) that resembles the participant Mg. Also, the first electronic device 1 may function as a virtual avatar that displays an image of the participant Mg or an image that resembles, for example, a character of the participant Mg on the first electronic device 1.
  • the first electronic device 1 may be used in the conference room MR by participants Ma, Mb, Mc, Md, etc., for example.
  • the second electronic device 100 described later has a function of outputting the voice and/or video of the participant Mg acquired by the second electronic device 100 to the first electronic device 1 when the participant Mg speaks.
  • the first electronic device 1 also has a function of outputting the voice and/or video of the participants Ma, Mb, Mc, Md, etc. acquired by the first electronic device 1 to the second electronic device 100 when the participants Ma, Mb, Mc, Md, etc. speak.
  • the first electronic device 1 allows the participants Ma, Mb, Mc, Md, etc. to hold a remote conference or video conference in the conference room MR even if the participant Mg is in a remote location. Therefore, the first electronic device 1 is also referred to as an electronic device "used locally" as appropriate.
  • the first electronic device 1 can be various devices, but may be, for example, a specially designed device.
  • the first electronic device 1 may have a housing with an exterior on which an illustration of a human or the like is drawn, or may have a housing that is shaped to resemble at least a part of a human or the like, or a robot.
  • the first electronic device 1 according to one embodiment may be, for example, a general-purpose smartphone, tablet, phablet, notebook computer (notebook PC or laptop), or computer (desktop).
  • the first electronic device 1 according to one embodiment may have at least a part of a human or robot drawn on the display of a notebook PC, for example.
  • the first electronic device 1 may include a control unit 10, a memory unit 20, a communication unit 30, an imaging unit 40, an audio input unit 50, an audio output unit 60, a display unit 70, and a power unit 80.
  • the control unit 10 may also include, for example, a determination unit 12, an estimation unit 14, and an adjustment unit 16.
  • the first electronic device 1 may not include at least some of the functional units shown in FIG. 2, or may include components other than the functional units shown in FIG. 2.
  • the control unit 10 controls and/or manages the entire first electronic device 1, including each functional unit constituting the first electronic device 1.
  • the control unit 10 may include at least one processor, such as a CPU (Central Processing Unit) or a DSP (Digital Signal Processor), to provide control and processing power for executing various functions.
  • the control unit 10 may be realized as a single processor, as a number of processors, or as individual processors.
  • the processor may be realized as a single integrated circuit (IC).
  • the processor may be realized as a number of communicatively connected integrated circuits and discrete circuits.
  • the processor may be realized based on various other known technologies.
  • the control unit 10 may include one or more processors and memories.
  • the processor may include a general-purpose processor that loads a specific program to execute a specific function, and a dedicated processor specialized for a specific process.
  • the dedicated processor may include an application specific integrated circuit (ASIC).
  • the processor may include a programmable logic device (PLD).
  • the PLD may include a field-programmable gate array (FPGA).
  • the control unit 10 may be either a system-on-a-chip (SoC) or a system in a package (SiP) in which one or more processors work together.
  • SoC system-on-a-chip
  • SiP system in a package
  • the control unit 10 may be configured to include, for example, at least one of software and hardware resources. Furthermore, in the first electronic device 1 according to one embodiment, the control unit 10 may be configured by specific means in which software and hardware resources work together. Furthermore, in the first electronic device 1 according to one embodiment, at least one of the other functional units may also be configured by specific means in which software and hardware resources work together.
  • control unit 10 performs various types of control and other operations, which will be described later.
  • determination unit 12 of the control unit 10 can perform various types of determination processing.
  • the estimation unit 14 can perform various types of estimation processing.
  • the adjustment unit 16 can perform various types of adjustment processing.
  • the storage unit 20 may function as a memory that stores various information.
  • the storage unit 20 may store, for example, a program executed in the control unit 10 and the results of processing executed in the control unit 10.
  • the storage unit 20 may also function as a work memory for the control unit 10.
  • the storage unit 20 may be connected to the control unit 10 by wire and/or wirelessly.
  • the storage unit 20 may include, for example, at least one of a RAM (Random Access Memory) and a ROM (Read Only Memory).
  • the storage unit 20 may be configured, for example, by a semiconductor memory or the like, but is not limited to this, and may be any storage device.
  • the storage unit 20 may be a storage medium such as a memory card inserted into the first electronic device 1 according to one embodiment.
  • the storage unit 20 may also be an internal memory of a CPU used as the control unit 10, or may be connected to the control unit 10 as a separate unit.
  • the communication unit 30 has an interface function for wireless and/or wired communication with, for example, an external device.
  • the communication method performed by the communication unit 30 in one embodiment may be a wireless communication standard.
  • the wireless communication standard includes cellular phone communication standards such as 2G, 3G, 4G, and 5G.
  • the cellular phone communication standards include LTE (Long Term Evolution), W-CDMA (Wideband Code Division Multiple Access), CDMA2000, PDC (Personal Digital Cellular), GSM (Registered Trademark) (Global System for Mobile communications), and PHS (Personal Handy-phone System), etc.
  • wireless communication standards include WiMAX (Worldwide Interoperability for Microwave Access), IEEE 802.11, WiFi, Bluetooth (registered trademark), IrDA (Infrared Data Association), and NFC (Near Field Communication).
  • the communication unit 30 may include, for example, a modem whose communication method is standardized by ITU-T (International Telecommunication Union Telecommunication Standardization Sector).
  • ITU-T International Telecommunication Union Telecommunication Standardization Sector
  • the communication unit 30 may be configured to include, for example, an antenna for transmitting and receiving radio waves and an appropriate RF unit.
  • the communication unit 30 may wirelessly communicate with, for example, a communication unit of another electronic device via an antenna.
  • the communication unit 30 may have a function of transmitting any information from the first electronic device 1 to another device, and/or a function of receiving any information from another device in the first electronic device 1.
  • the communication unit 30 may wirelessly communicate with the second electronic device 100 shown in FIG. 1.
  • the communication unit 30 may wirelessly communicate with a communication unit 130 (described later) of the second electronic device 100.
  • the communication unit 30 has a function of communicating with the second electronic device 100.
  • the communication unit 30 may wirelessly communicate with the third electronic device 300 shown in FIG. 1.
  • the communication unit 30 may wirelessly communicate with a communication unit 330 (described later) of the third electronic device 300.
  • the communication unit 30 may have a function of communicating with the third electronic device 300.
  • the communication unit 30 may also be configured as an interface such as a connector for wired connection to the outside.
  • the communication unit 30 can be configured using known technology for wireless communication, so a detailed description of the hardware and the like is omitted.
  • the communication unit 30 may be connected to the control unit 10 via a wired and/or wireless connection.
  • Various pieces of information received by the communication unit 30 may be supplied to, for example, the storage unit 20 and/or the control unit 10.
  • Various pieces of information received by the communication unit 30 may be stored in, for example, a memory built into the control unit 10.
  • the communication unit 30 may transmit, for example, the results of processing by the control unit 10 and/or information stored in the storage unit 20 to the outside.
  • the imaging unit 40 may be configured to include an image sensor that captures images electronically, such as a digital camera.
  • the imaging unit 40 may be configured to include an imaging element that performs photoelectric conversion, such as a CCD (Charge Coupled Device Image Sensor) or a CMOS (Complementary Metal Oxide Semiconductor) sensor.
  • the imaging unit 40 can capture an image of the surroundings of the first electronic device 1, for example.
  • the imaging unit 40 may capture an image of the inside of the conference room MR shown in FIG. 1, for example.
  • the imaging unit 40 may capture images of participants Ma, Mb, Mc, and Md of a conference held in the conference room MR shown in FIG. 1, for example.
  • the imaging unit 40 may convert the captured image into a signal and transmit it to the control unit 10. For this reason, the imaging unit 40 may be connected to the control unit 10 via a wired and/or wireless connection. Furthermore, a signal based on the image captured by the imaging unit 40 may be supplied to any functional unit of the first electronic device 1, such as the memory unit 20 and/or the display unit 70.
  • the imaging unit 40 is not limited to an imaging device such as a digital camera, and may be any device that captures an image of the state inside the conference room MR shown in FIG. 1.
  • the imaging unit 40 may capture images of the state inside the conference room MR as still images at predetermined time intervals (e.g., 15 frames per second). Also, in one embodiment, the imaging unit 40 may capture images of the state inside the conference room MR as a continuous video. Furthermore, the imaging unit 40 may be configured to include a fixed camera, or may be configured to include a movable camera.
  • the voice input unit 50 detects (acquires) sounds or voices around the first electronic device 1, including human voices.
  • the voice input unit 50 may detect sounds or voices as air vibrations, for example, with a diaphragm, and convert them into an electrical signal.
  • the voice input unit 50 may include an acoustic device that converts sounds into an electrical signal, such as a microphone.
  • the voice input unit 50 may detect (acquire) the voices of at least one of the participants Ma, Mb, Mc, and Md in the conference room MR shown in FIG. 1, for example.
  • the voices (electrical signals) detected by the voice input unit 50 may be input to the control unit 10, for example. For this reason, the voice input unit 50 may be connected to the control unit 10 by wire and/or wirelessly.
  • the audio input unit 50 may convert the acquired sound or voice into an electrical signal and supply it to the control unit 10.
  • the audio input unit 50 may also supply the electrical signal (audio signal) into which the sound or voice has been converted to a functional unit of the first electronic device 1, such as the memory unit 20.
  • the audio input unit 50 may be any device that detects (acquires) sound or voice within the conference room MR shown in FIG. 1.
  • the audio output unit 60 converts an electrical signal (audio signal) of sound or voice supplied from the control unit 10 into sound, and outputs the audio signal as sound or voice.
  • the audio output unit 60 may be connected to the control unit 10 by wire and/or wirelessly.
  • the audio output unit 60 may be configured to include a device having a function of outputting sound, such as an arbitrary speaker (loudspeaker).
  • the audio output unit 60 may be configured to include a directional speaker that transmits sound in a specific direction.
  • the audio output unit 60 may also be configured to be able to change the directionality of the sound.
  • the audio output unit 60 may include an amplifier or an amplification circuit that appropriately amplifies the electrical signal (audio signal).
  • the audio output unit 60 may amplify the audio signal that the communication unit 30 receives from the second electronic device 100.
  • the audio signal received from the second electronic device 100 may be, for example, the audio signal of a speaker (e.g., participant Mg shown in FIG. 1) that is received by the communication unit 30 from the second electronic device 100 of that speaker.
  • the audio output unit 60 may output the audio signal of a speaker (e.g., participant Mg shown in FIG. 1) as the voice of that speaker.
  • the display unit 70 may be any display device, such as a Liquid Crystal Display (LCD), an Organic Electro-Luminescence panel, or an Inorganic Electro-Luminescence panel.
  • the display unit 70 may display various types of information, such as characters, figures, or symbols.
  • the display unit 70 may also display objects and icon images constituting various GUIs, for example, to prompt the user to operate the first electronic device 1.
  • the display unit 70 may be connected to the control unit 10 or the like by wire and/or wirelessly.
  • the display unit 70 may be configured to include a backlight, etc., as appropriate.
  • the display unit 70 may display an image based on the video signal transmitted from the second electronic device 100.
  • the display unit 70 may display, for example, an image of participant Mg captured by the second electronic device 100 as an image based on the video signal transmitted from the second electronic device 100.
  • participants Ma, Mb, Mc, and Md shown in FIG. 1 can visually know the state of participant Mg who is in a location away from the conference room MR.
  • the display unit 70 may display, for example, the image of the participant Mg captured by the second electronic device 100 as is. On the other hand, the display unit 70 may display, for example, an image of the participant Mg as a character (for example, an avatar or a robot).
  • the power unit 80 generates power to drive any moving part in the first electronic device 1.
  • the power unit 80 may be configured to include a power source such as a servo motor that drives the moving part in the first electronic device 1.
  • the power unit 80 may drive any moving part in the first electronic device 1 under the control of the control unit 10. For this reason, the power unit 80 may be connected to the control unit 10 by wire and/or wirelessly.
  • the power unit 80 may drive, for example, at least a part of the housing of the first electronic device 1. Furthermore, for example, if the first electronic device 1 has a housing shaped to resemble at least a part of a human or robot, the power unit 80 may drive at least a part of the human or robot shape.
  • the first electronic device 1 may be driven by the power unit 80 to perform an action that expresses, for example, the emotion and/or behavior of the participant Mg.
  • the first electronic device 1 may be driven by the power unit 80 to perform an action that expresses the response of the participant Mg.
  • the "response” may include a short interjection such as "yes” and/or "ah” made by the listener during the speaker's speech or between speeches.
  • the "response” may also include a head movement such as a nod indicating a positive action not involving speech or a head shake indicating a negative action, or a hand movement such as a hand gesture, or a movement of the entire upper body indicating a large change in emotion such as surprise or deep agreement.
  • the "response” may include a change in facial expression that moves a part or multiple parts of the face.
  • the above-mentioned responses are made consciously or unconsciously for the purpose of showing that the listener understands or agrees with the content of the speaker's speech, or for the purpose of taking the rhythm of speech to make it easier for the speaker to speak. Therefore, the first electronic device 1 may perform an action such as a nod and/or a head shake of the participant Mg by driving at least a part of a part that imitates the head of the participant Mg.
  • the first electronic device 1 may perform an action such as a hand gesture of the participant Mg by driving at least a part of a part that imitates the hand of the participant Mg.
  • the first electronic device 1 may perform an action to express an emotion such as surprise or deep agreement of the participant Mg by driving at least a part of a part that imitates a part or parts of the face of the participant Mg.
  • the first electronic device 1 may perform an action to express an expression of the participant Mg by driving at least a part of a part that imitates a part or parts of the face of the participant Mg.
  • the first electronic device 1 may output, for example, a pre-recorded response of the participant Mg such as "Yes" and/or "Eh" from the audio output unit 60.
  • the first electronic device 1 may perform an action to express an emotion such as joy, anger, sadness, or happiness of the participant Mg by driving the power unit 80.
  • the power unit 80 may perform an action that expresses emotions such as joy, anger, sadness, and happiness of the participant Mg, for example, by driving at least a part of a component that imitates the face (expression) of the participant Mg.
  • the first electronic device 1 may perform an action such as a human shrugging the shoulders, a polite human bow, or an action that shows an apology, by driving the power unit 80.
  • the operation of expressing the emotions and/or behavior of a human being, such as participant Mg, by displaying using the display unit 70 and/or driving the power unit 80 may use various known technologies. For this reason, a detailed explanation of the operation of expressing the emotions and/or behavior of a human being, such as participant Mg, by displaying using the display unit 70 and/or driving the power unit 80 will be omitted.
  • the first electronic device 1 according to one embodiment can perform an operation of expressing the emotions and/or behavior of participant Mg by displaying using the display unit 70 and/or driving the power unit 80.
  • the first electronic device 1 may be a specially designed device as described above.
  • the first electronic device 1 may include, for example, the audio output unit 60 and the power unit 80 among the functional units shown in FIG. 2.
  • the first electronic device 1 may be connected to another electronic device to supplement at least some of the functions of the other functional units shown in FIG. 2.
  • the other electronic device may be, for example, a general-purpose smartphone, tablet, phablet, notebook computer (notebook PC or laptop), or computer (desktop).
  • the second electronic device 100 may be, for example, an electronic device used by the participant Mg at his/her home RL.
  • the above-mentioned first electronic device 1 has a function of outputting the voice and/or video of the participants Ma, Mb, Mc, Md, etc. acquired by the first electronic device 1 when the participants Ma, Mb, Mc, Md, etc. speak to the second electronic device 100.
  • the second electronic device 100 has a function of outputting the voice and/or video of the participant Mg acquired by the second electronic device 100 to the first electronic device 1 when the participant Mg speaks.
  • the second electronic device 100 allows the participant Mg to hold a remote conference or video conference even at a location away from the conference room MR. Therefore, the second electronic device 100 is also referred to as an electronic device "used remotely" as appropriate.
  • the second electronic device 100 may include a control unit 110, a memory unit 120, a communication unit 130, an imaging unit 140, an audio input unit 150, an audio output unit 160, a display unit 170, a tactile sensation providing unit 190, and an acquisition unit 200.
  • the control unit 110 may also include, for example, a determination unit 112, an estimation unit 114, and an adjustment unit 116.
  • the second electronic device 100 may not include at least some of the functional units shown in FIG. 3, or may include components other than the functional units shown in FIG. 3.
  • the control unit 110 controls and/or manages the entire second electronic device 100, including each functional unit constituting the second electronic device 100.
  • the control unit 110 may basically be configured based on the same concept as the control unit 10 shown in FIG. 2, for example.
  • the determination unit 112, estimation unit 114, and adjustment unit 116 of the control unit 110 may also be configured based on the same concept as the determination unit 12, estimation unit 14, and adjustment unit 16 of the control unit 10 shown in FIG. 2, for example.
  • the storage unit 120 may function as a memory that stores various types of information.
  • the storage unit 120 may store, for example, programs executed in the control unit 110 and results of processing executed in the control unit 110.
  • the storage unit 120 may also function as a work memory for the control unit 110.
  • the storage unit 120 may be connected to the control unit 110 via a wired and/or wireless connection.
  • the storage unit 120 may basically be configured based on the same concept as the storage unit 20 shown in FIG. 2, for example.
  • the communication unit 130 has an interface function for wireless and/or wired communication.
  • the communication unit 130 may wirelessly communicate with, for example, a communication unit of another electronic device, for example, via an antenna.
  • the communication unit 130 may wirelessly communicate with the first electronic device 1 shown in FIG. 1.
  • the communication unit 130 may wirelessly communicate with the communication unit 30 of the first electronic device 1.
  • the communication unit 130 has a function of communicating with the first electronic device 1.
  • the communication unit 130 may wirelessly communicate with the third electronic device 300 shown in FIG. 1.
  • the communication unit 130 may wirelessly communicate with the communication unit 330 (described later) of the third electronic device 300.
  • the communication unit 130 may have a function of communicating with the third electronic device 300.
  • the communication unit 130 may be connected to the control unit 110 in a wired and/or wireless manner.
  • the communication unit 130 may basically be configured based on the same idea as the communication unit 30 shown in FIG. 2.
  • the imaging unit 140 may be configured to include an image sensor that captures images electronically, such as a digital camera.
  • the imaging unit 140 may capture images of the interior of the home RL shown in FIG. 1, for example.
  • the imaging unit 140 may capture images of participants Mg who join a conference from the home RL shown in FIG. 1, for example.
  • the imaging unit 140 may convert the captured images into signals and transmit them to the control unit 110. For this reason, the imaging unit 140 may be connected to the control unit 110 by wire and/or wirelessly.
  • the imaging unit 140 may basically be configured based on the same concept as the imaging unit 40 shown in FIG. 2, for example.
  • the audio input unit 150 detects (acquires) sounds or voices around the second electronic device 100, including human voices.
  • the audio input unit 150 may detect sounds or voices as air vibrations, for example, with a diaphragm, and convert them into an electrical signal.
  • the audio input unit 150 may include an acoustic device that converts sounds into an electrical signal, such as an arbitrary microphone.
  • the audio input unit 150 may detect (acquire) the voice of the participant Mg in the home RL shown in FIG. 1, for example.
  • the voice (electrical signal) detected by the audio input unit 150 may be input to the control unit 110, for example. For this reason, the audio input unit 150 may be connected to the control unit 110 by wire and/or wirelessly.
  • the audio input unit 150 may basically be configured based on the same concept as the audio input unit 50 shown in FIG. 2, for example.
  • the audio output unit 160 converts an electrical signal (audio signal) supplied from the control unit 110 into sound, and outputs the audio signal as sound or voice.
  • the audio output unit 160 may be connected to the control unit 110 by wire and/or wirelessly.
  • the audio output unit 160 may be configured to include a device having a function of outputting sound, such as an arbitrary speaker (loudspeaker).
  • the audio output unit 160 may output a sound detected by the audio input unit 50 of the first electronic device 1.
  • the sound detected by the audio input unit 50 of the first electronic device 1 may be at least one of the voices of the participants Ma, Mb, Mc, and Md in the conference room MR shown in FIG. 1.
  • the audio output unit 160 may basically be configured based on the same idea as the audio output unit 60 shown in FIG. 2, for example.
  • the display unit 170 may be any display device, such as a Liquid Crystal Display (LCD), an Organic Electro-Luminescence panel, or an Inorganic Electro-Luminescence panel.
  • the display unit 170 may basically be configured based on the same concept as the display unit 70 shown in FIG. 2, for example.
  • Various data required for display on the display unit 170 may be supplied from, for example, the control unit 110 or the memory unit 120. For this reason, the display unit 170 may be connected to the control unit 110, etc., via a wired and/or wireless connection.
  • the display unit 170 may be, for example, a touch screen display equipped with a touch panel function that detects input by contact with the participant Mg's finger or stylus.
  • the display unit 170 may display an image based on the video signal transmitted from the first electronic device 1.
  • the display unit 170 may display images of participants Ma, Mb, Mc, Md, etc. captured by the first electronic device 1 (its imaging unit 40) as an image based on the video signal transmitted from the first electronic device 1.
  • participant Mg shown in FIG. 1 can visually know the state of participants Ma, Mb, Mc, Md, etc. in a conference room MR away from his/her home RL.
  • the display unit 170 may directly display images of the participants Ma, Mb, Mc, Md, etc. captured by the first electronic device 1. On the other hand, the display unit 170 may display images (e.g., avatars) that characterize the participants Ma, Mb, Mc, Md, etc.
  • images e.g., avatars
  • the display unit 170 may have a function of notifying, for example, the participant Mg of the response timing, which will be described later. In other words, the participant Mg can know the response timing by visually checking the display unit 170. Also, in one embodiment, the display unit 170 may be an indicator, such as an LED, that notifies the response timing.
  • the tactile sensation presentation unit 190 may have a function of presenting a tactile sensation such as vibration to the fingers of the participant Mg, for example.
  • the tactile sensation presentation unit 190 may be configured in combination with a display unit 170 having a touch screen display function. In such a configuration, for example, when the participant Mg touches the display unit 170 to operate the second electronic device 100, he or she can recognize the presentation of a tactile sensation by the tactile sensation presentation unit 190.
  • the tactile sensation presentation unit 190 may have a function of notifying the participant Mg, for example, of the response timing described below.
  • the acquisition unit 200 may be various functional units that acquire the second user's response to the first user's utterance. The second user's response will be described in more detail below.
  • the acquisition unit 200 of the second electronic device 100 may acquire input to at least one of the imaging unit 140 and the audio input unit 150 shown in FIG. 3, for example.
  • the acquisition unit 200 may also be configured to include at least one of the imaging unit 140 and the audio input unit 150 shown in FIG. 3, for example.
  • the acquisition unit 200 may acquire a mouse click or touch input by the user, or may acquire input to a motion sensor and/or a foot pedal.
  • the acquisition unit 200 may also include an input device that detects a mouse click or touch input by the user, or may include a motion sensor and/or a foot pedal.
  • the second electronic device 100 may be a dedicated device as described above. Meanwhile, in one embodiment, the second electronic device 100 may include some of the functional units shown in FIG. 3, for example. In this case, the second electronic device 100 may be connected to another electronic device to supplement at least some of the functions of the other functional units shown in FIG. 3.
  • the other electronic device may be, for example, a general-purpose smartphone, tablet, phablet, notebook computer (notebook PC or laptop), or computer (desktop), etc.
  • the second electronic device 100 may be a smartphone or a laptop computer.
  • the second electronic device 100 may be a smartphone or a laptop computer with an application (program) installed for linking with the first electronic device 1.
  • FIG. 4 is a block diagram showing a schematic configuration of the third electronic device 300 shown in FIG. 1. An example of the configuration of the third electronic device 300 according to one embodiment will be described below.
  • the third electronic device 300 may be installed in a location other than the participant Mg's home RL and the conference room MR, as shown in FIG. 1.
  • the third electronic device 300 may be installed in the participant Mg's home RL or nearby, or in the conference room MR or nearby.
  • the first electronic device 1 has a function of transmitting the audio and/or video data of the participants Ma, Mb, Mc, Md, etc. acquired by the first electronic device 1 to the third electronic device 300 when the participants Ma, Mb, Mc, Md, etc. speak.
  • the third electronic device 300 may transmit the audio and/or video data received from the first electronic device 1 to the second electronic device 100.
  • the second electronic device 100 also has a function of transmitting the audio and/or video data of the participant Mg acquired by the second electronic device 100 to the third electronic device 300 when the participant Mg speaks.
  • the third electronic device 300 may transmit the audio and/or video data received from the second electronic device 100 to the first electronic device 1. In this way, the third electronic device 300 may have a function of relaying between the first electronic device 1 and the second electronic device 100.
  • the third electronic device 100 is also referred to as a "server" as appropriate.
  • the third electronic device 300 may include a control unit 310, a storage unit 320, and a communication unit 330.
  • the control unit 310 may also include, for example, a determination unit 312, an estimation unit 314, and an adjustment unit 316.
  • the third electronic device 300 may not include at least some of the functional units shown in FIG. 4, or may include components other than the functional units shown in FIG. 4.
  • the control unit 310 controls and/or manages the entire third electronic device 300, including each functional unit constituting the third electronic device 300.
  • the control unit 310 may basically be configured based on the same concept as the control unit 10 shown in FIG. 2, for example.
  • the determination unit 312, estimation unit 314, and adjustment unit 316 of the control unit 310 may also be configured based on the same concept as the determination unit 12, estimation unit 14, and adjustment unit 16 of the control unit 10 shown in FIG. 2, for example.
  • the storage unit 320 may function as a memory that stores various types of information.
  • the storage unit 320 may store, for example, programs executed in the control unit 310 and results of processing executed in the control unit 310.
  • the storage unit 320 may also function as a work memory for the control unit 310.
  • the storage unit 320 may be connected to the control unit 310 via a wired and/or wireless connection.
  • the storage unit 320 may basically be configured based on the same concept as the storage unit 20 shown in FIG. 2, for example.
  • the communication unit 330 has an interface function for wireless and/or wired communication.
  • the communication unit 330 may wirelessly communicate with, for example, a communication unit of another electronic device, for example, via an antenna.
  • the communication unit 330 may wirelessly communicate with the first electronic device 1 shown in FIG. 1.
  • the communication unit 330 may wirelessly communicate with the communication unit 30 of the first electronic device 1.
  • the communication unit 330 has a function of communicating with the first electronic device 1.
  • the communication unit 330 may wirelessly communicate with the second electronic device 100 shown in FIG. 1.
  • the communication unit 330 may wirelessly communicate with the communication unit 130 of the second electronic device 100.
  • the communication unit 330 may have a function of communicating with the second electronic device 100. As shown in FIG. 4, the communication unit 330 may be connected to the control unit 310 in a wired and/or wireless manner. The communication unit 330 may basically be configured based on the same idea as the communication unit 30 shown in FIG. 2.
  • the third electronic device 300 may be, for example, a specially designed device.
  • the third electronic device 300 may include, for example, some of the functional units shown in FIG. 4.
  • the third electronic device 300 may be connected to other electronic devices to supplement at least some of the functions of the other functional units shown in FIG. 4.
  • the other electronic devices may be, for example, devices such as a general-purpose computer or server.
  • the third electronic device 300 may be, for example, a relay server, a web server, or an application server.
  • the first electronic device 1 is installed in the conference room MR and acquires video and/or audio of at least one of the participants Ma, Mb, Mc, and Md.
  • the video and/or audio acquired by the first electronic device 1 is transmitted to the second electronic device 100 installed in the home RL of the participant Mg.
  • the second electronic device 100 outputs the video and/or audio of at least one of the participants Ma, Mb, Mc, and Md acquired by the first electronic device 1. This allows the participant Mg to recognize the video and/or audio of at least one of the participants Ma, Mb, Mc, and Md.
  • the second electronic device 100 is installed in the home RL of the participant Mg and acquires video and/or audio of the participant Mg.
  • the video and/or audio acquired by the second electronic device 100 is transmitted to the first electronic device 1 installed in the conference room MR.
  • the first electronic device 1 outputs the video and/or audio of the participant Mg received from the second electronic device 100. This allows at least one of the participants Ma, Mb, Mc, and Md to hear the video and/or audio of the participant Mg.
  • FIG. 5 is a sequence diagram explaining the basic operation of the system according to the embodiment as described above.
  • FIG. 5 is a diagram showing the exchange of data etc. between the first electronic device 1, the second electronic device 100, and the third electronic device 300.
  • the basic operation when a remote conference or video conference is held using the system according to the embodiment will be explained with reference to FIG. 5.
  • the first electronic device 1 used locally may be used by the first user.
  • the first user may be, for example, at least one of the participants Ma, Mb, Mc, and Md shown in FIG. 1 (hereinafter also referred to as a local user).
  • the second electronic device 100 used remotely may be used by the second user.
  • the second user may be, for example, the participant Mg shown in FIG. 1 (hereinafter also referred to as a remote user).
  • the operation performed by the first electronic device 1 may be, in more detail, performed by, for example, the control unit 10 of the first electronic device 1.
  • the operation performed by the control unit 10 of the first electronic device 1 may be referred to as the operation performed by the first electronic device 1.
  • the operation performed by the second electronic device 100 may be, in more detail, performed by, for example, the control unit 110 of the second electronic device 100.
  • the operation performed by the control unit 110 of the second electronic device 100 may be referred to as the operation performed by the second electronic device 100.
  • the operations performed by the third electronic device 300 may be more specifically performed by, for example, the control unit 310 of the third electronic device 300.
  • the operations performed by the control unit 310 of the third electronic device 300 may be referred to as operations performed by the third electronic device 300.
  • the first electronic device 1 acquires at least one of the video and audio of the first user (e.g., at least one of the participants Ma, Mb, Mc, and Md) (step S1). Specifically, in step S1, the first electronic device 1 may capture the video of the first user using the imaging unit 40 and acquire (or detect) the audio of the first user using the audio input unit 50. Next, the first electronic device 1 encodes at least one of the video and audio of the first user (step S2). In step S2, encoding may mean compressing the video and/or audio data according to a predetermined rule and converting it into a format according to the purpose, including encryption. The first electronic device 1 may perform various known encoding methods, such as software encoding or hardware encoding.
  • the first electronic device 1 transmits the encoded video and/or audio data to the third electronic device 300 (step S3). Specifically, in step S3, the first electronic device 1 transmits the video and/or audio data from the communication unit 30 to the communication unit 330 of the third electronic device 300. Also in step S3, the third electronic device 300 receives the video and/or audio data transmitted from the communication unit 30 of the first electronic device 1 via the communication unit 330.
  • the third electronic device 300 transmits the encoded video and/or audio data received from the communication unit 30 to the second electronic device 100 (step S4). Specifically, in step S4, the third electronic device 300 transmits the video and/or audio data from the communication unit 330 to the communication unit 130 of the second electronic device 100. Also, in step S4, the second electronic device 100 receives the video and/or audio data transmitted from the communication unit 330 of the third electronic device 300 via the communication unit 130.
  • step S5 decodes the encoded video and/or audio data received from the communication unit 330 (step S5).
  • decoding may mean returning the format of the encoded video and/or audio data to its original format.
  • the second electronic device 100 may perform various known decoding methods, such as software encoding or hardware encoding.
  • the second electronic device 100 presents at least one of the video and audio of the first user (e.g., at least one of participants Ma, Mb, Mc, and Md) to the second user (e.g., participant Mg) (step S6).
  • the second electronic device 100 may display the video of the first user on the display unit 170 and output the audio of the first user from the audio output unit 160.
  • a second user e.g., participant Mg
  • a first user e.g., at least one of participants Ma, Mb, Mc, and Md
  • the above describes a manner in which the first electronic device 1 transmits video and/or audio of the first user to the second electronic device 100 via the third electronic device 300.
  • the second electronic device 100 can transmit video and/or audio of the second user to the first electronic device 1 via the third electronic device 300.
  • the second electronic device 100 acquires at least one of the video and audio of the second user (e.g., participant Mg) (step S11). Specifically, in step S11, the second electronic device 100 may capture the video of the second user using the imaging unit 140 and acquire (or detect) the audio of the second user using the audio input unit 150. Next, the second electronic device 100 encodes at least one of the video and audio of the second user (step S12).
  • the second electronic device 100 transmits the encoded video and/or audio data to the third electronic device 300 (step S13). Specifically, in step S13, the second electronic device 100 transmits the video and/or audio data from the communication unit 130 to the communication unit 330 of the third electronic device 300. Also in step S13, the third electronic device 300 receives the video and/or audio data transmitted from the communication unit 130 of the second electronic device 100 via the communication unit 330.
  • the third electronic device 300 transmits the encoded video and/or audio data received from the communication unit 130 to the first electronic device 1 (step S14). Specifically, in step S14, the third electronic device 300 transmits the video and/or audio data from the communication unit 330 to the communication unit 30 of the first electronic device 1. Also, in step S14, the first electronic device 1 receives the video and/or audio data transmitted from the communication unit 330 of the third electronic device 300 via the communication unit 30.
  • the first electronic device 1 decodes the encoded video and/or audio data received from the communication unit 330 (step S15).
  • the first electronic device 1 presents at least one of the video and audio of the second user (e.g., participant Mg) to the first user (e.g., at least one of participants Ma, Mb, Mc, and Md) (step S16). Specifically, in step S16, the first electronic device 1 may display the video of the second user on the display unit 70 and output the audio of the second user from the audio output unit 60.
  • the first electronic device 1 may display the video of the second user on the display unit 70 and output the audio of the second user from the audio output unit 60.
  • a first user e.g., at least one of participants Ma, Mb, Mc, and Md
  • a second user e.g., participant Mg
  • step S1 to step S6 and the operations from step S11 to step S16 may be executed in the reverse order. That is, the operations from step S11 to step S16 may be executed first, and then the operations from step S1 to step S6. Furthermore, the operations from step S1 to step S6 and the operations from step S11 to step S16 may be executed simultaneously, or may be executed so that they at least partially overlap.
  • the communication speed of the line is usually not guaranteed and is often a best-effort contract. It is expected that a certain level of communication speed can be ensured by establishing a dedicated line between the conference room MR and the home RL of the participant Mg shown in FIG. 1.
  • the establishment of a dedicated line tends to be a high hurdle in terms of cost. For this reason, recent remote conferences or video conferences are realized by a configuration in which at least a part of the network N includes an Internet line or the like, and there are often no options for compromising on communication delays.
  • the encoding and/or decoding operations also require a certain amount of time. For example, if a first user asks a second user a question, encoding and decoding must be performed twice before the second user's response is returned to the first user who asked the question. Even if one encoding or decoding takes only a short amount of time, if such processing goes back and forth between the first and second users, it is conceivable that a non-negligible time delay will occur during the conversation.
  • the timing at which the video and/or audio indicating the end of the first user's speech is actually transmitted to the second user may be delayed.
  • the timing of the second user's response is further delayed, it is expected that the first user will become impatient to wait for the second user's response, or that the second user's response will overlap with the first user's next speech.
  • the amount and/or quality of information transmitted to participants will decrease. Therefore, in a remote conference or video conference, it is desirable to be able to appropriately transmit and share the listener's response to the speaker's speech in order to facilitate smooth communication.
  • the system estimates the response timing of the second user based on the first user's speech, and notifies the second user of the arrival of the response timing.
  • the system according to one embodiment may also estimate the response timing of the second user based on the first user's speech at a point before the first user's speech ends.
  • Figure 6 is a diagram explaining how the system according to one embodiment estimates response timing.
  • the upper part of Figure 6 shows the waveform of the voice of the first user acquired (detected) by the voice input unit 50 when the first user of the first electronic device 1 is talking.
  • the vertical axis indicates the level of the voice of the first user
  • the horizontal axis indicates time (hours).
  • the vertical axis of the graph at the top of Figure 6 may represent, for example, the sound pressure of the voice of the first user acquired by the voice input unit 50, converted into a voltage and then amplified.
  • the vertical axis of the graph at the top of Figure 6 may represent the sound pressure or volume of the voice of the first user.
  • the graph at the bottom of FIG. 6 illustrates an example of response timing.
  • the time period during which the first user makes (almost) no sound may be set as the response timing of the second user.
  • the timing at which the first user who is making sound stops making (almost) sound may be set as the start point of the response timing of the second user.
  • the timing at which the first user who is not making sound next makes sound may be set as the end point of the response timing of the second user.
  • the state in which the response timing is on is indicated by a value of +1
  • the state in which the response timing is off is indicated by a value of -1.
  • the system according to one embodiment may estimate the response timing as shown in FIG. 6 while acquiring the voice of the first user, rather than determining the response timing by analyzing the voice of the first user after acquisition. That is, the system according to one embodiment may estimate the start time of each response timing as shown in FIG. 6 before the end of each utterance of the first user. In this case, the system according to one embodiment may estimate the start time of the response timing based on the features of the voice of the first user acquired by the voice input unit 50 and/or the features of the language.
  • the system may also estimate the start time of the response timing based on the image of the first user captured by the imaging unit 40, i.e., the face, facial expressions, gestures, and/or body movements, instead of or in addition to the voice of the first user.
  • a system may estimate the start point of the response timing based on the timing when the volume of the voice of the first user decreases or the tone of voice becomes lower, as a feature of the voice of the first user.
  • a system may estimate the start point of the response timing based on the timing when the ending of the sentence becomes "desu" or "masu", as a feature of the language of the first user.
  • a system may estimate the start point of the response timing based on the timing when the first user returns their gaze to the first electronic device 1 after looking away from the first electronic device 1, as a feature of the video of the first user.
  • the system may estimate the end point of the response timing based on the timing when the volume of speech decreases and then increases, as a feature of the first user's voice.
  • the system according to one embodiment may estimate the timing when the volume increases, as a feature of the first user's voice, as the timing of a question, and estimate the end point of the response timing based on the average response time to the question.
  • the system according to one embodiment may determine, as a feature of the first user's language, whether the content of the utterance is an open-ended question or a closed-ended question.
  • the system according to one embodiment may set, for example, a response timing after an open-ended question that is longer than the response timing after a closed-ended question.
  • the system according to one embodiment may also set, for example, a response timing after a closed-ended question that is shorter than the response timing after an open-ended question.
  • the system according to one embodiment may also determine whether or not the conversation is lively, or the degree to which the conversation is lively, as a feature of the first user's voice and/or language. In this case, the system according to one embodiment may set the response timing for determining that the first user is lively to be relatively short.
  • the system according to one embodiment may determine whether the content of the conversation is positive or negative based on the voice and/or language features of the first user. In this case, the system according to one embodiment may set the response timing to be relatively short when the content of the conversation of the first user is determined to be relatively positive, and set the response timing to be relatively long when the content of the conversation of the first user is determined to be relatively negative.
  • the system according to one embodiment may estimate the response timing or correct the estimated response timing by analyzing the first user's past audio and/or video history.
  • the system according to one embodiment may estimate the response timing based on, for example, AI (Artificial Intelligence) technology.
  • AI Artificial Intelligence
  • the system according to one embodiment may estimate the response timing based on, for example, machine learning (and even deep learning) technology.
  • the second electronic device 100 may indicate to the second user that it is time to respond at the time of the estimated response timing.
  • the second electronic device 100 may present the arrival of the response timing to the second user as at least one of visual information, auditory information, and tactile information.
  • the second electronic device 100 may notify the second user of the response timing by displaying "You have been asked a question" or "It's your turn” on the display unit 170.
  • the second electronic device 100 may also notify the second user of the response timing by turning on or blinking the display unit 170 configured as an indicator such as an LED.
  • the second electronic device 100 may also notify the second user of the response timing by outputting a sound such as "You have been asked a question” or "It's your turn” from the audio output unit 160.
  • the second electronic device 100 may also notify the second user of the response timing by outputting a predetermined notification sound or the like from the audio output unit 160.
  • the second electronic device 100 may notify the second user of the response timing by outputting haptic information, such as a predetermined vibration, from the haptic sensation providing unit 190.
  • the system according to one embodiment may transmit the response timing by prioritizing it over normal audio and/or video communication, for example. Because the transmission of the response timing is merely a notification of timing, it is considered that even if the response timing is prioritized over audio and/or video communication, it will have little effect on the audio and/or video communication. Furthermore, in the system according to one embodiment, the transmission of the response timing may be performed using, for example, a publish/subscribe server. Furthermore, in the system according to one embodiment, the transmission of the response timing may use a line separate from the line for normal audio and/or video communication.
  • the system according to one embodiment can inform the second user of the original response timing even if, for example, there is a delay in audio and/or video. Therefore, the second user can respond to the first user's comment at an appropriate time.
  • the system according to one embodiment reduces the number of cases where the first user becomes impatient for the second user's response, and also reduces the number of cases where the second user's response overlaps with the first user's next comment. Therefore, the system according to one embodiment can facilitate communication between multiple locations.
  • the above-mentioned response timing estimation may be performed by the estimation unit 14 of the first electronic device 1, the estimation unit 314 of the third electronic device 300, or the estimation unit 114 of the second electronic device 100.
  • the response timing estimation may be performed by at least one of the estimation unit 14 of the first electronic device 1, the estimation unit 114 of the second electronic device 100, and the estimation unit 314 of the third electronic device 300. In this case, among the estimation unit 14, the estimation unit 114, and the estimation unit 314, those that do not estimate the response timing may not be required components.
  • various determination processes related to the above-mentioned response timing estimation may be performed by the determination unit 12 of the first electronic device 1, the determination unit 312 of the third electronic device 300, or the determination unit 112 of the second electronic device 100.
  • the process related to correcting the estimated response timing may be performed by the adjustment unit 16 of the first electronic device 1, by the adjustment unit 316 of the third electronic device 300, or by the adjustment unit 116 of the second electronic device 100.
  • FIG. 7 is a sequence diagram illustrating the characteristic operations of the system according to one embodiment.
  • FIG. 7 is a diagram illustrating the exchange of data and the like between the first electronic device 1, the second electronic device 100, and the third electronic device 300.
  • the encoding and decoding of data described in FIG. 5 may use known technology. For this reason, the description of the encoding and decoding of data will be omitted in FIG. 7. Below, the description of the same or similar content as that already described in FIG. 5 may be simplified or omitted as appropriate.
  • step S101 the first electronic device 1 acquires at least one of the video and audio of the first user (e.g., at least one of participants Ma, Mb, Mc, and Md) (step S101).
  • the operation of step S101 may be the same as step S1 in FIG. 5.
  • the first electronic device 1 transmits video and/or audio data of the first user to the third electronic device 300 (step S102).
  • the operation of step S102 may be similar to step S3 in FIG. 5.
  • the third electronic device 300 transmits the video and/or audio data of the first user received from the first electronic device 1 to the second electronic device 100 (step S103).
  • the operation of step S103 may be similar to step S4 in FIG. 5.
  • step S104 When the second electronic device 100 receives the video and/or audio data of the first user from the third electronic device 300 in step S103, it presents at least one of the video and audio of the first user to the second user (e.g., participant Mg) (step S104).
  • the operation of step S104 may be the same as step S6 in FIG. 5.
  • the third electronic device 300 When the third electronic device 300 receives the video and/or audio data of the first user from the first electronic device 1 in step S102, it estimates the response timing based on the video and/or audio data of the first user (step S105).
  • the response timing estimation performed in step S105 can be performed as described above.
  • the third electronic device 300 determines whether the time of the estimated response timing has arrived (step S106). If the time of the response timing has not arrived in step S106, the third electronic device 300 may wait until the time of the response timing arrives or may execute other processing. If the time of the response timing has arrived in step S106, the third electronic device 300 transmits information indicating the estimated response timing to the second electronic device 100 (steps S107 and S108).
  • the second electronic device 100 When the second electronic device 100 receives the information indicating the timing to respond from the third electronic device 300 in step S108, it notifies the second user that it is time to respond (step S109). In step S109, the second electronic device 100 may present the timing to respond to the second user as at least one of visual information, auditory information, and tactile information, as described above. In this way, by presenting the timing to respond, the second user can respond to the conversation of the first user at an appropriate timing.
  • the second electronic device 100 acquires a response of the second user (e.g., participant Mg) to the speech of the first user (step S110).
  • the acquisition unit 200 of the second electronic device 100 may acquire the response of the second user to the speech of the first user.
  • the acquisition unit 200 of the second electronic device 100 may acquire, for example, input to at least one of the imaging unit 140 and the voice input unit 150 shown in FIG. 3.
  • the acquisition unit 200 may also acquire a mouse click or touch input by the user, or may acquire input to a motion sensor and/or a foot pedal.
  • the response of the second user may include, for example, a head movement back and forth or up and down (nodding), a head movement left and right (shaking), a hand gesture, a movement of the upper body, a facial expression, or a back-and-forth including a short speech such as "yes,” "no," or "ah.”
  • the responses acquired by the second electronic device 100 are not limited to those described above.
  • the second electronic device 100 may acquire a combination of the above as the response of the second user.
  • the second electronic device 100 may acquire at least one of the video and the voice of the second user when acquiring the response of the second user.
  • the second electronic device 100 may acquire the response of the second user by, for example, performing image recognition on the acquired video and voice recognition on the acquired voice.
  • the second electronic device 100 acquires is not limited to at least one of the video and the voice of the second user.
  • the response of the second user may be acquired by acquiring (detecting) the motion of the body of the second user, such as a nod.
  • the second electronic device 100 may be provided with a motion sensor.
  • the second electronic device 100 may be a device held by the second user in the hand, such as a wearable terminal worn by the second user, a mouse, or a touch pen.
  • the second electronic device 100 may acquire the response of the second user by connecting to a smartphone, a tablet terminal, a foot pedal, or the like held by the second user in the hand, by wire or wirelessly.
  • the acquisition of the response of the second user is not limited to the above, and the response of the second user may be acquired by combining these.
  • an example of the correspondence between the information such as video and audio acquired by the second electronic device 100 and the detection method of the response of the second user is shown in Table 1 below.
  • the second electronic device 100 may acquire a head nod or a head shake as the response of the second user, for example, by performing image recognition on the acquired video.
  • the second electronic device 100 may acquire a head nod action as the response of the second user when a positive word is detected by performing voice recognition on the acquired voice, for example.
  • the second electronic device 100 may acquire a head shake action as the response of the second user when a negative word is detected by performing voice recognition on the acquired voice, for example.
  • the second electronic device 100 may connect to a wearable terminal such as headphones equipped with a motion sensor worn by the second user. In this case, the second electronic device 100 may acquire a head nod or a head shake action of the second user detected by the wearable terminal as the response of the second user.
  • the second electronic device 100 may be, for example, a handheld device such as a smartphone or tablet equipped with a motion sensor.
  • the second electronic device 100 may be tilted back and forth by the second user, and may acquire a head nod action associated with this action as the second user's response.
  • the second electronic device 100 may detect a head nod, for example, by clicking a mouse.
  • the second electronic device 100 may display, on the display unit 170, a GUI in which a button corresponding to a head nod is set, and may acquire the second user's response by the second user clicking a mouse button.
  • the second electronic device 100 transmits the acquired data such as video and/or audio of the second user to the third electronic device 300 (step S111).
  • the operation of step S111 may be the same as step S13 in FIG. 5.
  • the data transmitted from the second electronic device 100 to the third electronic device 300 may include data indicating body movements such as nodding that correspond to the response of the second user.
  • the third electronic device 300 transmits data such as video and/or audio of the second user received from the second electronic device 100 to the first electronic device 1 (step S112).
  • the operation of step S112 may be the same as step S14 in FIG. 5.
  • step S113 When the first electronic device 1 receives data such as the video and/or audio of the second user from the third electronic device 300 in step S112, it presents at least one of the video and audio of the second user to the first user (e.g., participant Ma) (step S113).
  • the operation of step S113 may be the same as step S16 in FIG. 5.
  • the first electronic device 1 receives data indicating a body movement such as a nod of the second user in step S113, it may reproduce the body movement of the second user, for example, by driving the power unit 80.
  • the first electronic device 1 receives data indicating a body movement such as a nod of the second user in step S113 it may reproduce the body movement of the second user, for example, by displaying it on the display unit 70.
  • the first user can receive the second user's response to his/her comment at an appropriate time. Therefore, according to the system of one embodiment, communication between multiple locations can be facilitated.
  • the first electronic device 1 and the second electronic device 100 communicate with each other via the third electronic device 300.
  • the above-mentioned operation may be performed without the third electronic device 300.
  • the first electronic device 1 and the second electronic device 100 may be configured to be able to communicate with each other directly or indirectly.
  • the third electronic device 300 estimates the response timing in advance in step S105, and when the time of the response timing arrives, transmits information indicating the response timing to the second electronic device 100.
  • the system according to an embodiment may not be limited to such a configuration.
  • a system according to a modified example of an embodiment will be further described below.
  • FIG. 8 is a sequence diagram that explains the characteristic operations of a system according to a modified embodiment of the embodiment shown in FIG. 7. Below, only the differences from the characteristic operations of a system according to a modified embodiment of the embodiment shown in FIG. 7 will be explained.
  • step S101 to step S105 may be the same as those in FIG. 7.
  • the third electronic device 300 may transmit information indicating the response timing to the second electronic device 100 even before the time of the response timing arrives (steps S121 and S122).
  • the second electronic device 100 that has received the information indicating the response timing determines whether the time of the estimated response timing has arrived (step S123). If the time of the response timing has not arrived in step S123, the second electronic device 100 may wait until the time of the response timing arrives or may execute other processing. If the time of the response timing arrives in step S123, the second electronic device 100 notifies the second user that it is time to respond (step S109).
  • the operations from step S110 to step S113 may be the same as those in FIG. 7.
  • the first user can receive the second user's response to his/her own comment at an appropriate time.
  • the second electronic device 100 detects the response of the second user.
  • the first electronic device 1 and/or the second electronic device 300 may detect the response of the second user.
  • the system may include, for example, a first electronic device 1, a second electronic device 100, and a third electronic device 300.
  • the first electronic device 1 acquires at least one of a video and a voice of the first user.
  • the second electronic device 100 may be configured to be able to communicate with the first electronic device 1.
  • the second electronic device 100 outputs at least one of a video and a voice of the first user acquired by the first electronic device 1 to a second user who responds to the speech of the first user.
  • the third electronic device 300 may include a control unit 310 and an estimation unit 314.
  • the estimation unit 314 may estimate the response timing of the second user who responds to the speech of the first user based on at least one of a video and a voice of the first user.
  • the control unit 310 may control the second electronic device 100 to acquire information indicating the response timing estimated by the estimation unit 314.
  • the second electronic device 100 may include a presentation unit.
  • the presentation unit of the second electronic device 100 may present the response timing to the second user as at least one of visual information, auditory information, and tactile information.
  • the presentation unit of the second electronic device 100 may be, for example, at least one of the display unit 170, the audio output unit 160, and the tactile sensation presentation unit 190 shown in FIG. 3.
  • the presentation unit of the second electronic device 100 may present the response timing to the second user when the response timing is reached.
  • the second electronic device 100 may include an acquisition unit 200.
  • the acquisition unit 200 of the second electronic device 100 may acquire the response of the second user as at least one of video and audio.
  • the acquisition unit 200 of the second electronic device 100 may be, for example, at least one of the imaging unit 140 and the audio input unit 150 shown in FIG. 3.
  • the acquisition unit 200 of the second electronic device 100 may acquire input to at least one of the imaging unit 140 and the audio input unit 150 shown in FIG. 3.
  • the acquisition unit 200 may acquire a mouse click or touch input by the user, or may acquire input to a motion sensor and/or a foot pedal, etc.
  • the second electronic device 100 may also include a communication unit 130.
  • the communication unit 130 may transmit at least one of the video and audio acquired by the acquisition unit to the first electronic device 1.
  • the control unit 310 of the third electronic device 300 may, for example, perform control so as to transmit information indicating the response timing estimated by the estimation unit 314 to the second electronic device 100 before the response timing is reached (i.e., in advance).
  • the second electronic device 100 may include an acquisition unit 200 that acquires a response of the second user corresponding to a predetermined action of the second user.
  • the second electronic device 100 may also include a communication unit 130 that transmits data indicated by the response of the second user to the first electronic device 1.
  • the first electronic device 1 may also include a power unit 80 that drives at least a part of the housing of the first electronic device 1 based on the data indicating the response of the second user.
  • the estimation unit 314 of the third electronic device 300 may estimate the response timing.
  • the estimation unit 314 may estimate the response timing based on at least one of the voice characteristics of the first user and the language characteristics of the first user extracted from at least one of the video and audio of the first user acquired by the first electronic device 1.
  • the estimation unit 314 may also estimate the response timing based on at least one of the facial expression characteristics of the first user and the gestures of the first user extracted from at least one of the video and audio of the first user acquired by the first electronic device 1.
  • the estimation unit 314 of the third electronic device 300 may estimate the response timing by predicting the timing at which the first user's current utterance will end and the timing at which the first user's next utterance will start.
  • the response may overlap with the start of the first user's next speech. In that case, it is assumed that the first user who has started the next speech may interrupt the speech, which may result in poor communication. Therefore, for example, in step S113 of FIG. 7 or FIG. 8, when presenting the video and/or audio of the second user, if the remaining time until the end of the response timing is short, the first electronic device 1 may not present the video and/or audio of the second user. In addition, the first electronic device 1 may not drive at least a part of the housing of the first electronic device 1 with the power unit 80 based on data indicating the response of the second user.
  • the third electronic device 300 may also transmit the response timing estimated by the estimation unit 314 to the first electronic device 1. Then, for example, in step S113 of FIG. 7 or FIG. 8, the determination unit 12 of the first electronic device 1 may determine whether the remaining time of the response timing is shorter than a predetermined time. When the remaining response time is shorter than a predetermined time, the first electronic device 1 may be configured not to present the video and/or audio of the second user. Furthermore, when the remaining response time is shorter than a predetermined time, the first electronic device 1 may be configured not to cause the power unit 80 to drive at least a part of the housing of the first electronic device 1 based on data indicating the response of the second user.
  • the first electronic device 1 may include a determination unit 12.
  • the determination unit 12 may determine whether or not to present at least one of the video and audio of the second user acquired from the second electronic device 100 to the first user based on the remaining time until the end of the response timing.
  • the determination unit 12 may determine whether or not to cause the power unit 80 to drive at least a part of the housing of the first electronic device 1 based on the remaining time until the end of the response timing.
  • the response is prevented from overlapping with the start of the first user's next utterance. Therefore, the first user who has started the next utterance is not interrupted, allowing for smooth communication.
  • the first electronic device 1 may not present the image and/or voice of the second user when the first user is speaking, not when the remaining response time is shorter than the predetermined time.
  • the first electronic device 1 may not present the image and/or voice of the second user when the first user is speaking.
  • the determination unit 12 of the first electronic device 1 may determine whether or not the speech of the first user is detected by the voice input unit 50 in step S113 of FIG. 7 or FIG. 8. If the speech of the first user is detected, the first electronic device 1 may not present the image and/or voice of the second user.
  • the first electronic device 1 may not drive at least a part of the housing of the first electronic device 1 to the power unit 80 based on the data indicating the response of the second user, not when the remaining response time is shorter than the predetermined time, but when the first user is speaking.
  • the determination unit 12 of the first electronic device 1 may determine whether or not to present at least one of the video and audio of the second user acquired from the second electronic device 100 to the first user, depending on whether or not the first electronic device 1 detects the audio of the first user. Furthermore, the determination unit 12 may determine whether or not to cause the power unit 80 to drive at least a part of the housing of the first electronic device 1, depending on whether or not the first electronic device 1 detects the audio of the first user.
  • the first electronic device 1 may execute an operation to suggest to the first user that the response timing will be extended, instead of not presenting the image and/or voice of the second user. For example, when presenting the image and/or voice of the second user, the first electronic device 1 may notify the first user that the second user is about to speak if the remaining time until the end of the response timing is short when presenting the image and/or voice of the second user. Also, when the power unit 80 drives at least a part of the housing of the first electronic device 1 based on data indicating the response of the second user, the first electronic device 1 may notify the first user that the second user is about to speak if the remaining time until the end of the response timing is short.
  • the first electronic device 1 may output a voice such as a filler word of the second user, such as "hmm” or "um", from the voice output unit 60. Also, the first electronic device 1 may display the appearance of the second user about to speak on the display unit 70 by using characters or images. The first electronic device 1 may also express the appearance that the second user is about to speak by driving the power unit 80. In this way, by suggesting to the first user that the response timing will be extended and then presenting the image and/or audio of the second user, the risk that the response of the second user will overlap with the next utterance of the first user is reduced.
  • a voice such as a filler word of the second user, such as "hmm" or "um”
  • the first electronic device 1 may display the appearance of the second user about to speak on the display unit 70 by using characters or images.
  • the first electronic device 1 may also express the appearance that the second user is about to speak by driving the power unit 80. In this way, by suggesting to the first user that the response timing will be extended and then presenting the
  • the first electronic device 1 may include a control unit 10.
  • the control unit 10 may control the first electronic device 1 to perform an operation suggesting to the first user that the response timing be extended.
  • the control unit 10 may perform such control when the remaining time until the end of the response timing is less than or equal to a predetermined time when at least one of the video and audio of the second user acquired from the second electronic device 100 is presented to the first user.
  • the control unit 10 may also perform such control when the remaining time until the end of the response timing is less than or equal to a predetermined time when the power unit 80 drives at least a part of the housing of the first electronic device 1 based on data indicating the response of the second user.
  • the third electronic device 300 may adjust the response timing estimated by the estimation unit 314 based on the timing when the second user responded to the first user's speech in the past.
  • the adjustment unit 316 of the third electronic device 300 may adjust the response timing estimated by the estimation unit 314 based on the timing when the second user responded to the first user's speech in the past. For example, if the response timing estimated by the estimation unit 314 is too early, the adjustment unit 316 may delay the response timing depending on the degree to which the response timing is determined to be too early. Also, if the response timing estimated by the estimation unit 314 is too late, the adjustment unit 316 may advance the response timing depending on the degree to which the response timing is determined to be too late.
  • the system according to the modified example of the embodiment may include an adjustment unit 316.
  • the adjustment unit 316 may adjust the response timing estimated by the estimation unit 314 based on the timing at which the second user responded to the speech of the first user in the past.
  • the function of the adjustment unit 316 provided in the third electronic device 300 may be realized, for example, by the adjustment unit 116 provided in the second electronic device 100, or by the adjustment unit 16 provided in the first electronic device 1.
  • the time at which the response timing is presented may be adjusted, rather than adjusting the estimated response timing.
  • the adjustment unit 116 of the second electronic device 100 may adjust the time at which the response timing estimated by the estimation unit 314 is presented, based on the timing at which the second user responded to the speech of the first user in the past. For example, if the response timing estimated by the estimation unit 314 is too early, the adjustment unit 116 may delay the time at which the response timing is presented, depending on the degree to which the response timing is determined to be too early. Also, if the response timing estimated by the estimation unit 314 is too late, the adjustment unit 116 may advance the time at which the response timing is presented, depending on the degree to which the response timing is determined to be too late.
  • the second electronic device 100 may include an adjustment unit 116.
  • the adjustment unit 116 may adjust the time at which the response timing is presented to the second user based on the timing at which the second user responded to the speech of the first user in the past.
  • the function of the adjustment unit 116 included in the second electronic device 100 may be realized, for example, by an adjustment unit 316 included in the third electronic device 300, or may be realized by an adjustment unit 16 included in the first electronic device 1.
  • the embodiments of the present disclosure can also be realized as a method, a program executed by a processor or the like included in the device, or a storage medium or storage medium on which a program is recorded. It should be understood that these are also included in the scope of the present disclosure.
  • the above-described embodiments are not limited to implementation as a system.
  • the above-described embodiments may be implemented as a control method for a system, or as a program executed in a system.
  • the above-described embodiments may be implemented as at least one of the first electronic device 1, the second electronic device 100, and the third electronic device 300.
  • the above-described embodiments may be implemented as a control method for at least one of the first electronic device 1, the second electronic device 100, and the third electronic device 300.
  • the above-described embodiments may be implemented as a program executed by at least one of the first electronic device 1, the second electronic device 100, and the third electronic device 300, or as a storage medium or recording medium on which the program is recorded.
  • the above-described embodiment may be implemented as the second electronic device 100.
  • the second electronic device 100 may be configured to be able to communicate with the first electronic device 1.
  • the second electronic device 100 may include an acquisition unit, an output unit, an estimation unit, and a presentation unit.
  • the acquisition unit may acquire at least one of an image and a voice of the user of the first electronic device 1.
  • the acquisition unit may be, for example, at least one of the imaging unit 140 and the voice input unit 150 shown in FIG. 3.
  • the output unit may output at least one of an image and a voice of the user of the first electronic device 1 to the user of the second electronic device 100 who responds to the speech of the user of the first electronic device 1.
  • the output unit may be, for example, at least one of the voice output unit 160 and the display unit 170 shown in FIG. 3.
  • the estimation unit may estimate the response timing of the user of the second electronic device 100 who responds to the speech of the user of the first electronic device 1 based on at least one of the image and the voice of the user of the first electronic device 1.
  • the estimation unit may be, for example, the estimation unit 114 shown in FIG. 3.
  • the presentation unit may present information indicating the response timing estimated by the estimation unit.
  • the presentation unit may be, for example, at least one of the audio output unit 160, the display unit 170, and the haptic sensation providing unit 190 shown in FIG. 3.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Databases & Information Systems (AREA)
  • Telephonic Communication Services (AREA)

Abstract

This system includes a first electronic device, a second electronic device, an estimation unit, and a control unit. The first electronic device acquires at least one from among video and audio of a first user. The second electronic device is configured so as to be capable of communicating with the first electronic device, and the at least one from among video and audio of the first user acquired by the first electronic device is output to a second user responding to speech of the first user. The estimation unit estimates, on the basis of the at least one from among video and audio of the first user, the response timing of the second user responding to speech of the first user. The control unit causes information indicating the response timing estimated by the estimation unit to be acquired by the second electronic device.

Description

システム、電子機器、システムの制御方法、及びプログラムSYSTEM, ELECTRONIC DEVICE, SYSTEM CONTROL METHOD, AND PROGRAM - Patent application 関連出願の相互参照CROSS-REFERENCE TO RELATED APPLICATIONS
 本出願は、2022年9月29日に日本国に特許出願された特願2022-156837の優先権を主張するものであり、この先の出願の開示全体を、ここに参照のために取り込む。 This application claims priority to patent application No. 2022-156837, filed in Japan on September 29, 2022, the entire disclosure of which is incorporated herein by reference.
 本開示は、システム、電子機器、システムの制御方法、及びプログラムに関する。 This disclosure relates to a system, an electronic device, a method for controlling the system, and a program.
 近年、Web会議又はビデオ会議などのような、いわゆるリモート会議が実施される機会が増えている。リモート会議においては、複数の場所に存在する参加者のコミュニケーションを実現する電子機器(又は電子機器を含むシステム)が使用される。例えば、あるオフィスにおいて会議が行われる際に、会議の参加者の少なくとも1人が、遠隔地の自宅でリモート会議を行う場面を想定する。この場合、オフィスにおける会議の音声及び/又は映像は、例えばオフィスに設置された電子機器によって取得されて、例えば参加者の自宅に設置された電子機器に送信される。また、参加者の自宅における音声及び/又は映像は、例えば参加者の自宅に設置された電子機器によって取得されて、例えばオフィスに設置された電子機器に送信される。このような電子機器によれば、参加者全員が同じ場所に参集しなくても、会議を行うことができる。 In recent years, so-called remote conferences, such as web conferences or video conferences, have become more common. In remote conferences, electronic devices (or systems including electronic devices) are used to enable communication between participants in multiple locations. For example, consider a situation in which a conference is held in an office, and at least one of the conference participants holds the remote conference at his or her home in a remote location. In this case, audio and/or video of the conference in the office is acquired by, for example, an electronic device installed in the office, and transmitted to, for example, an electronic device installed in the participant's home. Also, audio and/or video at the participant's home is acquired by, for example, an electronic device installed in the participant's home, and transmitted to, for example, an electronic device installed in the office. Such electronic devices allow a conference to be held without all participants gathering in the same place.
 上述のようなリモート会議に応用され得る技術は、種々提案されている。例えば特許文献1は、スピーカが出力する指向性を有する音の出力範囲を表わす図形を、カメラが撮像した画像と重ねて表示する装置を開示している。この装置によれば、指向性を有する音の出力範囲を視覚的に把握することができる。また、例えば、特許文献2は、離れた所にいる話し手と聞き手が会話を行う際、話し手の側に聞き手ロボットを付設し、聞き手の側に話し手ロボットを付設するシステムを開示している。 Various technologies that can be applied to remote conferences such as those described above have been proposed. For example, Patent Document 1 discloses a device that displays a graphic that represents the output range of directional sound output by a speaker, superimposed on an image captured by a camera. This device makes it possible to visually grasp the output range of directional sound. Furthermore, for example, Patent Document 2 discloses a system in which, when a speaker and a listener in separate locations are engaged in a conversation, a listener robot is attached to the speaker's side, and a speaker robot is attached to the listener's side.
特開2010-21705号公報JP 2010-21705 A 特開2000-349920号公報JP 2000-349920 A
 一実施形態に係るシステムは、
 第1ユーザの映像及び音声の少なくとも一方を取得する第1電子機器と、
 前記第1電子機器と通信可能に構成され、前記第1電子機器が取得する前記第1ユーザの映像及び音声の少なくとも一方を、前記第1ユーザの発話に応答する第2ユーザに出力する第2電子機器と、
 前記第1ユーザの映像及び音声の少なくとも一方に基づいて、前記第1ユーザの発話に応答する前記第2ユーザの応答タイミングを推定する推定部と、
 前記推定部が推定する前記応答タイミングを示す情報を、前記第2電子機器によって取得させる制御部と、
 を含む。
In one embodiment, the system includes:
a first electronic device that acquires at least one of video and audio of a first user;
a second electronic device configured to be able to communicate with the first electronic device and configured to output at least one of a video and a voice of the first user acquired by the first electronic device to a second user who responds to a speech of the first user;
an estimation unit that estimates a response timing of the second user responding to an utterance of the first user based on at least one of a video and a voice of the first user;
a control unit that causes the second electronic device to acquire information indicating the response timing estimated by the estimation unit;
including.
 一実施形態に係る電子機器は、
 他の電子機器と通信可能に構成される電子機器であって、
 前記他の電子機器のユーザの映像及び音声の少なくとも一方を取得する取得部と、
 前記他の電子機器のユーザの映像及び音声の少なくとも一方を、前記他の電子機器のユーザの発話に応答する前記電子機器のユーザに出力する出力部と、
 前記他の電子機器のユーザの映像及び音声の少なくとも一方に基づいて、前記他の電子機器のユーザの発話に応答する前記電子機器のユーザの応答タイミングを推定する推定部と、
 前記推定部が推定する前記応答タイミングを示す情報を提示する提示部と、
 を備える。
The electronic device according to an embodiment includes:
An electronic device configured to be able to communicate with other electronic devices,
an acquisition unit that acquires at least one of a video and a voice of a user of the other electronic device;
an output unit that outputs at least one of a video and a voice of a user of the other electronic device to the user of the other electronic device in response to a speech of the user of the other electronic device;
an estimation unit that estimates a response timing of a user of the other electronic device responding to an utterance of the user of the other electronic device based on at least one of an image and an audio of the user of the other electronic device;
a presentation unit that presents information indicating the response timing estimated by the estimation unit;
Equipped with.
 一実施形態に係るシステムの制御方法は、
 第1電子機器が、第1ユーザの映像及び音声の少なくとも一方を取得するステップと、
 前記第1電子機器と通信可能に構成される第2電子機器が、前記第1電子機器によって取得される前記第1ユーザの映像及び音声の少なくとも一方を、前記第1ユーザの発話に応答する第2ユーザに出力するステップと、
 前記第1ユーザの映像及び音声の少なくとも一方に基づいて、前記第1ユーザの発話に応答する前記第2ユーザの応答タイミングを推定するステップと、
 前記応答タイミングを示す情報を、前記第2電子機器によって取得させるステップと、
 を含む。
A method for controlling a system according to an embodiment includes the steps of:
A first electronic device acquires at least one of a video and a voice of a first user;
A step in which a second electronic device configured to be able to communicate with the first electronic device outputs at least one of a video and a voice of the first user acquired by the first electronic device to a second user who responds to an utterance of the first user;
estimating a response timing of the second user responding to the speech of the first user based on at least one of a video and a voice of the first user;
causing the second electronic device to acquire information indicating the response timing;
including.
 一実施形態に係るプログラムは、
 コンピュータに、
 第1電子機器が、第1ユーザの映像及び音声の少なくとも一方を取得するステップと、
 前記第1電子機器と通信可能に構成される第2電子機器が、前記第1電子機器によって取得される前記第1ユーザの映像及び音声の少なくとも一方を、前記第1ユーザの発話に応答する第2ユーザに出力するステップと、
 前記第1ユーザの映像及び音声の少なくとも一方に基づいて、前記第1ユーザの発話に応答する前記第2ユーザの応答タイミングを推定するステップと、
 前記応答タイミングを示す情報を、前記第2電子機器によって取得させるステップと、
 を実行させる。
A program according to an embodiment includes:
On the computer,
A first electronic device acquires at least one of a video and a voice of a first user;
A step in which a second electronic device configured to be able to communicate with the first electronic device outputs at least one of a video and a voice of the first user acquired by the first electronic device to a second user who responds to an utterance of the first user;
estimating a response timing of the second user responding to the speech of the first user based on at least one of a video and a voice of the first user;
causing the second electronic device to acquire information indicating the response timing;
Execute the command.
一実施形態に係るシステムの使用態様の例を示す図である。FIG. 1 is a diagram illustrating an example of a usage mode of a system according to an embodiment. 一実施形態に係る第1電子機器の構成を概略的に示す機能ブロック図である。FIG. 2 is a functional block diagram illustrating a schematic configuration of a first electronic device according to an embodiment. 一実施形態に係る第2電子機器の構成を概略的に示す機能ブロック図である。FIG. 4 is a functional block diagram illustrating a schematic configuration of a second electronic device according to an embodiment. 一実施形態に係る第3電子機器の構成を概略的に示す機能ブロック図である。FIG. 4 is a functional block diagram illustrating a configuration of a third electronic device according to an embodiment. 一実施形態に係るシステムの基本的な動作を説明するシーケンス図である。FIG. 2 is a sequence diagram illustrating a basic operation of a system according to an embodiment. 一実施形態に係る応答タイミングを説明する図である。FIG. 11 is a diagram illustrating response timing according to an embodiment. 一実施形態に係るシステムの動作を説明するシーケンス図である。FIG. 2 is a sequence diagram illustrating the operation of a system according to an embodiment. 一実施形態に係るシステムの動作を説明するシーケンス図である。FIG. 2 is a sequence diagram illustrating the operation of a system according to an embodiment.
 本開示において、「電子機器」とは、例えば電力系統又はバッテリなどから供給される電力により駆動する機器としてよい。本開示において、「システム」とは、例えば、少なくとも電子機器を含むものとしてよい。本開示において、「ユーザ」とは、一実施形態に係る電子機器を使用する者又は使用し得る者(典型的には人間)、及び、一実施形態に係る電子機器を含むシステムを使用する者又は使用し得る者としてよい。また、本開示において、Web会議又はビデオ会議などのように、参加者の少なくとも1人が他の参加者と異なる場所から通信により参加する方式の会議を、「リモート会議」と総称する。 In this disclosure, an "electronic device" may be, for example, a device that is powered by power supplied from a power system or a battery. In this disclosure, a "system" may be, for example, a device that includes at least an electronic device. In this disclosure, a "user" may be a person who uses or may use an electronic device according to an embodiment (typically a human), and a person who uses or may use a system including an electronic device according to an embodiment. In addition, in this disclosure, a conference in which at least one participant participates by communication from a different location than the other participants, such as a web conference or video conference, is collectively referred to as a "remote conference."
 リモート会議などにおいて複数の場所の間でコミュニケーションを実現する電子機器は、例えばコミュニケーションの円滑化のため、さらなる機能の向上が望まれている。本開示の目的は、複数の場所の間でコミュニケーションを円滑にするシステム、電子機器、システムの制御方法、及びプログラムを提供することにある。一実施形態によれば、複数の場所の間でコミュニケーションを円滑にするシステム、電子機器、システムの制御方法、及びプログラムを提供することができる。以下、一実施形態に係る電子機器を含むシステムについて、図面を参照して詳細に説明する。 Further improvements in functionality are desired for electronic devices that enable communication between multiple locations during remote conferences, etc., for example to facilitate communication. The purpose of the present disclosure is to provide a system, electronic device, system control method, and program that facilitate communication between multiple locations. According to one embodiment, it is possible to provide a system, electronic device, system control method, and program that facilitate communication between multiple locations. Below, a system including an electronic device according to one embodiment is described in detail with reference to the drawings.
 図1は、一実施形態に係るシステムの使用態様の例を示す図である。以下、図1に示すように、会議室MRにおいて行われる会議に、参加者Mgが自宅RLからリモートで参加する場面を想定して説明する。図1に示すように、会議室MRにおいて、参加者Ma,Mb,Mc,及びMdが会議に参加するものとする。会議室MRにおいて、会議の参加者は、参加者Ma,Mb,Mc,及びMdなどに限定されず、例えばさらに他の参加者を含んでもよい。また、参加者Mg以外の参加者も、それぞれの自宅から、当該会議にリモートで参加してもよい。 FIG. 1 is a diagram showing an example of how a system according to an embodiment is used. The following description assumes a situation in which participant Mg remotely participates in a conference held in conference room MR from his/her home RL, as shown in FIG. 1. As shown in FIG. 1, participants Ma, Mb, Mc, and Md participate in the conference in conference room MR. In conference room MR, the participants of the conference are not limited to participants Ma, Mb, Mc, and Md, and may include, for example, other participants. Furthermore, participants other than participant Mg may also remotely participate in the conference from their respective homes.
 図1に示すように、一実施形態に係るシステムは、例えば、第1電子機器1と、第2電子機器100と、第3電子機器300と、を含んで構成されてよい。図1において、第1電子機器1、第2電子機器100、及び第3電子機器300は、それぞれ概略的な形状のみを示している。一実施形態に係るシステムは、第1電子機器1、第2電子機器100、及び第3電子機器300の少なくともいずれかを含まなくてもよいし、前述の電子機器以外の機器を含んでもよい。 As shown in FIG. 1, the system according to an embodiment may include, for example, a first electronic device 1, a second electronic device 100, and a third electronic device 300. In FIG. 1, the first electronic device 1, the second electronic device 100, and the third electronic device 300 are shown only in schematic form. The system according to an embodiment may not include at least any of the first electronic device 1, the second electronic device 100, and the third electronic device 300, and may include devices other than the electronic devices mentioned above.
 一実施形態に係る第1電子機器1は、会議室MRに設置されてよい。一方、一実施形態に係る第1電子機器1と通信可能な第2電子機器100は、参加者Mgの自宅RLに設置されてよい。参加者Mgの自宅RLの場所は、会議室MRの場所とは異なる場所としてよい。参加者Mgの自宅RLの場所は、会議室MRの場所から遠く離れていてもよいし、会議室MRの場所の近くとしてもよい。 The first electronic device 1 according to one embodiment may be installed in the conference room MR. Meanwhile, the second electronic device 100 capable of communicating with the first electronic device 1 according to one embodiment may be installed in the home RL of the participant Mg. The location of the home RL of the participant Mg may be a location different from the location of the conference room MR. The location of the home RL of the participant Mg may be far away from the location of the conference room MR, or may be close to the location of the conference room MR.
 図1に示すように、一実施形態に係る第1電子機器1は、例えばネットワークNを介して、一実施形態に係る第2電子機器100と接続される。また、図1に示すように、一実施形態に係る第3電子機器300は、例えばネットワークNを介して、第1電子機器1及び第2電子機器100の少なくとも一方と接続されてよい。一実施形態に係る第1電子機器1は、無線及び有線の少なくとも一方により、一実施形態に係る第2電子機器100と接続されてよい。一実施形態に係る第3電子機器300は、無線及び有線の少なくとも一方により、第1電子機器1及び第2電子機器100の少なくとも一方と接続されてよい。図1において、第1電子機器1、第2電子機器100、及び第3電子機器300がネットワークNを介して無線及び/又は有線により接続されている様子を、破線によって示してある。一実施形態において、第1電子機器1及び第2電子機器100は、一実施形態に係るリモート会議システムに含まれるものとしてよい。また、第3電子機器300も、一実施形態に係るリモート会議システムに含まれるものとしてもよい。 1, the first electronic device 1 according to an embodiment is connected to the second electronic device 100 according to an embodiment, for example, via a network N. Also, as shown in FIG. 1, the third electronic device 300 according to an embodiment may be connected to at least one of the first electronic device 1 and the second electronic device 100, for example, via a network N. The first electronic device 1 according to an embodiment may be connected to the second electronic device 100 according to an embodiment, at least one of wirelessly and wired. The third electronic device 300 according to an embodiment may be connected to at least one of the first electronic device 1 and the second electronic device 100, at least one of wirelessly and wired. In FIG. 1, the first electronic device 1, the second electronic device 100, and the third electronic device 300 are connected wirelessly and/or wired via the network N, as shown by dashed lines. In an embodiment, the first electronic device 1 and the second electronic device 100 may be included in a remote conference system according to an embodiment. Also, the third electronic device 300 may be included in a remote conference system according to an embodiment.
 本開示において、図1に示すようなネットワークNは、例えば各種の電子機器及び/又はサーバのような機器を適宜含んでもよい。また、図1に示すようなネットワークNは、例えば基地局及び/又は中継器のような機器も、適宜含んでもよい。また、本開示において、例えば第1電子機器1と第2電子機器100とが「通信する」場合、第1電子機器1と第2電子機器100とが直接通信するものとしてもよい。また、例えば第1電子機器1と第2電子機器100とが「通信する」場合、第1電子機器1と第2電子機器100とが例えば第3電子機器300のような他の機器及び/又は基地局などの少なくともいずれかを介して通信するものとしてもよい。また、例えば第1電子機器1と第2電子機器100とが「通信する」場合、より詳細には、第1電子機器1の通信部と第2電子機器100の通信部とが通信を行うものとしてよい。 In the present disclosure, the network N as shown in FIG. 1 may include various electronic devices and/or devices such as a server as appropriate. The network N as shown in FIG. 1 may also include devices such as a base station and/or a repeater as appropriate. In the present disclosure, for example, when the first electronic device 1 and the second electronic device 100 "communicate", the first electronic device 1 and the second electronic device 100 may communicate directly. In the present disclosure, for example, when the first electronic device 1 and the second electronic device 100 "communicate", the first electronic device 1 and the second electronic device 100 may communicate via at least one of other devices such as the third electronic device 300 and/or a base station. In the present disclosure, for example, when the first electronic device 1 and the second electronic device 100 "communicate", more specifically, the communication unit of the first electronic device 1 and the communication unit of the second electronic device 100 may communicate.
 以上のような表記は、第1電子機器1と第2電子機器100とが「通信する」場合のみならず、一方が他方に情報を「送信する」場合、及び/又は、一方が送信した情報を他方が「受信する」場合にも、上述同様の意図を含んでもよい。さらに、以上のような表記は、第1電子機器1と第2電子機器100とが「通信する」場合のみならず、例えば第3電子機器300を含む任意の電子機器が、他の任意の電子機器と通信する場合にも、上述同様の意図を含んでもよい。 The above-mentioned notation may include the same intention as above not only when the first electronic device 1 and the second electronic device 100 "communicate" with each other, but also when one "sends" information to the other and/or when the other "receives" information sent by one. Furthermore, the above-mentioned notation may include the same intention as above not only when the first electronic device 1 and the second electronic device 100 "communicate" with each other, but also when any electronic device, including the third electronic device 300, communicates with any other electronic device.
 一実施形態に係る第1電子機器1は、会議室MRにおいて、例えば図1に示すように配置されてよい。この場合、第1電子機器1は、会議の参加者Ma,Mb,Mc,及びMdの少なくとも1人の音声及び/又は映像を取得可能な位置に配置されてよい。また、第1電子機器1は、後述のように、参加者Mgの音声及び/又は映像を出力する。このため、第1電子機器1は、第1電子機器1から出力される参加者Mgの音声及び/又は映像が会議の参加者Ma,Mb,Mc,及びMdの少なくとも1人に届くように配置されてよい。 The first electronic device 1 according to one embodiment may be arranged in the conference room MR, for example as shown in FIG. 1. In this case, the first electronic device 1 may be arranged in a position where it can acquire the voice and/or video of at least one of the conference participants Ma, Mb, Mc, and Md. Furthermore, the first electronic device 1 outputs the voice and/or video of participant Mg, as described below. Therefore, the first electronic device 1 may be arranged so that the voice and/or video of participant Mg output from the first electronic device 1 reaches at least one of the conference participants Ma, Mb, Mc, and Md.
 一実施形態に係る第2電子機器100は、参加者Mgの自宅RLにおいて、例えば図1に示すような態様で配置されてよい。この場合、第2電子機器100は、参加者Mgの音声及び/又は映像を取得可能な位置に配置されてよい。第2電子機器100は、第2電子機器100に接続されたマイク若しくはヘッドセット及び/又はカメラによって、参加者Mgの音声及び/又は映像を取得してもよい。 The second electronic device 100 according to one embodiment may be arranged in the home RL of the participant Mg, for example, in a manner as shown in FIG. 1. In this case, the second electronic device 100 may be arranged in a position where it is possible to acquire the voice and/or image of the participant Mg. The second electronic device 100 may acquire the voice and/or image of the participant Mg by a microphone or a headset and/or a camera connected to the second electronic device 100.
 また、第2電子機器100は、後述のように、会議室MRにおける会議の参加者Ma,Mb,Mc,及びMdの少なくとも1人の音声及び/又は映像を出力する。このため、第2電子機器100は、第2電子機器100から出力される音声及び/又は映像が参加者Mgに届くように配置されてよい。第2電子機器100から出力される音声は、例えばヘッドフォン、イヤフォン、スピーカ、又はヘッドセットなどを介して、参加者Mgの耳に届くように配置されてもよい。 The second electronic device 100 also outputs audio and/or video of at least one of the participants Ma, Mb, Mc, and Md of the conference in the conference room MR, as described below. For this reason, the second electronic device 100 may be positioned so that the audio and/or video output from the second electronic device 100 reaches the participant Mg. The audio output from the second electronic device 100 may be positioned so that it reaches the ears of the participant Mg via, for example, headphones, earphones, speakers, or a headset.
 第3電子機器300は、第1電子機器1と第2電子機器100とを中継する例えばサーバのような機器としてよい。また、一実施形態に係るシステムは、第3電子機器300を含まなくてもよい。 The third electronic device 300 may be, for example, a server-like device that relays between the first electronic device 1 and the second electronic device 100. Also, the system according to one embodiment does not need to include the third electronic device 300.
 図1は、一実施形態に係る第1電子機器1、第2電子機器100、及び第3実施形態300の使用態様の単なる一例を示すものである。一実施形態に係る第1電子機器1、第2電子機器100、及び第3実施形態300は、他の種々の態様で使用されてもよい。 FIG. 1 shows only one example of a usage mode of the first electronic device 1, the second electronic device 100, and the third embodiment 300 according to an embodiment. The first electronic device 1, the second electronic device 100, and the third embodiment 300 according to an embodiment may be used in various other modes.
 図1に示す第1電子機器1及び第2電子機器100を含むリモート会議システムにより、参加者Mgは、自宅RLに居ながら、あたかも会議室MRにおいて実施される会議に参加しているように振る舞うことができる。また、図1に示す第1電子機器1及び第2電子機器100を含むリモート会議システムにより、会議の参加者Ma,Mb,Mc,及びMdは、会議室MRにおいて実施される会議にあたかも参加者Mgが現実に参加しているかのような感覚を得ることができる。すなわち、第1電子機器1及び第2電子機器100を含むリモート会議システムにおいて、会議室MRに配置された第1電子機器1は、参加者Mgのアバターのような役割を担うことができる。この場合、第1電子機器1は、当該第1電子機器1を参加者Mgに見立てたフィジカルアバター(例えばテレプレゼンスロボットのような)として機能するようにしてもよい。また、第1電子機器1は、当該第1電子機器1に参加者Mgの画像又は参加者Mgを例えばキャラクタ化したような画像を表示させたバーチャルアバターとして機能するようにしてもよい。 The remote conference system including the first electronic device 1 and the second electronic device 100 shown in FIG. 1 allows the participant Mg to behave as if he or she is participating in a conference held in the conference room MR while staying at home RL. Also, the remote conference system including the first electronic device 1 and the second electronic device 100 shown in FIG. 1 allows the conference participants Ma, Mb, Mc, and Md to feel as if the participant Mg is actually participating in the conference held in the conference room MR. That is, in the remote conference system including the first electronic device 1 and the second electronic device 100, the first electronic device 1 arranged in the conference room MR can play a role like an avatar of the participant Mg. In this case, the first electronic device 1 may function as a physical avatar (such as a telepresence robot) that resembles the participant Mg. Also, the first electronic device 1 may function as a virtual avatar that displays an image of the participant Mg or an image that resembles, for example, a character of the participant Mg on the first electronic device 1.
 次に、一実施形態に係る第1電子機器1、第2電子機器100、及び第3電子機器300の機能的な構成について説明する。 Next, the functional configurations of the first electronic device 1, the second electronic device 100, and the third electronic device 300 according to one embodiment will be described.
 図2は、図1に示した第1電子機器1の機能の構成を概略的に示すブロック図である。以下、一実施形態に係る第1電子機器1の構成の一例について説明する。第1電子機器1は、図1に示したように、例えば参加者Ma,Mb,Mc,及びMdなどが、会議室MRにおいて使用する機器としてよい。後述する第2電子機器100は、参加者Mgが発話する際に、第2電子機器100が取得した参加者Mgの音声及び/又は映像を、第1電子機器1に出力する機能を有する。また、第1電子機器1は、参加者Ma,Mb,Mc,及びMdなどが発話する際に、第1電子機器1が取得した参加者Ma,Mb,Mc,及びMdなどの音声及び/又は映像を、第2電子機器100に出力する機能を有する。第1電子機器1により、参加者Ma,Mb,Mc,及びMdなどは、会議室MRにおいて、参加者Mgが離れた場所にいても、リモート会議又はビデオ会議を行うことができる。したがって、第1電子機器1は、適宜、「ローカルで使用される」電子機器とも記す。 2 is a block diagram showing a schematic configuration of the functions of the first electronic device 1 shown in FIG. 1. An example of the configuration of the first electronic device 1 according to an embodiment will be described below. As shown in FIG. 1, the first electronic device 1 may be used in the conference room MR by participants Ma, Mb, Mc, Md, etc., for example. The second electronic device 100 described later has a function of outputting the voice and/or video of the participant Mg acquired by the second electronic device 100 to the first electronic device 1 when the participant Mg speaks. The first electronic device 1 also has a function of outputting the voice and/or video of the participants Ma, Mb, Mc, Md, etc. acquired by the first electronic device 1 to the second electronic device 100 when the participants Ma, Mb, Mc, Md, etc. speak. The first electronic device 1 allows the participants Ma, Mb, Mc, Md, etc. to hold a remote conference or video conference in the conference room MR even if the participant Mg is in a remote location. Therefore, the first electronic device 1 is also referred to as an electronic device "used locally" as appropriate.
 一実施形態に係る第1電子機器1は、各種の機器を想定することができるが、例えば、専用に設計された機器としてもよい。例えば、一実施形態に係る第1電子機器1は、人間などのイラストが描かれた外観の筐体を有してもよいし、人間などの少なくとも一部を模したような形状又はロボットのような形状の筐体を有してもよい。また、一実施形態に係る第1電子機器1は、例えば、汎用のスマートフォン、タブレット、ファブレット、ノートパソコン(ノートPC若しくはラップトップ)、又はコンピュータ(デスクトップ)などの機器としてもよい。一実施形態に係る第1電子機器1は、例えばノートPCのディスプレイに、人間又はロボットなどの少なくとも一部を描画してもよい。 The first electronic device 1 according to one embodiment can be various devices, but may be, for example, a specially designed device. For example, the first electronic device 1 according to one embodiment may have a housing with an exterior on which an illustration of a human or the like is drawn, or may have a housing that is shaped to resemble at least a part of a human or the like, or a robot. The first electronic device 1 according to one embodiment may be, for example, a general-purpose smartphone, tablet, phablet, notebook computer (notebook PC or laptop), or computer (desktop). The first electronic device 1 according to one embodiment may have at least a part of a human or robot drawn on the display of a notebook PC, for example.
 図2に示すように、一実施形態に係る第1電子機器1は、制御部10、記憶部20、通信部30、撮像部40、音声入力部50、音声出力部60、表示部70、及び動力部80を備えてよい。また、制御部10は、例えば、判定部12、推定部14、及び調整部16を含んでよい。一実施形態において、第1電子機器1は、図2に示す機能部の少なくとも一部を備えなくてもよいし、図2に示す機能部以外の構成要素を備えてもよい。 As shown in FIG. 2, the first electronic device 1 according to one embodiment may include a control unit 10, a memory unit 20, a communication unit 30, an imaging unit 40, an audio input unit 50, an audio output unit 60, a display unit 70, and a power unit 80. The control unit 10 may also include, for example, a determination unit 12, an estimation unit 14, and an adjustment unit 16. In one embodiment, the first electronic device 1 may not include at least some of the functional units shown in FIG. 2, or may include components other than the functional units shown in FIG. 2.
 制御部10は、第1電子機器1を構成する各機能部をはじめとして、第1電子機器1の全体を制御及び/又は管理する。制御部10は、種々の機能を実行するための制御及び処理能力を提供するために、例えばCPU(Central Processing Unit)又はDSP(Digital Signal Processor)のような、少なくとも1つのプロセッサを含んでよい。制御部10は、まとめて1つのプロセッサで実現してもよいし、いくつかのプロセッサで実現してもよいし、それぞれ個別のプロセッサで実現してもよい。プロセッサは、単一の集積回路(IC;Integrated Circuit)として実現されてよい。プロセッサは、複数の通信可能に接続された集積回路及びディスクリート回路として実現されてよい。プロセッサは、他の種々の既知の技術に基づいて実現されてよい。 The control unit 10 controls and/or manages the entire first electronic device 1, including each functional unit constituting the first electronic device 1. The control unit 10 may include at least one processor, such as a CPU (Central Processing Unit) or a DSP (Digital Signal Processor), to provide control and processing power for executing various functions. The control unit 10 may be realized as a single processor, as a number of processors, or as individual processors. The processor may be realized as a single integrated circuit (IC). The processor may be realized as a number of communicatively connected integrated circuits and discrete circuits. The processor may be realized based on various other known technologies.
 制御部10は、1以上のプロセッサ及びメモリを含んでもよい。プロセッサは、特定のプログラムを読み込ませて特定の機能を実行する汎用のプロセッサ、及び特定の処理に特化した専用のプロセッサを含んでよい。専用のプロセッサは、特定用途向けIC(ASIC;Application Specific Integrated Circuit)を含んでよい。プロセッサは、プログラマブルロジックデバイス(PLD;Programmable Logic Device)を含んでよい。PLDは、FPGA(Field-Programmable Gate Array)を含んでよい。制御部10は、1つ又は複数のプロセッサが協働するSoC(System-on-a-Chip)、及びSiP(System In a Package)のいずれかであってもよい。制御部10は、第1電子機器1の各構成要素の動作を制御する。 The control unit 10 may include one or more processors and memories. The processor may include a general-purpose processor that loads a specific program to execute a specific function, and a dedicated processor specialized for a specific process. The dedicated processor may include an application specific integrated circuit (ASIC). The processor may include a programmable logic device (PLD). The PLD may include a field-programmable gate array (FPGA). The control unit 10 may be either a system-on-a-chip (SoC) or a system in a package (SiP) in which one or more processors work together. The control unit 10 controls the operation of each component of the first electronic device 1.
 制御部10は、例えば、ソフトウェア及びハードウェア資源の少なくとも一方を含んで構成されてよい。また、一実施形態に係る第1電子機器1において、制御部10は、ソフトウェアとハードウェア資源とが協働した具体的手段によって構成されてもよい。また、一実施形態に係る第1電子機器1において、他の機能部の少なくともいずれかも、ソフトウェアとハードウェア資源とが協働した具体的手段によって構成されてもよい。 The control unit 10 may be configured to include, for example, at least one of software and hardware resources. Furthermore, in the first electronic device 1 according to one embodiment, the control unit 10 may be configured by specific means in which software and hardware resources work together. Furthermore, in the first electronic device 1 according to one embodiment, at least one of the other functional units may also be configured by specific means in which software and hardware resources work together.
 一実施形態に係る第1電子機器1において、制御部10が行う制御などの動作については、さらに後述する。また、制御部10の判定部12は、各種の判定処理を行うことができる。推定部14は、各種の推定処理を行うことができる。調整部16は、各種の調整処理を行うことができる。 In the first electronic device 1 according to one embodiment, the control unit 10 performs various types of control and other operations, which will be described later. In addition, the determination unit 12 of the control unit 10 can perform various types of determination processing. The estimation unit 14 can perform various types of estimation processing. The adjustment unit 16 can perform various types of adjustment processing.
 記憶部20は、各種の情報を記憶するメモリとしての機能を有してよい。記憶部20は、例えば制御部10において実行されるプログラム、及び、制御部10において実行された処理の結果などを記憶してよい。また、記憶部20は、制御部10のワークメモリとして機能してもよい。図2に示すように、記憶部20は、制御部10に有線及び/又は無線で接続されてよい。記憶部20は、例えば、RAM(Random Access Memory)及びROM(Read Only Memory)の少なくとも一方を含んでもよい。記憶部20は、例えば半導体メモリ等により構成することができるが、これに限定されず、任意の記憶装置とすることができる。例えば、記憶部20は、一実施形態に係る第1電子機器1に挿入されたメモリカードのような記憶媒体としてもよい。また、記憶部20は、制御部10として用いられるCPUの内部メモリであってもよいし、制御部10に別体として接続されるものとしてもよい。 The storage unit 20 may function as a memory that stores various information. The storage unit 20 may store, for example, a program executed in the control unit 10 and the results of processing executed in the control unit 10. The storage unit 20 may also function as a work memory for the control unit 10. As shown in FIG. 2, the storage unit 20 may be connected to the control unit 10 by wire and/or wirelessly. The storage unit 20 may include, for example, at least one of a RAM (Random Access Memory) and a ROM (Read Only Memory). The storage unit 20 may be configured, for example, by a semiconductor memory or the like, but is not limited to this, and may be any storage device. For example, the storage unit 20 may be a storage medium such as a memory card inserted into the first electronic device 1 according to one embodiment. The storage unit 20 may also be an internal memory of a CPU used as the control unit 10, or may be connected to the control unit 10 as a separate unit.
 通信部30は、例えば外部の機器などと無線及び/又は有線により通信するためのインタフェースの機能を有する。一実施形態の通信部30によって行われる通信方式は、無線通信規格としてよい。例えば、無線通信規格は、2G、3G、4G、及び5G等のセルラーフォンの通信規格を含む。例えば、セルラーフォンの通信規格は、LTE(Long Term Evolution)、W-CDMA(Wideband Code Division Multiple Access)、CDMA2000、PDC(Personal Digital Cellular)、GSM(登録商標)(Global System for Mobile communications)、及びPHS(Personal Handy-phone System)等を含む。例えば、無線通信規格は、WiMAX(Worldwide Interoperability for Microwave Access)、IEEE802.11、WiFi、Bluetooth(登録商標)、IrDA(Infrared Data Association)、及びNFC(Near Field Communication)等を含む。通信部30は、例えばITU-T(International Telecommunication Union Telecommunication Standardization Sector)において通信方式が標準化されたモデムを含んでよい。通信部30は、上記の通信規格の1つ又は複数をサポートすることができる。 The communication unit 30 has an interface function for wireless and/or wired communication with, for example, an external device. The communication method performed by the communication unit 30 in one embodiment may be a wireless communication standard. For example, the wireless communication standard includes cellular phone communication standards such as 2G, 3G, 4G, and 5G. For example, the cellular phone communication standards include LTE (Long Term Evolution), W-CDMA (Wideband Code Division Multiple Access), CDMA2000, PDC (Personal Digital Cellular), GSM (Registered Trademark) (Global System for Mobile communications), and PHS (Personal Handy-phone System), etc. For example, wireless communication standards include WiMAX (Worldwide Interoperability for Microwave Access), IEEE 802.11, WiFi, Bluetooth (registered trademark), IrDA (Infrared Data Association), and NFC (Near Field Communication). The communication unit 30 may include, for example, a modem whose communication method is standardized by ITU-T (International Telecommunication Union Telecommunication Standardization Sector). The communication unit 30 can support one or more of the above communication standards.
 通信部30は、例えば電波を送受信するアンテナ及び適当なRF部などを含めて構成してよい。通信部30は、例えばアンテナを介して、例えば他の電子機器の通信部と無線通信してもよい。通信部30は、第1電子機器1から他の機器に任意の情報を送信する機能、及び/又は、第1電子機器1において他の機器から任意の情報を受信する機能を備えてよい。例えば、通信部30は、図1に示した第2電子機器100と無線通信してよい。この場合、通信部30は、第2電子機器100の通信部130(後述)と無線通信してよい。このように、一実施形態において、通信部30は、第2電子機器100と通信する機能を有する。また、例えば、通信部30は、図1に示した第3電子機器300と無線通信してよい。この場合、通信部30は、第3電子機器300の通信部330(後述)と無線通信してよい。このように、一実施形態において、通信部30は、第3電子機器300と通信する機能を有してよい。また、通信部30は、外部に有線接続するためのコネクタなどのようなインタフェースとして構成してもよい。通信部30は、無線通信を行うための既知の技術により構成することができるため、より詳細なハードウェアなどの説明は省略する。 The communication unit 30 may be configured to include, for example, an antenna for transmitting and receiving radio waves and an appropriate RF unit. The communication unit 30 may wirelessly communicate with, for example, a communication unit of another electronic device via an antenna. The communication unit 30 may have a function of transmitting any information from the first electronic device 1 to another device, and/or a function of receiving any information from another device in the first electronic device 1. For example, the communication unit 30 may wirelessly communicate with the second electronic device 100 shown in FIG. 1. In this case, the communication unit 30 may wirelessly communicate with a communication unit 130 (described later) of the second electronic device 100. Thus, in one embodiment, the communication unit 30 has a function of communicating with the second electronic device 100. Also, for example, the communication unit 30 may wirelessly communicate with the third electronic device 300 shown in FIG. 1. In this case, the communication unit 30 may wirelessly communicate with a communication unit 330 (described later) of the third electronic device 300. Thus, in one embodiment, the communication unit 30 may have a function of communicating with the third electronic device 300. The communication unit 30 may also be configured as an interface such as a connector for wired connection to the outside. The communication unit 30 can be configured using known technology for wireless communication, so a detailed description of the hardware and the like is omitted.
 図2に示すように、通信部30は、制御部10に有線及び/又は無線で接続されてよい。通信部30が受信する各種の情報は、例えば記憶部20及び/又は制御部10に供給されてよい。通信部30が受信する各種の情報は、例えば制御部10に内蔵されたメモリに記憶してもよい。また、通信部30は、例えば制御部10による処理結果、及び/又は、記憶部20に記憶された情報などを外部に送信してもよい。 As shown in FIG. 2, the communication unit 30 may be connected to the control unit 10 via a wired and/or wireless connection. Various pieces of information received by the communication unit 30 may be supplied to, for example, the storage unit 20 and/or the control unit 10. Various pieces of information received by the communication unit 30 may be stored in, for example, a memory built into the control unit 10. Furthermore, the communication unit 30 may transmit, for example, the results of processing by the control unit 10 and/or information stored in the storage unit 20 to the outside.
 撮像部40は、例えばデジタルカメラのような、電子的に画像を撮像するイメージセンサを含んで構成されてよい。撮像部40は、CCD(Charge Coupled Device Image Sensor)又はCMOS(Complementary Metal Oxide Semiconductor)センサ等のように、光電変換を行う撮像素子を含んで構成されてよい。撮像部40は、例えば第1電子機器1の周囲の画像を撮像することができる。撮像部40は、例えば図1に示す会議室MR内の様子を撮像してよい。一実施形態において、撮像部40は、例えば図1に示す会議室MRにおいて行われる会議の参加者Ma,Mb,Mc,及びMdなどを撮像してよい。 The imaging unit 40 may be configured to include an image sensor that captures images electronically, such as a digital camera. The imaging unit 40 may be configured to include an imaging element that performs photoelectric conversion, such as a CCD (Charge Coupled Device Image Sensor) or a CMOS (Complementary Metal Oxide Semiconductor) sensor. The imaging unit 40 can capture an image of the surroundings of the first electronic device 1, for example. The imaging unit 40 may capture an image of the inside of the conference room MR shown in FIG. 1, for example. In one embodiment, the imaging unit 40 may capture images of participants Ma, Mb, Mc, and Md of a conference held in the conference room MR shown in FIG. 1, for example.
 撮像部40は、撮像した画像を信号に変換して、制御部10に送信してよい。このため、撮像部40は、制御部10に有線及び/又は無線で接続されてよい。また、撮像部40によって撮像された画像に基づく信号は、記憶部20、及び/又は表示部70など、第1電子機器1の任意の機能部に供給されてもよい。撮像部40は、図1に示す会議室MR内の様子を撮像するものであれば、デジタルカメラのような撮像デバイスに限定されず、任意のデバイスとしてよい。 The imaging unit 40 may convert the captured image into a signal and transmit it to the control unit 10. For this reason, the imaging unit 40 may be connected to the control unit 10 via a wired and/or wireless connection. Furthermore, a signal based on the image captured by the imaging unit 40 may be supplied to any functional unit of the first electronic device 1, such as the memory unit 20 and/or the display unit 70. The imaging unit 40 is not limited to an imaging device such as a digital camera, and may be any device that captures an image of the state inside the conference room MR shown in FIG. 1.
 一実施形態において、撮像部40は、例えば会議室MR内の様子を所定時間ごと(例えば秒間15フレームなど)の静止画として撮像してもよい。また、一実施形態において、撮像部40は、例えば会議室MR内の様子を連続した動画として撮像してもよい。さらに、撮像部40は、定点カメラを含んで構成してもよいし、可動式のカメラを含んで構成してもよい。 In one embodiment, the imaging unit 40 may capture images of the state inside the conference room MR as still images at predetermined time intervals (e.g., 15 frames per second). Also, in one embodiment, the imaging unit 40 may capture images of the state inside the conference room MR as a continuous video. Furthermore, the imaging unit 40 may be configured to include a fixed camera, or may be configured to include a movable camera.
 音声入力部50は、人が発する声を含む、第1電子機器1の周囲の音又は音声を検出(取得)する。例えば、音声入力部50は、音又は音声を空気振動として例えばダイヤフラムなどで検出したものを電気信号に変換したものとしてよい。具体的には、音声入力部50は、任意のマイク(マイクロフォン)のような音を電気信号に変換する音響機器を含んで構成されてよい。一実施形態において、音声入力部50は、例えば図1に示した会議室MRにおける参加者Ma,Mb,Mc,及びMdの少なくともいずれかの音声を検出(取得)してよい。音声入力部50によって検出された音声(電気信号)は、例えば制御部10に入力されてよい。このため、音声入力部50は、制御部10に有線及び/又は無線で接続されてよい。 The voice input unit 50 detects (acquires) sounds or voices around the first electronic device 1, including human voices. For example, the voice input unit 50 may detect sounds or voices as air vibrations, for example, with a diaphragm, and convert them into an electrical signal. Specifically, the voice input unit 50 may include an acoustic device that converts sounds into an electrical signal, such as a microphone. In one embodiment, the voice input unit 50 may detect (acquire) the voices of at least one of the participants Ma, Mb, Mc, and Md in the conference room MR shown in FIG. 1, for example. The voices (electrical signals) detected by the voice input unit 50 may be input to the control unit 10, for example. For this reason, the voice input unit 50 may be connected to the control unit 10 by wire and/or wirelessly.
 音声入力部50は、取得した音又は音声を電気信号に変換して、制御部10に供給してよい。また、音声入力部50は、音又は音声が変換された電気信号(音声信号)を、記憶部20など、第1電子機器1の機能部に供給してもよい。音声入力部50は、図1に示す会議室MR内の音又は音声を検出(取得)するものであれば、任意のデバイスとしてよい。 The audio input unit 50 may convert the acquired sound or voice into an electrical signal and supply it to the control unit 10. The audio input unit 50 may also supply the electrical signal (audio signal) into which the sound or voice has been converted to a functional unit of the first electronic device 1, such as the memory unit 20. The audio input unit 50 may be any device that detects (acquires) sound or voice within the conference room MR shown in FIG. 1.
 音声出力部60は、制御部10から供給される音又は音声の電気信号(音声信号)を音に変換することにより、当該音声信号を音又は音声として出力する。音声出力部60は、制御部10に有線及び/又は無線で接続されてよい。音声出力部60は、任意のスピーカ(ラウドスピーカ)などの音を出力する機能を有するデバイスを含めて構成されてよい。一実施形態において、音声出力部60は、特定の方向に音を伝達する指向性スピーカを含んで構成されてもよい。また、音声出力部60は、音の指向性を変更可能に構成されていてもよい。音声出力部60は、電気信号(音声信号)を適宜増幅する増幅器又は増幅回路などを含んでもよい。 The audio output unit 60 converts an electrical signal (audio signal) of sound or voice supplied from the control unit 10 into sound, and outputs the audio signal as sound or voice. The audio output unit 60 may be connected to the control unit 10 by wire and/or wirelessly. The audio output unit 60 may be configured to include a device having a function of outputting sound, such as an arbitrary speaker (loudspeaker). In one embodiment, the audio output unit 60 may be configured to include a directional speaker that transmits sound in a specific direction. The audio output unit 60 may also be configured to be able to change the directionality of the sound. The audio output unit 60 may include an amplifier or an amplification circuit that appropriately amplifies the electrical signal (audio signal).
 一実施形態において、音声出力部60は、通信部30が第2電子機器100から受信する音声信号を増幅してよい。ここで、第2電子機器100から受信する音声信号とは、例えば、発話者(例えば図1に示した参加者Mg)の第2電子機器100から通信部30が受信する、当該発話者の音声信号としてよい。すなわち、音声出力部60は、発話者(例えば図1に示した参加者Mg)の音声信号を、当該発話者の音声として出力してよい。 In one embodiment, the audio output unit 60 may amplify the audio signal that the communication unit 30 receives from the second electronic device 100. Here, the audio signal received from the second electronic device 100 may be, for example, the audio signal of a speaker (e.g., participant Mg shown in FIG. 1) that is received by the communication unit 30 from the second electronic device 100 of that speaker. In other words, the audio output unit 60 may output the audio signal of a speaker (e.g., participant Mg shown in FIG. 1) as the voice of that speaker.
 表示部70は、例えば、液晶ディスプレイ(Liquid Crystal Display:LCD)、有機ELディスプレイ(Organic Electro-Luminescence panel)、又は無機ELディスプレイ(Inorganic Electro-Luminescence panel)等の任意の表示デバイスとしてよい。表示部70は、文字、図形、又は記号等の各種の情報を表示してよい。また、表示部70は、例えばユーザに第1電子機器1の操作を促すために、種々のGUIを構成するオブジェクト、及びアイコン画像などを表示してもよい。 The display unit 70 may be any display device, such as a Liquid Crystal Display (LCD), an Organic Electro-Luminescence panel, or an Inorganic Electro-Luminescence panel. The display unit 70 may display various types of information, such as characters, figures, or symbols. The display unit 70 may also display objects and icon images constituting various GUIs, for example, to prompt the user to operate the first electronic device 1.
 表示部70において表示を行うために必要な各種データは、例えば制御部10又は記憶部20などから供給されてよい。このため、表示部70は、制御部10などに有線及び/又は無線で接続されてよい。また、表示部70は、例えばLCDなどを含む場合、適宜、バックライトなどを含んで構成されてもよい。 Various data necessary for display on the display unit 70 may be supplied, for example, from the control unit 10 or the memory unit 20. For this reason, the display unit 70 may be connected to the control unit 10 or the like by wire and/or wirelessly. Furthermore, when the display unit 70 includes, for example, an LCD, it may be configured to include a backlight, etc., as appropriate.
 一実施形態において、表示部70は、第2電子機器100から送信される映像信号に基づく映像を表示してよい。表示部70は、第2電子機器100から送信される映像信号に基づく映像として、第2電子機器100によって撮像された例えば参加者Mgの映像を表示してもよい。第1電子機器1の表示部70に参加者Mgの映像が表示されることにより、例えば図1に示す参加者Ma,Mb,Mc,及びMdなどは、会議室MRから離れた場所にいる参加者Mgの様子を視覚的に知ることができる。 In one embodiment, the display unit 70 may display an image based on the video signal transmitted from the second electronic device 100. The display unit 70 may display, for example, an image of participant Mg captured by the second electronic device 100 as an image based on the video signal transmitted from the second electronic device 100. By displaying the image of participant Mg on the display unit 70 of the first electronic device 1, for example, participants Ma, Mb, Mc, and Md shown in FIG. 1 can visually know the state of participant Mg who is in a location away from the conference room MR.
 表示部70は、例えば第2電子機器100によって撮像された参加者Mgの映像をそのまま表示してもよい。一方、表示部70は、例えば参加者Mgをキャラクタ化したような画像(例えばアバター又はロボットの姿)を表示してもよい。 The display unit 70 may display, for example, the image of the participant Mg captured by the second electronic device 100 as is. On the other hand, the display unit 70 may display, for example, an image of the participant Mg as a character (for example, an avatar or a robot).
 動力部80は、第1電子機器1における任意の可動部を駆動する動力を発生する。動力部80は、第1電子機器1における可動部を駆動するサーボモータなどの動力源を含んで構成されてよい。動力部80は、制御部10の制御によって、第1電子機器1における任意の可動部を駆動してよい。このため、動力部80は、制御部10に有線及び/又は無線で接続されてよい。 The power unit 80 generates power to drive any moving part in the first electronic device 1. The power unit 80 may be configured to include a power source such as a servo motor that drives the moving part in the first electronic device 1. The power unit 80 may drive any moving part in the first electronic device 1 under the control of the control unit 10. For this reason, the power unit 80 may be connected to the control unit 10 by wire and/or wirelessly.
 一実施形態において、動力部80は、例えば第1電子機器1の筐体の少なくとも一部を駆動してよい。また、動力部80は、例えば第1電子機器1が人間などの少なくとも一部を模したような形状又はロボットのような形状の筐体を有する場合、人間などの形状又はロボットの少なくとも一部を駆動してもよい。 In one embodiment, the power unit 80 may drive, for example, at least a part of the housing of the first electronic device 1. Furthermore, for example, if the first electronic device 1 has a housing shaped to resemble at least a part of a human or robot, the power unit 80 may drive at least a part of the human or robot shape.
 第1電子機器1は、動力部80の駆動により、例えば参加者Mgの感情及び/又は行動を表すような動作をしてよい。例えば、第1電子機器1は、動力部80の駆動により、参加者Mgの応答を表すような動作をしてよい。ここで、「応答」とは、話し手の発話中若しくは発話と発話の間に聞き手が行う「はい」及び/又は「えー」等の短時間の発声を含む相槌を含むものとしてよい。また、「応答」とは、発声を含まない肯定的動作を示す頷き若しくは否定的動作を示す首振りのような頭部の動き、又は手振り等の手の動き、驚き若しくは深い同意等の大きな感情の変化を表す上半身全体の動きなどを含むものとしてもよい。さらに、「応答」とは、顔の一部又は複数部分を動かす表情の変化などを含むものとしてもよい。以上のような応答は、聞き手が話し手の発話内容を理解している、若しくは同意していることを示す目的、又は話し手が話しやすくなるように発話のリズムをとる目的で、意識的若しくは無意識的に行われるものである。そのため、第1電子機器1は、例えば、参加者Mgの頭部を模した部品の少なくとも一部を駆動することにより、参加者Mgの頷き及び/又は首振りを表すような動作をしてよい。例えば、第1電子機器1は、参加者Mgの手を模した部品の少なくとも一部を駆動することにより、参加者Mgの手振り等の動作をしてよい。例えば、第1電子機器1は、参加者Mgの上半身を模した部品の少なくとも一部を駆動することにより、参加者Mgの驚き又は深い同意などの感情を表現する動作をしてよい。例えば、第1電子機器1は、参加者Mgの顔の一部あるいは複数部分を模した部品の少なくとも一部を駆動することにより、参加者Mgの表情を表現する動作をしてよい。また、第1電子機器1は、例えば、予め録音した参加者Mgの「はい」及び/又は「えー」等の相槌を、音声出力部60から出力してもよい。また、例えば、第1電子機器1は、動力部80の駆動により、参加者Mgの喜怒哀楽のような感情を表すような動作をしてよい。この場合、動力部80は、例えば、参加者Mgの顔(表情)を模した部品の少なくとも一部を駆動することにより、参加者Mgの喜怒哀楽のような感情を表すような動作をしてよい。また、例えば、第1電子機器1は、動力部80の駆動により、人間が肩をすぼめるなどの動作、人間のお辞儀のような礼儀、又は謝罪を示すような動作をしてもよい。 The first electronic device 1 may be driven by the power unit 80 to perform an action that expresses, for example, the emotion and/or behavior of the participant Mg. For example, the first electronic device 1 may be driven by the power unit 80 to perform an action that expresses the response of the participant Mg. Here, the "response" may include a short interjection such as "yes" and/or "ah" made by the listener during the speaker's speech or between speeches. The "response" may also include a head movement such as a nod indicating a positive action not involving speech or a head shake indicating a negative action, or a hand movement such as a hand gesture, or a movement of the entire upper body indicating a large change in emotion such as surprise or deep agreement. Furthermore, the "response" may include a change in facial expression that moves a part or multiple parts of the face. The above-mentioned responses are made consciously or unconsciously for the purpose of showing that the listener understands or agrees with the content of the speaker's speech, or for the purpose of taking the rhythm of speech to make it easier for the speaker to speak. Therefore, the first electronic device 1 may perform an action such as a nod and/or a head shake of the participant Mg by driving at least a part of a part that imitates the head of the participant Mg. For example, the first electronic device 1 may perform an action such as a hand gesture of the participant Mg by driving at least a part of a part that imitates the hand of the participant Mg. For example, the first electronic device 1 may perform an action to express an emotion such as surprise or deep agreement of the participant Mg by driving at least a part of a part that imitates a part or parts of the face of the participant Mg. For example, the first electronic device 1 may perform an action to express an expression of the participant Mg by driving at least a part of a part that imitates a part or parts of the face of the participant Mg. In addition, the first electronic device 1 may output, for example, a pre-recorded response of the participant Mg such as "Yes" and/or "Eh" from the audio output unit 60. In addition, for example, the first electronic device 1 may perform an action to express an emotion such as joy, anger, sadness, or happiness of the participant Mg by driving the power unit 80. In this case, the power unit 80 may perform an action that expresses emotions such as joy, anger, sadness, and happiness of the participant Mg, for example, by driving at least a part of a component that imitates the face (expression) of the participant Mg. Also, for example, the first electronic device 1 may perform an action such as a human shrugging the shoulders, a polite human bow, or an action that shows an apology, by driving the power unit 80.
 表示部70による表示、及び/又は、動力部80の駆動により、例えば参加者Mgのような人間の感情及び/又は行動を表す動作は、公知の種々の技術を用いてよい。このため、表示部70による表示、及び/又は、動力部80の駆動により、例えば参加者Mgのような人間の感情及び/又は行動を表す動作については、より詳細な説明は省略する。一実施形態に係る第1電子機器1は、表示部70による表示、及び/又は、動力部80の駆動により、参加者Mgの感情及び/又は行動を表す動作を行うことができる。 The operation of expressing the emotions and/or behavior of a human being, such as participant Mg, by displaying using the display unit 70 and/or driving the power unit 80 may use various known technologies. For this reason, a detailed explanation of the operation of expressing the emotions and/or behavior of a human being, such as participant Mg, by displaying using the display unit 70 and/or driving the power unit 80 will be omitted. The first electronic device 1 according to one embodiment can perform an operation of expressing the emotions and/or behavior of participant Mg by displaying using the display unit 70 and/or driving the power unit 80.
 一実施形態において、第1電子機器1は、上述のように、専用に設計された機器としてもよい。一方、一実施形態において、第1電子機器1は、例えば図2に示す機能部のうち音声出力部60及び動力部80を備えてもよい。この場合、第1電子機器1は、図2に示す他の機能部の機能の少なくとも一部を補うために、他の電子機器に接続されてもよい。ここで、他の電子機器とは、例えば、汎用のスマートフォン、タブレット、ファブレット、ノートパソコン(ノートPC若しくはラップトップ)、又はコンピュータ(デスクトップ)などの機器としてもよい。 In one embodiment, the first electronic device 1 may be a specially designed device as described above. On the other hand, in one embodiment, the first electronic device 1 may include, for example, the audio output unit 60 and the power unit 80 among the functional units shown in FIG. 2. In this case, the first electronic device 1 may be connected to another electronic device to supplement at least some of the functions of the other functional units shown in FIG. 2. Here, the other electronic device may be, for example, a general-purpose smartphone, tablet, phablet, notebook computer (notebook PC or laptop), or computer (desktop).
 図3は、図1に示した第2電子機器100の構成を概略的に示すブロック図である。以下、一実施形態に係る第2電子機器100の構成の一例について説明する。第2電子機器100は、図1に示したように、例えば参加者Mgが、自宅RLにおいて使用する機器としてよい。上述した第1電子機器1は、参加者Ma,Mb,Mc,及びMdなどが発話する際に、第1電子機器1が取得した参加者Ma,Mb,Mc,及びMdなどの音声及び/又は映像を、第2電子機器100に出力する機能を有する。また、第2電子機器100は、参加者Mgが発話する際に、第2電子機器100が取得した参加者Mgの音声及び/又は映像を、第1電子機器1に出力する機能を有する。第2電子機器100により、参加者Mgは、会議室MRから離れた場所においても、リモート会議又はビデオ会議を行うことができる。したがって、第2電子機器100は、適宜、「リモートで使用される」電子機器とも記す。 3 is a block diagram showing a schematic configuration of the second electronic device 100 shown in FIG. 1. An example of the configuration of the second electronic device 100 according to an embodiment will be described below. As shown in FIG. 1, the second electronic device 100 may be, for example, an electronic device used by the participant Mg at his/her home RL. The above-mentioned first electronic device 1 has a function of outputting the voice and/or video of the participants Ma, Mb, Mc, Md, etc. acquired by the first electronic device 1 when the participants Ma, Mb, Mc, Md, etc. speak to the second electronic device 100. In addition, the second electronic device 100 has a function of outputting the voice and/or video of the participant Mg acquired by the second electronic device 100 to the first electronic device 1 when the participant Mg speaks. The second electronic device 100 allows the participant Mg to hold a remote conference or video conference even at a location away from the conference room MR. Therefore, the second electronic device 100 is also referred to as an electronic device "used remotely" as appropriate.
 図3に示すように、一実施形態に係る第2電子機器100は、制御部110、記憶部120、通信部130、撮像部140、音声入力部150、音声出力部160、表示部170、触感呈示部190、及び取得部200を備えてよい。また、制御部110は、例えば、判定部112、推定部114、及び調整部116を含んでよい。一実施形態において、第2電子機器100は、図3に示す機能部の少なくとも一部を備えなくてもよいし、図3に示す機能部以外の構成要素を備えてもよい。 As shown in FIG. 3, the second electronic device 100 according to one embodiment may include a control unit 110, a memory unit 120, a communication unit 130, an imaging unit 140, an audio input unit 150, an audio output unit 160, a display unit 170, a tactile sensation providing unit 190, and an acquisition unit 200. The control unit 110 may also include, for example, a determination unit 112, an estimation unit 114, and an adjustment unit 116. In one embodiment, the second electronic device 100 may not include at least some of the functional units shown in FIG. 3, or may include components other than the functional units shown in FIG. 3.
 制御部110は、第2電子機器100を構成する各機能部をはじめとして、第2電子機器100の全体を制御及び/又は管理する。制御部110は、基本的に、例えば図2に示した制御部10と同様の思想に基づく構成としてよい。また、制御部110の判定部112、推定部114、及び調整部116についても、それぞれ、例えば図2に示した制御部10の判定部12、推定部14、及び調整部16と同様の思想に基づく構成としてよい。 The control unit 110 controls and/or manages the entire second electronic device 100, including each functional unit constituting the second electronic device 100. The control unit 110 may basically be configured based on the same concept as the control unit 10 shown in FIG. 2, for example. The determination unit 112, estimation unit 114, and adjustment unit 116 of the control unit 110 may also be configured based on the same concept as the determination unit 12, estimation unit 14, and adjustment unit 16 of the control unit 10 shown in FIG. 2, for example.
 記憶部120は、各種の情報を記憶するメモリとしての機能を有してよい。記憶部120は、例えば制御部110において実行されるプログラム、及び、制御部110において実行された処理の結果などを記憶してよい。また、記憶部120は、制御部110のワークメモリとして機能してもよい。図3に示すように、記憶部120は、制御部110に有線及び/又は無線で接続されてよい。記憶部120は、基本的に、例えば図2に示した記憶部20と同様の思想に基づく構成としてよい。 The storage unit 120 may function as a memory that stores various types of information. The storage unit 120 may store, for example, programs executed in the control unit 110 and results of processing executed in the control unit 110. The storage unit 120 may also function as a work memory for the control unit 110. As shown in FIG. 3, the storage unit 120 may be connected to the control unit 110 via a wired and/or wireless connection. The storage unit 120 may basically be configured based on the same concept as the storage unit 20 shown in FIG. 2, for example.
 通信部130は、無線及び/又は有線により通信するためのインタフェースの機能を有する。通信部130は、例えばアンテナを介して、例えば他の電子機器の通信部と無線通信してもよい。例えば、通信部130は、図1に示した第1電子機器1と無線通信してよい。この場合、通信部130は、第1電子機器1の通信部30と無線通信してよい。このように、一実施形態において、通信部130は、第1電子機器1と通信する機能を有する。また、例えば、通信部130は、図1に示した第3電子機器300と無線通信してよい。この場合、通信部130は、第3電子機器300の通信部330(後述)と無線通信してよい。このように、一実施形態において、通信部130は、第3電子機器300と通信する機能を有してよい。図3に示すように、通信部130は、制御部110に有線及び/又は無線で接続されてよい。通信部130は、基本的に、例えば図2に示した通信部30と同様の思想に基づく構成としてよい。 The communication unit 130 has an interface function for wireless and/or wired communication. The communication unit 130 may wirelessly communicate with, for example, a communication unit of another electronic device, for example, via an antenna. For example, the communication unit 130 may wirelessly communicate with the first electronic device 1 shown in FIG. 1. In this case, the communication unit 130 may wirelessly communicate with the communication unit 30 of the first electronic device 1. In this way, in one embodiment, the communication unit 130 has a function of communicating with the first electronic device 1. Also, for example, the communication unit 130 may wirelessly communicate with the third electronic device 300 shown in FIG. 1. In this case, the communication unit 130 may wirelessly communicate with the communication unit 330 (described later) of the third electronic device 300. In this way, in one embodiment, the communication unit 130 may have a function of communicating with the third electronic device 300. As shown in FIG. 3, the communication unit 130 may be connected to the control unit 110 in a wired and/or wireless manner. The communication unit 130 may basically be configured based on the same idea as the communication unit 30 shown in FIG. 2.
 撮像部140は、例えばデジタルカメラのような、電子的に画像を撮像するイメージセンサを含んで構成されてよい。撮像部140は、例えば図1に示す自宅RL内の様子を撮像してよい。一実施形態において、撮像部140は、例えば図1に示す自宅RLから会議に参加する参加者Mgなどを撮像してよい。撮像部140は、撮像した画像を信号に変換して、制御部110に送信してよい。このため、撮像部140は、制御部110に有線及び/又は無線で接続されてよい。撮像部140は、基本的に、例えば図2に示した撮像部40と同様の思想に基づく構成としてよい。 The imaging unit 140 may be configured to include an image sensor that captures images electronically, such as a digital camera. The imaging unit 140 may capture images of the interior of the home RL shown in FIG. 1, for example. In one embodiment, the imaging unit 140 may capture images of participants Mg who join a conference from the home RL shown in FIG. 1, for example. The imaging unit 140 may convert the captured images into signals and transmit them to the control unit 110. For this reason, the imaging unit 140 may be connected to the control unit 110 by wire and/or wirelessly. The imaging unit 140 may basically be configured based on the same concept as the imaging unit 40 shown in FIG. 2, for example.
 音声入力部150は、人が発する声を含む、第2電子機器100の周囲の音又は音声を検出(取得)する。例えば、音声入力部150は、音又は音声を空気振動として例えばダイヤフラムなどで検出したものを電気信号に変換したものとしてよい。具体的には、音声入力部150は、任意のマイク(マイクロフォン)のような音を電気信号に変換する音響機器を含んで構成されてよい。一実施形態において、音声入力部150は、例えば図1に示した自宅RLにおける参加者Mgの音声を検出(取得)してよい。音声入力部150によって検出された音声(電気信号)は、例えば制御部110に入力されてよい。このため、音声入力部150は、制御部110に有線及び/又は無線で接続されてよい。音声入力部150は、基本的に、例えば図2に示した音声入力部50と同様の思想に基づく構成としてよい。 The audio input unit 150 detects (acquires) sounds or voices around the second electronic device 100, including human voices. For example, the audio input unit 150 may detect sounds or voices as air vibrations, for example, with a diaphragm, and convert them into an electrical signal. Specifically, the audio input unit 150 may include an acoustic device that converts sounds into an electrical signal, such as an arbitrary microphone. In one embodiment, the audio input unit 150 may detect (acquire) the voice of the participant Mg in the home RL shown in FIG. 1, for example. The voice (electrical signal) detected by the audio input unit 150 may be input to the control unit 110, for example. For this reason, the audio input unit 150 may be connected to the control unit 110 by wire and/or wirelessly. The audio input unit 150 may basically be configured based on the same concept as the audio input unit 50 shown in FIG. 2, for example.
 音声出力部160は、制御部110から供給される電気信号(音声信号)を音に変換することにより、当該音声信号を音又は音声として出力する。音声出力部160は、制御部110に有線及び/又は無線で接続されてよい。音声出力部160は、任意のスピーカ(ラウドスピーカ)などの音を出力する機能を有するデバイスを含めて構成されてよい。一実施形態において、音声出力部160は、第1電子機器1の音声入力部50が検出した音声を出力してよい。ここで、第1電子機器1の音声入力部50が検出した音声とは、図1に示した会議室MRにおける参加者Ma,Mb,Mc,及びMdの少なくともいずれかの音声としてよい。音声出力部160は、基本的に、例えば図2に示した音声出力部60と同様の思想に基づく構成としてよい。 The audio output unit 160 converts an electrical signal (audio signal) supplied from the control unit 110 into sound, and outputs the audio signal as sound or voice. The audio output unit 160 may be connected to the control unit 110 by wire and/or wirelessly. The audio output unit 160 may be configured to include a device having a function of outputting sound, such as an arbitrary speaker (loudspeaker). In one embodiment, the audio output unit 160 may output a sound detected by the audio input unit 50 of the first electronic device 1. Here, the sound detected by the audio input unit 50 of the first electronic device 1 may be at least one of the voices of the participants Ma, Mb, Mc, and Md in the conference room MR shown in FIG. 1. The audio output unit 160 may basically be configured based on the same idea as the audio output unit 60 shown in FIG. 2, for example.
 表示部170は、例えば、液晶ディスプレイ(Liquid Crystal Display:LCD)、有機ELディスプレイ(Organic Electro-Luminescence panel)、又は無機ELディスプレイ(Inorganic Electro-Luminescence panel)等の任意の表示デバイスとしてよい。表示部170は、基本的に、例えば図2に示した表示部70と同様の思想に基づく構成としてよい。表示部170において表示を行うために必要な各種データは、例えば制御部110又は記憶部120などから供給されてよい。このため、表示部170は、制御部110などに有線及び/又は無線で接続されてよい。 The display unit 170 may be any display device, such as a Liquid Crystal Display (LCD), an Organic Electro-Luminescence panel, or an Inorganic Electro-Luminescence panel. The display unit 170 may basically be configured based on the same concept as the display unit 70 shown in FIG. 2, for example. Various data required for display on the display unit 170 may be supplied from, for example, the control unit 110 or the memory unit 120. For this reason, the display unit 170 may be connected to the control unit 110, etc., via a wired and/or wireless connection.
 表示部170は、例えば参加者Mgの指又はスタイラスの接触による入力を検出するタッチパネルの機能を備えたタッチスクリーンディスプレイとしてもよい。 The display unit 170 may be, for example, a touch screen display equipped with a touch panel function that detects input by contact with the participant Mg's finger or stylus.
 一実施形態において、表示部170は、第1電子機器1から送信される映像信号に基づく映像を表示してよい。表示部170は、第1電子機器1から送信される映像信号に基づく映像として、第1電子機器1(の撮像部40)によって撮像された例えば参加者Ma,Mb,Mc,及びMdなどの映像を表示してもよい。第2電子機器100の表示部170に参加者Ma,Mb,Mc,及びMdなどの映像が表示されることにより、例えば図1に示す参加者Mgは、自宅RLから離れた会議室MRにいる参加者Ma,Mb,Mc,及びMdなどの様子を視覚的に知ることができる。 In one embodiment, the display unit 170 may display an image based on the video signal transmitted from the first electronic device 1. The display unit 170 may display images of participants Ma, Mb, Mc, Md, etc. captured by the first electronic device 1 (its imaging unit 40) as an image based on the video signal transmitted from the first electronic device 1. By displaying images of participants Ma, Mb, Mc, Md, etc. on the display unit 170 of the second electronic device 100, for example, participant Mg shown in FIG. 1 can visually know the state of participants Ma, Mb, Mc, Md, etc. in a conference room MR away from his/her home RL.
 表示部170は、例えば第1電子機器1によって撮像された参加者Ma,Mb,Mc,及びMdなどの映像をそのまま表示してもよい。一方、表示部170は、例えば参加者Ma,Mb,Mc,及びMdなどをキャラクタ化したような画像(例えばアバター)を表示してもよい。 The display unit 170 may directly display images of the participants Ma, Mb, Mc, Md, etc. captured by the first electronic device 1. On the other hand, the display unit 170 may display images (e.g., avatars) that characterize the participants Ma, Mb, Mc, Md, etc.
 一実施形態において、表示部170は、後述する応答タイミングを例えば参加者Mgに知らせる機能を備えてよい。すなわち、参加者Mgは、表示部170を視認することにより、応答タイミングを知ることができる。また、一実施形態において、表示部170は、応答タイミングを知らせる例えばLEDなどのインジケータとしてもよい。 In one embodiment, the display unit 170 may have a function of notifying, for example, the participant Mg of the response timing, which will be described later. In other words, the participant Mg can know the response timing by visually checking the display unit 170. Also, in one embodiment, the display unit 170 may be an indicator, such as an LED, that notifies the response timing.
 触感呈示部190は、例えば参加者Mgの指などに振動などの触感を提示する機能を有してよい。一実施形態において、触感呈示部190は、タッチスクリーンディスプレイの機能を備える表示部170と組み合わされた構成としてもよい。このような構成では、例えば参加者Mgは、表示部170に触れて第2電子機器100を操作する際に、触感呈示部190による触感の提示を認識することができる。一実施形態において、触感呈示部190は、後述する応答タイミングを例えば参加者Mgに知らせる機能を備えてもよい。 The tactile sensation presentation unit 190 may have a function of presenting a tactile sensation such as vibration to the fingers of the participant Mg, for example. In one embodiment, the tactile sensation presentation unit 190 may be configured in combination with a display unit 170 having a touch screen display function. In such a configuration, for example, when the participant Mg touches the display unit 170 to operate the second electronic device 100, he or she can recognize the presentation of a tactile sensation by the tactile sensation presentation unit 190. In one embodiment, the tactile sensation presentation unit 190 may have a function of notifying the participant Mg, for example, of the response timing described below.
 取得部200は、第1ユーザの発話に対する第2ユーザの応答を取得する各種の機能部としてよい。ここで、第2ユーザの応答について、さらに後述する。第2電子機器100の取得部200は、例えば、図3に示した撮像部140、及び音声入力部150の少なくとも一方に対する入力を取得してもよい。また、取得部200は、例えば、図3に示した撮像部140、及び音声入力部150の少なくとも一方を含んで構成されてもよい。また、取得部200は、ユーザによるマウスクリック又はタッチ入力などを取得してもよいし、モーションセンサ及び/又はフットペダルなどに対する入力を取得してもよい。さらに、取得部200は、ユーザによるマウスクリック又はタッチ入力などを検出する入力デバイスを含んでもよいし、モーションセンサ及び/又はフットペダルなどを含んでもよい。 The acquisition unit 200 may be various functional units that acquire the second user's response to the first user's utterance. The second user's response will be described in more detail below. The acquisition unit 200 of the second electronic device 100 may acquire input to at least one of the imaging unit 140 and the audio input unit 150 shown in FIG. 3, for example. The acquisition unit 200 may also be configured to include at least one of the imaging unit 140 and the audio input unit 150 shown in FIG. 3, for example. The acquisition unit 200 may acquire a mouse click or touch input by the user, or may acquire input to a motion sensor and/or a foot pedal. The acquisition unit 200 may also include an input device that detects a mouse click or touch input by the user, or may include a motion sensor and/or a foot pedal.
 一実施形態において、第2電子機器100は、上述のように、専用に設計された機器としてもよい。一方、一実施形態において、第2電子機器100は、例えば図3に示す機能部のうち一部を備えてもよい。この場合、第2電子機器100は、図3に示す他の機能部の機能の少なくとも一部を補うために、他の電子機器に接続されてもよい。ここで、他の電子機器とは、例えば、汎用のスマートフォン、タブレット、ファブレット、ノートパソコン(ノートPC若しくはラップトップ)、又はコンピュータ(デスクトップ)などの機器としてもよい。 In one embodiment, the second electronic device 100 may be a dedicated device as described above. Meanwhile, in one embodiment, the second electronic device 100 may include some of the functional units shown in FIG. 3, for example. In this case, the second electronic device 100 may be connected to another electronic device to supplement at least some of the functions of the other functional units shown in FIG. 3. Here, the other electronic device may be, for example, a general-purpose smartphone, tablet, phablet, notebook computer (notebook PC or laptop), or computer (desktop), etc.
 特に、スマートフォン又はノートパソコンなどは、図3に示す機能部のほとんど全てを備えていることが多い。このため、一実施形態において、第2電子機器100は、スマートフォン又はノートパソコンなどとしてもよい。この場合、第2電子機器100は、スマートフォン又はノートパソコンなどにおいて、第1電子機器1と連携するためのアプリケーション(プログラム)をインストールしたものとしてもよい。 In particular, a smartphone or a laptop computer often has almost all of the functional units shown in FIG. 3. For this reason, in one embodiment, the second electronic device 100 may be a smartphone or a laptop computer. In this case, the second electronic device 100 may be a smartphone or a laptop computer with an application (program) installed for linking with the first electronic device 1.
 図4は、図1に示した第3電子機器300の構成を概略的に示すブロック図である。以下、一実施形態に係る第3電子機器300の構成の一例について説明する。第3電子機器300は、図1に示したように、例えば参加者Mgの自宅RL及び会議室MRとは異なる場所に設置されてよい。また、第3電子機器300は、例えば参加者Mgの自宅RL又はその付近に設置されてもよいし、会議室MR又はその付近に設置されてもよい。 FIG. 4 is a block diagram showing a schematic configuration of the third electronic device 300 shown in FIG. 1. An example of the configuration of the third electronic device 300 according to one embodiment will be described below. The third electronic device 300 may be installed in a location other than the participant Mg's home RL and the conference room MR, as shown in FIG. 1. The third electronic device 300 may be installed in the participant Mg's home RL or nearby, or in the conference room MR or nearby.
 第1電子機器1は、参加者Ma,Mb,Mc,及びMdなどが発話する際に、第1電子機器1が取得した参加者Ma,Mb,Mc,及びMdなどの音声及び/又は映像のテータを、第3電子機器300に送信する機能を有する。第3電子機器300は、第1電子機器1から受信した音声及び/又は映像のデータを第2電子機器100に送信してよい。また、第2電子機器100は、参加者Mgが発話する際に、第2電子機器100が取得した参加者Mgの音声及び/又は映像のデータを、第3電子機器300に送信する機能を有する。第3電子機器300は、第2電子機器100から受信した音声及び/又は映像のデータを第1電子機器1に送信してよい。このように、第3電子機器300は、第1電子機器1と第2電子機器100とを中継する機能を備えてよい。第3電子機器100は、適宜、「サーバ」とも記す。 The first electronic device 1 has a function of transmitting the audio and/or video data of the participants Ma, Mb, Mc, Md, etc. acquired by the first electronic device 1 to the third electronic device 300 when the participants Ma, Mb, Mc, Md, etc. speak. The third electronic device 300 may transmit the audio and/or video data received from the first electronic device 1 to the second electronic device 100. The second electronic device 100 also has a function of transmitting the audio and/or video data of the participant Mg acquired by the second electronic device 100 to the third electronic device 300 when the participant Mg speaks. The third electronic device 300 may transmit the audio and/or video data received from the second electronic device 100 to the first electronic device 1. In this way, the third electronic device 300 may have a function of relaying between the first electronic device 1 and the second electronic device 100. The third electronic device 100 is also referred to as a "server" as appropriate.
 図4に示すように、一実施形態に係る第3電子機器300は、制御部310、記憶部320、及び通信部330を備えてよい。また、制御部310は、例えば、判定部312、推定部314、及び調整部316を含んでよい。一実施形態において、第3電子機器300は、図4に示す機能部の少なくとも一部を備えなくてもよいし、図4に示す機能部以外の構成要素を備えてもよい。 As shown in FIG. 4, the third electronic device 300 according to one embodiment may include a control unit 310, a storage unit 320, and a communication unit 330. The control unit 310 may also include, for example, a determination unit 312, an estimation unit 314, and an adjustment unit 316. In one embodiment, the third electronic device 300 may not include at least some of the functional units shown in FIG. 4, or may include components other than the functional units shown in FIG. 4.
 制御部310は、第3電子機器300を構成する各機能部をはじめとして、第3電子機器300の全体を制御及び/又は管理する。制御部310は、基本的に、例えば図2に示した制御部10と同様の思想に基づく構成としてよい。また、制御部310の判定部312、推定部314、及び調整部316についても、それぞれ、例えば図2に示した制御部10の判定部12、推定部14、及び調整部16と同様の思想に基づく構成としてよい。 The control unit 310 controls and/or manages the entire third electronic device 300, including each functional unit constituting the third electronic device 300. The control unit 310 may basically be configured based on the same concept as the control unit 10 shown in FIG. 2, for example. The determination unit 312, estimation unit 314, and adjustment unit 316 of the control unit 310 may also be configured based on the same concept as the determination unit 12, estimation unit 14, and adjustment unit 16 of the control unit 10 shown in FIG. 2, for example.
 記憶部320は、各種の情報を記憶するメモリとしての機能を有してよい。記憶部320は、例えば制御部310において実行されるプログラム、及び、制御部310において実行された処理の結果などを記憶してよい。また、記憶部320は、制御部310のワークメモリとして機能してもよい。図4に示すように、記憶部320は、制御部310に有線及び/又は無線で接続されてよい。記憶部320は、基本的に、例えば図2に示した記憶部20と同様の思想に基づく構成としてよい。 The storage unit 320 may function as a memory that stores various types of information. The storage unit 320 may store, for example, programs executed in the control unit 310 and results of processing executed in the control unit 310. The storage unit 320 may also function as a work memory for the control unit 310. As shown in FIG. 4, the storage unit 320 may be connected to the control unit 310 via a wired and/or wireless connection. The storage unit 320 may basically be configured based on the same concept as the storage unit 20 shown in FIG. 2, for example.
 通信部330は、無線及び/又は有線により通信するためのインタフェースの機能を有する。通信部330は、例えばアンテナを介して、例えば他の電子機器の通信部と無線通信してもよい。例えば、通信部330は、図1に示した第1電子機器1と無線通信してよい。この場合、通信部330は、第1電子機器1の通信部30と無線通信してよい。このように、一実施形態において、通信部330は、第1電子機器1と通信する機能を有する。また、例えば、通信部330は、図1に示した第2電子機器100と無線通信してよい。この場合、通信部330は、第2電子機器100の通信部130と無線通信してよい。このように、一実施形態において、通信部330は、第2電子機器100と通信する機能を有してよい。図4に示すように、通信部330は、制御部310に有線及び/又は無線で接続されてよい。通信部330は、基本的に、例えば図2に示した通信部30と同様の思想に基づく構成としてよい。 The communication unit 330 has an interface function for wireless and/or wired communication. The communication unit 330 may wirelessly communicate with, for example, a communication unit of another electronic device, for example, via an antenna. For example, the communication unit 330 may wirelessly communicate with the first electronic device 1 shown in FIG. 1. In this case, the communication unit 330 may wirelessly communicate with the communication unit 30 of the first electronic device 1. In this way, in one embodiment, the communication unit 330 has a function of communicating with the first electronic device 1. Also, for example, the communication unit 330 may wirelessly communicate with the second electronic device 100 shown in FIG. 1. In this case, the communication unit 330 may wirelessly communicate with the communication unit 130 of the second electronic device 100. In this way, in one embodiment, the communication unit 330 may have a function of communicating with the second electronic device 100. As shown in FIG. 4, the communication unit 330 may be connected to the control unit 310 in a wired and/or wireless manner. The communication unit 330 may basically be configured based on the same idea as the communication unit 30 shown in FIG. 2.
 一実施形態において、第3電子機器300は、例えば専用に設計された機器としてもよい。一方、一実施形態において、第3電子機器300は、例えば図4に示す機能部のうち一部を備えてもよい。この場合、第3電子機器300は、図4に示す他の機能部の機能の少なくとも一部を補うために、他の電子機器に接続されてもよい。ここで、他の電子機器とは、例えば、汎用のコンピュータ又はサーバなどの機器としてもよい。一実施形態において、第3電子機器300は、例えば中継サーバ、ウェブサーバ、又はアプリケーションサーバなどとしてもよい。 In one embodiment, the third electronic device 300 may be, for example, a specially designed device. On the other hand, in one embodiment, the third electronic device 300 may include, for example, some of the functional units shown in FIG. 4. In this case, the third electronic device 300 may be connected to other electronic devices to supplement at least some of the functions of the other functional units shown in FIG. 4. Here, the other electronic devices may be, for example, devices such as a general-purpose computer or server. In one embodiment, the third electronic device 300 may be, for example, a relay server, a web server, or an application server.
 次に、一実施形態に係る第1電子機器1及び第2電子機器100の基本的な動作について説明する。以下、図1に示すように、会議室MRにおいて実施されるリモート会議に、参加者Mgが自宅RLから参加する状況を想定して説明する。 Next, the basic operation of the first electronic device 1 and the second electronic device 100 according to one embodiment will be described. The following description will be given assuming a situation in which a participant Mg participates in a remote conference held in a conference room MR from his/her home RL, as shown in FIG. 1.
 すなわち、一実施形態に係る第1電子機器1は、会議室MRに設置され、参加者Ma,Mb,Mc,及びMdの少なくともいずれかの映像及び/又は音声を取得する。第1電子機器1によって取得された映像及び/又は音声は、参加者Mgの自宅RLに設置された第2電子機器100に送信される。第2電子機器100は、第1電子機器1が取得する参加者Ma,Mb,Mc,及びMdの少なくともいずれかの映像及び/又は音声を出力する。これにより、参加者Mgは、参加者Ma,Mb,Mc,及びMdの少なくともいずれかの映像及び/又は音声を認識することができる。 That is, the first electronic device 1 according to one embodiment is installed in the conference room MR and acquires video and/or audio of at least one of the participants Ma, Mb, Mc, and Md. The video and/or audio acquired by the first electronic device 1 is transmitted to the second electronic device 100 installed in the home RL of the participant Mg. The second electronic device 100 outputs the video and/or audio of at least one of the participants Ma, Mb, Mc, and Md acquired by the first electronic device 1. This allows the participant Mg to recognize the video and/or audio of at least one of the participants Ma, Mb, Mc, and Md.
 一方、一実施形態に係る第2電子機器100は、参加者Mgの自宅RLに設置され、参加者Mgの映像及び/又は音声を取得する。第2電子機器100によって取得された映像及び/又は音声は、会議室MRに設置された第1電子機器1に送信される。第1電子機器1は、第2電子機器100から受信する参加者Mgの映像及び/又は音声を出力する。これにより、参加者Ma,Mb,Mc,及びMdの少なくともいずれかは、参加者Mgの映像及び/又は音声を聞くことができる。 Meanwhile, the second electronic device 100 according to one embodiment is installed in the home RL of the participant Mg and acquires video and/or audio of the participant Mg. The video and/or audio acquired by the second electronic device 100 is transmitted to the first electronic device 1 installed in the conference room MR. The first electronic device 1 outputs the video and/or audio of the participant Mg received from the second electronic device 100. This allows at least one of the participants Ma, Mb, Mc, and Md to hear the video and/or audio of the participant Mg.
 図5は、上述のような一実施形態に係るシステムの基本的な動作について説明するシーケンス図である。図5は、第1電子機器1、第2電子機器100、及び第3電子機器300の相互間で行われるデータなどのやり取りを示す図である。以下、図5を参照して、一実施形態に係るシステムを用いてリモート会議又はビデオ会議が行われる際の基本的な動作について説明する。 FIG. 5 is a sequence diagram explaining the basic operation of the system according to the embodiment as described above. FIG. 5 is a diagram showing the exchange of data etc. between the first electronic device 1, the second electronic device 100, and the third electronic device 300. Below, the basic operation when a remote conference or video conference is held using the system according to the embodiment will be explained with reference to FIG. 5.
 図5に示す動作において、ローカルで使用される第1電子機器1は、第1ユーザによって使用されるものとしてよい。ここで、第1ユーザとは、例えば図1に示した参加者Ma,Mb,Mc,及びMdの少なくとも1人(以下、ローカルのユーザとも記す)としてよい。また、リモートで使用される第2電子機器100は、第2ユーザによって使用されるものとしてよい。ここで、第2ユーザとは、例えば図1に示した参加者Mg(以下、リモートのユーザとも記す)としてよい。以下、第1電子機器1が実行する動作は、より詳細には、例えば第1電子機器1の制御部10が実行するものとしてよい。本明細書において、第1電子機器1の制御部10が実行する動作を、第1電子機器1が実行する動作として記すことがある。同様に、第2電子機器100が実行する動作は、より詳細には、例えば第2電子機器100の制御部110が実行するものとしてよい。本明細書において、第2電子機器100の制御部110が実行する動作を、第2電子機器100が実行する動作として記すことがある。また、第3電子機器300が実行する動作は、より詳細には、例えば第3電子機器300の制御部310が実行するものとしてよい。本明細書において、第3電子機器300の制御部310が実行する動作を、第3電子機器300が実行する動作として記すことがある。 In the operation shown in FIG. 5, the first electronic device 1 used locally may be used by the first user. Here, the first user may be, for example, at least one of the participants Ma, Mb, Mc, and Md shown in FIG. 1 (hereinafter also referred to as a local user). The second electronic device 100 used remotely may be used by the second user. Here, the second user may be, for example, the participant Mg shown in FIG. 1 (hereinafter also referred to as a remote user). Hereinafter, the operation performed by the first electronic device 1 may be, in more detail, performed by, for example, the control unit 10 of the first electronic device 1. In this specification, the operation performed by the control unit 10 of the first electronic device 1 may be referred to as the operation performed by the first electronic device 1. Similarly, the operation performed by the second electronic device 100 may be, in more detail, performed by, for example, the control unit 110 of the second electronic device 100. In this specification, the operation performed by the control unit 110 of the second electronic device 100 may be referred to as the operation performed by the second electronic device 100. Furthermore, the operations performed by the third electronic device 300 may be more specifically performed by, for example, the control unit 310 of the third electronic device 300. In this specification, the operations performed by the control unit 310 of the third electronic device 300 may be referred to as operations performed by the third electronic device 300.
 図5に示す動作が開始すると、第1電子機器1は、第1ユーザ(例えば参加者Ma,Mb,Mc,及びMdなどの少なくともいずれか)の映像及び音声の少なくとも一方を取得する(ステップS1)。具体的には、ステップS1において、第1電子機器1は、撮像部40によって第1ユーザの映像を撮像し、音声入力部50によって第1ユーザの音声を取得(又は検出)してよい。次に、第1電子機器1は、第1ユーザの映像及び音声の少なくとも一方をエンコードする(ステップS2)。ステップS2において、エンコードとは、映像及び/又は音声のデータを所定の規則に従って圧縮し、暗号化を含む目的に応じた形式に変換するものとしてよい。第1電子機器1は、ソフトウェアエンコード又はハードウェアエンコードなど、公知の種々のエンコードを行ってよい。 5 starts, the first electronic device 1 acquires at least one of the video and audio of the first user (e.g., at least one of the participants Ma, Mb, Mc, and Md) (step S1). Specifically, in step S1, the first electronic device 1 may capture the video of the first user using the imaging unit 40 and acquire (or detect) the audio of the first user using the audio input unit 50. Next, the first electronic device 1 encodes at least one of the video and audio of the first user (step S2). In step S2, encoding may mean compressing the video and/or audio data according to a predetermined rule and converting it into a format according to the purpose, including encryption. The first electronic device 1 may perform various known encoding methods, such as software encoding or hardware encoding.
 次に、第1電子機器1は、エンコードされた映像及び/又は音声のデータを、第3電子機器300に送信する(ステップS3)。具体的には、ステップS3において、第1電子機器1は、映像及び/又は音声のデータを、通信部30から、第3電子機器300の通信部330に送信する。また、ステップS3において、第3電子機器300は、第1電子機器1の通信部30から送信される映像及び/又は音声のデータを、通信部330によって受信する。 Next, the first electronic device 1 transmits the encoded video and/or audio data to the third electronic device 300 (step S3). Specifically, in step S3, the first electronic device 1 transmits the video and/or audio data from the communication unit 30 to the communication unit 330 of the third electronic device 300. Also in step S3, the third electronic device 300 receives the video and/or audio data transmitted from the communication unit 30 of the first electronic device 1 via the communication unit 330.
 次に、第3電子機器300は、通信部30から受信するエンコードされた映像及び/又は音声のデータを、第2電子機器100に送信する(ステップS4)。具体的には、ステップS4において、第3電子機器300は、映像及び/又は音声のデータを、通信部330から、第2電子機器100の通信部130に送信する。また、ステップS4において、第2電子機器100は、第3電子機器300の通信部330から送信される映像及び/又は音声のデータを、通信部130によって受信する。 Next, the third electronic device 300 transmits the encoded video and/or audio data received from the communication unit 30 to the second electronic device 100 (step S4). Specifically, in step S4, the third electronic device 300 transmits the video and/or audio data from the communication unit 330 to the communication unit 130 of the second electronic device 100. Also, in step S4, the second electronic device 100 receives the video and/or audio data transmitted from the communication unit 330 of the third electronic device 300 via the communication unit 130.
 次に、第2電子機器100は、通信部330から受信するエンコードされた映像及び/又は音声のデータをデコードする(ステップS5)。ステップS5において、デコードとは、エンコードされた映像及び/又は音声のデータの形式を、元の形式に戻すものとしてよい。第2電子機器100は、ソフトウェアエンコード又はハードウェアエンコードなど、公知の種々のデコードを行ってよい。 Then, the second electronic device 100 decodes the encoded video and/or audio data received from the communication unit 330 (step S5). In step S5, decoding may mean returning the format of the encoded video and/or audio data to its original format. The second electronic device 100 may perform various known decoding methods, such as software encoding or hardware encoding.
 次に、第2電子機器100は、第1ユーザ(例えば参加者Ma,Mb,Mc,及びMdなどの少なくともいずれか)の映像及び音声の少なくとも一方を、第2ユーザ(例えば参加者Mg)に提示する(ステップS6)。具体的には、ステップS6において、第2電子機器100は、表示部170に第1ユーザの映像を表示し、音声出力部160から第1ユーザの音声を出力してよい。 Next, the second electronic device 100 presents at least one of the video and audio of the first user (e.g., at least one of participants Ma, Mb, Mc, and Md) to the second user (e.g., participant Mg) (step S6). Specifically, in step S6, the second electronic device 100 may display the video of the first user on the display unit 170 and output the audio of the first user from the audio output unit 160.
 ステップS1~ステップS6の動作により、例えば自宅RLにいる第2ユーザ(例えば参加者Mg)は、例えば会議室MRにおける第1ユーザ(例えば参加者Ma,Mb,Mc,及びMdなどの少なくともいずれか)の映像及び/又は音声を認識することができる。 By performing the operations of steps S1 to S6, for example, a second user (e.g., participant Mg) at home RL can recognize the video and/or audio of a first user (e.g., at least one of participants Ma, Mb, Mc, and Md) in a conference room MR.
 以上、第1電子機器1が第1ユーザの映像及び/又は音声を、第3電子機器300を介して、第2電子機器100に送信する態様を説明した。その逆の手順により、第2電子機器100が第2ユーザの映像及び/又は音声を、第3電子機器300を介して、第1電子機器1に送信することができる。 The above describes a manner in which the first electronic device 1 transmits video and/or audio of the first user to the second electronic device 100 via the third electronic device 300. By reversing the procedure, the second electronic device 100 can transmit video and/or audio of the second user to the first electronic device 1 via the third electronic device 300.
 すなわち、第2電子機器100は、第2ユーザ(例えば参加者Mg)の映像及び音声の少なくとも一方を取得する(ステップS11)。具体的には、ステップS11において、第2電子機器100は、撮像部140によって第2ユーザの映像を撮像し、音声入力部150によって第2ユーザの音声を取得(又は検出)してよい。次に、第2電子機器100は、第2ユーザの映像及び音声の少なくとも一方をエンコードする(ステップS12)。 That is, the second electronic device 100 acquires at least one of the video and audio of the second user (e.g., participant Mg) (step S11). Specifically, in step S11, the second electronic device 100 may capture the video of the second user using the imaging unit 140 and acquire (or detect) the audio of the second user using the audio input unit 150. Next, the second electronic device 100 encodes at least one of the video and audio of the second user (step S12).
 次に、第2電子機器100は、エンコードされた映像及び/又は音声のデータを、第3電子機器300に送信する(ステップS13)。具体的には、ステップS13において、第2電子機器100は、映像及び/又は音声のデータを、通信部130から、第3電子機器300の通信部330に送信する。また、ステップS13において、第3電子機器300は、第2電子機器100の通信部130から送信される映像及び/又は音声のデータを、通信部330によって受信する。 Next, the second electronic device 100 transmits the encoded video and/or audio data to the third electronic device 300 (step S13). Specifically, in step S13, the second electronic device 100 transmits the video and/or audio data from the communication unit 130 to the communication unit 330 of the third electronic device 300. Also in step S13, the third electronic device 300 receives the video and/or audio data transmitted from the communication unit 130 of the second electronic device 100 via the communication unit 330.
 次に、第3電子機器300は、通信部130から受信するエンコードされた映像及び/又は音声のデータを、第1電子機器1に送信する(ステップS14)。具体的には、ステップS14において、第3電子機器300は、映像及び/又は音声のデータを、通信部330から、第1電子機器1の通信部30に送信する。また、ステップS14において、第1電子機器1は、第3電子機器300の通信部330から送信される映像及び/又は音声のデータを、通信部30によって受信する。 Next, the third electronic device 300 transmits the encoded video and/or audio data received from the communication unit 130 to the first electronic device 1 (step S14). Specifically, in step S14, the third electronic device 300 transmits the video and/or audio data from the communication unit 330 to the communication unit 30 of the first electronic device 1. Also, in step S14, the first electronic device 1 receives the video and/or audio data transmitted from the communication unit 330 of the third electronic device 300 via the communication unit 30.
 次に、第1電子機器1は、通信部330から受信するエンコードされた映像及び/又は音声のデータをデコードする(ステップS15)。 Next, the first electronic device 1 decodes the encoded video and/or audio data received from the communication unit 330 (step S15).
 次に、第1電子機器1は、第2ユーザ(例えば参加者Mg)の映像及び音声の少なくとも一方を、第1ユーザ(例えば参加者Ma,Mb,Mc,及びMdなどの少なくともいずれか)に提示する(ステップS16)。具体的には、ステップS16において、第1電子機器1は、表示部70に第2ユーザの映像を表示し、音声出力部60から第2ユーザの音声を出力してよい。 Next, the first electronic device 1 presents at least one of the video and audio of the second user (e.g., participant Mg) to the first user (e.g., at least one of participants Ma, Mb, Mc, and Md) (step S16). Specifically, in step S16, the first electronic device 1 may display the video of the second user on the display unit 70 and output the audio of the second user from the audio output unit 60.
 ステップS11~ステップS16の動作により、例えば会議室MRにおける第1ユーザ(例えば参加者Ma,Mb,Mc,及びMdなどの少なくともいずれか)は、例えば自宅RLにいる第2ユーザ(例えば参加者Mg)の映像及び/又は音声を認識できる。 By performing the operations of steps S11 to S16, for example, a first user (e.g., at least one of participants Ma, Mb, Mc, and Md) in the conference room MR can recognize the video and/or audio of a second user (e.g., participant Mg) in his/her home RL.
 ステップS1からステップS6までの動作と、ステップS11からステップS16までの動作とは、逆の順序で実行されてもよい。すなわち、ステップS11からステップS16までの動作が実行されてから、ステップS1からステップS6までの動作が実行されてもよい。また、ステップS1からステップS6までの動作と、ステップS11からステップS16までの動作とは、同時に実行されてもよいし、少なくとも一部が重なるようにして実行されてもよい。 The operations from step S1 to step S6 and the operations from step S11 to step S16 may be executed in the reverse order. That is, the operations from step S11 to step S16 may be executed first, and then the operations from step S1 to step S6. Furthermore, the operations from step S1 to step S6 and the operations from step S11 to step S16 may be executed simultaneously, or may be executed so that they at least partially overlap.
 ここで、以上のようにして実現されるリモート会議又はビデオ会議において想定される課題について説明する。 Here, we will explain some of the issues that may arise when using remote or video conferencing in the manner described above.
 例えば図1に示すように、ネットワークNの少なくとも一部にインターネット回線などを含む場合、当該回線の通信速度は通常保証されておらず、ベストエフォート型の契約になることが多い。図1に示した会議室MRと参加者Mgの自宅RLとの間に専用回線を施設すれば、通信速度をある程度確保することが期待できると想定される。しかしながら、専用回線の施設は、コスト面のハードルが高くなる傾向にある。このため、近年のリモート会議又はビデオ会議は、ネットワークNの少なくとも一部にインターネット回線などを含む構成により実現され、通信の遅延については妥協せざるを得ない場合が多い。 For example, as shown in FIG. 1, when at least a part of the network N includes an Internet line or the like, the communication speed of the line is usually not guaranteed and is often a best-effort contract. It is expected that a certain level of communication speed can be ensured by establishing a dedicated line between the conference room MR and the home RL of the participant Mg shown in FIG. 1. However, the establishment of a dedicated line tends to be a high hurdle in terms of cost. For this reason, recent remote conferences or video conferences are realized by a configuration in which at least a part of the network N includes an Internet line or the like, and there are often no options for compromising on communication delays.
 また、図5に示したような動作においては、エンコード及び/又はデコードの動作にも所定の時間を要する。例えば、第1ユーザが第2ユーザに何らかの質問などをした場合、質問をした第1ユーザに第2ユーザの応答が返ってくるまでには、エンコード及びデコードを2回ずつ実行する必要がある。仮に、1回のエンコード又はデコードが要するのは僅かな時間であるとしても、このような処理が第1ユーザと第2ユーザとの間で往復すると、その会話の間に無視できない時間の遅延を生じることも想定される。 Furthermore, in the operation shown in FIG. 5, the encoding and/or decoding operations also require a certain amount of time. For example, if a first user asks a second user a question, encoding and decoding must be performed twice before the second user's response is returned to the first user who asked the question. Even if one encoding or decoding takes only a short amount of time, if such processing goes back and forth between the first and second users, it is conceivable that a non-negligible time delay will occur during the conversation.
 このように、第1ユーザと第2ユーザとの間の会話に遅延が生じると、リモート会議又はビデオ会議の進行が円滑でなくなる事態も想定される。例えば、第1ユーザが第2ユーザに意思の確認などの応答を求めた場合、第1ユーザの発言の終了時点を示す映像及び/又は音声が実際に第2ユーザに送信されるタイミングは、遅延することがあり得る。この場合、第2ユーザが応答するタイミングがさらに遅延することにより、第1ユーザが第2ユーザの応答を待ちきれなくなったり、第1ユーザの次の発言に第2ユーザの応答がかぶったりしてしまうことも想定される。また、このような事態が生じてしまうと、参加者に伝達する情報の量及び/又は質が低下する事態も想定される。そのため、リモート会議又はビデオ会議では、発話者の発話に対する聞き手の応答を適切に伝達し、共有できることが、コミュニケーションを円滑にするためにも望ましい。 In this way, if a delay occurs in the conversation between the first user and the second user, it is expected that the progress of the remote conference or video conference will not be smooth. For example, when the first user requests a response such as confirmation of intent from the second user, the timing at which the video and/or audio indicating the end of the first user's speech is actually transmitted to the second user may be delayed. In this case, if the timing of the second user's response is further delayed, it is expected that the first user will become impatient to wait for the second user's response, or that the second user's response will overlap with the first user's next speech. In addition, if such a situation occurs, it is also expected that the amount and/or quality of information transmitted to participants will decrease. Therefore, in a remote conference or video conference, it is desirable to be able to appropriately transmit and share the listener's response to the speaker's speech in order to facilitate smooth communication.
 そこで、一実施形態に係るシステムは、第1ユーザの発話に基づいて、第1ユーザの発話に基づく第2ユーザの応答タイミングを推定し、当該応答タイミングの到来を第2ユーザに提示する。また、一実施形態に係るシステムは、第1ユーザの発話に基づく第2ユーザの応答タイミングを、第1ユーザの発話が終了する前の時点で推定してよい。 The system according to one embodiment estimates the response timing of the second user based on the first user's speech, and notifies the second user of the arrival of the response timing. The system according to one embodiment may also estimate the response timing of the second user based on the first user's speech at a point before the first user's speech ends.
 次に、一実施形態に係るシステムによる応答タイミングの推定について説明する。図6は、一実施形態に係るシステムによる応答タイミングの推定を説明する図である。 Next, we will explain how the system according to one embodiment estimates response timing. Figure 6 is a diagram explaining how the system according to one embodiment estimates response timing.
 図6の上側は、第1電子機器1の第1ユーザが会話する際に、音声入力部50が取得(検出)する第1ユーザの音声の波形を示している。図6の上側のグラフは、縦軸が第1ユーザの音声のレベルを示し、横軸が時間(時刻)を示している。図6の上側のグラフの縦軸は、例えば、音声入力部50が取得する第1ユーザの音声の音圧を電圧に変換してから増幅したものとしてよい。また、図6の上側のグラフの縦軸は、第1ユーザの音声の音圧としてもよいし、音量としてもよい。 The upper part of Figure 6 shows the waveform of the voice of the first user acquired (detected) by the voice input unit 50 when the first user of the first electronic device 1 is talking. In the graph at the top of Figure 6, the vertical axis indicates the level of the voice of the first user, and the horizontal axis indicates time (hours). The vertical axis of the graph at the top of Figure 6 may represent, for example, the sound pressure of the voice of the first user acquired by the voice input unit 50, converted into a voltage and then amplified. Furthermore, the vertical axis of the graph at the top of Figure 6 may represent the sound pressure or volume of the voice of the first user.
 一般的に、人間の会話は自分と相手とのやり取りにより進行する。このため、一般的な会話においては、図6の上側のグラフに示すように、自分が話すタイミングと、相手の話、返事、又は相槌などの反応を待つタイミングとが存在する。相手の話、返事、又は相槌などを待つタイミングには、自分は音声を(ほとんど)発さない傾向にある。したがって、例えば図6の上側のグラフに示す無音又はほぼ無音の区間は、第2ユーザが応答するのに望ましいタイミングとしてよい。このように、第1ユーザの会話に対して、第2ユーザが応答するのに望ましいタイミングを、(第2ユーザの)「応答タイミング」と記す。 Generally, human conversations progress through interactions between oneself and the other person. For this reason, in a typical conversation, as shown in the upper graph of Figure 6, there are times when one speaks and times when one waits for the other person's speech, a reply, or a reaction such as an interjection. When waiting for the other person's speech, a reply, or an interjection, one tends to make (almost no) sound. Therefore, for example, the silent or nearly silent sections shown in the upper graph of Figure 6 may be considered desirable times for the second user to respond. In this way, the desirable times for the second user to respond to the conversation of the first user are referred to as (the second user's) "response timing."
 図6の下側のグラフは、応答タイミングを例示している。図6の下側のグラフに示すように、第1ユーザの音声が(ほとんど)発されない時間の区間を、第2ユーザの応答タイミングとしてよい。図6の下側のグラフに示すように、音声を発している第1ユーザの音声が(ほとんど)発されなくなるタイミングを、第2ユーザの応答タイミングの開始時点としてよい。また、音声を発していない第1ユーザの音声が次に発されるタイミングを、第2ユーザの応答タイミングの終了時点としてよい。図6の下側のグラフにおいては、応答タイミングがオンになっている状態を+1の値で示し、応答タイミングがオフになっている状態を-1の値で示してある。 The graph at the bottom of FIG. 6 illustrates an example of response timing. As shown in the graph at the bottom of FIG. 6, the time period during which the first user makes (almost) no sound may be set as the response timing of the second user. As shown in the graph at the bottom of FIG. 6, the timing at which the first user who is making sound stops making (almost) sound may be set as the start point of the response timing of the second user. Also, the timing at which the first user who is not making sound next makes sound may be set as the end point of the response timing of the second user. In the graph at the bottom of FIG. 6, the state in which the response timing is on is indicated by a value of +1, and the state in which the response timing is off is indicated by a value of -1.
 一実施形態に係るシステムは、図6に示すような応答タイミングを、第1ユーザの音声を取得した後に解析することにより判定するのではなく、第1ユーザの音声を取得している最中に推定してよい。すなわち、一実施形態に係るシステムは、図6に示すような個々の応答タイミングの開始時点を、第1ユーザの個々の発話が終わる時点よりも前に推定してよい。この場合、一実施形態に係るシステムは、音声入力部50が取得する第1ユーザの音声の特徴量、及び/又は言語の特徴量などに基づいて、応答タイミングの開始時点を推定してよい。また、一実施形態に係るシステムは、第1ユーザの音声に代えて又は当該音声とともに、撮像部40が撮像する第1ユーザの映像、すなわち、顔、表情、仕草、及び/又は、身体の動きなどに基づいて、応答タイミングの開始時点を推定してもよい。 The system according to one embodiment may estimate the response timing as shown in FIG. 6 while acquiring the voice of the first user, rather than determining the response timing by analyzing the voice of the first user after acquisition. That is, the system according to one embodiment may estimate the start time of each response timing as shown in FIG. 6 before the end of each utterance of the first user. In this case, the system according to one embodiment may estimate the start time of the response timing based on the features of the voice of the first user acquired by the voice input unit 50 and/or the features of the language. The system according to one embodiment may also estimate the start time of the response timing based on the image of the first user captured by the imaging unit 40, i.e., the face, facial expressions, gestures, and/or body movements, instead of or in addition to the voice of the first user.
 第1ユーザの音声及び/又は映像から応答タイミングを推定する手法は、種々想定することができる。例えば、一実施形態に係るシステムは、第1ユーザの音声の特徴量として、音量が下がる又は下がり口調になるタイミングに基づいて、応答タイミングの開始時点を推定してよい。また、一実施形態に係るシステムは、第1ユーザの言語の特徴量として、「~です」又は「~ます」のような語尾になるタイミングに基づいて、応答タイミングの開始時点を推定してよい。また、一実施形態に係るシステムは、第1ユーザの映像の特徴量として、第1ユーザが第1電子機器1から視線をそらした状態から第1電子機器1に視線を戻すタイミングに基づいて、応答タイミングの開始時点を推定してもよい。 Various methods can be envisaged for estimating the response timing from the voice and/or video of the first user. For example, a system according to an embodiment may estimate the start point of the response timing based on the timing when the volume of the voice of the first user decreases or the tone of voice becomes lower, as a feature of the voice of the first user. A system according to an embodiment may estimate the start point of the response timing based on the timing when the ending of the sentence becomes "desu" or "masu", as a feature of the language of the first user. A system according to an embodiment may estimate the start point of the response timing based on the timing when the first user returns their gaze to the first electronic device 1 after looking away from the first electronic device 1, as a feature of the video of the first user.
 また、一実施形態に係るシステムは、第1ユーザの音声の特徴量として、発話の音量が下がってから次に上がるタイミングに基づいて、応答タイミングの終了時点を推定してもよい。また、一実施形態に係るシステムは、第1ユーザの音声の特徴量として、音量が上がる時点を問いかけのタイミングと推定し、当該問いかけの平均的な応答時間に基づいて、応答タイミングの終了時点を推定してもよい。 The system according to one embodiment may estimate the end point of the response timing based on the timing when the volume of speech decreases and then increases, as a feature of the first user's voice. The system according to one embodiment may estimate the timing when the volume increases, as a feature of the first user's voice, as the timing of a question, and estimate the end point of the response timing based on the average response time to the question.
 また、一実施形態に係るシステムは、第1ユーザの言語の特徴量として、発話の内容が自由回答式の質問(オープンクエスチョン)であるか、又は回答選択式の質問(クローズドクエスチョン)であるか判定してもよい。この場合、一実施形態に係るシステムは、例えば自由回答式の質問の後の応答タイミングとして、回答選択式の質問の後の応答タイミングよりも長い時間を設定してもよい。また、一実施形態に係るシステムは、例えば回答選択式の質問の後の応答タイミングとして、自由回答式の質問の後の応答タイミングよりも短い時間を設定してもよい。 The system according to one embodiment may determine, as a feature of the first user's language, whether the content of the utterance is an open-ended question or a closed-ended question. In this case, the system according to one embodiment may set, for example, a response timing after an open-ended question that is longer than the response timing after a closed-ended question. The system according to one embodiment may also set, for example, a response timing after a closed-ended question that is shorter than the response timing after an open-ended question.
 また、一実施形態に係るシステムは、第1ユーザの音声及び/又は言語の特徴量として、会話が弾んでいるか否か、又は会話が弾んでいる度合いを判定してもよい。この場合、一実施形態に係るシステムは、第1ユーザの会話が弾んでいると判定される際の応答タイミングを比較的短く設定してもよい。 The system according to one embodiment may also determine whether or not the conversation is lively, or the degree to which the conversation is lively, as a feature of the first user's voice and/or language. In this case, the system according to one embodiment may set the response timing for determining that the first user is lively to be relatively short.
また、一実施形態に係るシステムは、第1ユーザの音声及び/又は言語の特徴量として、会話の内容がポジティブか、ネガティブかを判定してもよい。この場合、一実施形態に係るシステムは、第1ユーザの会話が比較的ポジティブな内容と判定される際の応答タイミングを比較的短く設定し、第1ユーザの会話が比較的ネガティブな内容と判定される際の応答タイミングを比較的長く設定してもよい。 Furthermore, the system according to one embodiment may determine whether the content of the conversation is positive or negative based on the voice and/or language features of the first user. In this case, the system according to one embodiment may set the response timing to be relatively short when the content of the conversation of the first user is determined to be relatively positive, and set the response timing to be relatively long when the content of the conversation of the first user is determined to be relatively negative.
 また、一実施形態に係るシステムは、第1ユーザの過去の音声及び/又は映像の履歴を解析することにより、応答タイミングを推定したり、推定された応答タイミングを補正したりしてもよい。一実施形態に係るシステムは、例えばAI(Artificial Intelligence)の技術に基づいて、応答タイミングを推定してもよい。また、一実施形態に係るシステムは、例えば機械学習(さらにはディープラーニング)の技術に基づいて、応答タイミングを推定してもよい。 The system according to one embodiment may estimate the response timing or correct the estimated response timing by analyzing the first user's past audio and/or video history. The system according to one embodiment may estimate the response timing based on, for example, AI (Artificial Intelligence) technology. The system according to one embodiment may estimate the response timing based on, for example, machine learning (and even deep learning) technology.
 一実施形態に係るシステムは、以上説明したような応答タイミングが推定されたら、推定された応答タイミングの時刻が到来する時点で、第2電子機器100において応答タイミングであることを第2ユーザに示してよい。第2電子機器100は、応答タイミングの到来を、視覚情報、聴覚情報、及び触覚情報の少なくともいずれかとして、第2ユーザに提示してよい。例えば、第2電子機器100は、表示部170に「質問されました」又は「あなたのターンです」のように表示することにより、第2ユーザに応答タイミングを知らせてよい。また、第2電子機器100は、例えばLEDなどのインジケータとして構成される表示部170を点灯又は点滅させるなどして、第2ユーザに応答タイミングを知らせてよい。また、第2電子機器100は、音声出力部160から、「質問されました」又は「あなたのターンです」などの音声を出力することにより、第2ユーザに応答タイミングを知らせてもよい。また、第2電子機器100は、音声出力部160から、所定の通知音などを出力することにより、第2ユーザに応答タイミングを知らせてもよい。また、第2電子機器100は、触感呈示部190から、例えば所定の振動などの触覚情報を出力することにより、第2ユーザに応答タイミングを知らせてもよい。 In the system according to an embodiment, when the response timing is estimated as described above, the second electronic device 100 may indicate to the second user that it is time to respond at the time of the estimated response timing. The second electronic device 100 may present the arrival of the response timing to the second user as at least one of visual information, auditory information, and tactile information. For example, the second electronic device 100 may notify the second user of the response timing by displaying "You have been asked a question" or "It's your turn" on the display unit 170. The second electronic device 100 may also notify the second user of the response timing by turning on or blinking the display unit 170 configured as an indicator such as an LED. The second electronic device 100 may also notify the second user of the response timing by outputting a sound such as "You have been asked a question" or "It's your turn" from the audio output unit 160. The second electronic device 100 may also notify the second user of the response timing by outputting a predetermined notification sound or the like from the audio output unit 160. Furthermore, the second electronic device 100 may notify the second user of the response timing by outputting haptic information, such as a predetermined vibration, from the haptic sensation providing unit 190.
 この場合、一実施形態に係るシステムは、例えば通常の音声及び/又は映像の通信よりも優先させて、応答タイミングの伝達を行うようにしてもよい。応答タイミングの伝達は、単にタイミングの通知に過ぎないため、音声及び/又は映像の通信よりも優先させたとしても、音声及び/又は映像の通信にほとんど影響しないものと考えられる。また、一実施形態に係るシステムにおいて、応答タイミングの伝達は、例えばパブリッシュ/サブスクライブサーバを用いて行ってもよい。さらに、一実施形態に係るシステムにおいて、応答タイミングの伝達は、通常の音声及び/又は映像の通信とは別の回線を用いてもよい。 In this case, the system according to one embodiment may transmit the response timing by prioritizing it over normal audio and/or video communication, for example. Because the transmission of the response timing is merely a notification of timing, it is considered that even if the response timing is prioritized over audio and/or video communication, it will have little effect on the audio and/or video communication. Furthermore, in the system according to one embodiment, the transmission of the response timing may be performed using, for example, a publish/subscribe server. Furthermore, in the system according to one embodiment, the transmission of the response timing may use a line separate from the line for normal audio and/or video communication.
 このように、一実施形態に係るシステムによれば、例えば音声及び/又は映像が遅延する場合であっても、本来の応答タイミングを第2ユーザに知らせることができる。したがって、第2ユーザは、第1ユーザの発言に対し適切なタイミングで応答することができる。すなわち、一実施形態に係るシステムによれば、第1ユーザが第2ユーザの応答を待ちきれなくなる事態も低減され、第1ユーザの次の発言に第2ユーザの応答がかぶってしまう事態も低減される。このため、一実施形態に係るシステムによれば、複数の場所の間でコミュニケーションを円滑にすることができる。 In this way, the system according to one embodiment can inform the second user of the original response timing even if, for example, there is a delay in audio and/or video. Therefore, the second user can respond to the first user's comment at an appropriate time. In other words, the system according to one embodiment reduces the number of cases where the first user becomes impatient for the second user's response, and also reduces the number of cases where the second user's response overlaps with the first user's next comment. Therefore, the system according to one embodiment can facilitate communication between multiple locations.
 上述した応答タイミングの推定は、第1電子機器1の推定部14によって実行してもよいし、第3電子機器300の推定部314によって実行してもよいし、第2電子機器100の推定部114によって実行してもよい。応答タイミングの推定は、第1電子機器1の推定部14、第2電子機器100の推定部114、第3電子機器300の推定部314の少なくともいずれかが実行してよい。この場合、推定部14、推定部114、及び推定部314のうち、応答タイミングの推定を実行しないものは、必須の構成要素としなくてもよい。また、上述した応答タイミングの推定に関連する各種の判定処理は、第1電子機器1の判定部12によって実行してもよいし、第3電子機器300の判定部312によって実行してもよいし、第2電子機器100の判定部112によって実行してもよい。さらに、推定された応答タイミングの補正に関連する処理は、第1電子機器1の調整部16によって実行してもよいし、第3電子機器300の調整部316によって実行してもよいし、第2電子機器100の調整部116によって実行してもよい。 The above-mentioned response timing estimation may be performed by the estimation unit 14 of the first electronic device 1, the estimation unit 314 of the third electronic device 300, or the estimation unit 114 of the second electronic device 100. The response timing estimation may be performed by at least one of the estimation unit 14 of the first electronic device 1, the estimation unit 114 of the second electronic device 100, and the estimation unit 314 of the third electronic device 300. In this case, among the estimation unit 14, the estimation unit 114, and the estimation unit 314, those that do not estimate the response timing may not be required components. In addition, various determination processes related to the above-mentioned response timing estimation may be performed by the determination unit 12 of the first electronic device 1, the determination unit 312 of the third electronic device 300, or the determination unit 112 of the second electronic device 100. Furthermore, the process related to correcting the estimated response timing may be performed by the adjustment unit 16 of the first electronic device 1, by the adjustment unit 316 of the third electronic device 300, or by the adjustment unit 116 of the second electronic device 100.
 次に、一実施形態に係るシステムの特徴的な動作について、さらに説明する。図7は、一実施形態に係るシステムの特徴的な動作について説明するシーケンス図である。図7は、図5と同様に、第1電子機器1、第2電子機器100、及び第3電子機器300の相互間で行われるデータなどのやり取りを示す図である。以下、図7を参照して、一実施形態に係るシステムを用いてリモート会議又はビデオ会議が行われる際の特徴的な動作について説明する。図5において説明したデータのエンコード及びデコードは、既知の技術を利用してよい。このため、図7においては、データのエンコード及びデコードについての説明は省略する。以下、図5において既に説明したのと同様又は類似となる内容の説明は、適宜、簡略化又は省略することがある。 Next, the characteristic operations of the system according to one embodiment will be further described. FIG. 7 is a sequence diagram illustrating the characteristic operations of the system according to one embodiment. Like FIG. 5, FIG. 7 is a diagram illustrating the exchange of data and the like between the first electronic device 1, the second electronic device 100, and the third electronic device 300. Below, referring to FIG. 7, the characteristic operations when a remote conference or video conference is held using the system according to one embodiment will be described. The encoding and decoding of data described in FIG. 5 may use known technology. For this reason, the description of the encoding and decoding of data will be omitted in FIG. 7. Below, the description of the same or similar content as that already described in FIG. 5 may be simplified or omitted as appropriate.
 図7に示す動作が開始すると、第1電子機器1は、第1ユーザ(例えば参加者Ma,Mb,Mc,及びMdなどの少なくともいずれか)の映像及び音声の少なくとも一方を取得する(ステップS101)。ステップS101の動作は、図5のステップS1と同様としてよい。 7 starts, the first electronic device 1 acquires at least one of the video and audio of the first user (e.g., at least one of participants Ma, Mb, Mc, and Md) (step S101). The operation of step S101 may be the same as step S1 in FIG. 5.
 次に、第1電子機器1は、第1ユーザの映像及び/又は音声のデータを、第3電子機器300に送信する(ステップS102)。ステップS102の動作は、図5のステップS3と同様としてよい。第3電子機器300は、第1電子機器1から受信する第1ユーザの映像及び/又は音声のデータを、第2電子機器100に送信する(ステップS103)。ステップS103の動作は、図5のステップS4と同様としてよい。 Next, the first electronic device 1 transmits video and/or audio data of the first user to the third electronic device 300 (step S102). The operation of step S102 may be similar to step S3 in FIG. 5. The third electronic device 300 transmits the video and/or audio data of the first user received from the first electronic device 1 to the second electronic device 100 (step S103). The operation of step S103 may be similar to step S4 in FIG. 5.
 第2電子機器100は、ステップS103において第3電子機器300から第1ユーザの映像及び/又は音声のデータを受信すると、当該第1ユーザの映像及び音声の少なくとも一方を、第2ユーザ(例えば参加者Mg)に提示する(ステップS104)。ステップS104の動作は、図5のステップS6と同様としてよい。 When the second electronic device 100 receives the video and/or audio data of the first user from the third electronic device 300 in step S103, it presents at least one of the video and audio of the first user to the second user (e.g., participant Mg) (step S104). The operation of step S104 may be the same as step S6 in FIG. 5.
 第3電子機器300は、ステップS102において第1電子機器1から第1ユーザの映像及び/又は音声のデータを受信すると、第1ユーザの映像及び/又は音声のデータに基づいて、応答タイミングを推定する(ステップS105)。ステップS105において実行する応答タイミングの推定は、上述のように行うことができる。 When the third electronic device 300 receives the video and/or audio data of the first user from the first electronic device 1 in step S102, it estimates the response timing based on the video and/or audio data of the first user (step S105). The response timing estimation performed in step S105 can be performed as described above.
 次に、第3電子機器300は、推定された応答タイミングの時刻が到来したか否か判定する(ステップS106)。ステップS106において応答タイミングの時刻が到来していない場合、第3電子機器300は、応答タイミングの時刻が到来するまで待機するか、他の処理を実行してよい。ステップS106において応答タイミングの時刻が到来した場合、第3電子機器300は、推定された応答タイミングを示す情報を、第2電子機器100に送信する(ステップS107及びステップS108)。 Then, the third electronic device 300 determines whether the time of the estimated response timing has arrived (step S106). If the time of the response timing has not arrived in step S106, the third electronic device 300 may wait until the time of the response timing arrives or may execute other processing. If the time of the response timing has arrived in step S106, the third electronic device 300 transmits information indicating the estimated response timing to the second electronic device 100 (steps S107 and S108).
 第2電子機器100は、ステップS108において第3電子機器300から応答タイミングを示す情報を受信すると、応答タイミングである旨を第2ユーザに提示する(ステップS109)。ステップS109において、第2電子機器100は、上述のように、視覚情報、聴覚情報、及び触覚情報の少なくともいずれかとして、第2ユーザに応答タイミングを提示してよい。このように、応答タイミングが提示されることにより、第2ユーザは、適切なタイミングで第1ユーザの会話に応答することができる。 When the second electronic device 100 receives the information indicating the timing to respond from the third electronic device 300 in step S108, it notifies the second user that it is time to respond (step S109). In step S109, the second electronic device 100 may present the timing to respond to the second user as at least one of visual information, auditory information, and tactile information, as described above. In this way, by presenting the timing to respond, the second user can respond to the conversation of the first user at an appropriate timing.
 次に、第2電子機器100は、第1ユーザの発話に対する第2ユーザ(例えば参加者Mg)の応答を取得する(ステップS110)。例えば、第2電子機器100の取得部200は、第1ユーザの発話に対する第2ユーザの応答を取得してよい。第2電子機器100の取得部200は、例えば、図3に示した撮像部140、及び音声入力部150の少なくとも一方に対する入力を取得してもよい。また、取得部200は、ユーザによるマウスクリック又はタッチ入力などを取得してもよいし、モーションセンサ及び/又はフットペダルなどに対する入力を取得してもよい。また、第2ユーザの応答とは、例えば、頭部の前後若しくは上下の動き(頷き)、頭部の左右の動き(首振り)、手振り、上半身の動き、表情、又は、「はい」、「いいえ」若しくは「えー」等の短時間の発声を含む相槌等を含むものとしてよい。第2電子機器100が取得する応答は、前述のものに限られない。第2電子機器100は、前述のものが組み合わされたものを、第2ユーザの応答として取得してもよい。第2電子機器100は、第2ユーザの応答を取得するにあたり、第2ユーザの映像及び音声の少なくとも一方を取得してよい。そして、第2電子機器100は、例えば、取得した映像に対しては画像認識、音声に対しては音声認識を行うことで、第2ユーザの応答を取得してもよい。ただし、第2電子機器100が取得するのは、第2ユーザの映像及び音声の少なくとも一方に限られない。例えば、第2電子機器100は、例えば人体の動きを取得するモーションセンサを備える場合、第2ユーザの頷きなどの身体の動きを取得(検出)することで、第2ユーザの応答を取得してもよい。例えば、第2電子機器100は、モーションセンサを備えてもよい。この場合、第2電子機器100は、第2ユーザが装着するウェアラブル端末、マウス、又はタッチペンなどのような、第2ユーザが手に持つデバイスとしてもよい。また、前述の場合、第2電子機器100は、第2ユーザが手に持つスマートフォン、タブレット端末、又はフットペダル等と有線又は無線接続することにより、第2ユーザの応答を取得してもよい。第2ユーザの応答の取得については上記に限らず、また、これらを組み合わせることで第2ユーザの応答を取得してもよい。ここで、第2電子機器100が取得する映像及び音声などの情報と、第2ユーザの応答の検出方法の対応関係の例を、以下の表1に示す。第2電子機器100は、例えば、取得した映像に画像認識を行うことで、頭部の頷き又は頭部の首振りを、第2ユーザの応答として取得してよい。また、第2電子機器100は、例えば、取得した音声に音声認識を行うことで、ポジティブな単語を検出すると、頭部の頷きの動作を第2ユーザの応答として取得してもよい。第2電子機器100は、例えば、取得した音声に音声認識を行うことで、ネガティブな単語を検出すると、頭部の首振りの動作を第2ユーザの応答として取得してもよい。第2電子機器100は、例えば、第2ユーザが装着するモーションセンサを備えるヘッドフォン等のウェアラブル端末と接続してもよい。この場合、第2電子機器100は、ウェアラブル端末が検出した第2ユーザにおける頭部の頷き又は頭部の首振りの動作を第2ユーザの応答として取得してもよい。また、第2電子機器100は、例えば、モーションセンサを備えるスマートフォン又はタブレット等の手に持つデバイスとしてよい。この場合、第2電子機器100は、第2のユーザによって前後に傾けられることで、この動作に対応づけられている頭部の頷きの動作を第2ユーザの応答として取得してもよい。第2電子機器100は、例えば、マウスのクリックによって頭部の頷きを検出してもよい。この場合、第2電子機器100は、表示部170に、頭部の頷きに対応するボタンを設定したGUIを表示し、第2ユーザによってマウスのボタンがクリックされることにより、第2ユーザの応答を取得してよい。 Next, the second electronic device 100 acquires a response of the second user (e.g., participant Mg) to the speech of the first user (step S110). For example, the acquisition unit 200 of the second electronic device 100 may acquire the response of the second user to the speech of the first user. The acquisition unit 200 of the second electronic device 100 may acquire, for example, input to at least one of the imaging unit 140 and the voice input unit 150 shown in FIG. 3. The acquisition unit 200 may also acquire a mouse click or touch input by the user, or may acquire input to a motion sensor and/or a foot pedal. The response of the second user may include, for example, a head movement back and forth or up and down (nodding), a head movement left and right (shaking), a hand gesture, a movement of the upper body, a facial expression, or a back-and-forth including a short speech such as "yes," "no," or "ah." The responses acquired by the second electronic device 100 are not limited to those described above. The second electronic device 100 may acquire a combination of the above as the response of the second user. The second electronic device 100 may acquire at least one of the video and the voice of the second user when acquiring the response of the second user. The second electronic device 100 may acquire the response of the second user by, for example, performing image recognition on the acquired video and voice recognition on the acquired voice. However, what the second electronic device 100 acquires is not limited to at least one of the video and the voice of the second user. For example, when the second electronic device 100 is provided with a motion sensor that acquires, for example, the motion of a human body, the response of the second user may be acquired by acquiring (detecting) the motion of the body of the second user, such as a nod. For example, the second electronic device 100 may be provided with a motion sensor. In this case, the second electronic device 100 may be a device held by the second user in the hand, such as a wearable terminal worn by the second user, a mouse, or a touch pen. In the above-mentioned case, the second electronic device 100 may acquire the response of the second user by connecting to a smartphone, a tablet terminal, a foot pedal, or the like held by the second user in the hand, by wire or wirelessly. The acquisition of the response of the second user is not limited to the above, and the response of the second user may be acquired by combining these. Here, an example of the correspondence between the information such as video and audio acquired by the second electronic device 100 and the detection method of the response of the second user is shown in Table 1 below. The second electronic device 100 may acquire a head nod or a head shake as the response of the second user, for example, by performing image recognition on the acquired video. In addition, the second electronic device 100 may acquire a head nod action as the response of the second user when a positive word is detected by performing voice recognition on the acquired voice, for example. The second electronic device 100 may acquire a head shake action as the response of the second user when a negative word is detected by performing voice recognition on the acquired voice, for example. The second electronic device 100 may connect to a wearable terminal such as headphones equipped with a motion sensor worn by the second user. In this case, the second electronic device 100 may acquire a head nod or a head shake action of the second user detected by the wearable terminal as the response of the second user. The second electronic device 100 may be, for example, a handheld device such as a smartphone or tablet equipped with a motion sensor. In this case, the second electronic device 100 may be tilted back and forth by the second user, and may acquire a head nod action associated with this action as the second user's response. The second electronic device 100 may detect a head nod, for example, by clicking a mouse. In this case, the second electronic device 100 may display, on the display unit 170, a GUI in which a button corresponding to a head nod is set, and may acquire the second user's response by the second user clicking a mouse button.
Figure JPOXMLDOC01-appb-T000001
Figure JPOXMLDOC01-appb-T000001
 次に、第2電子機器100は、取得した第2ユーザの映像及び/又は音声などのデータを、第3電子機器300に送信する(ステップS111)。ステップS111の動作は、図5のステップS13と同様としてよい。ここで、第2電子機器100が第3電子機器300に送信するデータには、第2ユーザの応答に対応する頷き等の身体の動きを示すデータが含まれてもよい。 Then, the second electronic device 100 transmits the acquired data such as video and/or audio of the second user to the third electronic device 300 (step S111). The operation of step S111 may be the same as step S13 in FIG. 5. Here, the data transmitted from the second electronic device 100 to the third electronic device 300 may include data indicating body movements such as nodding that correspond to the response of the second user.
 第3電子機器300は、第2電子機器100から受信する第2ユーザの映像及び/又は音声などのデータを、第1電子機器1に送信する(ステップS112)。ステップS112の動作は、図5のステップS14と同様としてよい。 The third electronic device 300 transmits data such as video and/or audio of the second user received from the second electronic device 100 to the first electronic device 1 (step S112). The operation of step S112 may be the same as step S14 in FIG. 5.
 第1電子機器1は、ステップS112において第3電子機器300から第2ユーザの映像及び/又は音声などのデータを受信すると、当該第2ユーザの映像及び音声の少なくとも一方などを、第1ユーザ(例えば参加者Maなど)に提示する(ステップS113)。ステップS113の動作は、図5のステップS16と同様としてよい。また、第1電子機器1は、ステップS113において第2ユーザの頷きなどの身体の動きを示すデータを受信している場合、例えば動力部80を駆動させることにより、第2ユーザの身体の動きを再現してもよい。また、ステップS113において第2ユーザの頷きなどの身体の動きを示すデータを受信している場合、第1電子機器1は、第2ユーザの身体の動きを、例えば表示部70に表示によって再現してもよい。 When the first electronic device 1 receives data such as the video and/or audio of the second user from the third electronic device 300 in step S112, it presents at least one of the video and audio of the second user to the first user (e.g., participant Ma) (step S113). The operation of step S113 may be the same as step S16 in FIG. 5. Furthermore, when the first electronic device 1 receives data indicating a body movement such as a nod of the second user in step S113, it may reproduce the body movement of the second user, for example, by driving the power unit 80. Furthermore, when the first electronic device 1 receives data indicating a body movement such as a nod of the second user in step S113, it may reproduce the body movement of the second user, for example, by displaying it on the display unit 70.
 このように、一実施形態に係るシステムによれば、第1ユーザは、自らの発言に対し、第2ユーザの応答を適切なタイミングで受信することができる。したがって、一実施形態に係るシステムによれば、複数の場所の間でコミュニケーションを円滑にすることができる。 In this way, according to the system of one embodiment, the first user can receive the second user's response to his/her comment at an appropriate time. Therefore, according to the system of one embodiment, communication between multiple locations can be facilitated.
(他の実施形態)
 図7においては、第1電子機器1と第2電子機器100とは、第3電子機器300を介して通信すること態様について説明した。しかしながら、一実施形態において、上述の動作は、第3電子機器300を介さずに行ってもよい。この場合、第1電子機器1と、第2電子機器100とは、直接的又は間接的に、相互に通信可能なものとして構成してもよい。
Other Embodiments
7, the first electronic device 1 and the second electronic device 100 communicate with each other via the third electronic device 300. However, in one embodiment, the above-mentioned operation may be performed without the third electronic device 300. In this case, the first electronic device 1 and the second electronic device 100 may be configured to be able to communicate with each other directly or indirectly.
(他の実施形態)
 また、図7において、第3電子機器300は、ステップS105にて応答タイミングを予め推定し、当該応答タイミングの時刻が到来したら、第2電子機器100に応答タイミングを示す情報を送信した。しかしながら、一実施形態に係るシステムは、このような構成に限定されなくてもよい。以下、一実施形態の変形例に係るシステムについて、さらに説明する。
Other Embodiments
7, the third electronic device 300 estimates the response timing in advance in step S105, and when the time of the response timing arrives, transmits information indicating the response timing to the second electronic device 100. However, the system according to an embodiment may not be limited to such a configuration. A system according to a modified example of an embodiment will be further described below.
 図8は、図7に示した一実施形態の変形例に係るシステムの特徴的な動作について説明するシーケンス図である。以下、図7に示した一実施形態の変形例に係るシステムの特徴的な動作と異なる点についてのみ説明する。 FIG. 8 is a sequence diagram that explains the characteristic operations of a system according to a modified embodiment of the embodiment shown in FIG. 7. Below, only the differences from the characteristic operations of a system according to a modified embodiment of the embodiment shown in FIG. 7 will be explained.
 図8に示すように、ステップS101からステップS105までの動作は、図7と同様としてよい。一方、第3電子機器300は、ステップS105にて応答タイミングを(予め)推定したら、当該応答タイミングの時刻が到来する前であっても、第2電子機器100に応答タイミングを示す情報を送信してよい(ステップS121及びステップS122)。この場合、応答タイミングを示す情報を受信した第2電子機器100は、推定された応答タイミングの時刻が到来したか否か判定する(ステップS123)。ステップS123において応答タイミングの時刻が到来していない場合、第2電子機器100は、応答タイミングの時刻が到来するまで待機するか、他の処理を実行してよい。ステップS123において応答タイミングの時刻が到来した場合、第2電子機器100は、応答タイミングである旨を第2ユーザに提示する(ステップS109)。ステップS110からステップS113までの動作は、図7と同様としてよい。 As shown in FIG. 8, the operations from step S101 to step S105 may be the same as those in FIG. 7. On the other hand, once the third electronic device 300 has estimated the response timing (in advance) in step S105, it may transmit information indicating the response timing to the second electronic device 100 even before the time of the response timing arrives (steps S121 and S122). In this case, the second electronic device 100 that has received the information indicating the response timing determines whether the time of the estimated response timing has arrived (step S123). If the time of the response timing has not arrived in step S123, the second electronic device 100 may wait until the time of the response timing arrives or may execute other processing. If the time of the response timing arrives in step S123, the second electronic device 100 notifies the second user that it is time to respond (step S109). The operations from step S110 to step S113 may be the same as those in FIG. 7.
 このように、図8に示す一実施形態の変形例に係るシステムによっても、第1ユーザは、自らの発言に対し、第2ユーザの応答を適切なタイミングで受信することができる。 In this way, even with the system according to the modified embodiment shown in FIG. 8, the first user can receive the second user's response to his/her own comment at an appropriate time.
(他の実施形態)
 上述した実施形態においては、第2電子機器100が、第2ユーザの応答を検出する構成について説明した。しかしながら、他の実施形態において、第1電子機器1及び/又は第2電子機器300が、第2ユーザの応答を検出してもよい。
Other Embodiments
In the above-described embodiment, the second electronic device 100 detects the response of the second user. However, in other embodiments, the first electronic device 1 and/or the second electronic device 300 may detect the response of the second user.
 以上説明したように、一実施形態に係るシステムは、例えば、第1電子機器1と、第2電子機器100と、第3電子機器300とを含んでよい。第1電子機器1は、第1ユーザの映像及び音声の少なくとも一方を取得する。第2電子機器100は、第1電子機器1と通信可能に構成されてよい。第2電子機器100は、第1電子機器1が取得する第1ユーザの映像及び音声の少なくとも一方を、第1ユーザの発話に応答する第2ユーザに出力する。第3電子機器300は、制御部310と、推定部314とを含んでよい。推定部314は、第1ユーザの映像及び音声の少なくとも一方に基づいて、第1ユーザの発話に応答する第2ユーザの応答タイミングを推定してよい。制御部310は、推定部314が推定する応答タイミングを示す情報が、第2電子機器100によって取得されるように制御してよい。 As described above, the system according to an embodiment may include, for example, a first electronic device 1, a second electronic device 100, and a third electronic device 300. The first electronic device 1 acquires at least one of a video and a voice of the first user. The second electronic device 100 may be configured to be able to communicate with the first electronic device 1. The second electronic device 100 outputs at least one of a video and a voice of the first user acquired by the first electronic device 1 to a second user who responds to the speech of the first user. The third electronic device 300 may include a control unit 310 and an estimation unit 314. The estimation unit 314 may estimate the response timing of the second user who responds to the speech of the first user based on at least one of a video and a voice of the first user. The control unit 310 may control the second electronic device 100 to acquire information indicating the response timing estimated by the estimation unit 314.
 一実施形態に係る第2電子機器100は、提示部を備えてもよい。この場合、第2電子機器100の提示部は、応答タイミングを、第2ユーザに視覚情報、聴覚情報、及び触覚情報の少なくともいずれかとして提示してもよい。ここで、第2電子機器100の提示部とは、例えば、図3に示した表示部170、音声出力部160、及び触感呈示部190の少なくともいずれかとしてよい。また、第2電子機器100の提示部は、応答タイミングに達する時点で、当該応答タイミングを第2ユーザに提示してもよい。 The second electronic device 100 according to one embodiment may include a presentation unit. In this case, the presentation unit of the second electronic device 100 may present the response timing to the second user as at least one of visual information, auditory information, and tactile information. Here, the presentation unit of the second electronic device 100 may be, for example, at least one of the display unit 170, the audio output unit 160, and the tactile sensation presentation unit 190 shown in FIG. 3. Furthermore, the presentation unit of the second electronic device 100 may present the response timing to the second user when the response timing is reached.
 一実施形態に係る第2電子機器100は、取得部200を備えてもよい。この場合、第2電子機器100の取得部200は、第2ユーザの応答を映像及び音声の少なくとも一方として取得してもよい。ここで、第2電子機器100の取得部200とは、例えば、図3に示した撮像部140、及び音声入力部150の少なくとも一方としてもよい。また、第2電子機器100の取得部200とは、例えば、図3に示した撮像部140、及び音声入力部150の少なくとも一方に対する入力を取得してもよい。さらに、取得部200は、ユーザによるマウスクリック又はタッチ入力を取得してもよいし、モーションセンサ及び/又はフットペダルなどに対する入力を取得してもよい。 The second electronic device 100 according to an embodiment may include an acquisition unit 200. In this case, the acquisition unit 200 of the second electronic device 100 may acquire the response of the second user as at least one of video and audio. Here, the acquisition unit 200 of the second electronic device 100 may be, for example, at least one of the imaging unit 140 and the audio input unit 150 shown in FIG. 3. The acquisition unit 200 of the second electronic device 100 may acquire input to at least one of the imaging unit 140 and the audio input unit 150 shown in FIG. 3. Furthermore, the acquisition unit 200 may acquire a mouse click or touch input by the user, or may acquire input to a motion sensor and/or a foot pedal, etc.
 また、一実施形態に係る第2電子機器100は、通信部130を備えてもよい。この場合、通信部130は、上述の取得部が取得する映像及び音声の少なくとも一方を、第1電子機器1に送信してもよい。 The second electronic device 100 according to one embodiment may also include a communication unit 130. In this case, the communication unit 130 may transmit at least one of the video and audio acquired by the acquisition unit to the first electronic device 1.
 一実施形態に係る第3電子機器300の制御部310は、例えば推定部314が推定する応答タイミングを示す情報を、当該応答タイミングに達する前に(すなわち予め)、第2電子機器100に送信するように制御してもよい。 The control unit 310 of the third electronic device 300 according to one embodiment may, for example, perform control so as to transmit information indicating the response timing estimated by the estimation unit 314 to the second electronic device 100 before the response timing is reached (i.e., in advance).
 一実施形態において、第2電子機器100は、第2ユーザの所定の動作に対応する当該第2ユーザの応答を取得する取得部200を備えてもよい。また、一実施形態において、第2電子機器100は、第2ユーザの応答が示すデータを、第1電子機器1に送信する通信部130を備えてもよい。また、一実施形態において、第1電子機器1は、第2ユーザの応答を示すデータに基づいて第1電子機器1の筐体の少なくとも一部を駆動する動力部80を備えてもよい。 In one embodiment, the second electronic device 100 may include an acquisition unit 200 that acquires a response of the second user corresponding to a predetermined action of the second user. In one embodiment, the second electronic device 100 may also include a communication unit 130 that transmits data indicated by the response of the second user to the first electronic device 1. In one embodiment, the first electronic device 1 may also include a power unit 80 that drives at least a part of the housing of the first electronic device 1 based on the data indicating the response of the second user.
 一実施形態に係る第3電子機器300の推定部314は、応答タイミングを推定してよい。この場合、推定部314は、第1電子機器1が取得する第1ユーザの映像及び音声の少なくとも一方から抽出される、第1ユーザの音声の特徴及び第1ユーザの言語の特徴の少なくとも一方に基づいて、応答タイミングを推定してよい。また、推定部314は、第1電子機器1が取得する第1ユーザの映像及び音声の少なくとも一方から抽出される、第1ユーザの表情の特徴及び第1ユーザの仕草の少なくとも一方に基づいて、応答タイミングを推定してもよい。 The estimation unit 314 of the third electronic device 300 according to one embodiment may estimate the response timing. In this case, the estimation unit 314 may estimate the response timing based on at least one of the voice characteristics of the first user and the language characteristics of the first user extracted from at least one of the video and audio of the first user acquired by the first electronic device 1. The estimation unit 314 may also estimate the response timing based on at least one of the facial expression characteristics of the first user and the gestures of the first user extracted from at least one of the video and audio of the first user acquired by the first electronic device 1.
 また、一実施形態に係る第3電子機器300の推定部314は、第1ユーザの現在の発話が終了するタイミング及び第1ユーザの次の発話が開始するタイミングを予測することにより、応答タイミングを推定してもよい。 In addition, the estimation unit 314 of the third electronic device 300 according to one embodiment may estimate the response timing by predicting the timing at which the first user's current utterance will end and the timing at which the first user's next utterance will start.
 次に、一実施形態に係るシステムの変形例について、さらに説明する。 Next, we will further explain a modified example of the system according to one embodiment.
(一実施形態の変形例)
 例えば、上述した実施形態において、応答タイミングが終了する時点が近い時に第2ユーザが応答すると、当該応答が第1ユーザの次の発話の開始とかぶることがあり得る。そうなると、次の発話を開始した第1ユーザは発話を中断するなどして、コミュニケーションが円滑にならないことも想定される。したがって、例えば図7又は図8のステップS113において、第1電子機器1は、第2ユーザの映像及び/又は音声などを提示する際に、応答タイミングの終了までの残り時間が短い場合、第2ユーザの映像及び/又は音声などを提示しないようにしてもよい。また、第1電子機器1は、第2ユーザの応答を示すデータに基づいて第1電子機器1の筐体の少なくとも一部を動力部80に駆動させないようにしてもよい。これらの場合、例えば第3電子機器300は、推定部314が推定した応答タイミングを、第1電子機器1にも送信してよい。そして、例えば、第1電子機器1の判定部12は、図7又は図8のステップS113において、応答タイミングの残りが所定時間よりも短いか否か判定してよい。応答タイミングの残りが所定時間よりも短い場合、第1電子機器1は、第2ユーザの映像及び/又は音声などを提示しないようにしてよい。また、応答タイミングの残りが所定時間よりも短い場合、第1電子機器1は、第2ユーザの応答を示すデータに基づいて第1電子機器1の筐体の少なくとも一部を動力部80に駆動させないようにしてよい。
(Modification of one embodiment)
For example, in the above-described embodiment, if the second user responds when the response timing is nearing the end, the response may overlap with the start of the first user's next speech. In that case, it is assumed that the first user who has started the next speech may interrupt the speech, which may result in poor communication. Therefore, for example, in step S113 of FIG. 7 or FIG. 8, when presenting the video and/or audio of the second user, if the remaining time until the end of the response timing is short, the first electronic device 1 may not present the video and/or audio of the second user. In addition, the first electronic device 1 may not drive at least a part of the housing of the first electronic device 1 with the power unit 80 based on data indicating the response of the second user. In these cases, for example, the third electronic device 300 may also transmit the response timing estimated by the estimation unit 314 to the first electronic device 1. Then, for example, in step S113 of FIG. 7 or FIG. 8, the determination unit 12 of the first electronic device 1 may determine whether the remaining time of the response timing is shorter than a predetermined time. When the remaining response time is shorter than a predetermined time, the first electronic device 1 may be configured not to present the video and/or audio of the second user. Furthermore, when the remaining response time is shorter than a predetermined time, the first electronic device 1 may be configured not to cause the power unit 80 to drive at least a part of the housing of the first electronic device 1 based on data indicating the response of the second user.
 このように、第1電子機器1は、判定部12を備えてもよい。この場合、判定部12は、応答タイミングの終了時点までの残り時間に基づいて、第2電子機器100から取得した第2ユーザの映像及び音声の少なくとも一方を、第1ユーザに提示するか否か判定してもよい。この場合、判定部12は、応答タイミングの終了時点までの残り時間に基づいて、動力部80に第1電子機器1の筐体の少なくとも一部を駆動させるか否か判定してもよい。 In this way, the first electronic device 1 may include a determination unit 12. In this case, the determination unit 12 may determine whether or not to present at least one of the video and audio of the second user acquired from the second electronic device 100 to the first user based on the remaining time until the end of the response timing. In this case, the determination unit 12 may determine whether or not to cause the power unit 80 to drive at least a part of the housing of the first electronic device 1 based on the remaining time until the end of the response timing.
 一実施形態の変形例によれば、応答タイミングが終了する時点が近い時に第2ユーザが応答しても、当該応答が第1ユーザの次の発話の開始とかぶることが阻止される。このため、次の発話を開始した第1ユーザは発話を中断されることがなく、コミュニケーションを円滑にすることができる。 According to a modified embodiment, even if the second user responds when the response timing is nearing the end, the response is prevented from overlapping with the start of the first user's next utterance. Therefore, the first user who has started the next utterance is not interrupted, allowing for smooth communication.
 また、上述した一実施形態の変形例において、第1電子機器1は、応答タイミングの残りが所定時間よりも短い場合ではなく、第1ユーザが発話している場合に、第2ユーザの映像及び/又は音声などを提示しないようにしてもよい。例えば図7又は図8のステップS113において、第1電子機器1は、第2ユーザの映像及び/又は音声などを提示する際に、第1ユーザが発話している場合、第2ユーザの映像及び/又は音声などを提示しないようにしてもよい。この場合、第1電子機器1の判定部12は、図7又は図8のステップS113において、音声入力部50によって第1ユーザの発話が検出されているか否か判定してよい。第1ユーザの発話が検出されている場合、第1電子機器1は、第2ユーザの映像及び/又は音声などを提示しないようにしてよい。また、第1電子機器1は、応答タイミングの残りが所定時間よりも短い場合ではなく、第1ユーザが発話している場合に、第2ユーザの応答を示すデータに基づいて第1電子機器1の筐体の少なくとも一部を動力部80に駆動させないようにしてよい。 In addition, in a modified example of the embodiment described above, the first electronic device 1 may not present the image and/or voice of the second user when the first user is speaking, not when the remaining response time is shorter than the predetermined time. For example, in step S113 of FIG. 7 or FIG. 8, when presenting the image and/or voice of the second user, the first electronic device 1 may not present the image and/or voice of the second user when the first user is speaking. In this case, the determination unit 12 of the first electronic device 1 may determine whether or not the speech of the first user is detected by the voice input unit 50 in step S113 of FIG. 7 or FIG. 8. If the speech of the first user is detected, the first electronic device 1 may not present the image and/or voice of the second user. In addition, the first electronic device 1 may not drive at least a part of the housing of the first electronic device 1 to the power unit 80 based on the data indicating the response of the second user, not when the remaining response time is shorter than the predetermined time, but when the first user is speaking.
 このように、第1電子機器1の判定部12は、第1電子機器が第1ユーザの音声を検出しているか否かに応じて、第2電子機器100から取得した第2ユーザの映像及び音声の少なくとも一方を、第1ユーザに提示するか否か判定してもよい。また、判定部12は、第1電子機器1が第1ユーザの音声を検出しているか否かに応じて、動力部80に第1電子機器1の筐体の少なくとも一部を駆動させるか否か判定してよい。 In this way, the determination unit 12 of the first electronic device 1 may determine whether or not to present at least one of the video and audio of the second user acquired from the second electronic device 100 to the first user, depending on whether or not the first electronic device 1 detects the audio of the first user. Furthermore, the determination unit 12 may determine whether or not to cause the power unit 80 to drive at least a part of the housing of the first electronic device 1, depending on whether or not the first electronic device 1 detects the audio of the first user.
 また、上述した一実施形態の変形例において、第1電子機器1は、第2ユーザの映像及び/又は音声などを提示しないようにするのではなく、応答タイミングが延長される旨を第1ユーザに示唆する動作を実行してもよい。例えば、第1電子機器1は、第2ユーザの映像及び/又は音声などを提示する際に、応答タイミングの終了までの残り時間が短い場合、第2ユーザが発話しようとしている旨を第1ユーザに提示してもよい。また、第1電子機器1は、動力部80が第2ユーザの応答を示すデータに基づいて第1電子機器1の筐体の少なくとも一部を駆動する際に、応答タイミングの終了までの残り時間が短い場合、第2ユーザが発話しようとしている旨を第1ユーザに提示してもよい。これらのような場合、第1電子機器1は、音声出力部60から、例えば「うーん」又は「えーと」などの第2ユーザのつなぎ言葉のような音声を出力してもよい。また、第1電子機器1は、第2ユーザが発話しようとしている様子を、文字又は映像などにより表示部70に表示してもよい。また、第1電子機器1は、第2ユーザが発話しようとしている様子を、動力部80の駆動によって表現してもよい。このように、応答タイミングが延長される旨を第1ユーザに示唆したうえで、第2ユーザの映像及び/又は音声などを提示することにより、第2ユーザの応答が第1ユーザの次の発話とかぶるリスクは低減される。 In addition, in a modified example of the embodiment described above, the first electronic device 1 may execute an operation to suggest to the first user that the response timing will be extended, instead of not presenting the image and/or voice of the second user. For example, when presenting the image and/or voice of the second user, the first electronic device 1 may notify the first user that the second user is about to speak if the remaining time until the end of the response timing is short when presenting the image and/or voice of the second user. Also, when the power unit 80 drives at least a part of the housing of the first electronic device 1 based on data indicating the response of the second user, the first electronic device 1 may notify the first user that the second user is about to speak if the remaining time until the end of the response timing is short. In such a case, the first electronic device 1 may output a voice such as a filler word of the second user, such as "hmm" or "um", from the voice output unit 60. Also, the first electronic device 1 may display the appearance of the second user about to speak on the display unit 70 by using characters or images. The first electronic device 1 may also express the appearance that the second user is about to speak by driving the power unit 80. In this way, by suggesting to the first user that the response timing will be extended and then presenting the image and/or audio of the second user, the risk that the response of the second user will overlap with the next utterance of the first user is reduced.
 このように、一実施形態の変形例に係るシステムにおいて、第1電子機器1は、制御部10を備えてよい。この場合、制御部10は、応答タイミングの延長を第1ユーザに示唆する動作を第1電子機器1が行うように制御してもよい。制御部10は、このような制御を、第2電子機器100から取得した第2ユーザの映像及び音声の少なくとも一方を第1ユーザに提示する際に、応答タイミングの終了時点までの残り時間が所定以下の場合に実行してよい。また、制御部10は、このような制御を、動力部80が第2ユーザの応答を示すデータに基づいて第1電子機器1の筐体の少なくとも一部を駆動する際に、応答タイミングの終了時点までの残り時間が所定以下の場合に実行してよい。 In this way, in a system according to a modified embodiment, the first electronic device 1 may include a control unit 10. In this case, the control unit 10 may control the first electronic device 1 to perform an operation suggesting to the first user that the response timing be extended. The control unit 10 may perform such control when the remaining time until the end of the response timing is less than or equal to a predetermined time when at least one of the video and audio of the second user acquired from the second electronic device 100 is presented to the first user. The control unit 10 may also perform such control when the remaining time until the end of the response timing is less than or equal to a predetermined time when the power unit 80 drives at least a part of the housing of the first electronic device 1 based on data indicating the response of the second user.
(一実施形態の変形例)
 例えば、上述した実施形態において、第2ユーザに応答タイミングを提示したとしても、実際に第2ユーザが応答を返すタイミングには個人差がある。第2ユーザに応答タイミングが提示されたタイミングから、実際に第2ユーザが応答するタイミングまでの時間が極端に短い場合、又は極端に長い場合、第1ユーザに違和感を与えることになり、コミュニケーションが円滑にならないことも想定される。
(Modification of one embodiment)
For example, in the above-described embodiment, even if a response timing is presented to the second user, there are individual differences in the timing at which the second user actually responds. If the time from when the response timing is presented to the second user to when the second user actually responds is extremely short or extremely long, it may give the first user a sense of discomfort and prevent smooth communication.
 そこで、第3電子機器300は、第2ユーザが過去に第1ユーザの発話に応答したタイミングに基づいて、推定部314が推定する応答タイミングを調整してもよい。この場合、第3電子機器300の調整部316は、第2ユーザが過去に第1ユーザの発話に応答したタイミングに基づいて、推定部314が推定した応答タイミングを調整してもよい。例えば、推定部314が推定する応答タイミングが早すぎる場合、調整部316は、応答タイミングが早すぎると判定される度合いに応じて、当該応答タイミングを遅くしてよい。また、推定部314が推定する応答タイミングが遅すぎる場合、調整部316は、応答タイミングが遅すぎると判定される度合いに応じて、当該応答タイミングを早くしてもよい。 The third electronic device 300 may adjust the response timing estimated by the estimation unit 314 based on the timing when the second user responded to the first user's speech in the past. In this case, the adjustment unit 316 of the third electronic device 300 may adjust the response timing estimated by the estimation unit 314 based on the timing when the second user responded to the first user's speech in the past. For example, if the response timing estimated by the estimation unit 314 is too early, the adjustment unit 316 may delay the response timing depending on the degree to which the response timing is determined to be too early. Also, if the response timing estimated by the estimation unit 314 is too late, the adjustment unit 316 may advance the response timing depending on the degree to which the response timing is determined to be too late.
 このように、一実施形態の変形例に係るシステムは、調整部316を含んでもよい。この場合、調整部316は、第2ユーザが過去に第1ユーザの発話に応答したタイミングに基づいて、推定部314が推定する応答タイミングを調整してもよい。また、第3電子機器300が備える調整部316の機能は、例えば第2電子機器100が備える調整部116によって実現されてもよいし、第1電子機器1が備える調整部16によって実現されてもよい。 In this way, the system according to the modified example of the embodiment may include an adjustment unit 316. In this case, the adjustment unit 316 may adjust the response timing estimated by the estimation unit 314 based on the timing at which the second user responded to the speech of the first user in the past. Furthermore, the function of the adjustment unit 316 provided in the third electronic device 300 may be realized, for example, by the adjustment unit 116 provided in the second electronic device 100, or by the adjustment unit 16 provided in the first electronic device 1.
 一方、一実施形態の変形例に係るシステムにおいて、推定された応答タイミングを調整するのではなく、応答タイミングを提示する時点を調整してもよい。この場合、第2電子機器100の調整部116は、第2ユーザが過去に第1ユーザの発話に応答したタイミングに基づいて、推定部314が推定した応答タイミングを提示する時点を調整してもよい。例えば、推定部314が推定する応答タイミングが早すぎる場合、調整部116は、応答タイミングが早すぎると判定される度合いに応じて、当該応答タイミングが提示される時点を遅くしてよい。また、推定部314が推定する応答タイミングが遅すぎる場合、調整部116は、応答タイミングが遅すぎると判定される度合いに応じて、当該応答タイミングが提示される時点を早くしてもよい。 On the other hand, in a system according to a modified example of an embodiment, the time at which the response timing is presented may be adjusted, rather than adjusting the estimated response timing. In this case, the adjustment unit 116 of the second electronic device 100 may adjust the time at which the response timing estimated by the estimation unit 314 is presented, based on the timing at which the second user responded to the speech of the first user in the past. For example, if the response timing estimated by the estimation unit 314 is too early, the adjustment unit 116 may delay the time at which the response timing is presented, depending on the degree to which the response timing is determined to be too early. Also, if the response timing estimated by the estimation unit 314 is too late, the adjustment unit 116 may advance the time at which the response timing is presented, depending on the degree to which the response timing is determined to be too late.
 このように、一実施形態の変形例に係るシステムにおいて、例えば第2電子機器100は、調整部116を備えてもよい。ここで、調整部116は、第2ユーザが過去に第1ユーザの発話に応答したタイミングに基づいて、応答タイミングを第2ユーザに提示する時点を調整してもよい。また、第2電子機器100が備える調整部116の機能は、例えば第3電子機器300が備える調整部316によって実現されてもよいし、第1電子機器1が備える調整部16によって実現されてもよい。 In this way, in a system according to a modified example of an embodiment, for example, the second electronic device 100 may include an adjustment unit 116. Here, the adjustment unit 116 may adjust the time at which the response timing is presented to the second user based on the timing at which the second user responded to the speech of the first user in the past. Also, the function of the adjustment unit 116 included in the second electronic device 100 may be realized, for example, by an adjustment unit 316 included in the third electronic device 300, or may be realized by an adjustment unit 16 included in the first electronic device 1.
 本開示に係る実施形態について、諸図面及び実施例に基づき説明してきたが、当業者であれば本開示に基づき種々の変形又は修正を行うことが容易であることに注意されたい。従って、これらの変形又は修正は本開示の範囲に含まれることに留意されたい。例えば、各構成部又は各ステップなどに含まれる機能などは論理的に矛盾しないように再配置可能であり、複数の構成部又はステップなどを1つに組み合わせたり、或いは分割したりすることが可能である。本開示に係る実施形態について装置を中心に説明してきたが、本開示に係る実施形態は装置の各構成部が実行するステップを含む方法としても実現し得るものである。本開示に係る実施形態は装置が備えるプロセッサなどにより実行される方法、プログラム、又はプログラムを記録した記憶媒体若しくは記録媒体としても実現し得るものである。本開示の範囲にはこれらも包含されるものと理解されたい。 Although the embodiments of the present disclosure have been described based on the drawings and examples, it should be noted that those skilled in the art would easily be able to make various modifications or corrections based on the present disclosure. Therefore, it should be noted that these modifications or corrections are included in the scope of the present disclosure. For example, the functions included in each component or step can be rearranged so as not to cause logical inconsistencies, and multiple components or steps can be combined into one or divided. Although the embodiments of the present disclosure have been described mainly with respect to the device, the embodiments of the present disclosure can also be realized as a method including steps executed by each component of the device. The embodiments of the present disclosure can also be realized as a method, a program executed by a processor or the like included in the device, or a storage medium or storage medium on which a program is recorded. It should be understood that these are also included in the scope of the present disclosure.
 上述した実施形態は、システムとしての実施のみに限定されるものではない。例えば、上述した実施形態は、システムの制御方法として実施してもよいし、システムにおいて実行されるプログラムとして実施してもよい。また、例えば、上述した実施形態は、第1電子機器1、第2電子機器100、及び第3電子機器300の少なくともいずれかのような機器として実施してもよい。また、上述した実施形態は、第1電子機器1、第2電子機器100、及び第3電子機器300の少なくともいずれかのような機器の制御方法として実施してもよい。さらに、上述した実施形態は、第1電子機器1、第2電子機器100、及び第3電子機器300の少なくともいずれかのような機器によって実行されるプログラム、又はプログラムを記録した記憶媒体若しくは記録媒体としてとして実施してもよい。 The above-described embodiments are not limited to implementation as a system. For example, the above-described embodiments may be implemented as a control method for a system, or as a program executed in a system. For example, the above-described embodiments may be implemented as at least one of the first electronic device 1, the second electronic device 100, and the third electronic device 300. The above-described embodiments may be implemented as a control method for at least one of the first electronic device 1, the second electronic device 100, and the third electronic device 300. Furthermore, the above-described embodiments may be implemented as a program executed by at least one of the first electronic device 1, the second electronic device 100, and the third electronic device 300, or as a storage medium or recording medium on which the program is recorded.
 例えば、上述した実施形態は、第2電子機器100として実施してもよい。この場合、第2電子機器100は、第1電子機器1と通信可能に構成されてよい。第2電子機器100は、取得部と、出力部と、推定部と、提示部と、を備えてもよい。取得部は、第1電子機器1のユーザの映像及び音声の少なくとも一方を取得してよい。取得部は、例えば図3に示した撮像部140及び音声入力部150の少なくとも一方としてよい。出力部は、第1電子機器1のユーザの映像及び音声の少なくとも一方を、第1電子機器1のユーザの発話に応答する第2電子機器100のユーザに出力してよい。出力部は、例えば図3に示した音声出力部160及び表示部170の少なくとも一方としてよい。推定部は、第1電子機器1のユーザの映像及び音声の少なくとも一方に基づいて、第1電子機器1のユーザの発話に応答する第2電子機器100のユーザの応答タイミングを推定してよい。推定部は、例えば図3に示した推定部114などとしてよい。提示部は、推定部が推定する応答タイミングを示す情報を提示してよい。提示部は、例えば図3に示した音声出力部160、表示部170、及び触感呈示部190の少なくともいずれかとしてよい。 For example, the above-described embodiment may be implemented as the second electronic device 100. In this case, the second electronic device 100 may be configured to be able to communicate with the first electronic device 1. The second electronic device 100 may include an acquisition unit, an output unit, an estimation unit, and a presentation unit. The acquisition unit may acquire at least one of an image and a voice of the user of the first electronic device 1. The acquisition unit may be, for example, at least one of the imaging unit 140 and the voice input unit 150 shown in FIG. 3. The output unit may output at least one of an image and a voice of the user of the first electronic device 1 to the user of the second electronic device 100 who responds to the speech of the user of the first electronic device 1. The output unit may be, for example, at least one of the voice output unit 160 and the display unit 170 shown in FIG. 3. The estimation unit may estimate the response timing of the user of the second electronic device 100 who responds to the speech of the user of the first electronic device 1 based on at least one of the image and the voice of the user of the first electronic device 1. The estimation unit may be, for example, the estimation unit 114 shown in FIG. 3. The presentation unit may present information indicating the response timing estimated by the estimation unit. The presentation unit may be, for example, at least one of the audio output unit 160, the display unit 170, and the haptic sensation providing unit 190 shown in FIG. 3.
 1 第1電子機器
 10 制御部
 12 判定部
 14 推定部
 16 調整部
 20 記憶部
 30 通信部
 40 撮像部
 50 音声入力部
 60 音声出力部
 70 表示部
 80 動力部
 100 第2電子機器
 110 制御部
 112 判定部
 114 推定部
 116 調整部
 120 記憶部
 130 通信部
 140 撮像部
 150 音声入力部
 160 音声出力部
 170 表示部
 190 触感呈示部
 200 取得部
 300 第3電子機器
 310 制御部
 312 判定部
 314 推定部
 316 調整部
 320 記憶部
 330 通信部
 N ネットワーク
LIST OF SYMBOLS 1 First electronic device 10 Control unit 12 Determination unit 14 Estimation unit 16 Adjustment unit 20 Memory unit 30 Communication unit 40 Imaging unit 50 Audio input unit 60 Audio output unit 70 Display unit 80 Power unit 100 Second electronic device 110 Control unit 112 Determination unit 114 Estimation unit 116 Adjustment unit 120 Memory unit 130 Communication unit 140 Imaging unit 150 Audio input unit 160 Audio output unit 170 Display unit 190 Tactile sensation providing unit 200 Acquisition unit 300 Third electronic device 310 Control unit 312 Determination unit 314 Estimation unit 316 Adjustment unit 320 Memory unit 330 Communication unit N Network

Claims (19)

  1.  第1ユーザの映像及び音声の少なくとも一方を取得する第1電子機器と、
     前記第1電子機器と通信可能に構成され、前記第1電子機器が取得する前記第1ユーザの映像及び音声の少なくとも一方を、前記第1ユーザの発話に応答する第2ユーザに出力する第2電子機器と、
     前記第1ユーザの映像及び音声の少なくとも一方に基づいて、前記第1ユーザの発話に応答する前記第2ユーザの応答タイミングを推定する推定部と、
     前記推定部が推定する前記応答タイミングを示す情報を、前記第2電子機器によって取得させる制御部と、
     を含むシステム。
    a first electronic device that acquires at least one of video and audio of a first user;
    a second electronic device configured to be able to communicate with the first electronic device and configured to output at least one of a video and a voice of the first user acquired by the first electronic device to a second user who responds to a speech of the first user;
    an estimation unit that estimates a response timing of the second user responding to an utterance of the first user based on at least one of a video and a voice of the first user;
    a control unit that causes the second electronic device to acquire information indicating the response timing estimated by the estimation unit;
    A system including:
  2.  前記第2電子機器は、前記応答タイミングを前記第2ユーザに視覚情報、聴覚情報、及び触覚情報の少なくともいずれかとして提示する提示部を備える、請求項1に記載のシステム。 The system according to claim 1, wherein the second electronic device includes a presentation unit that presents the response timing to the second user as at least one of visual information, auditory information, and tactile information.
  3.  前記提示部は、前記応答タイミングに達する時点で、当該応答タイミングを前記第2ユーザに提示する、請求項2に記載のシステム。 The system according to claim 2, wherein the presenting unit presents the response timing to the second user when the response timing is reached.
  4.  前記第2電子機器は、前記第2ユーザの応答を映像及び音声の少なくとも一方として取得する取得部と、前記取得部が取得する映像及び音声の少なくとも一方を、前記第1電子機器に送信する通信部と、を備える、請求項1に記載のシステム。 The system according to claim 1, wherein the second electronic device includes an acquisition unit that acquires the response of the second user as at least one of video and audio, and a communication unit that transmits at least one of the video and audio acquired by the acquisition unit to the first electronic device.
  5.  前記第1電子機器は、前記応答タイミングの終了時点までの残り時間に基づいて、前記第2電子機器から取得した前記第2ユーザの映像及び音声の少なくとも一方を前記第1ユーザに提示するか否か判定する判定部を備える、請求項4に記載のシステム。 The system according to claim 4, wherein the first electronic device includes a determination unit that determines whether or not to present at least one of the video and audio of the second user acquired from the second electronic device to the first user based on the remaining time until the end of the response timing.
  6.  前記第1電子機器が第1ユーザの音声を検出しているか否かに応じて、前記第2電子機器から取得した前記第2ユーザの映像及び音声の少なくとも一方を前記第1ユーザに提示するか否か判定する判定部を備える、請求項4に記載のシステム。 The system according to claim 4, further comprising a determination unit that determines whether or not to present at least one of the video and audio of the second user acquired from the second electronic device to the first user, depending on whether or not the first electronic device detects the audio of the first user.
  7.  前記第2電子機器は、前記第2ユーザの所定の動作に対応する当該第2ユーザの応答を取得する取得部と、前記第2ユーザの応答を示すデータを前記第1電子機器に送信する通信部と、を備え、
     前記第1電子機器は、前記第2ユーザの応答を示すデータに基づいて前記第1電子機器の筐体の少なくとも一部を駆動する動力部を備える、請求項1に記載のシステム。
    the second electronic device includes an acquisition unit that acquires a response of the second user corresponding to a predetermined action of the second user, and a communication unit that transmits data indicating the response of the second user to the first electronic device;
    The system of claim 1 , wherein the first electronic device includes a power unit that drives at least a portion of a housing of the first electronic device based on data indicating a response of the second user.
  8.  前記第1電子機器は、前記応答タイミングの終了時点までの残り時間に基づいて、前記動力部に前記第1電子機器の筐体の少なくとも一部を駆動させるか否か判定する判定部を備える、請求項7に記載のシステム。 The system according to claim 7, wherein the first electronic device includes a determination unit that determines whether or not to cause the power unit to drive at least a portion of the housing of the first electronic device based on the remaining time until the end of the response timing.
  9.  前記第1電子機器が第1ユーザの音声を検出しているか否かに応じて、前記動力部に前記第1電子機器の筐体の少なくとも一部を駆動させるか否か判定する判定部を備える、請求項7に記載のシステム。 The system according to claim 7, further comprising a determination unit that determines whether or not to cause the power unit to drive at least a portion of the housing of the first electronic device depending on whether or not the first electronic device detects the voice of the first user.
  10.  前記制御部は、前記推定部が推定する前記応答タイミングを示す情報を、当該応答タイミングに達する前に、前記第2電子機器に送信する、請求項1に記載のシステム。 The system according to claim 1, wherein the control unit transmits information indicating the response timing estimated by the estimation unit to the second electronic device before the response timing is reached.
  11.  前記推定部は、前記第1電子機器が取得する前記第1ユーザの映像及び音声の少なくとも一方から抽出される、前記第1ユーザの音声の特徴、前記第1ユーザの言語の特徴、前記第1ユーザの表情の特徴、前記第1ユーザの仕草の少なくともいずれかに基づいて、前記応答タイミングを推定する、請求項1に記載のシステム。 The system according to claim 1, wherein the estimation unit estimates the response timing based on at least one of the voice characteristics of the first user, the language characteristics of the first user, the facial expression characteristics of the first user, and the gestures of the first user, which are extracted from at least one of the video and audio of the first user acquired by the first electronic device.
  12.  前記推定部は、前記第1ユーザの現在の発話が終了するタイミング及び前記第1ユーザの次の発話が開始するタイミングを予測することにより、前記応答タイミングを推定する、請求項1又は11に記載のシステム。 The system according to claim 1 or 11, wherein the estimation unit estimates the response timing by predicting the timing at which the first user's current utterance ends and the timing at which the first user's next utterance starts.
  13.  前記第2電子機器は、前記第2ユーザが過去に前記第1ユーザの発話に応答したタイミングに基づいて、前記応答タイミングを前記第2ユーザに提示する時点を調整する調整部を備える、請求項1に記載のシステム。 The system according to claim 1, wherein the second electronic device includes an adjustment unit that adjusts the time at which the response timing is presented to the second user based on the timing at which the second user responded to the speech of the first user in the past.
  14.  前記第2ユーザが過去に前記第1ユーザの発話に応答したタイミングに基づいて、前記推定部が推定する前記応答タイミングを調整する調整部を含む、請求項1に記載のシステム。 The system according to claim 1, further comprising an adjustment unit that adjusts the response timing estimated by the estimation unit based on the timing at which the second user responded to the speech of the first user in the past.
  15.  前記第1電子機器は、前記第2電子機器から取得した前記第2ユーザの映像及び音声の少なくとも一方を前記第1ユーザに提示する際に、前記応答タイミングの終了時点までの残り時間が所定以下の場合、前記応答タイミングの延長を前記第1ユーザに示唆する動作を行う制御部を備える、請求項4に記載のシステム。 The system according to claim 4, wherein the first electronic device includes a control unit that, when presenting at least one of the video and audio of the second user acquired from the second electronic device to the first user, performs an operation of suggesting to the first user that the response timing be extended if the remaining time until the end of the response timing is equal to or less than a predetermined time.
  16.  前記第1電子機器は、前記動力部が前記第1電子機器の筐体の少なくとも一部を駆動する際に、前記応答タイミングの終了時点までの残り時間が所定以下の場合、前記応答タイミングの延長を前記第1ユーザに示唆する動作を行う制御部を備える、請求項7に記載のシステム。 The system according to claim 7, wherein the first electronic device includes a control unit that performs an operation to suggest to the first user that the response timing be extended if the remaining time until the end of the response timing is equal to or less than a predetermined time when the power unit drives at least a portion of the housing of the first electronic device.
  17.  他の電子機器と通信可能に構成される電子機器であって、
     前記他の電子機器のユーザの映像及び音声の少なくとも一方を取得する取得部と、
     前記他の電子機器のユーザの映像及び音声の少なくとも一方を、前記他の電子機器のユーザの発話に応答する前記電子機器のユーザに出力する出力部と、
     前記他の電子機器のユーザの映像及び音声の少なくとも一方に基づいて、前記他の電子機器のユーザの発話に応答する前記電子機器のユーザの応答タイミングを推定する推定部と、
     前記推定部が推定する前記応答タイミングを示す情報を提示する提示部と、
     を備える、電子機器。
    An electronic device configured to be able to communicate with other electronic devices,
    an acquisition unit that acquires at least one of a video and a voice of a user of the other electronic device;
    an output unit that outputs at least one of a video and a voice of a user of the other electronic device to the user of the other electronic device in response to a speech of the user of the other electronic device;
    an estimation unit that estimates a response timing of a user of the other electronic device responding to an utterance of the user of the other electronic device based on at least one of an image and an audio of the user of the other electronic device;
    a presentation unit that presents information indicating the response timing estimated by the estimation unit;
    An electronic device comprising:
  18.  第1電子機器が、第1ユーザの映像及び音声の少なくとも一方を取得するステップと、
     前記第1電子機器と通信可能に構成される第2電子機器が、前記第1電子機器によって取得される前記第1ユーザの映像及び音声の少なくとも一方を、前記第1ユーザの発話に応答する第2ユーザに出力するステップと、
     前記第1ユーザの映像及び音声の少なくとも一方に基づいて、前記第1ユーザの発話に応答する前記第2ユーザの応答タイミングを推定するステップと、
     前記応答タイミングを示す情報を、前記第2電子機器によって取得させるステップと、
     を含む、システムの制御方法。
    A first electronic device acquires at least one of a video and a voice of a first user;
    A step in which a second electronic device configured to be able to communicate with the first electronic device outputs at least one of a video and a voice of the first user acquired by the first electronic device to a second user who responds to an utterance of the first user;
    estimating a response timing of the second user responding to the speech of the first user based on at least one of a video and a voice of the first user;
    causing the second electronic device to acquire information indicating the response timing;
    A method for controlling a system, comprising:
  19.  コンピュータに、
     第1電子機器が、第1ユーザの映像及び音声の少なくとも一方を取得するステップと、
     前記第1電子機器と通信可能に構成される第2電子機器が、前記第1電子機器によって取得される前記第1ユーザの映像及び音声の少なくとも一方を、前記第1ユーザの発話に応答する第2ユーザに出力するステップと、
     前記第1ユーザの映像及び音声の少なくとも一方に基づいて、前記第1ユーザの発話に応答する前記第2ユーザの応答タイミングを推定するステップと、
     前記応答タイミングを示す情報を、前記第2電子機器によって取得させるステップと、
     を実行させる、プログラム。
    On the computer,
    A first electronic device acquires at least one of a video and a voice of a first user;
    A step in which a second electronic device configured to be able to communicate with the first electronic device outputs at least one of a video and a voice of the first user acquired by the first electronic device to a second user who responds to an utterance of the first user;
    estimating a response timing of the second user responding to the speech of the first user based on at least one of a video and a voice of the first user;
    causing the second electronic device to acquire information indicating the response timing;
    A program to execute.
PCT/JP2023/032576 2022-09-29 2023-09-06 System, electronic device, system control method, and program WO2024070550A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022156837 2022-09-29
JP2022-156837 2022-09-29

Publications (1)

Publication Number Publication Date
WO2024070550A1 true WO2024070550A1 (en) 2024-04-04

Family

ID=90477430

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/032576 WO2024070550A1 (en) 2022-09-29 2023-09-06 System, electronic device, system control method, and program

Country Status (1)

Country Link
WO (1) WO2024070550A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006243980A (en) * 2005-03-01 2006-09-14 Fuji Xerox Co Ltd Information processing system, information processing method, and computer program
JP2006304009A (en) * 2005-04-21 2006-11-02 Fuji Xerox Co Ltd Electronic conference system
JP2012146072A (en) * 2011-01-11 2012-08-02 Nippon Telegr & Teleph Corp <Ntt> Next speaker guidance system, next speaker guidance method and next speaker guidance program
JP2017118364A (en) * 2015-12-24 2017-06-29 日本電信電話株式会社 Communication system, communication device, and communication program
JP2021140240A (en) * 2020-03-02 2021-09-16 コニカミノルタ株式会社 Interaction support system, interaction support method, and interaction support program
JP2022113138A (en) * 2021-01-22 2022-08-03 富士フイルムビジネスイノベーション株式会社 Information processing device and program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006243980A (en) * 2005-03-01 2006-09-14 Fuji Xerox Co Ltd Information processing system, information processing method, and computer program
JP2006304009A (en) * 2005-04-21 2006-11-02 Fuji Xerox Co Ltd Electronic conference system
JP2012146072A (en) * 2011-01-11 2012-08-02 Nippon Telegr & Teleph Corp <Ntt> Next speaker guidance system, next speaker guidance method and next speaker guidance program
JP2017118364A (en) * 2015-12-24 2017-06-29 日本電信電話株式会社 Communication system, communication device, and communication program
JP2021140240A (en) * 2020-03-02 2021-09-16 コニカミノルタ株式会社 Interaction support system, interaction support method, and interaction support program
JP2022113138A (en) * 2021-01-22 2022-08-03 富士フイルムビジネスイノベーション株式会社 Information processing device and program

Similar Documents

Publication Publication Date Title
US9253303B2 (en) Signal processing apparatus and storage medium
US11032675B2 (en) Electronic accessory incorporating dynamic user-controlled audio muting capabilities, related methods and communications terminal
US10567314B1 (en) Programmable intelligent agents for human-chatbot communication
EP2856742A1 (en) System and methods for managing concurrent audio messages
WO2020026850A1 (en) Information processing device, information processing method, and program
CN110035250A (en) Audio-frequency processing method, processing equipment, terminal and computer readable storage medium
CN106982286B (en) Recording method, recording equipment and computer readable storage medium
KR102447381B1 (en) Method for providing intelligent agent service while calling and electronic device thereof
EP2698787A2 (en) Method for providing voice call using text data and electronic device thereof
US20210090548A1 (en) Translation system
KR101609585B1 (en) Mobile terminal for hearing impaired person
CN111108491B (en) Conference system
EP3968619A1 (en) Three-party call terminal for use in mobile man-machine collaborative calling robot
KR20230133864A (en) Systems and methods for handling speech audio stream interruptions
WO2021244135A1 (en) Translation method and apparatus, and headset
WO2024070550A1 (en) System, electronic device, system control method, and program
CN105229997A (en) Communication terminal and communication means
JP2015011651A (en) Information processing device, information processing method, and program
WO2024154626A1 (en) Electronic apparatus and program
WO2024075707A1 (en) System, electronic device, method for controlling system, and program
CN105306656B (en) Call message leaving method, apparatus and system
JP2006139138A (en) Information terminal and base station
WO2023286680A1 (en) Electronic device, program, and system
JP2015115926A (en) Portable terminal device, lip-reading communication method, and program
US20180123731A1 (en) Reduced Latency for Initial Connection to Local Wireless Networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23871790

Country of ref document: EP

Kind code of ref document: A1