WO2018061173A1 - Système de conférence télévisuelle, procédé de conférence télévisuelle et programme - Google Patents

Système de conférence télévisuelle, procédé de conférence télévisuelle et programme Download PDF

Info

Publication number
WO2018061173A1
WO2018061173A1 PCT/JP2016/078992 JP2016078992W WO2018061173A1 WO 2018061173 A1 WO2018061173 A1 WO 2018061173A1 JP 2016078992 W JP2016078992 W JP 2016078992W WO 2018061173 A1 WO2018061173 A1 WO 2018061173A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
participant
conference
image
image analysis
Prior art date
Application number
PCT/JP2016/078992
Other languages
English (en)
Japanese (ja)
Inventor
俊二 菅谷
Original Assignee
株式会社オプティム
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社オプティム filed Critical 株式会社オプティム
Priority to PCT/JP2016/078992 priority Critical patent/WO2018061173A1/fr
Publication of WO2018061173A1 publication Critical patent/WO2018061173A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Definitions

  • the present invention relates to a TV conference system, a TV conference method, and a program that make it easy to understand who made what.
  • Patent Document 1 A video conference apparatus that can be adjusted according to the above is disclosed (Patent Document 1).
  • the present invention provides the following solutions.
  • the invention according to the first feature is a TV conference system in which a participant performs a TV conference, Image analysis means for image analysis of an image of the TV conference in which the participant is shown; From the result of the image analysis, face detection means for detecting a part including the face of the participant as a face part; Face list display means for displaying a list of images of the detected face parts of a plurality of participants; A video conference system is provided.
  • an image analysis unit that performs image analysis of an image of the TV conference in which the participant is reflected; Face detection means for detecting a part including the face of the participant as a face part, and face list display means for displaying a list of images of the detected face parts of the plurality of participants.
  • the invention according to the first feature is a category of the TV conference system, but the TV conference method and the TV conference program exhibit the same operations and effects.
  • the invention according to the second feature is a TV conference system which is the invention according to the first feature,
  • the face list display means arranges the detected face part in the center of each display area, and arranges the display area in which the face part is arranged in the center to make a list display. Provide a system.
  • the face list display means arranges the detected face part in the center of each display area, and A list display is made by arranging display areas arranged in the center.
  • the invention according to the third feature is a TV conference system which is the invention according to the first feature or the second feature
  • the face list display means provides a TV conference system in which, when displaying a list of images of the face portion, a background portion other than the detected face is replaced and displayed.
  • the face list display means detects when displaying a list of images of the face part.
  • the background part other than the selected face is replaced and displayed.
  • An invention according to a fourth feature is a TV conference system according to any one of the first feature to the third feature,
  • the face list display means provides a TV conference system, wherein after the start of list display, when the detected face portion satisfies a predetermined condition, the face list display means replaces and displays the image with another image.
  • the face list display means detects the detected face after starting the list display. When the face portion satisfies a predetermined condition, it is replaced with another image and displayed.
  • the invention according to the fifth feature is a TV conference system according to any one of the first feature to the fourth feature,
  • the face detection means further detects whether or not each participant of the detected face portion speaks,
  • the face list display means further provides a TV conference system that changes the attention level of a participant who detects that he / she is speaking when displaying a list of images of the face portion.
  • the face detection means further includes a participant of the detected face part
  • the face list display means further changes the degree of attention of the participant who detects the speech when the face images are displayed as a list.
  • An invention according to a sixth feature is a TV conference system according to any one of the first feature to the fifth feature, A speech detection means for detecting the speech of the participant; Speaker determination means for determining a speaker of the detected speech; A speech history display means for displaying a speech history of a participant selected from the list display; A video conference system is provided.
  • the speech detection means for detecting the speech of the participant, and the detected A speaker determination unit that determines a speaker of a speech, and a speech history display unit that displays a speech history of a participant selected from the list display.
  • the invention according to a seventh aspect is a TV conference system in which a participant performs a TV conference, Image analysis means for image analysis of an image of the TV conference in which the participant is shown; From the result of the image analysis, a part including the face of the participant is detected as a face part, and further, a face detection unit for detecting whether or not each participant of the detected face part is speaking, A list of face images of a plurality of detected participants is displayed, and a face list display for changing the attention level of the participant who has detected the speech when displaying the images of the face portions.
  • a video conference system is provided.
  • the image analysis means for performing image analysis on an image of the TV conference in which the participant is reflected, and the result of the image analysis, Face detection means for detecting a part including a participant's face as a face part, and detecting whether each participant of the detected face part is speaking, and detected face parts of a plurality of participants A list of images, and a face list display means for changing the attention level of the participant who has detected the speech when the face image is displayed as a list, and detecting the speech of the participant A speech detection means for determining the speaker of the detected speech, and a speech history display means for displaying the speech history of the participant.
  • the invention according to an eighth feature is a video conference method in which a participant conducts a video conference, Image analysis of an image of a TV conference in which the participant is shown; Detecting a part including the face of the participant as a face part from the result of the image analysis; Displaying a list of images of face portions of a plurality of detected participants; A video conference method is provided.
  • the invention according to the ninth feature provides a computer system in which a participant conducts a TV conference.
  • the faces of all the members can be appropriately displayed at the connection destination and the facial expressions of the participants can be read. It is possible to provide a TV conference system, a TV conference method, and a TV conference program that are easy to understand who made what kind of remarks.
  • FIG. 1 is a schematic diagram of a preferred embodiment of the present invention.
  • FIG. 2 is a diagram illustrating the relationship between the functional blocks of the TV conference apparatus 100 and the functions.
  • FIG. 3 is a flowchart of face list display processing in the TV conference apparatus 100.
  • FIG. 4 is a diagram showing a relationship between functional blocks and functions when image analysis processing and face detection processing are performed by the transmitting-side TV conference device 100a and face list display processing is performed by the receiving-side TV conference device 100b.
  • FIG. 5 is a flowchart when image analysis processing and face detection processing are performed by the transmitting-side TV conference device 100a, and face list display processing is performed by the receiving-side TV conference device 100b.
  • FIG. 1 is a schematic diagram of a preferred embodiment of the present invention.
  • FIG. 2 is a diagram illustrating the relationship between the functional blocks of the TV conference apparatus 100 and the functions.
  • FIG. 3 is a flowchart of face list display processing in the TV conference apparatus 100.
  • FIG. 4 is a
  • FIG. 6 is a diagram illustrating a relationship between functional blocks and functions when image analysis processing and face detection processing are performed by the computer 200 and face list display processing is performed by the TV conference device 100b on the receiving side.
  • FIG. 7 is a flowchart when the image analysis process and the face detection process are performed by the computer 200 and the face list display process is performed by the TV conference device 100b on the receiving side.
  • FIG. 8 is a diagram illustrating the relationship between the function blocks and the functions when the face list display process and the speech history display process are performed in the TV conference apparatus 100.
  • FIG. 9 is a flowchart of face list display processing and speech history display processing in the TV conference apparatus 100.
  • FIG. 10 is a diagram illustrating an example of a display of a general TV conference.
  • FIG. 10 is a diagram illustrating an example of a display of a general TV conference.
  • FIG. 11 is a diagram illustrating an example of the display of the face list display process.
  • FIG. 12 is a diagram illustrating an example of a display in which the background is replaced in the face list display process.
  • FIG. 13 is a diagram illustrating an example of a display in which a face portion satisfying a predetermined condition is replaced in the face list display process.
  • FIG. 14 is a diagram illustrating an example of a display for changing the attention level of a participant who has detected that he / she is speaking in the face list display process.
  • FIG. 15 is a diagram illustrating an example of the display of the face list display process and the speech history display process.
  • FIG. 1 is a schematic diagram of a preferred embodiment of the present invention. The outline of the present invention will be described with reference to FIG.
  • the TV conference apparatus 100 includes a camera unit 10, a control unit 110, a communication unit 120, a storage unit 130, an input unit 140, and an output unit 150.
  • the control unit 110 implements the image analysis module 111 in cooperation with the communication unit 120 and the storage unit 130. Further, the control unit 110 implements the face detection module 112 in cooperation with the storage unit 130. Further, the output unit 150 implements the face list display module 151 in cooperation with the control unit 110 and the storage unit 130.
  • the TV conference apparatus 100 may have the above-described configurations as the entire apparatus, and may be in the form of an internal device or an external device.
  • the TV conference device 100 includes a mobile phone, a portable information terminal, a tablet terminal, a personal computer, an electronic product such as a netbook terminal, a slate terminal, an electronic book terminal, and a portable music player, a smart glass, and a head-mounted display.
  • Wearable terminals such as, and other items.
  • the smartphone illustrated as the TV conference apparatus 100a in FIG. 2 and the personal computer, the display, and the WEB camera illustrated as the TV conference apparatus 100b are merely examples.
  • the image analysis module 111 of the TV conference device 100 receives a captured image from the connected TV conference device 100 via the communication unit 120 (step S01).
  • the captured image here is a precise image having an amount of information necessary for image analysis, and the number of pixels and the image quality can be designated. Audio data is also received along with the captured image. The received captured image and audio data are saved in the storage unit 130.
  • the image analysis module 111 of the TV conference apparatus 100 performs image analysis of the received captured image (step S02).
  • the image analysis here is an analysis of the positions and number of participants in the conference.
  • gender and age may be analyzed, or analysis may be performed to identify individual participants using an employee database or the like.
  • the image analysis module 111 specifies that there are four participants and their positions by image analysis.
  • the face detection module 112 of the TV conference device 100 detects a part including each participant's face as a face part based on the image analysis result of step S02 (step S03).
  • the face detection here is for specifying the participant's head position. For example, even when the participant is not facing the camera and parts such as eyes and mouth cannot be found, the temporal region Or the back of the head may be detected as a face.
  • the face detection module 112 detects four faces of a participant 1001, a participant 1002, a participant 1003, and a participant 1004.
  • a human may perform teacher learning, or may use machine learning or deep learning (deep learning).
  • the learned image analysis module 111 and face detection module 112 may be acquired from the outside via the communication unit 120.
  • this patent is not limited, and existing techniques can be used.
  • the face list display module 151 of the TV conference apparatus 100 displays a list of detected images of the face portions of the plurality of participants on the output unit 150 (step S04).
  • the output unit 150 is divided into areas equal to or more than the number of detected participants, and the detected face portion is arranged and displayed at the center of each divided display area.
  • the display area with the face portion arranged in the center is arranged as a face list display.
  • the face portion to be displayed here is not limited to the head, but may be from the head to the chest. In the example of FIG.
  • the participant 1001 is placed in the area 101, the participant 1002 is placed in the area 102, the participant 1003 is placed in the area 103, and the participant 1004 is placed in the area 104, and the face list is displayed. Further, as shown in FIG. 1, a captured image representing the entire state may be displayed in the empty area 105 where the face portion is not displayed.
  • FIG. 10 is a diagram showing an example of a general video conference display. Since the captured image is displayed as it is on the output unit 150, the overall atmosphere is transmitted, but the display of the face portion is small, so the facial expressions of the participant 1001, the participant 1002, the participant 1003, and the participant 1004, respectively. Is difficult to read.
  • FIG. 11 is a diagram showing an example of the display of the face list display process of the present invention. Since the participant 1001 is displayed in the area 1101 of the output unit 150, the participant 1002 is displayed in the area 1102, the participant 1003 is displayed in the area 1103, and the participant 1004 is displayed in the area 1104, the face of each participant is displayed. It can be seen that the display is large and facial expressions are easy to read.
  • the area 1105 “TV conference system 2016/9/9 15:07:19 ⁇ Connecting to XX office >> Call start: 2016/9/9 14:05:33 Destination participant: 4 In this example, the “name”, the date and time, the connection destination, the call start time, and the number of destination participants are displayed.
  • information may be displayed in the empty area, or a captured image representing the entire state may be displayed. Further, when the area is divided, the entire output unit 150 may be divided into a face list display area without creating an empty area. At that time, the size of each participant's area may be different.
  • a TV conference system in which a participant conducts a TV conference, even when there are a plurality of participants at one site, the faces of all the members are appropriately displayed at the connection destination. It is possible to provide a TV conference system, a TV conference method, and a TV conference program that can read facial expressions.
  • FIG. 2 is a diagram illustrating the relationship between the functional blocks of the TV conference apparatus 100 and the functions.
  • the video conference apparatus 100 includes a camera unit 10, a control unit 110, a communication unit 120, a storage unit 130, an input unit 140, and an output unit 150.
  • the control unit 110 implements the image analysis module 111 in cooperation with the communication unit 120 and the storage unit 130. Further, the control unit 110 implements the face detection module 112 in cooperation with the storage unit 130. Further, the output unit 150 implements the face list display module 151 in cooperation with the control unit 110 and the storage unit 130.
  • the communication network 300 may be a public communication network such as the Internet or a dedicated communication network, and enables communication between the TV conference apparatus 100a and the TV conference apparatus 100b.
  • the TV conference apparatus 100 may have the above-described configurations as the entire apparatus, and may be in the form of an internal device or an external device.
  • the TV conference device 100 includes a mobile phone, a portable information terminal, a tablet terminal, a personal computer, an electronic product such as a netbook terminal, a slate terminal, an electronic book terminal, and a portable music player, a smart glass, and a head-mounted display.
  • Wearable terminals such as, and other items.
  • the smartphone illustrated as the TV conference apparatus 100a in FIG. 2 and the personal computer, the display, and the WEB camera illustrated as the TV conference apparatus 100b are merely examples.
  • the TV conference apparatus 100 includes, as the camera unit 10, an imaging device such as a lens, an imaging device, various buttons, and a flash, and captures images as captured images such as moving images and still images.
  • An image obtained by imaging is a precise image having an amount of information necessary for image analysis, and the number of pixels and the image quality can be designated.
  • the camera unit 10 includes a microphone for acquiring audio data combined with moving image capturing or can use the microphone function of the input unit 140.
  • the control unit 110 includes a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), and the like.
  • the control unit 110 implements the image analysis module 111 in cooperation with the communication unit 120 and the storage unit 130. Further, the control unit 110 implements the face detection module 112 in cooperation with the storage unit 130.
  • a device for enabling communication with other devices for example, a WiFi (Wireless Fidelity) compatible device compliant with IEEE 802.11 or an IMT-2000 standard such as a third generation or fourth generation mobile communication system. Compliant wireless device etc. It may be a wired LAN connection.
  • WiFi Wireless Fidelity
  • the storage unit 130 includes a data storage unit such as a hard disk or a semiconductor memory, and stores data necessary for processing of captured images, image analysis results, face detection results, and the like.
  • the input unit 140 has functions necessary for using the TV conference system.
  • a liquid crystal display that realizes a touch panel function, a keyboard, a mouse, a pen tablet, a hardware button on the apparatus, a microphone for performing voice recognition, and the like can be provided.
  • the function of the present invention is not particularly limited by the input method.
  • the output unit 150 has functions necessary for using the TV conference system.
  • the output unit 150 implements the face list display module 151 in cooperation with the control unit 110 and the storage unit 130.
  • forms such as a liquid crystal display, a PC display, a projection on a projector, and an audio output can be considered.
  • the function of the present invention is not particularly limited by the output method.
  • FIG. 3 is a flowchart of face list display processing in the TV conference apparatus 100. Processing executed by each module described above will be described in accordance with this processing.
  • an example of a flowchart in the case where the image analysis process, the face detection process, and the face list display process are performed in the TV conference device 100a on the captured image reception side is shown as an example.
  • control unit 110 of the TV conference device 100a notifies the connection destination TV conference device 100b of the start of the TV conference via the communication unit 120 (step S301).
  • the control unit 110 of the TV conference apparatus 100b starts imaging with the camera unit 10 (step S302).
  • the captured image is a precise image having an amount of information necessary for image analysis in the TV conference apparatus 100a, and the number of pixels and the image quality can be designated.
  • audio data is acquired together with moving image capturing.
  • control unit 110 of the TV conference device 100b transmits the captured image to the TV conference device 100a via the communication unit 120 (step S303).
  • the captured image is a moving image
  • audio data is also transmitted.
  • the image analysis module 111 of the TV conference apparatus 100a receives a captured image from the TV conference apparatus 100b via the communication unit 120 (step S304). Audio data is also received along with the captured image. The received captured image and audio data are saved in the storage unit 130.
  • the image analysis module 111 of the TV conference apparatus 100a performs image analysis of the received captured image (step S305).
  • the image analysis is an analysis of the positions and number of participants in the conference.
  • gender and age may be analyzed, or analysis may be performed to identify individual participants using an employee database or the like.
  • the face detection module 112 of the TV conference device 100a detects a part including each participant's face as a face part based on the image analysis result of step S305 (step S306).
  • the face detection is for specifying the participant's head position. For example, even when the participant is not facing the camera and parts such as eyes and mouth cannot be found, the temporal region Or the back of the head may be detected as a face.
  • a human may perform teacher learning, or may use machine learning or deep learning (deep learning).
  • the learned image analysis module 111 and face detection module 112 may be acquired from the outside via the communication unit 120.
  • this patent is not limited, and existing techniques can be used.
  • the face list display module 151 of the TV conference device 100a displays a list of detected face part images of the plurality of participants on the output unit 150 (step S307).
  • the output unit 150 is divided into areas equal to or more than the number of detected participants, and the detected face portion is arranged and displayed at the center of each divided display area.
  • the face portion to be displayed here is not limited to the head, but may be from the head to the chest.
  • the display area with the face portion arranged in the center is arranged as a face list display.
  • the face portion to be displayed here is not limited to the head, but may be from the head to the chest.
  • FIG. 10 is a diagram showing an example of a general video conference display. Since the captured image is displayed as it is on the output unit 150, the overall atmosphere is transmitted, but the display of the face portion is small, so the facial expressions of the participant 1001, the participant 1002, the participant 1003, and the participant 1004, respectively. Is difficult to read.
  • FIG. 11 is a diagram showing an example of the display of the face list display process of the present invention. Since the participant 1001 is displayed in the area 1101 of the output unit 150, the participant 1002 is displayed in the area 1102, the participant 1003 is displayed in the area 1103, and the participant 1004 is displayed in the area 1104, the face display of each participant is displayed. It can be seen that it is large and easy to read facial expressions.
  • the area 1105 “TV conference system 2016/9/9 15:07:19 ⁇ Connecting to XX office >> Call start: 2016/9/9 14:05:33 Destination participant: 4 In this example, the “name”, the date and time, the connection destination, the call start time, and the number of destination participants are displayed.
  • information may be displayed in the empty area, or a captured image representing the entire state may be displayed. Further, when the area is divided, the entire output unit 150 may be divided into a face list display area without creating an empty area. At that time, the size of each participant's area may be different.
  • the control unit 110 of the TV conference device 100a confirms whether or not to end the TV conference (step S308). It is assumed that the user can designate the end of the video conference via the input unit 140. If the video conference is to be terminated, the process proceeds to the next step S309. If the video conference is not to be terminated, the process returns to step S304 to continue the processing.
  • control unit 110 of the TV conference device 100a notifies the TV conference device 100b of the end of the TV conference via the communication unit 120 (step S309).
  • the TV conference device 100b confirms whether or not to end the TV conference (step S310). If the TV conference apparatus 100b does not end, the process returns to step S302 to continue the process, and if it ends, the TV conference ends.
  • a TV conference system in which a participant conducts a TV conference, even when there are a plurality of participants at one site, the faces of all the members are appropriately displayed at the connection destination. It is possible to provide a TV conference system, a TV conference method, and a TV conference program that can read facial expressions.
  • FIG. 12 is a diagram illustrating an example of a display in which the background is replaced in the face list display process. Compared to FIG. 11, it can be seen that the backgrounds of the participants in the areas 1201, 1202, 1203, and 1204 of the output unit 150 are replaced with uniform backgrounds.
  • the background replacement process may be performed when the face list display module 151 performs the face list display process of step S307 in the flowchart of FIG. In this way, by replacing the background portion and displaying it, extra information is eliminated and the facial expressions of each participant are easier to read.
  • the same background is used for all of the participant 1001, the participant 1002, the participant 1003, and the participant 1004, but the background may be changed depending on the participant.
  • the background of the participants may be changed for each location, and which participants are in the same space may be displayed in an easy-to-understand manner.
  • a captured image representing the overall state is displayed in the area 1205.
  • the date and time, the connection destination, the call start time, the number of participants of the other party, and the like may be displayed.
  • information such as “background: being replaced” may be displayed somewhere on the output unit 150 as shown in the display 1206. Whether or not to use the background replacement function may be set by the user at a desired timing, or the setting may be saved.
  • FIG. 13 is a diagram illustrating an example of a display in which a face portion satisfying a predetermined condition is replaced in the face list display process.
  • the face part replacement process may be performed when the face list display module 151 performs the face list display process of step S307 in the flowchart of FIG.
  • the face list display module 151 determines a predetermined condition, and replaces the face portion when the condition is satisfied.
  • the predetermined condition here may be, for example, a case where there are no participants in the captured image, that is, a case where the participant goes out of the shooting range of the camera. Comparing FIG. 13 with FIG. 12, it can be seen that an image 1307 is displayed in the area 1302 instead of the participant 1002 because the participant 1002 disappears.
  • the participant 1002 is absent. Since the other three participants are present, the participant 1001 is displayed in the area 1301, the participant 1003 is displayed in the area 1303, and the participant 1004 is displayed in the area 1304 as in FIG.
  • the image 1307 may be a still image of the participant 1002, a favorite illustration, an avatar of the participant 1002, or the like, and can be set by the participant according to his / her preference.
  • the area 1305 may display date and time, connection destination, call start time, number of participants at the other end, and the like. Further, information such as “Away” may be displayed somewhere in the area 1302 as in the display 1306.
  • FIG. 14 is a diagram illustrating an example of a display for changing the attention level of a participant who has detected that he / she is speaking in the face list display process.
  • the face detection module 112 performs the face detection process of step S306 in the flowchart of FIG. 3, it is detected whether or not each participant is speaking, and the face list display module 151
  • the attention level of the participant who has detected that he / she is speaking is changed.
  • the attention level change is performed in order to make the speaker stand out and to attract attention. Specifically, as can be seen by comparing FIG. 13 and FIG.
  • the participant 1001 is displayed in the area 1401, the participant 1002 is displayed in the area 1402, and the participant 1003 is displayed in the area 1403 as in FIG.
  • information “speaking” may be displayed somewhere in the area 1404 like a display 1406.
  • the area 1404 where the speaking participant 1004 is displayed may be placed in a conspicuous part such as the center of the output unit 150 (not shown).
  • the position of the speaker may be indicated by a mark or the like as shown in the display 1407 in conjunction with the attention level changing process described above.
  • an example in which the participant who speaks is only the participant 1004 is illustrated, but when a plurality of participants speak, a process of changing the degree of attention is performed on a plurality of people. Also good.
  • FIG. 4 is a diagram showing a relationship between functional blocks and functions when image analysis processing and face detection processing are performed by the transmitting-side TV conference device 100a and face list display processing is performed by the receiving-side TV conference device 100b.
  • the video conference apparatus 100 includes a camera unit 10, a control unit 110, a communication unit 120, a storage unit 130, an input unit 140, and an output unit 150.
  • the control unit 110 implements the image analysis module 113 in cooperation with the camera unit 10 and the storage unit 130. Further, the control unit 110 implements the face detection module 112 in cooperation with the storage unit 130. Further, the output unit 150 implements the face list display module 151 in cooperation with the control unit 110 and the storage unit 130.
  • the communication network 300 may be a public communication network such as the Internet or a dedicated communication network, and enables communication between the TV conference apparatus 100a and the TV conference apparatus 100b.
  • FIG. 5 is a flowchart when the image analysis process and the face detection process are performed by the transmitting-side TV conference apparatus 100a, and the face list display process is performed by the receiving-side TV conference apparatus 100b. Processing executed by each module described above will be described in accordance with this processing.
  • control unit 110 of the TV conference device 100a notifies the connection destination TV conference device 100b of the start of the TV conference via the communication unit 120 (step S501).
  • the control unit 110 of the TV conference device 100a starts imaging with the camera unit 10 (step S502).
  • the captured image is a precise image having an amount of information necessary for image analysis in the TV conference apparatus 100a, and the number of pixels and the image quality can be designated.
  • audio data is acquired together with moving image capturing.
  • the captured image and audio data are stored in the storage unit 130.
  • the image analysis module 113 of the TV conference apparatus 100a performs image analysis of the captured image (step S503).
  • the image analysis is an analysis of the positions and number of participants in the conference.
  • gender and age may be analyzed, or analysis may be performed to identify individual participants using an employee database or the like.
  • the face detection module 112 of the TV conference device 100a detects a part including each participant's face as a face part based on the image analysis result of step S503 (step S504).
  • the face detection here is for specifying the participant's head position. For example, even when the participant is not facing the camera and parts such as eyes and mouth cannot be found, the temporal region Or the back of the head may be detected as a face.
  • a human may perform teacher learning, or may use machine learning or deep learning (deep learning). Further, since a large amount of data is required for learning, the learned image analysis module 113 and the face detection module 112 may be acquired from the outside via the communication unit 120.
  • this patent is not limited, and existing techniques can be used.
  • the control unit 110 of the TV conference device 100a transmits the analysis image to the TV conference device 100b via the communication unit 120 (step S505).
  • the captured image is a moving image
  • audio data is also transmitted.
  • the analysis image transmitted here may be only the data determined to be necessary for the face list display as a result of the image analysis and face detection, or the captured image itself that was captured by the TV conference device 100a and used for the image analysis, or You may send the captured image which changed the resolution.
  • the TV conference device 100b receives an analysis image from the TV conference device 100a via the communication unit 120 (step S506). Audio data is also received along with the analysis image. The received analysis image and audio data are saved in the storage unit 130.
  • the face list display module 151 of the TV conference device 100b displays a list of detected face part images of the plurality of participants on the output unit 150 (step S507).
  • the output unit 150 is divided into areas equal to or more than the number of detected participants, and the detected face portion is arranged and displayed at the center of each divided display area.
  • the display area with the face portion arranged in the center is arranged as a face list display.
  • the face portion to be displayed here is not limited to the head, but may be from the head to the chest.
  • FIG. 10 is a diagram showing an example of a general video conference display. Since the captured image is displayed as it is on the output unit 150, the overall atmosphere is transmitted, but the display of the face portion is small, so the facial expressions of the participant 1001, the participant 1002, the participant 1003, and the participant 1004, respectively. Is difficult to read.
  • FIG. 11 is a diagram showing an example of the display of the face list display process of the present invention. Since the participant 1001 is displayed in the area 1101 of the output unit 150, the participant 1002 is displayed in the area 1102, the participant 1003 is displayed in the area 1103, and the participant 1004 is displayed in the area 1104, the face display of each participant is displayed. It can be seen that it is large and easy to read facial expressions.
  • the area 1105 “TV conference system 2016/9/9 15:07:19 ⁇ Connecting to XX office >> Call start: 2016/9/9 14:05:33 Destination participant: 4 In this example, the “name”, the date and time, the connection destination, the call start time, and the number of destination participants are displayed.
  • information may be displayed in the empty area, or a captured image representing the entire state may be displayed. Further, when the area is divided, the entire output unit 150 may be divided into a face list display area without creating an empty area. At that time, the size of each participant's area may be different.
  • the control unit 110 of the TV conference device 100a confirms whether or not to end the TV conference (step S508). It is assumed that the user can designate the end of the video conference via the input unit 140. If the video conference is to be ended, the process proceeds to the next step S509. If the video conference is not to be ended, the process returns to step S502 and the process is continued.
  • control unit 110 of the TV conference device 100a notifies the TV conference device 100b of the end of the TV conference via the communication unit 120 (step S509).
  • FIG. 2 and 3 is configured to perform image analysis processing, face detection processing, and face list display processing on the reception side, image analysis processing and face detection processing on the transmission side in FIGS. 4 and 5, and face list display processing on the reception side.
  • image analysis processing and face detection processing on the transmission side in FIGS. 4 and 5 face list display processing on the reception side.
  • FIG. 4 and FIG. 5 it is only necessary to transmit the analysis image, so that the effect of suppressing the communication data amount can be expected particularly when the background has been replaced.
  • a TV conference system in which a participant conducts a TV conference, even when there are a plurality of participants at one site, the faces of all the members are appropriately displayed at the connection destination. It is possible to provide a TV conference system, a TV conference method, and a TV conference program that can read facial expressions.
  • FIG. 6 is a diagram illustrating a relationship between functional blocks and functions when image analysis processing and face detection processing are performed by the computer 200 and face list display processing is performed by the TV conference device 100b on the receiving side.
  • the video conference apparatus 100 includes a camera unit 10, a control unit 110, a communication unit 120, a storage unit 130, an input unit 140, and an output unit 150.
  • the output unit 150 implements the face list display module 151 in cooperation with the control unit 110 and the storage unit 130.
  • the computer 200 includes a control unit 210, a communication unit 220, and a storage unit 230.
  • the control unit 210 implements the image analysis module 211 in cooperation with the communication unit 220 and the storage unit 230.
  • the control unit 210 implements the face detection module 212 in cooperation with the storage unit 230.
  • the communication network 300 may be a public communication network such as the Internet or a dedicated communication network, and enables communication between the TV conference apparatus 100a and the computer 200 and between the TV conference apparatus 100b and the computer 200.
  • the TV conference apparatus 100 may have the above-described configurations as the entire apparatus, and may be in the form of an internal device or an external device.
  • the TV conference device 100 includes a mobile phone, a portable information terminal, a tablet terminal, a personal computer, an electronic product such as a netbook terminal, a slate terminal, an electronic book terminal, and a portable music player, a smart glass, and a head-mounted display.
  • Wearable terminals such as, and other items.
  • the smartphone illustrated as the TV conference apparatus 100a in FIG. 6 and the personal computer, the display, and the WEB camera illustrated as the TV conference apparatus 100b are merely examples.
  • the TV conference apparatus 100 includes, as the camera unit 10, an imaging device such as a lens, an imaging device, various buttons, and a flash, and captures images as captured images such as moving images and still images.
  • An image obtained by imaging is a precise image having an amount of information necessary for image analysis, and the number of pixels and the image quality can be designated.
  • the camera unit 10 includes a microphone for acquiring audio data combined with moving image capturing or can use the microphone function of the input unit 140.
  • the control unit 110 includes a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), and the like.
  • a CPU Central Processing Unit
  • RAM Random Access Memory
  • ROM Read Only Memory
  • a device for enabling communication with other devices for example, a WiFi (Wireless Fidelity) compatible device compliant with IEEE 802.11 or an IMT-2000 standard such as a third generation or fourth generation mobile communication system. Compliant wireless device etc. It may be a wired LAN connection.
  • WiFi Wireless Fidelity
  • the storage unit 130 includes a data storage unit such as a hard disk or a semiconductor memory, and stores data necessary for processing of captured images, image analysis results, face detection results, and the like.
  • the input unit 140 has functions necessary for using the TV conference system.
  • a liquid crystal display that realizes a touch panel function, a keyboard, a mouse, a pen tablet, a hardware button on the apparatus, a microphone for performing voice recognition, and the like can be provided.
  • the function of the present invention is not particularly limited by the input method.
  • the output unit 150 has functions necessary for using the TV conference system.
  • the output unit 150 implements the face list display module 151 in cooperation with the control unit 110 and the storage unit 130.
  • forms such as a liquid crystal display, a PC display, a projection on a projector, and an audio output can be considered.
  • the function of the present invention is not particularly limited by the output method.
  • the computer 200 may be a general computer having the functions described below. Although not described here, an input unit and an output unit may be provided as necessary.
  • the computer 200 includes a CPU, RAM, ROM and the like as the control unit 210.
  • the control unit 210 implements the image analysis module 211 in cooperation with the communication unit 220 and the storage unit 230.
  • the control unit 210 implements the face detection module 212 in cooperation with the storage unit 230.
  • a device for enabling communication with other devices as the communication unit 220 for example, a WiFi compatible device compliant with IEEE802.11 or a wireless device compliant with the IMT-2000 standard such as a third generation or fourth generation mobile communication system Etc. It may be a wired LAN connection.
  • the storage unit 230 includes a data storage unit using a hard disk or a semiconductor memory.
  • the storage unit 230 holds data such as acquired captured images, image analysis results, and face detection results.
  • FIG. 7 is a flowchart when the image analysis process and the face detection process are performed by the computer 200 and the face list display process is performed by the TV conference device 100b on the receiving side. Processing executed by each module described above will be described in accordance with this processing.
  • control unit 110 of the TV conference apparatus 100a notifies the start of the TV conference to the computer 200 and the connected TV conference apparatus 100b via the communication unit 120 (step S701).
  • the computer 200 that has received the notification of the start of the TV conference from the TV conference device 100a notifies the TV conference device 100b of the start of the TV conference. Do.
  • the control unit 110 of the TV conference device 100a starts imaging with the camera unit 10 (step S702).
  • the captured image is a precise image having an amount of information necessary for image analysis by the computer 200, and the number of pixels and the image quality can be designated.
  • audio data is acquired together with moving image capturing.
  • the captured image and audio data are stored in the storage unit 130.
  • control unit 110 of the TV conference device 100a transmits the captured image to the computer 200 via the communication unit 120 (step S703).
  • the captured image is a moving image
  • audio data is also transmitted.
  • the image analysis module 211 of the computer 200 receives a captured image from the TV conference device 100a via the communication unit 220 (step S704). Audio data is also received along with the captured image. The received captured image and audio data are stored in the storage unit 230.
  • the image analysis module 211 of the computer 200 performs image analysis of the received captured image (step S705).
  • the image analysis here is an analysis of the positions and number of participants in the conference.
  • gender and age may be analyzed, or analysis may be performed to identify individual participants using an employee database or the like.
  • the face detection module 212 of the computer 200 detects a part including the face of each participant as a face part based on the image analysis result of step S705 (step S706).
  • the face detection here is for specifying the participant's head position. For example, even when the participant is not facing the camera and parts such as eyes and mouth cannot be found, the temporal region Or the back of the head may be detected as a face.
  • a human may perform teacher learning, or may use machine learning or deep learning (deep learning).
  • the learned image analysis module 211 and face detection module 212 may be acquired from the outside.
  • this patent is not limited, and existing techniques can be used.
  • the control unit 210 of the computer 200 transmits the analysis image to the TV conference device 100b via the communication unit 220 (step S707).
  • the captured image is a moving image
  • audio data is also transmitted.
  • the analysis image to be transmitted here may be only data determined to be necessary for the face list display as a result of image analysis and face detection, or is taken by the TV conference apparatus 100a and used for image analysis by the computer 200. You may send the captured image itself or the captured image which changed resolution.
  • the TV conference apparatus 100b receives an analysis image from the computer 200 via the communication unit 120 (step S708). Audio data is also received along with the analysis image. The received analysis image and audio data are saved in the storage unit 130.
  • the face list display module 151 of the TV conference apparatus 100b displays a list of detected face part images on the output unit 150 (step S709).
  • the output unit 150 is divided into areas equal to or more than the number of detected participants, and the detected face portion is arranged and displayed at the center of each divided display area.
  • the display area with the face portion arranged in the center is arranged as a face list display.
  • the face portion to be displayed here is not limited to the head, but may be from the head to the chest.
  • FIG. 10 is a diagram showing an example of a general video conference display. Since the captured image is displayed as it is on the output unit 150, the overall atmosphere is transmitted, but the display of the face portion is small, so the facial expressions of the participant 1001, the participant 1002, the participant 1003, and the participant 1004, respectively. Is difficult to read.
  • FIG. 11 is a diagram showing an example of the display of the face list display process of the present invention. Since the participant 1001 is displayed in the area 1101 of the output unit 150, the participant 1002 is displayed in the area 1102, the participant 1003 is displayed in the area 1103, and the participant 1004 is displayed in the area 1104, the face display of each participant is displayed. It can be seen that it is large and easy to read facial expressions.
  • the area 1105 “TV conference system 2016/9/9 15:07:19 ⁇ Connecting to XX office >> Call start: 2016/9/9 14:05:33 Destination participant: 4 In this example, the “name”, the date and time, the connection destination, the call start time, and the number of destination participants are displayed.
  • information may be displayed in the empty area, or a captured image representing the entire state may be displayed. Further, when the area is divided, the entire output unit 150 may be divided into a face list display area without creating an empty area. At that time, the size of each participant's area may be different.
  • the control unit 110 of the TV conference device 100a confirms whether or not to end the TV conference (step S710). It is assumed that the user can designate the end of the video conference via the input unit 140. If the TV conference is to be ended, the process proceeds to the next step S711. If the TV conference is not to be ended, the process returns to step S702 to continue the processing.
  • the control unit 110 of the TV conference device 100a notifies the computer 200 and the TV conference device 100b of the end of the TV conference via the communication unit 120 (step S711).
  • the computer 200 that receives the TV conference end notification from the TV conference apparatus 100a notifies the TV conference apparatus 100b of the TV conference end notification. Do.
  • the latter An advantage is that the analysis module 211 and the face detection module 212 can be easily updated. Further, since a large amount of data is required for machine learning and deep learning, the computer 200 has an advantage that a large-capacity storage is easily provided in the storage unit 230 in this respect.
  • a TV conference system in which a participant conducts a TV conference, even when there are a plurality of participants at one site, the faces of all the members are appropriately displayed at the connection destination. It is possible to provide a TV conference system, a TV conference method, and a TV conference program that can read facial expressions.
  • FIG. 8 is a diagram illustrating the relationship between the function blocks and the functions when the face list display process and the speech history display process are performed in the TV conference apparatus 100.
  • the control unit 110 implements a speech detection module 114 and a speaker determination module 115 in cooperation with the communication unit 120 and the storage unit 130.
  • the output unit 150 implements the message history display module 152 in cooperation with the control unit 110 and the storage unit 130.
  • the TV conference apparatus 100 may have the above-described configurations as the entire apparatus, and may be in the form of an internal device or an external device.
  • the TV conference device 100 includes a mobile phone, a portable information terminal, a tablet terminal, a personal computer, an electronic product such as a netbook terminal, a slate terminal, an electronic book terminal, and a portable music player, a smart glass, and a head-mounted display.
  • Wearable terminals such as, and other items.
  • the smartphone illustrated as the TV conference apparatus 100a in FIG. 8, the personal computer, the display, and the WEB camera illustrated as the TV conference apparatus 100b are merely examples.
  • FIG. 9 is a flowchart of the face list display process and the speech history display process in the TV conference apparatus 100. Processing executed by each module described above will be described in accordance with this processing.
  • an example in which a series of processing from image analysis processing to speech history display processing is performed by the video conference apparatus 100 on the captured image receiving side is shown.
  • the image analysis process, the face detection process, the speech detection process, and the speaker determination process are performed by the TV conference apparatus 100 or the computer 200 on the captured image transmission side, as described above.
  • the TV conference device 100 on the receiving side may be configured to perform only face list display processing and speech history display processing.
  • the image analysis module 111 of the TV conference device 100 on the captured image receiving side receives a captured image via the communication unit 120 (step S901). Audio data is also received along with the captured image. The received captured image and audio data are saved in the storage unit 130.
  • the TV conference start notification is not described, but it is assumed that the TV conference start notification is performed before step S901.
  • the image analysis module 111 performs image analysis of the received captured image (step S902).
  • the image analysis here is an analysis of the positions and number of participants in the conference.
  • gender and age may be analyzed, or analysis may be performed to identify individual participants using an employee database or the like.
  • the face detection module 112 detects a part including each participant's face as a face part based on the image analysis result of step S902 (step S903).
  • the face detection here is for specifying the participant's head position. For example, even when the participant is not facing the camera and parts such as eyes and mouth cannot be found, the temporal region Or the back of the head may be detected as a face.
  • a human may perform teacher learning, or may use machine learning or deep learning (deep learning).
  • the learned image analysis module 111 and face detection module 112 may be acquired from the outside via the communication unit 120.
  • this patent is not limited, and existing techniques can be used.
  • the speech detection module 114 detects the speech of each participant based on the received audio data (step S904).
  • the content of received voice data is analyzed and converted into text by voice recognition. If multiple people are speaking at the same time and it is difficult to isolate the speech, use the image analysis result in step S902, the face detection result in step S903, etc. Improvements may be made.
  • this patent is not limited, and it is assumed that the existing technology can be used.
  • the speaker determination module 115 determines a speaker based on the image analysis result in step S902, the face detection result in step S903, the speech detection result in step S904, and the like (step S905).
  • the speaker determination here specifies which participant is speaking using the captured image, the mouth movement based on the analysis image, the voice height, the input direction, and the like, and the content of the speech detected in step S904. It is to be tied.
  • the results of these processes are stored in the storage unit 130 as data as to which participant made what and when.
  • a human may perform teacher learning, or may use machine learning or deep learning (deep learning).
  • the learned speech detection module 114 and the speaker determination module 115 may be acquired from the outside via the communication unit 120.
  • this patent is not limited and existing technology can be used.
  • the face list display module 151 displays a list of detected face part images of a plurality of participants on the output unit 150 (step S906).
  • the output unit 150 is divided into areas equal to or more than the number of detected participants, and the detected face portion is arranged and displayed at the center of each divided display area.
  • the display area with the face portion arranged in the center is arranged as a face list display.
  • the face portion to be displayed here is not limited to the head, but may be from the head to the chest.
  • the speech history display module 152 confirms whether or not to display the speech history (step S907). It is assumed that the user can specify the message history display via the input unit 140. If the message history is displayed, the process proceeds to the next step S908. If the message history is not displayed, the process is terminated.
  • the speech history display module 152 When displaying the speech history, the speech history display module 152 causes the user to select a participant who displays the speech history via the input unit 140 (step S908).
  • the number of participants to be selected is not limited, and one, a plurality of people, or all of the participants may be selected.
  • the speech history display module 152 displays the speech history of the selected participant on the output unit 150 (step S909).
  • the TV conference end notification is not described. However, when the TV conference ends, it is assumed that the TV conference end notification is sent to the destination TV conference apparatus 100.
  • FIG. 15 is a diagram showing an example of the display of the face list display process and the speech history display process. Since the participant 1001 is displayed in the area 1501 of the output unit 150, the participant 1002 is displayed in the area 1502, the participant 1003 is displayed in the area 1503, and the participant 1004 is displayed in the area 1504, the face display of each participant is displayed. It can be seen that it is large and easy to read facial expressions. Also, here, participant 1001 is displayed as “Participant A” 1506, participant 1002 is displayed as “Participant B” 1507, participant 1003 is displayed as “Participant C” 1508, and participant 1004 Is displayed on the output unit 150 as a “participant D”.
  • the region 1503 or display 1508 of the participant C may be selected with the pointer 1510.
  • the fact that “participant C” is selected may be displayed on the output unit 150 in an easily understandable manner.
  • an example in which the content of the speech of “participant C” is displayed in the area 1505 by the speech history display in step S909 is illustrated. If there are too many utterance histories that can be displayed, a scroll bar 1511 or the like may be provided in the utterance history display area 1505 so that the past utterances can be displayed.
  • the participant who made the speech is also displayed together with the content of the speech.
  • the speech history it is easier to understand if the time of speech is also displayed.
  • a TV conference system in which a participant conducts a TV conference, even when there are a plurality of participants at one site, the faces of all the members are appropriately displayed at the connection destination. It is possible to provide a TV conference system, a TV conference method, and a TV conference program that can read facial expressions and easily understand who made what.
  • the means and functions described above are realized by a computer (including a CPU, an information processing apparatus, and various terminals) reading and executing a predetermined program.
  • the program may be, for example, in a form (SaaS: Software as a Service) provided from a computer via a network, or a flexible disk, CD (CD-ROM, etc.), DVD (DVD-ROM, DVD). -RAM, etc.) and a computer-readable recording medium such as a compact memory.
  • the computer reads the program from the recording medium, transfers it to the internal storage device or the external storage device, stores it, and executes it.
  • the program may be recorded in advance in a storage device (recording medium) such as a magnetic disk, an optical disk, or a magneto-optical disk, and provided from the storage device to a computer via a communication line.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

Le problème décrit par la présente invention est d'afficher l'apparence d'un participant et d'afficher la personne qui parle d'une manière facilement compréhensible, dans un système de conférence télévisuelle dans lequel des participants effectuent une conférence télévisuelle, et ce, même lorsqu'il existe une pluralité de participants en un endroit. La solution selon l'invention porte sur un système de conférence télévisuelle dans lequel des participants effectuent une conférence télévisuelle, le système de conférence télévisuelle étant pourvu d'un module d'analyse d'image (111) conçu pour analyser une image de la conférence télévisuelle dans laquelle des participants sont vus, et un module de détection de visage (112) conçu pour détecter une partie comprenant le visage d'un participant, en tant que partie de visage, les parties de visage détectées parmi une pluralité de participants étant affichées sous forme de liste par un module d'affichage de liste de visages (151).
PCT/JP2016/078992 2016-09-30 2016-09-30 Système de conférence télévisuelle, procédé de conférence télévisuelle et programme WO2018061173A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2016/078992 WO2018061173A1 (fr) 2016-09-30 2016-09-30 Système de conférence télévisuelle, procédé de conférence télévisuelle et programme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2016/078992 WO2018061173A1 (fr) 2016-09-30 2016-09-30 Système de conférence télévisuelle, procédé de conférence télévisuelle et programme

Publications (1)

Publication Number Publication Date
WO2018061173A1 true WO2018061173A1 (fr) 2018-04-05

Family

ID=61760351

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/078992 WO2018061173A1 (fr) 2016-09-30 2016-09-30 Système de conférence télévisuelle, procédé de conférence télévisuelle et programme

Country Status (1)

Country Link
WO (1) WO2018061173A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3627832A1 (fr) 2018-09-21 2020-03-25 Yamaha Corporation Appareil de traitement d'images, appareil de caméra et procédé de traitement d'images
WO2020151443A1 (fr) * 2019-01-23 2020-07-30 广州视源电子科技股份有限公司 Procédé de transmission d'image vidéo, dispositif, tablette intelligente interactive et support de stockage
JP2021521497A (ja) * 2018-05-04 2021-08-26 グーグル エルエルシーGoogle LLC 検出された口運動および/または注視に基づく自動化アシスタントの適応
US11493992B2 (en) 2018-05-04 2022-11-08 Google Llc Invoking automated assistant function(s) based on detected gesture and gaze
US11688417B2 (en) 2018-05-04 2023-06-27 Google Llc Hot-word free adaptation of automated assistant function(s)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004153674A (ja) * 2002-10-31 2004-05-27 Sony Corp カメラ装置
JP2009194857A (ja) * 2008-02-18 2009-08-27 Sharp Corp 通信会議システム、通信装置、通信会議方法、コンピュータプログラム
JP2009206924A (ja) * 2008-02-28 2009-09-10 Fuji Xerox Co Ltd 情報処理装置、情報処理システム及び情報処理プログラム
JP2012054897A (ja) * 2010-09-03 2012-03-15 Sharp Corp 会議システム、情報処理装置、及び情報処理方法
JP2014175866A (ja) * 2013-03-08 2014-09-22 Ricoh Co Ltd テレビ会議システム
JP2015019162A (ja) * 2013-07-09 2015-01-29 大日本印刷株式会社 会議支援システム
JP2016134781A (ja) * 2015-01-20 2016-07-25 株式会社リコー 情報処理装置、音声出力方法、プログラム、コミュニケーションシステム

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004153674A (ja) * 2002-10-31 2004-05-27 Sony Corp カメラ装置
JP2009194857A (ja) * 2008-02-18 2009-08-27 Sharp Corp 通信会議システム、通信装置、通信会議方法、コンピュータプログラム
JP2009206924A (ja) * 2008-02-28 2009-09-10 Fuji Xerox Co Ltd 情報処理装置、情報処理システム及び情報処理プログラム
JP2012054897A (ja) * 2010-09-03 2012-03-15 Sharp Corp 会議システム、情報処理装置、及び情報処理方法
JP2014175866A (ja) * 2013-03-08 2014-09-22 Ricoh Co Ltd テレビ会議システム
JP2015019162A (ja) * 2013-07-09 2015-01-29 大日本印刷株式会社 会議支援システム
JP2016134781A (ja) * 2015-01-20 2016-07-25 株式会社リコー 情報処理装置、音声出力方法、プログラム、コミュニケーションシステム

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021521497A (ja) * 2018-05-04 2021-08-26 グーグル エルエルシーGoogle LLC 検出された口運動および/または注視に基づく自動化アシスタントの適応
US11493992B2 (en) 2018-05-04 2022-11-08 Google Llc Invoking automated assistant function(s) based on detected gesture and gaze
US11614794B2 (en) 2018-05-04 2023-03-28 Google Llc Adapting automated assistant based on detected mouth movement and/or gaze
US11688417B2 (en) 2018-05-04 2023-06-27 Google Llc Hot-word free adaptation of automated assistant function(s)
JP7471279B2 (ja) 2018-05-04 2024-04-19 グーグル エルエルシー 検出された口運動および/または注視に基づく自動化アシスタントの適応
EP3627832A1 (fr) 2018-09-21 2020-03-25 Yamaha Corporation Appareil de traitement d'images, appareil de caméra et procédé de traitement d'images
US10965909B2 (en) 2018-09-21 2021-03-30 Yamaha Corporation Image processing apparatus, camera apparatus, and image processing method
WO2020151443A1 (fr) * 2019-01-23 2020-07-30 广州视源电子科技股份有限公司 Procédé de transmission d'image vidéo, dispositif, tablette intelligente interactive et support de stockage

Similar Documents

Publication Publication Date Title
WO2018061173A1 (fr) Système de conférence télévisuelle, procédé de conférence télévisuelle et programme
KR20140100704A (ko) 음성 대화 기능을 구비한 휴대 단말기 및 이의 음성 대화 방법
JP7100824B2 (ja) データ処理装置、データ処理方法及びプログラム
JP7283384B2 (ja) 情報処理端末、情報処理装置、および情報処理方法
US9247206B2 (en) Information processing device, information processing system, and information processing method
US20170185365A1 (en) System and method for screen sharing
US20220224735A1 (en) Information processing apparatus, non-transitory computer readable medium storing program, and method
JP2014220619A (ja) 会議情報記録システム、情報処理装置、制御方法およびコンピュータプログラム
JP2011061450A (ja) 会議通信システム、会議通信方法及びプログラム
WO2018158852A1 (fr) Système d'appel téléphonique et système de communication
CN114531564A (zh) 处理方法及电子设备
JP2020136921A (ja) ビデオ通話システム、およびコンピュータプログラム
AU2013222959A1 (en) Method and apparatus for processing information of image including a face
JP4973908B2 (ja) 通信端末およびその表示方法
US20230093298A1 (en) Voice conference apparatus, voice conference system and voice conference method
WO2019026395A1 (fr) Dispositif de traitement d'informations, procédé de traitement d'informations et programme
JP5432805B2 (ja) 発言機会均等化方法、発言機会均等化装置及び発言機会均等化プログラム
JP2004112511A (ja) 表示制御装置および方法
US11928253B2 (en) Virtual space control system, method for controlling the same, and control program
KR101562901B1 (ko) 대화 지원 서비스 제공 시스템 및 방법
US11949727B2 (en) Organic conversations in a virtual group setting
JP5613102B2 (ja) 会議装置、会議方法および会議プログラム
WO2006106671A1 (fr) Dispositif de traitement et d’affichage d’images, dispositif de reception et de transmission, systeme de communication, procede et programme de traitement d’images et support d’enregistrement contenant le programme de traitement d’images
JP2005091463A (ja) 情報処理装置
JP2007150917A (ja) 通信端末およびその表示方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16917723

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16917723

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP