WO2015131520A1

WO2015131520A1 - Method and device for displaying layout in telepresence conferencing system

Info

Publication number: WO2015131520A1
Application number: PCT/CN2014/087606
Authority: WO
Inventors: 马铮
Original assignee: 中兴通讯股份有限公司
Priority date: 2014-03-05
Filing date: 2014-09-26
Publication date: 2015-09-11
Also published as: CN104902217B; CN104902217A

Abstract

The invention relates to the field of multimedia telepresence communication. Disclosed are a method and device for displaying the layout in a telepresence conferencing system, the method comprising: a multipoint processing unit receives the local video layout information containing each seat image sent by each telepresence terminal; by analyzing the local video layout information of each telepresence terminal, the multipoint processing unit obtains attendee seating information on each telepresence terminal; according to the attendee seating information on each telepresence terminal, the multipoint processing unit determines whether each telepresence terminal has only one seat with an attendee seated; when the multipoint processing unit determines that each telepresence terminal has only one seat with an attendee seated, sending the image of the seat with the attendee seated on the telepresence terminal to the corresponding telepresence terminal. The invention uses a layout display method to enable reasonable display of the layout of the attendees of all telepresence terminals except the present telepresence terminal, thus enhancing the face to face experience.

Description

Method and device for displaying layout in telepresence conference system

Technical field

The present invention relates to the field of multimedia telepresence communications, and in particular, to a display and layout method and apparatus for video viewed by a terminal in a telepresence system.

Background technique

Telepresence, also known as telepresence technology, is a teleconferencing technology that combines video communication and communication experience. It has the characteristics of life-size, ultra-high definition and low latency. It focuses on real face-to-face communication. The implementation process involves network, communication, conference environment, functional applications and other aspects. The final presentation to the conference participants is an integrated real communication experience combined with transactional applications.

In a quad-part telepresence conference, there is only one agent at each telepresence terminal, so in fact, there are only four seats in the entire conference. According to the conventional remote video layout method, the speech end is usually displayed as a large screen in the far-end video, and then the video of the other two participating terminals is displayed on the small screen of a certain agent screen.

1 is a conventional layout display diagram of displaying a layout in a telepresence conference system according to an embodiment of the present invention. As shown in Figure 1, the remote video layout of a telepresence terminal, wherein the large video in the far-end video is the video of the speaking end, and the small pictures in the left and right screens are respectively videos of the other two telepresence terminals.

Therefore, in a multipoint telepresence conference, there are some telepresence terminals (assuming that all telepresence terminals are three seats - that is, three-screen telepresence terminals), not all agents have participants. When such a telepresence terminal is used as a remote video display output of other telepresence terminals, there will be cases where some agents at the far end are empty. In this way, some seats in the far-end video are empty, and some terminal participants do not have free space to display in the far-end video.

It can be seen that in the prior art, the telepresence technology can provide a real communication experience very easily in point-to-point communication, but in the case of multi-point communication, how can all of the multiple displays be displayed in multiple displays. Personnel images, and as much as possible to preserve the user experience of real face-to-point communication, this is a key issue that can enhance the user experience.

Summary of the invention

It is an object of the present invention to provide a method and apparatus for displaying a layout in a telepresence conference system, which can solve the problem of unreasonable layout display of a participant in a multipoint telepresence conference system in the case of multipoint communication.

According to an embodiment of the present invention, a method for displaying a layout in a telepresence conference system is provided, including:

The multipoint processing unit receives the local video layout information that is sent by each telepresence terminal and includes each of the agent images;

The multi-point processing unit analyzes the local video layout information of each telepresence terminal to obtain the seating information of the personnel of each telepresence terminal;

The multi-point processing unit determines whether each of the telepresence terminals has only one seat and has a person sitting according to the seat information of each telepresence terminal agent;

When the multi-point processing unit determines that each of the telepresence terminals has only one seat and has a person sitting, the seat image of each telepresence terminal having a seat on the person is sent to the corresponding telepresence terminal.

Preferably, each of the agent images in the local video layout information of each of the telepresence terminals is separately subjected to face recognition;

According to the recognition result of the face image or the no face of each agent image, it is obtained whether the seat of each telepresence terminal has a person sitting on the seat.

Preferably, the seat image of the person sitting is extracted from the local video layout information of each telepresence terminal;

Generating each remote video layout information respectively sent to each telepresence terminal by separately combining the extracted agent images;

The remote video layout information used for sending to any of the telepresence terminals includes a seat image of the other telepresence terminal except the any telepresence terminal.

Preferably, each of the video images in the remote video layout information is respectively formed into a corresponding video code stream including the display location identifier, and then sent to the corresponding telepresence terminal.

Preferably, the method further includes: each of the telepresence terminals performing image display according to the corresponding video code stream.

Preferably, each of the telepresence terminals displays the agent image on the corresponding display screen according to the display location identifier in the corresponding video code stream.

According to another embodiment of the present invention, an apparatus for displaying a layout in a telepresence conference system is provided, including:

The receiving module is located in the multi-point processing unit, and is configured to receive, by the multi-point processing unit, the local video layout information that is sent by each telepresence terminal and includes each of the agent images;

The analysis module is located in the multi-point processing unit, and is configured as a multi-point processing unit to analyze the local video layout information of each telepresence terminal, and obtain the information of the seat of each telepresence terminal agent;

The judging module is located in the multi-point processing unit, and is configured as a multi-point processing unit, according to the information of the seat sitting of each telepresence terminal, determining whether each telepresence terminal has only one seat and has a person sitting;

The sending module is located in the multi-point processing unit, and is configured to send the seat image of each telepresence terminal to the corresponding telepresence terminal when the multi-point processing unit determines that only one agent of each telepresence terminal has a seat.

Preferably, the analyzing module further comprises:

The identification sub-module is configured to perform face recognition on each of the agent images in the local video layout information of each of the telepresence terminals;

The determining sub-module is set to obtain the information of the person sitting in the seat of each telepresence terminal according to the recognition result of the face or the face of each agent image.

Preferably, the sending module further includes:

The extraction sub-module is configured to extract, from the local video layout information of each telepresence terminal, a seat image that the person sits on;

The combination sub-module is configured to generate respective remote video layout information respectively sent to each telepresence terminal by separately combining the extracted agent images.

Preferably, the sending module further includes:

The code stream sub-module is configured to respectively form each video code stream corresponding to the display position identifier by each seat image in the remote video layout information, and then send the video code stream to the corresponding telepresence terminal.

Compared with the prior art, the present invention has the beneficial effects that a remote video layout display method can be implemented to enable participants of all telepresence terminals except the local telepresence terminal in a specific scenario to be implemented. A reasonable layout display enhances the sensory experience of each party's participants and other participants.

DRAWINGS

1 is a conventional layout display diagram of displaying a layout in a telepresence conference system according to an embodiment of the present invention;

2 is a flowchart of a method for displaying a layout in a telepresence conference system according to an embodiment of the present invention;

3 is a structural diagram of an apparatus for displaying a layout in a telepresence conference system according to an embodiment of the present invention;

4 is a four-party conference scene diagram for displaying a layout in a telepresence conference system according to an embodiment of the present invention;

5 is a video layout diagram of a first telepresence terminal TerA that displays a layout in a telepresence conference system according to an embodiment of the present invention;

6 is a video layout diagram of a second telepresence terminal TerB that displays a layout in a telepresence conference system according to an embodiment of the present invention;

7 is a video layout diagram of a third telepresence terminal TerC that displays a layout in a telepresence conference system according to an embodiment of the present invention;

8 is a video layout diagram of a fourth telepresence terminal TerD that displays a layout in a telepresence conference system according to an embodiment of the present invention;

9 is a remote video layout view of a first telepresence terminal TerA that displays a layout in a telepresence conference system according to an embodiment of the present invention;

10 is a remote video layout view seen by a second telepresence terminal TerB that displays a layout in a telepresence conference system according to an embodiment of the present invention;

11 is a remote video layout view seen by a third telepresence terminal TerC that displays a layout in a telepresence conference system according to an embodiment of the present invention;

12 is a remote video layout view of a fourth telepresence terminal TerD displayed in a telepresence conference system according to an embodiment of the present invention;

13 is a flowchart of processing a telepresence terminal that displays a layout in a telepresence conference system according to an embodiment of the present invention;

FIG. 14 is a flowchart of processing of an MCU displaying a layout in a telepresence conference system according to an embodiment of the present invention.

detailed description

The preferred embodiments of the present invention are described in detail below with reference to the accompanying drawings.

2 is a flowchart of a method for displaying a layout in a telepresence conference system according to an embodiment of the present invention. As shown in FIG. 2, the application scenario is defined as a four-party telepresence conference, and four telepresence terminals participating in the conference are all three. Screen real terminal, each party's participants are concentrated on one agent, the steps are as follows:

Step S1: The multi-point processing unit receives local video layout information including each agent image sent by each telepresence terminal. The respective telepresence terminals use biometric identification technology, such as face recognition technology, to identify whether there is information about the sitting of the participant in the local agent, and send the information to the MCU. This process is an optional process, and the participating telepresence terminals may not perform this process. The MCU collects local video layout information from each participant's telepresence terminal (ie, whether there is information on the seat of each participant at the terminal) and saves it.

Step S2: The multi-point processing unit analyzes the local video layout information of each telepresence terminal to obtain the seating information of the personnel of each telepresence terminal.

In step S2, each of the seat images in the local video layout information of each of the collected telepresence terminals is separately subjected to face recognition;

Step S3: The multi-point processing unit determines, according to the seat sitting information of each telepresence terminal agent, whether each of the telepresence terminals has only one seat and has a person sitting. The determining whether each of the telepresence terminals is a quartet telepresence conference, and all the participants in the local video layout information of the participant telepresence terminal have a seat in the seat. The MCU can also manually determine that there is only one participant on the seat of each participant's telepresence terminal to sit.

Step S4: When the multi-point processing unit determines that each of the telepresence terminals has only one agent sitting, the seat image of each telepresence terminal having a person sitting is sent to the corresponding telepresence terminal.

In step S4, the seat image of the person sitting is extracted from the local video layout information of each telepresence terminal;

Each remote video layout information used to be separately transmitted to each telepresence terminal is generated by separately combining the extracted seat images.

The remote video layout information used for sending to any of the telepresence terminals includes a seat image of the other telepresence terminal except the any telepresence terminal. The MCU automatically organizes the remote video layout that it watches for each participant's telepresence terminal. The left, middle, and right screens are the seats for the participants in the other three parties except the local end.

In addition, the MCU can also organize the remote video layout that it watches for each participant's telepresence terminal by manual control, that is, manually select and process the video for each agent of the telepresence terminal, for its left, middle, and The right three seats perform video switching processing respectively, and the video source comes from the other three-party telepresence terminal except the local end, where the participants sit at the seat.

Further, it also includes:

Each of the video images in the remote video layout information is respectively formed into a corresponding video code stream including the display location identifier, and then sent to the corresponding telepresence terminal.

Further, the method further includes: each of the telepresence terminals performing image display according to the corresponding video code stream.

Further, each of the telepresence terminals displays the agent image on the corresponding display screen according to the display position identifier in the corresponding video code stream. That is, the final remote video layout of the four participants of the telepresence terminal is the seat of the other three parties of the telepresence terminal, and is displayed on the three seats screens of the local end.

FIG. 3 is a structural diagram of an apparatus for displaying a layout in a telepresence conference system according to an embodiment of the present invention. As shown in FIG. 3, the method includes: a receiving module, an analysis module, a judging module, and a sending module.

The receiving module is located in the multi-point processing unit, and is configured to receive the local video layout information that is sent by each telepresence terminal and includes each of the agent images.

The analysis module is located in the multi-point processing unit, and is configured to analyze the local video layout information of each telepresence terminal to obtain the seating information of the personnel of each telepresence terminal. The identification sub-module of the analysis module is configured to perform face recognition on each seat image in the local video layout information of each telepresence terminal. The determining sub-module of the analysis module is configured to obtain, according to the recognition result of the face image or the face of each agent image, whether each of the telepresence terminal seats has a person sitting on the seat.

The judging module is located in the multi-point processing unit, and is configured to determine, according to the seat sitting information of each telepresence terminal agent, whether each telepresence terminal has only one seat and has a person sitting.

The sending module is located in the multi-point processing unit, and is configured to send the seat image of each telepresence terminal to the corresponding telepresence terminal when it is determined that only one agent of each telepresence terminal has a seat. among them, The extraction submodule of the sending module is configured to extract a seat image of a person sitting from the local video layout information of each telepresence terminal. The combination submodule of the sending module is configured to generate respective remote video layout information respectively sent to each telepresence terminal by separately combining the extracted agent images. The code stream sub-module of the sending module is configured to respectively form each video code stream corresponding to the display position identifier by each seat image in the remote video layout information, and then send the video stream to the corresponding telepresence terminal.

FIG. 4 is a diagram of a four-party conference scene in which a layout is displayed in a telepresence conference system according to an embodiment of the present invention. As shown in FIG. 4, there are four telepresence terminals of TerA, TerB, TerC, and TerD, which jointly participate in a telepresence conference held on a Multipoint Control Unit (MCU). These four terminals are three-screen telepresence terminals with three screens: left (L), medium (C), and right (R).

FIG. 5 is a video layout diagram of a first telepresence terminal TerA that displays a layout in a telepresence conference system according to an embodiment of the present invention. As shown in Figure 5, the left screen (L) position of TerA is seated by two participants, and the other two screens - the middle screen (C) and the right screen (R), are not seated by the participants.

FIG. 6 is a video layout diagram of a second telepresence terminal TerB that displays a layout in a telepresence conference system according to an embodiment of the present invention. As shown in Figure 6, TerB's mid-screen (C) position has two participants, and the other two screens - the left screen (L) and the right screen (R) - are not attended by the participants.

FIG. 7 is a video layout diagram of a third telepresence terminal TerC that displays a layout in a telepresence conference system according to an embodiment of the present invention. As shown in Figure 7, TerC's right screen (R) position has two participants, and the other two screens - left screen (L) and middle screen (C) - are not attended by participants.

FIG. 8 is a video layout diagram of a fourth telepresence terminal TerD that displays a layout in a telepresence conference system according to an embodiment of the present invention. As shown in Figure 8, TerD's mid-screen (C) position is seated by one participant, and the other two screens - left screen (L) and right screen (R) - are not attended by participants.

FIG. 9 is a remote video layout view of a first telepresence terminal TerA that displays a layout in a telepresence conference system according to an embodiment of the present invention. As shown in Figure 9, the left screen (L) of TerA is the video of two participants of TerB, the middle screen (C) is the video of two participants of TerC, and the right screen (R) is a participant of TerD. Video of people.

FIG. 10 is a remote video layout view of a second telepresence terminal TerB that displays a layout in a telepresence conference system according to an embodiment of the present invention. As shown in Figure 10, the left screen (L) of TerB is the video of two participants of TerA, the middle screen (C) is the video of two participants of TerC, and the right screen (R) is a participant of TerD. Video of people.

FIG. 11 is a remote video layout diagram seen by a third telepresence terminal TerC that displays a layout in a telepresence conference system according to an embodiment of the present invention. As shown in Figure 11, the left screen (L) of TerC is the video of two participants of TerA, the middle screen (C) is the video of two participants of TerB, and the right screen (R) is a participant of TerD. Video of people.

FIG. 12 is a remote video layout view of a fourth telepresence terminal TerD displayed in a telepresence conference system according to an embodiment of the present invention. As shown in Figure 12, the left screen (L) of TerD is the video of two participants of TerA, the middle screen (C) is the video of two participants of TerB, and the right screen (R) is the two participants of TerC. Video of people.

FIG. 13 is a flowchart of processing a telepresence terminal that displays a layout in a telepresence conference system according to an embodiment of the present invention. As shown in FIG. 13 , the terminal can determine which participant in the video currently collected by the local party is seated by the face recognition technology, and save the information and send it to the MCU.

FIG. 14 is a flowchart of processing of an MCU displaying a layout in a telepresence conference system according to an embodiment of the present invention. As shown in Figure 14, the MCU collects the local video layout information of each participant's telepresence terminal and saves it. After all the local video layout information of the participating telepresence terminals are collected, the MCU analyzes and judges if the current conference is a four-party conference ( That is, there are four participants in the conference (the telepresence terminal), and there is only one participant in the local video layout of each participant's telepresence terminal. When this condition is met, the MCU starts to organize for each conference terminal. The far-end video layout of the remote video layout that it views is composed of the seats of the other three participants in the telepresence terminal except the local end. The participants of the other three participants of the telepresence terminal are displayed separately. In the left, middle and right seats.

In summary, the present invention has the following technical effects: in a quad-part telepresence conference, when only one of the three seats in each participant's telepresence terminal has a participant, each participant can pass The remote video of the telepresence terminal is located at the same time as the other three participants, that is, all the participants can be seen in one conference room, so that in this particular scenario, each telepresence site Participants can achieve large-screen display to achieve the best face-to-face sensory effect with all participants.

Although the invention has been described in detail above, the invention is not limited thereto, and various modifications may be made by those skilled in the art in accordance with the principles of the invention. Therefore, modifications made in accordance with the principles of the invention are to be understood as falling within the scope of the invention.

Industrial applicability

The technical solution provided by the present invention can realize a reasonable layout display by using a remote video layout display method to enable all participants of the telepresence terminal except the local telepresence terminal in a specific scenario to enhance each display. The sensory experience of a party attending a face-to-face meeting with other participants.

Claims

A method of displaying a layout in a telepresence conference system, including:

The multipoint processing unit receives the local video layout information that is sent by each telepresence terminal and includes each of the agent images;

The multi-point processing unit analyzes the local video layout information of each telepresence terminal to obtain the seating information of the personnel of each telepresence terminal;

The multi-point processing unit determines whether each of the telepresence terminals has only one seat and has a person sitting according to the seat information of each telepresence terminal agent;

When the multi-point processing unit determines that each of the telepresence terminals has only one seat and has a person sitting, the seat image of each telepresence terminal having a seat on the person is sent to the corresponding telepresence terminal.
The method according to claim 1, wherein the multi-point processing unit analyzes the local video layout information of each telepresence terminal, and obtains the information of the seating information of the personnel of each telepresence terminal:

Performing face recognition on each of the agent images in the local video layout information of each of the telepresence terminals;

According to the recognition result of the face image or the no face of each agent image, it is obtained whether the seat of each telepresence terminal has a person sitting on the seat.
The method according to claim 1, wherein the step of transmitting the agent image in which each of the telepresence terminals has a person to the corresponding telepresence terminal comprises:

Extracting a seat image of a person sitting from the local video layout information of each telepresence terminal;

Generating each remote video layout information respectively sent to each telepresence terminal by separately combining the extracted agent images;

The remote video layout information used for sending to any of the telepresence terminals includes a seat image of the other telepresence terminal except the any telepresence terminal.
The method according to claim 3, wherein the step of transmitting the agent image in which each of the telepresence terminals has a person to the corresponding telepresence terminal further comprises:

Each of the video images in the remote video layout information is respectively formed into a corresponding video code stream including the display location identifier, and then sent to the corresponding telepresence terminal.
The method according to claim 3 or 4, further comprising: said each telepresence terminal performing image display according to said corresponding video code stream.
The method according to claim 5, wherein each of the telepresence terminals displays the agent image on the corresponding display screen in full screen according to the display location identifier in the corresponding video code stream.
A device for displaying a layout in a telepresence conference system, comprising:

The receiving module is located in the multi-point processing unit, and is configured to receive local video layout information that is sent by each telepresence terminal and includes each of the agent images;

The analysis module is located in the multi-point processing unit, and is configured to analyze the local video layout information of each telepresence terminal to obtain the information of the seat of each telepresence terminal agent;

The judging module is located in the multi-point processing unit, and is configured to determine, according to the seat sitting information of each telepresence terminal agent, whether each telepresence terminal has only one seat and has a person sitting;

The sending module is located in the multi-point processing unit, and is configured to send the seat image of each telepresence terminal to the corresponding telepresence terminal when it is determined that only one agent of each telepresence terminal has a seat.
The apparatus of claim 7, wherein the analysis module further comprises:

The identification sub-module is configured to perform face recognition on each of the agent images in the local video layout information of each of the telepresence terminals;

The determining sub-module is set to obtain the information of the person sitting in the seat of each telepresence terminal according to the recognition result of the face or the face of each agent image.
The apparatus of claim 7, wherein the transmitting module further comprises:

The extraction sub-module is configured to extract, from the local video layout information of each telepresence terminal, a seat image that the person sits on;

The combination sub-module is configured to generate respective remote video layout information respectively sent to each telepresence terminal by separately combining the extracted agent images.
The apparatus of claim 9, wherein the sending module further comprises:

The code stream sub-module is configured to respectively form each video code stream corresponding to the display position identifier by each seat image in the remote video layout information, and then send the video code stream to the corresponding telepresence terminal.