WO2021093882A1 - Video meeting method, meeting terminal, server, and storage medium - Google Patents

Video meeting method, meeting terminal, server, and storage medium Download PDF

Info

Publication number
WO2021093882A1
WO2021093882A1 PCT/CN2020/129049 CN2020129049W WO2021093882A1 WO 2021093882 A1 WO2021093882 A1 WO 2021093882A1 CN 2020129049 W CN2020129049 W CN 2020129049W WO 2021093882 A1 WO2021093882 A1 WO 2021093882A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
code stream
conference
server
decoding
Prior art date
Application number
PCT/CN2020/129049
Other languages
French (fr)
Chinese (zh)
Inventor
曹泊
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2021093882A1 publication Critical patent/WO2021093882A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1813Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
    • H04L12/1827Network arrangements for conference optimisation or adaptation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234309Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4 or from Quicktime to Realvideo
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234345Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440218Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4

Definitions

  • This application relates to the technical field of video conferencing, such as a video conferencing method, a conference terminal, a server, and a storage medium.
  • multiple conference terminals encode the local video data and send it to the MCU (Multi Control Unit) server.
  • MCU Multi Control Unit
  • the MCU server receives the video stream from multiple conference terminals .
  • Each conference terminal receives the video code stream from the MCU server, decodes the data, and then displays the picture.
  • the MCU server needs to process the video streams of multiple video sources at the same time during the video conference: decode and decode the video streams received from multiple video sources.
  • the video picture is synthesized, and then the synthesized picture is encoded. Therefore, this processing process requires the MCU server to have high processing capabilities. Once the MCU server has insufficient performance, it will cause a delay in the video picture on the conference terminal side, which will affect the conference experience of the users participating in the conference.
  • the video conference method, conference terminal, server, and storage medium provided by the embodiments of the present application avoid the situation that insufficient processing performance of the MCU server causes serious video picture delay on the conference terminal side and poor conference experience of the users participating in the conference.
  • the embodiment of the present application provides a video conference method, which is applied to a conference terminal, and includes:
  • the composite code stream is formed by the server decoding, synthesizing and re-encoding the video code streams of some video sources in the video conference.
  • At least one independent code stream is the server's self-exit The video code stream that is received at a video source other than the part of the video source and then forwarded to the conference terminal;
  • the embodiment of the present application also provides a video conference method, including:
  • the stream is forwarded to each conference terminal in the video conference as an independent code stream.
  • An embodiment of the present application also provides a conference terminal, which includes a first processor, a first memory, and a first communication bus;
  • the first communication bus is configured to realize connection and communication between the first processor and the first memory
  • the first processor is configured to execute at least one program stored in the first memory, so as to implement the above-mentioned first video conference method.
  • An embodiment of the present application also provides a server, which includes a second processor, a second memory, and a second communication bus;
  • the second communication bus is configured to realize connection and communication between the second processor and the second memory
  • the second processor is configured to execute at least one program stored in the second memory to implement the second video conference method described above.
  • An embodiment of the present application also provides a storage medium.
  • the storage medium stores at least one of a first video conference program and a second video conference program.
  • the first video conference program can be executed by at least one processor, so as to implement the above-mentioned first Video conference method;
  • the second video conference program can be executed by at least one processor to implement the above-mentioned second video conference method.
  • the server in the video conference process, after the server receives the video code streams sent by multiple video sources in the video conference, it only performs video codes for some video sources.
  • the stream is decoded, combined and re-encoded to form a composite code stream and sent to each conference terminal in the video conference.
  • the server uses the video code stream of the video source other than the part of the video source as an independent code stream. It is sent to each conference terminal in the video conference, so that the conference terminal decodes and displays the composite code stream and at least one independent code stream.
  • the server does not need to decode the video code streams of all video sources.
  • Picture synthesis and re-encoding thus reducing the requirements for server-side encoding and decoding capabilities.
  • For other video streams outside the processing capacity of the server they are directly sent to the conference terminal, thereby making full use of the processing resources on the conference terminal side, reducing the delay of the video screen on the conference terminal side, improving the smoothness of the video conference, and enhancing Improve the user experience.
  • FIG. 1 is an interactive flowchart of the video conference method provided in Embodiment 1 of this application;
  • FIG. 3 is a schematic diagram of the conference terminal provided in the second embodiment of the application displaying video images of multiple video sources;
  • FIG. 4 is another schematic diagram of the conference terminal provided in the second embodiment of the application displaying video images of multiple video sources;
  • FIG. 5 is a flow chart of the server shown in the second embodiment of the application processing the video code streams of some video sources to form a composite code stream;
  • FIG. 6 is an interactive flowchart of the video conference method provided in Embodiment 3 of this application.
  • FIG. 7 is a schematic diagram of the video conference screen layout of the conference terminal provided in the third embodiment of the application.
  • FIG. 8 is another schematic diagram of the video conference screen layout of the conference terminal provided in the third embodiment of the application.
  • FIG. 9 is a schematic diagram of a hardware structure of a conference terminal provided in Embodiment 4 of this application.
  • FIG. 10 is a schematic diagram of a hardware structure of the server provided in the fourth embodiment of the application.
  • FIG. 11 is a schematic diagram of a video conference system provided in Embodiment 4 of this application.
  • This embodiment provides a video conference method. Please refer to an interactive flowchart of the video conference method shown in FIG. 1.
  • S102 The server receives the video code stream sent by the video source in the video conference.
  • the server may be an MCU server.
  • the MCU server is essentially a multimedia information exchange, which performs multipoint calls and connections, realizes functions such as video broadcasting, video selection, audio mixing, and data broadcasting, and completes multiple terminal signals. Tandem and switch.
  • the server will notify the corresponding conference terminal to allow these conference terminals to join the video conference.
  • the media collection device on the side of each conference terminal such as a camera, can collect image information on the side of each conference terminal to form a video.
  • the conference terminal encodes the video collected by the media collection device, the video code stream of the conference terminal can be formed, and then the video code stream is sent to the server.
  • the media collection device on the conference terminal side may include a microphone, etc., in addition to a camera, and the microphone is set to collect audio information of users participating in the conference.
  • the video collected by the conference terminal also includes both image information and audio information.
  • the video stream sent by multiple conference terminals in the video conference will be received.
  • these conference terminals that provide video streams are video sources.
  • some participants may turn off their cameras, that is, do not provide image information of the conference terminal during the conference. In this case, the conference terminal can be considered as not a video source.
  • S104 The server decodes, synthesizes and re-encodes the video code streams of some video sources, obtains a synthesized code stream and sends it to each conference terminal, and treats the video code streams of video sources other than the part of the video sources as independent The code stream is forwarded to each conference terminal.
  • the server After the server receives the video code stream sent by the video source, it can decode, synthesize and re-encode only part of the video code stream. For the video code stream of the video source other than the above part of the video code stream, the server does not do Decoding, picture synthesis and other processing. For example, assuming that a video conference contains four video sources, the server can only decode, synthesize and re-encode the video streams of three of the video sources to form a composite stream. For the other video stream, then Continue to be independent. Of course, in other examples, the server may only decode, synthesize and re-encode the video streams of two of the video sources, so that the remaining two video streams continue to be independent.
  • the video code stream that has not been synthesized by the server only contains the video picture on the side of a single conference terminal, while the itinerary of the synthesized code stream has undergone picture synthesis, which contains at least two video pictures on the conference terminal side.
  • Stream the video stream that has not been decoded by the server, synthesized and re-encoded by the server can be called "independent stream”.
  • the server After processing the received video stream to obtain the composite stream, the server can send the composite stream to each conference terminal so as to display the video image corresponding to the composite stream on each conference terminal.
  • the server also needs to send it to each conference terminal so that each conference terminal can display the video image corresponding to the independent code stream.
  • the conference terminal Based on the composite code stream and the independent code stream, the conference terminal can display the image information of multiple video sources. It is worth noting that there is no strict timing relationship between the server’s action of sending the composite stream and the action of sending the independent stream. In some scenarios, the server sends the composite stream first, and then the independent stream. In other cases, the server sends the composite stream first, and then the independent stream.
  • the independent code stream is transmitted before the composite code stream, and even in some examples, the composite code stream and the independent code stream are transmitted to the conference terminal side at the same time.
  • the conference terminal in order to reduce the picture delay on the conference terminal side, so that the conference terminal can process the video stream in time, so as to display the picture as soon as possible.
  • the server can proceed immediately. Transmission without paying attention to other video streams.
  • the server when the server decodes, synthesizes and re-encodes the received video streams a1, c1, and d1, the server receives the video stream b1 again, because the server does not need to process the video stream b1. b1 performs additional processing, so it can directly send the video bit stream b1 as an independent bit stream to each conference terminal side. Subsequently, after the synthesis code stream is generated, the synthesized code stream is transmitted to the conference terminal. For another example, the five parties a2, b2, c2, d2, and e2 conduct a video conference. According to the settings, the server decodes, synthesizes and re-encodes the video streams of the three parties b2, c2, and d2.
  • the video streams are treated as independent streams. If the first video code stream received by the server is a video code stream of a2, the server can directly send the independent code stream of the video code stream to each conference terminal side. Subsequently, the server sequentially receives the video streams of b2, c2, e2, and d2. After receiving the video streams of b2 and c2, the server can decode the two video streams first. After receiving the video code of e2 After streaming, the video code stream is forwarded to the conference terminal.
  • the server After receiving the d2 video code stream, the server decodes it, and then synthesizes the picture with the decoding results of the video code streams on both sides of b2 and c2, and then re-encodes the synthesized picture to obtain a composite code stream, and combine the code
  • the stream is sent to the conference terminal side.
  • the conference terminal respectively decodes the composite code stream and at least one independent code stream.
  • the conference terminal After receiving the video code stream sent by the server, the conference terminal can decode the received video code stream. It is understandable that the conference terminal in the related technology only needs to decode one video stream, that is, it only needs to decode the composite stream containing all video source video pictures, so the decoding method is only the same as the encoding method on the server side. Correspondence is fixed and unique. In this embodiment, the conference terminal needs to decode at least two video streams (one composite stream and at least one independent stream). The independent stream and the composite stream are encoded by different objects: the composite stream is encoded by the server. , And the independent code is coded by the corresponding conference terminal.
  • the conference terminal and the server use the same encoding method, then the conference terminal can use the same decoding method when decoding the independent code stream and the composite code stream received from the server, that is, the conference terminal The terminal does not need to distinguish between different video code streams to adopt different decoding methods.
  • the encoding methods of the conference terminal and the server may be different, and even the encoding methods adopted by different conference terminals are not exactly the same.
  • the conference terminal decodes the video code stream it receives, it also needs to adopt different decoding methods for different video code streams.
  • the server and multiple conference terminals can pre-appoint each other's coding and decoding methods for each video stream.
  • the server and the conference terminal agree to use decoding method 1 for decoding the composite code stream, and use decoding method 2 for the independent code stream to decode.
  • the server sends the composite code stream to the conference terminal, it only needs to carry the corresponding identification information in the video code stream.
  • the conference terminal receives the video code stream, according to the identification information carried by it, it can query the corresponding identification information.
  • the video code stream needs to be decoded using the first decoding method.
  • the conference terminal displays the video picture corresponding to the composite code stream and the at least one independent code stream.
  • the conference terminal After the conference terminal completes the decoding of the video code stream it has received, it can display the corresponding video picture on the screen.
  • the conference terminal can display the video screen corresponding to the decoded video stream immediately after decoding a certain video stream, and does not need to wait for the independent stream and composite stream to be displayed. Display after the decoding is completed.
  • the server can only decode, synthesize and re-encode the video code streams of some video sources, and does not need to perform this processing on the video code streams of all video sources.
  • the processed video stream can be directly sent to each conference terminal, allowing each conference terminal to directly decode and display, which can make full use of the processing resources on the conference terminal side, reduce the processing burden of the server, reduce the delay of the video picture, and improve the video The quality of the meeting.
  • the server in the embodiment of the present application can only decode, synthesize and re-encode the video stream of some video sources to form a composite stream, and then send the composite stream to each conference terminal.
  • the server directly sends them to each conference terminal.
  • the server and the conference terminal belong to the same manufacturer, and the designer can solidify these contents on the conference terminal and the server before the equipment leaves the factory.
  • programmers can write these contents into the upgrade program in the way of equipment upgrade, and then push them to the server and conference terminal side respectively.
  • the above content can also be determined by the server itself according to the current video conference situation.
  • the following describes this video conference method with reference to the flowchart shown in FIG. 2.
  • S202 The conference terminal sends the video codec capability parameter of the conference terminal to the server.
  • the conference terminal after the conference terminal enters the video conference, it can send its own video codec capability parameters to the server. It is understandable that the conference terminal can actively report the video codec capability parameters to the server or at After receiving the server's request, send it to the server.
  • the video codec capability parameter may represent the video codec capability of the conference terminal.
  • the video codec capability parameter includes coding parameters and decoding parameters
  • the coding parameter includes the coding capability of the conference terminal.
  • the decoding parameters include at least one of the decoding capability of the conference terminal, the meeting speed, the frame rate, and format information.
  • the server determines the decoding and display strategy of the video conference according to the video codec capability parameter of each conference terminal and the video codec capability of the server.
  • the server After the server obtains the video coding and decoding capability parameters of each conference terminal, it can determine the decoding and display strategy of the video conference based on the coding and decoding capabilities of these conference terminals and the video coding and decoding capabilities of the server itself. Based on the number of conference terminals in this video conference, and the way each conference terminal encodes its local video, etc., the server can determine that the video stream of all video sources will be processed into a composite stream in this video conference. Processing requirements. Exemplarily, the server can determine whether its own video codec capability meets the processing requirement. In the case where the server determines that its own video codec capability meets the processing requirement, it indicates that the server can decode all video code streams of all video sources.
  • Picture synthesis and re-encoding that is, the server can process the video stream of all video sources into a composite stream. However, if the server determines that its video codec capability is lower than the processing requirements for processing the video code streams of all video sources in the video conference into a composite code stream, that is, the server’s processing capacity is not enough to deal with the video codes of all video sources. When the stream is decoded, screen synthesized and re-encoded, the server will determine that in the subsequent process of the video conference, it will only process part of the video stream to synthesize the stream.
  • the server needs to ensure is the video code stream that it sends to each conference terminal, including the composite code stream and at least one independent code stream.
  • the encoding method adopted is supported by the conference terminal, otherwise if a conference terminal cannot respond to itself Decoding the received video code stream will cause the conference terminal to be unable to display at least one video screen on the side of the conference terminal.
  • the decoding display strategy determined by the server includes a decoding instruction, and the decoding instruction is used to indicate that the conference terminal is in the process of receiving multiple video streams, that is, a composite stream and at least one independent stream.
  • the way to decode For example, the server instructs the conference terminal to decode the video code stream carrying the identification information "1" using decoding method one, and the video code stream carrying the identification information "2" to decode the video code stream carrying the identification information as " The 3" video code stream is decoded by decoding method three.
  • the decoding display strategy also includes a display indication, which is used to inform the conference terminal server of the composite code stream sent by the conference terminal server and the mapping of each code stream in the at least one independent code stream to the display area.
  • a display indication which is used to inform the conference terminal server of the composite code stream sent by the conference terminal server and the mapping of each code stream in the at least one independent code stream to the display area.
  • the display instructions are not necessary for the decoding display strategy, because in these examples, the conference terminal can set a corresponding number of display areas on the display screen according to the number of video streams that it will receive. For example, suppose the conference terminal determines that the number of video streams that it will receive in this meeting is k channels after negotiating with the server, then the conference terminal side can set k display areas, and whenever a video is received and decoded After the code stream, the conference terminal randomly selects a video picture for displaying the video code stream from the display area of the currently unfilled picture.
  • the video screen corresponding to the composite stream contains at least two video images on the conference terminal side at the same time, if the composite stream cannot be guaranteed to be displayed in a large display area, it will cause the video screen The characters etc. are very small, and the user is struggling to watch the details.
  • FIG 3 there are three video sources a3, b3, and c3 in a video conference.
  • the server processes the video streams of a3 and b3 to form a composite stream, and c3 continues to be an independent stream.
  • only two display areas need to be set on each conference terminal side.
  • the composite stream occupies one display area, and the independent stream occupies the other display area. In this way, the video images corresponding to a3 and b3 need to be shared.
  • a display area which makes the video pictures on both sides of a3 and b3 only half the size of the video picture on the side of c3, which not only makes it difficult for users to see the pictures on both sides of a3 and b3, but also does not conform to the user's video conference habit.
  • the decoding display strategy includes the mapping relationship between multiple video streams and multiple display areas.
  • the server can ensure that the video images of multiple video sources are on the conference terminal side. For example, to ensure that each video source uses the same size display area for display, and for example, to ensure that the video images of multiple video sources can be spliced and displayed in a whole area, as shown in Figure 4: Six conference terminals a4, b4, c4, d4, e4, and f4 participate in the conference, and each conference terminal has its own camera turned on. Therefore, there are a total of six video sources.
  • the server transfers the video codes of a4, b4, c4, and d4
  • the stream is processed into a composite stream, and the video streams of the other two video sources are treated as independent streams.
  • the server writes the display area corresponding to each video stream to each conference terminal.
  • the first area 401 is used to display the video picture corresponding to the composite stream.
  • the server Decide; the second area 402 is used to display the video screen corresponding to e4, and the third area 403 is used to display the video screen corresponding to f4.
  • the pictures of the six video sources are displayed in one area and will not be distributed everywhere on the screen.
  • the video picture specifications of multiple video sources are consistent, which conforms to the user's video viewing habits. .
  • S206 The server sends the decoded display strategy to each conference terminal.
  • the server After the server determines the decoding display strategy, it can send the decoding display strategy to each conference terminal so that each conference terminal can understand the implementation plan of the video conference.
  • the server decodes and synthesizes the video code streams of the m video sources among the n video sources of the video conference, and re-encodes the synthesized pictures according to the encoding mode corresponding to the decoding mode in the decoding display strategy to form a synthesized code stream.
  • the server decides in the process of negotiating with the conference terminal that it will process the video stream of m video sources to form a composite stream.
  • the code stream will be sent to each conference terminal as an independent code stream, where m is less than n.
  • the server After the server receives the video stream used to form the composite stream, it can decode, synthesize and re-encode the video stream to form a composite stream.
  • the server and multiple conference terminals determine which video source streams form the composite stream during the negotiation phase (that is, the determination phase of the decoding display strategy), and the server is generating the composite stream. You must wait until the video streams of these video sources have been received.
  • the server does not specify which video source streams constitute the composite stream in the decoding and display strategy. Therefore, the server can temporarily determine which video sources are used according to the actual situation of the video conference. The video streams are combined together.
  • the server may select the first m video code streams to form a composite code stream according to the order in which it receives video code streams from multiple video sources. Please refer to the flowchart shown in FIG. 5 for the server to decode, synthesize and re-encode the video stream of some video sources to obtain the composite stream.
  • S502 Obtain video code streams of the first m video sources according to the sequence of receiving video code streams from multiple video sources.
  • S504 Decode video code streams of m video sources.
  • S506 Perform picture synthesis on the decoding results corresponding to the m video code streams to obtain a synthesized picture.
  • S210 The server sends the composite code stream and the remaining n-m independent code streams received from the video source to each conference terminal.
  • the server can send the composite code stream generated by its own processing to each conference terminal.
  • the server can also send the video code stream that it receives as an independent code stream to each conference terminal.
  • the sending sequence has been described in more detail in the foregoing embodiment, and will not be repeated here.
  • S212 The conference terminal decodes and displays the video code stream according to the decoding and display strategy.
  • the conference terminal After receiving the video code stream sent by the server, the conference terminal can decode the composite code stream and at least one independent code stream according to the decoding mode indicated by the decoding instruction in the decoding display strategy. Subsequently, the conference terminal fills the video images of the composite code stream and at least one independent code stream into the corresponding display area for display according to the display instructions in the decoding display strategy.
  • the server can obtain the video coding and decoding capability parameters of each conference terminal, and then based on the server’s own video coding and decoding capabilities and the video coding and decoding capabilities of multiple conference terminals. , To determine whether you can process the video streams of all video sources in the video conference to form a composite stream.
  • the server determines that it can process the video streams of all video sources in the video conference to form a composite stream
  • the server In the subsequent process, the video code streams of multiple video sources can still be processed according to the video conference solution provided in related technologies; but if the server determines that its own video codec capability is lower than the video code of all video sources in the video conference If the stream is processed into a composite code stream, it can only process the video code stream of some of the video sources, and at the same time make full use of the decoding resources on the side of each conference terminal, thereby reducing the delay of the video screen on the side of each conference terminal. Enhance the meeting experience of users participating in the video conference.
  • the conference initiator creates a conference on the MCU management platform.
  • the conference initiator is actually a conference terminal in the video conference, and the conference terminal is usually held by the conference host.
  • the conference initiator creates a conference on the MCU management platform, which is equivalent to opening a network "meeting room" on the management platform.
  • S604 The MCU notifies the conference terminal that needs to participate in the conference to enter the video conference.
  • the conference terminal reports its own video coding and decoding capability parameters to the MCU.
  • S608 The MCU determines its own optimal decoding capability and the optimal decoding capability of the conference terminal, and notifies the conference terminal.
  • the MCU will select the screen layout by default.
  • the MCU will tell the conference terminal the screen layout.
  • the control message sent by the MCU to the conference terminal includes the number of multi-screens, multi-screen layout mode, and the content of each sub-screen (such as the main video source, Auxiliary video source, the first channel of main video decoding, the second channel of main video decoding, the first channel of auxiliary video decoding, etc.).
  • the conference terminal encodes the local video according to the negotiated encoding method and sends it to the MCU.
  • S612 The MCU processes the received video code stream and sends it to each conference terminal.
  • the MCU After receiving the video stream, the MCU performs video stream decoding, picture synthesis, re-encoding, and then sends it to the conference terminal. As shown in Figure 7, there are only three far-end (Far) video streams and one local (Near) video stream. The decoding capability of the MCU is completely sufficient. Therefore, the MCU can be responsible for the decoding of all video streams, picture synthesis and Re-encode to form a composite code stream and then send it to each conference terminal. At this time, the video code stream sent by the server only has a composite code stream, not an independent code stream.
  • the MCU will select some of the video streams to form a composite stream according to the negotiated decoding capabilities and by default according to the order in which it receives the stream. Perform decoding, and then perform picture synthesis after the decoding is completed, and then encode the picture synthesis data to form a composite code stream and send it to each conference terminal. For the rest of the video stream, the MCU directly carries the identification tag and sends it to each conference terminal. As shown in Figure 8, a video conference with a multi-screen layout includes six video sources. After negotiation, the MCU will process the video streams of the four video sources, and each conference terminal will process the video of the remaining two video sources. Code stream.
  • the video streams of Far1, Far, Far3, and Far4 are decoded by the MCU, synthesized and then coded to form a composite stream and then sent to the conference terminal; the two streams of Far5 and Near are received by the MCU and labeled with the corresponding identification labels, and then Sent to each conference terminal.
  • S614 The conference terminal decodes and displays the received video code stream.
  • the conference terminal decodes the composite code stream sent by the MCU and displays it in the corresponding area; on the other hand, the conference terminal decodes the independent code stream after receiving it, and then displays it in the corresponding area according to the corresponding identification label.
  • the conference terminal fills the video pictures corresponding to the composite code stream into the designated area, and fills the video pictures corresponding to other independent code streams into the designated area in the multi-picture layout according to the corresponding tags.
  • the storage medium can store at least one computer program that can be read, compiled, and executed by at least one processor.
  • the storage medium can store the first video conference.
  • At least one of a program and a second video conference program wherein the first video conference program can be used by at least one processor to execute a process on the conference terminal side that implements any one of the video conference methods introduced in the foregoing embodiments.
  • the second video conference program can be used by at least one processor to execute the server-side process for implementing any of the video conference methods introduced in the foregoing embodiments.
  • the conference terminal 90 includes a first processor 91, a first memory 92, and a first communication bus configured to connect the first processor 91 and the first memory 92 93.
  • the first memory 92 may be the aforementioned storage medium storing the first video conference program, and the first processor 91 may read the first video conference program, compile and execute the video conference method introduced in the foregoing embodiment. The process on the terminal side.
  • the first processor 91 receives the composite code stream and at least one independent code stream sent by the server, where the composite code stream is formed by the server decoding, synthesizing and re-encoding the video code streams of some video sources in the video conference, and at least one independent code stream is formed by the server.
  • the code stream is the video code stream that the server receives from a video source other than the part of the video source and forwards it to the conference terminal.
  • the first processor 91 respectively decodes the composite code stream and the at least one independent code stream, and displays the composite code stream and the video picture corresponding to the at least one independent code stream.
  • the first processor 91 before receiving the composite code stream and at least one independent code stream sent by the server, the first processor 91 will first send the video codec capability parameters of the conference terminal to the server, and receive the server sent
  • the decoding and display strategy is determined by the server according to the server’s own video coding and decoding capabilities and the video coding and decoding capabilities of each conference terminal in the video conference.
  • the decoding and display strategy is used to instruct the conference terminal to compare the composite code stream with at least one Decoding display mode of independent stream.
  • the video encoding and decoding capability parameters sent to the server include encoding parameters and decoding parameters.
  • the encoding parameters include the encoding capability of the conference terminal; the decoding parameters include the decoding capability of the conference terminal, the meeting speed, At least one of frame rate and format information.
  • the decoding display strategy includes a decoding instruction and a display instruction, where the decoding instruction is used to indicate how the conference terminal decodes the synthesized code stream and at least one independent code stream; the display instruction is used to indicate the synthesized code stream and at least one independent code stream sent by the server.
  • the first processor 91 may decode the composite code stream and the at least one independent code stream according to the decoding mode indicated by the decoding instruction.
  • the first processor 91 fills the composite code stream and the video picture of the at least one independent code stream into the corresponding display area for display according to the display instruction.
  • the server 100 includes a second processor 101, a second memory 102, and a second communication bus 103 configured to connect the second processor 101 and the second memory 102
  • the second memory 102 may be the aforementioned storage medium storing the second video conference program
  • the second processor 101 may read the second video conference program, compile and execute the server-side video conference method introduced in the foregoing embodiment. Process.
  • the second processor 101 receives the video code stream sent by the video source in the video conference, and then decodes, synthesizes and re-encodes the video code stream of part of the video source to obtain the synthesized code stream and send it to each conference terminal, and remove all The video code streams of the video sources other than the above-mentioned video sources are forwarded to each conference terminal as an independent code stream.
  • the second processor 101 may also first obtain the video codec capability parameters of each conference terminal in the video conference, and then according to the video codec of each conference terminal
  • the capability parameters and the video encoding and decoding capabilities of the server determine the decoding and display strategy of the video conference, where the decoding and display strategy is used to indicate the decoding and display mode of each conference terminal on the composite code stream and at least one independent code stream.
  • the second processor 101 sends the decoded display strategy to each conference terminal.
  • the second processor 101 before the second processor 101 decodes, synthesizes and re-encodes the video code streams of some video sources, it also first performs the following steps according to the video codec capability parameters of each conference terminal and the server The video encoding and decoding capabilities of the server determine that the encoding and decoding capabilities of this server are lower than the processing requirements for processing the video streams of all video sources in the video conference into composite streams.
  • the composite code stream is formed by video code streams of m video sources, and m is greater than or equal to 2.
  • the second processor 101 decodes, synthesizes and re-images the video code streams of some video sources.
  • the composite code stream is obtained by encoding
  • the video code streams of the first m video sources are obtained in the order in which the video code streams are received from multiple video sources, and then the video code streams of m video sources are decoded, and then m videos are decoded.
  • the decoded result corresponding to the code stream is subjected to picture synthesis to obtain a synthesized picture.
  • the second processor 101 encodes the composite picture to obtain a composite code stream in accordance with the encoding method corresponding to the decoding method in the decoding display strategy.
  • the video conference system 11 includes a server 100 and a plurality of conference terminals 90.
  • the server 100 may be an MCU server, and the conference terminal may be implemented in various forms.
  • the conference terminal may include mobile terminals such as mobile phones, tablet computers, notebook computers, PDAs, PDAs (Personal Digital Assistants), navigation devices, wearable devices, smart bracelets, pedometers, etc., as well as mobile terminals such as digital TV, Fixed terminals such as desktop computers.
  • the server during the video conference, after the server receives the video code streams sent by multiple video sources in the video conference, it only decodes, synthesizes and re-encodes the video code streams of some video sources , Forming a composite code stream and sending it to the conference terminal.
  • the server sends the video code stream of the video source other than the part of the video source as an independent code stream to each conference terminal, so that the conference terminal can compare the composite code stream with at least An independent code stream is decoded and displayed.
  • the server does not need to decode, synthesize and re-encode the video code streams of all video sources. Therefore, the requirement on the server-side codec capability is reduced.
  • the computer-readable medium may include computer storage. Medium (or non-transitory medium) and communication medium (or temporary medium).
  • medium or non-transitory medium
  • communication medium or temporary medium
  • the term computer storage medium includes volatile and non-volatile data implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Sexual, removable and non-removable media.
  • Computer storage media include random access memory (Random Access Memory, RAM), read-only memory (Read-Only Memory, ROM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read Only Memory, EEPROM), flash memory or other Memory technology, compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, magnetic disk storage or other magnetic storage devices Or it can be set as any other medium that stores the desired information and can be accessed by the computer.
  • RAM Random Access Memory
  • ROM read-only memory
  • EEPROM Electrically erasable programmable read-only memory
  • flash memory or other Memory technology
  • CD-ROM compact Disc Read-Only Memory
  • DVD Digital Versatile Disc
  • magnetic cassettes magnetic tapes
  • magnetic disk storage or other magnetic storage devices Or it can be set as any other medium that stores the desired information and can be accessed by the computer.
  • communication media usually contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as carrier waves or other transmission mechanisms, and may include any information delivery media. . Therefore, this application is not limited to any specific combination of hardware and software.

Abstract

Provided in the embodiments of the present application are a video meeting method, a meeting terminal, a server, and a storage medium; in the process of a video meeting, a server receives a video code stream sent by a plurality of video sources in the video meeting, implements decoding, picture synthesis, and re-encoding of the code stream of only part of the video sources to form a merged code stream, and sends same to the meeting terminals; the server simultaneously sends the video code stream of the video sources other than said part of the video sources as an independent code stream to each meeting terminal; and the meeting terminal decodes and displays the merged code stream and the at least one independent code stream.

Description

一种视频会议方法、会议终端、服务器及存储介质Video conference method, conference terminal, server and storage medium
本申请要求在2019年11月14日提交中国专利局、申请号为201911115565.3的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on November 14, 2019 with application number 201911115565.3. The entire content of this application is incorporated into this application by reference.
技术领域Technical field
本申请涉及视频会议技术领域,例如一种视频会议方法、会议终端、服务器及存储介质。This application relates to the technical field of video conferencing, such as a video conferencing method, a conference terminal, a server, and a storage medium.
背景技术Background technique
在云会议系统中的视频会议中,多个会议终端先将本地的视频数据编码后发给MCU(Multi Control Unit,多点控制单元)服务器,MCU服务器收到多个会议终端的视频码流后,先对这些视频码流进行解码,然后再进行多画面合成,最后再将多画面对应的视频数据进行编码发给参加视频会议的每个会议终端。每个会议终端收到MCU服务器的视频码流后进行数据解码,然后进行画面展示。In the video conference in the cloud conference system, multiple conference terminals encode the local video data and send it to the MCU (Multi Control Unit) server. After the MCU server receives the video stream from multiple conference terminals , First decode these video streams, then perform multi-picture synthesis, and finally encode the video data corresponding to the multi-pictures to each conference terminal participating in the video conference. Each conference terminal receives the video code stream from the MCU server, decodes the data, and then displays the picture.
如果一次视频会议中参与会议的会议终端比较多,则视频会议过程中,MCU服务器就需要同时处理多个视频源的视频码流:将从多个视频源处接收到的视频码流进行解码及视频画面合成,然后再对合成画面进行编码。因此,这个处理过程要求MCU服务器具有较高的处理能力,一旦MCU服务器性能不够,则会导致会议终端侧的视频画面存在延时,影响与会用户的会议体验。If there are more conference terminals participating in the conference in a video conference, the MCU server needs to process the video streams of multiple video sources at the same time during the video conference: decode and decode the video streams received from multiple video sources. The video picture is synthesized, and then the synthesized picture is encoded. Therefore, this processing process requires the MCU server to have high processing capabilities. Once the MCU server has insufficient performance, it will cause a delay in the video picture on the conference terminal side, which will affect the conference experience of the users participating in the conference.
发明内容Summary of the invention
本申请实施例提供的视频会议方法、会议终端、服务器及存储介质,避免了MCU服务器处理性能不够造成会议终端侧视频画面延时严重,与会用户会议体验不佳的情况。The video conference method, conference terminal, server, and storage medium provided by the embodiments of the present application avoid the situation that insufficient processing performance of the MCU server causes serious video picture delay on the conference terminal side and poor conference experience of the users participating in the conference.
本申请实施例提供一种视频会议方法,应用于会议终端,包括:The embodiment of the present application provides a video conference method, which is applied to a conference terminal, and includes:
接收服务器发送的合成码流与至少一个独立码流,合成码流由服务器对视 频会议中部分视频源的视频码流进行解码、画面合成及重新编码形成,至少一个独立码流为服务器自除所述部分视频源之外的视频源处接收到后转发给所述会议终端的视频码流;Receive the composite code stream sent by the server and at least one independent code stream. The composite code stream is formed by the server decoding, synthesizing and re-encoding the video code streams of some video sources in the video conference. At least one independent code stream is the server's self-exit The video code stream that is received at a video source other than the part of the video source and then forwarded to the conference terminal;
分别对合成码流与至少一个独立码流进行解码;Respectively decode the composite code stream and at least one independent code stream;
对合成码流与至少一个独立码流对应的视频画面进行显示。Display the video picture corresponding to the composite code stream and the at least one independent code stream.
本申请实施例还提供一种视频会议方法,包括:The embodiment of the present application also provides a video conference method, including:
接收视频会议中视频源发送的视频码流;Receive the video stream sent by the video source in the video conference;
对部分视频源的视频码流进行解码、画面合成及重新编码,得到合成码流发送给所述视频会议中的每个会议终端,并将除所述部分视频源之外的视频源的视频码流作为独立码流转发给所述视频会议中的每个会议终端。Decode, synthesize and re-encode the video code streams of some video sources to obtain a synthesized code stream and send it to each conference terminal in the video conference, and combine the video codes of video sources other than the part of the video source The stream is forwarded to each conference terminal in the video conference as an independent code stream.
本申请实施例还提供一种会议终端,会议终端包括第一处理器、第一存储器及第一通信总线;An embodiment of the present application also provides a conference terminal, which includes a first processor, a first memory, and a first communication bus;
第一通信总线设置为实现第一处理器和第一存储器之间的连接通信;The first communication bus is configured to realize connection and communication between the first processor and the first memory;
第一处理器设置为执行第一存储器中存储的至少一个程序,以实现上述第一种视频会议方法。The first processor is configured to execute at least one program stored in the first memory, so as to implement the above-mentioned first video conference method.
本申请实施例还提供一种服务器,服务器包括第二处理器、第二存储器及第二通信总线;An embodiment of the present application also provides a server, which includes a second processor, a second memory, and a second communication bus;
第二通信总线设置为实现第二处理器和第二存储器之间的连接通信;The second communication bus is configured to realize connection and communication between the second processor and the second memory;
第二处理器设置为执行第二存储器中存储的至少一个程序,以实现上述第二种的视频会议方法。The second processor is configured to execute at least one program stored in the second memory to implement the second video conference method described above.
本申请实施例还提供一种存储介质,存储介质存储有第一视频会议程序与第二视频会议程序中的至少一个,第一视频会议程序可被至少一个处理器执行,以实现上述第一种视频会议方法;第二视频会议程序可被至少一个处理器执行,以实现上述第二种的视频会议方法。An embodiment of the present application also provides a storage medium. The storage medium stores at least one of a first video conference program and a second video conference program. The first video conference program can be executed by at least one processor, so as to implement the above-mentioned first Video conference method; the second video conference program can be executed by at least one processor to implement the above-mentioned second video conference method.
根据本申请实施例提供的视频会议方法、会议终端、服务器及存储介质,在视频会议过程中,服务器接收到视频会议中多个视频源发送的视频码流后, 仅对部分视频源的视频码流进行解码、画面合成及重新编码,形成合成码流并发送给视频会议中的每个会议终端,同时服务器将除所述部分视频源之外的视频源的视频码流作为的独立码流也发送给所述视频会议中的每个会议终端,让会议终端对合成码流与至少一个独立码流进行解码显示,这种会议方案中服务器因为不需要对全部视频源的视频码流进行解码、画面合成以及重新编码,因此,降低了对服务器侧编解码能力的要求。对于服务器处理能力范围之外的其他视频码流,直接发送给会议终端,从而充分利用了会议终端侧的处理资源,降低了会议终端侧视频画面的延时,提升了视频会议的流畅度,增强了用户体验。According to the video conference method, conference terminal, server, and storage medium provided by the embodiments of the present application, in the video conference process, after the server receives the video code streams sent by multiple video sources in the video conference, it only performs video codes for some video sources. The stream is decoded, combined and re-encoded to form a composite code stream and sent to each conference terminal in the video conference. At the same time, the server uses the video code stream of the video source other than the part of the video source as an independent code stream. It is sent to each conference terminal in the video conference, so that the conference terminal decodes and displays the composite code stream and at least one independent code stream. In this kind of conference solution, the server does not need to decode the video code streams of all video sources. Picture synthesis and re-encoding, thus reducing the requirements for server-side encoding and decoding capabilities. For other video streams outside the processing capacity of the server, they are directly sent to the conference terminal, thereby making full use of the processing resources on the conference terminal side, reducing the delay of the video screen on the conference terminal side, improving the smoothness of the video conference, and enhancing Improve the user experience.
附图说明Description of the drawings
图1为本申请实施例一中提供的视频会议方法的一种交互流程图;FIG. 1 is an interactive flowchart of the video conference method provided in Embodiment 1 of this application;
图2为本申请实施例二中提供的视频会议方法的一种交互流程图;2 is an interactive flowchart of the video conference method provided in Embodiment 2 of this application;
图3为本申请实施例二中提供的会议终端显示多个视频源的视频画面的一种示意图;3 is a schematic diagram of the conference terminal provided in the second embodiment of the application displaying video images of multiple video sources;
图4为本申请实施例二中提供的会议终端显示多个视频源的视频画面的另一种示意图;4 is another schematic diagram of the conference terminal provided in the second embodiment of the application displaying video images of multiple video sources;
图5为本申请实施例二中示出的服务器对部分视频源的视频码流进行处理形成合成码流的一种流程图;FIG. 5 is a flow chart of the server shown in the second embodiment of the application processing the video code streams of some video sources to form a composite code stream;
图6为本申请实施例三中提供的视频会议方法的一种交互流程图;FIG. 6 is an interactive flowchart of the video conference method provided in Embodiment 3 of this application;
图7为本申请实施例三中提供的会议终端视频会议画面布局的一种示意图;FIG. 7 is a schematic diagram of the video conference screen layout of the conference terminal provided in the third embodiment of the application;
图8为本申请实施例三中提供的会议终端视频会议画面布局的另一种示意图;FIG. 8 is another schematic diagram of the video conference screen layout of the conference terminal provided in the third embodiment of the application;
图9为本申请实施例四中提供的会议终端的一种硬件结构示意图;FIG. 9 is a schematic diagram of a hardware structure of a conference terminal provided in Embodiment 4 of this application;
图10为本申请实施例四中提供的服务器的一种硬件结构示意图;10 is a schematic diagram of a hardware structure of the server provided in the fourth embodiment of the application;
图11为本申请实施例四中提供的视频会议系统的一种示意图。FIG. 11 is a schematic diagram of a video conference system provided in Embodiment 4 of this application.
具体实施方式Detailed ways
实施例一:Example one:
为了避免相关技术中因为MCU服务器处理性能不足,导致MCU服务器在对来自多个视频源的视频码流进行解码、画面合成及重新编码处理时的效率低下,进而导致视频会议中多个会议终端显示的视频画面延时严重,影响与会用户会议体验的情况,本实施例提供一种视频会议方法,请参见图1示出的视频会议方法的一种交互流程图。In order to avoid the insufficient processing performance of the MCU server in the related technology, the MCU server's inefficiency in decoding, synthesizing and re-encoding video streams from multiple video sources is inefficient, which in turn leads to the display of multiple conference terminals in the video conference. The video picture delay is serious, which affects the meeting experience of the users participating in the meeting. This embodiment provides a video conference method. Please refer to an interactive flowchart of the video conference method shown in FIG. 1.
S102:服务器接收视频会议中视频源发送的视频码流。S102: The server receives the video code stream sent by the video source in the video conference.
在本实施例中,服务器可以是MCU服务器,MCU服务器实质上是一台多媒体信息交换机,进行多点呼叫和连接,实现视频广播、视频选择、音频混合、数据广播等功能,完成多个终端信号的汇接与切换。In this embodiment, the server may be an MCU server. The MCU server is essentially a multimedia information exchange, which performs multipoint calls and connections, realizes functions such as video broadcasting, video selection, audio mixing, and data broadcasting, and completes multiple terminal signals. Tandem and switch.
可以理解的是,当视频会议的发起者通过自己的会议终端向服务器发起了视频会议之后,服务器将会通知对应的会议终端,让这些会议终端加入到该视频会议当中。在视频会议过程中,每个会议终端侧的媒体采集设备,例如摄像头,可以采集到每个会议终端侧的图像信息,从而形成视频。会议终端对媒体采集设备采集到的视频进行编码之后就可以形成本会议终端的视频码流,然后将该视频码流发送给服务器。It is understandable that after the initiator of the video conference initiates the video conference to the server through its own conference terminal, the server will notify the corresponding conference terminal to allow these conference terminals to join the video conference. During a video conference, the media collection device on the side of each conference terminal, such as a camera, can collect image information on the side of each conference terminal to form a video. After the conference terminal encodes the video collected by the media collection device, the video code stream of the conference terminal can be formed, and then the video code stream is sent to the server.
需要说明的是,通常情况下,会议终端侧的媒体采集设备除了摄像头以外,还可以包括麦克风等,麦克风设置为采集与会用户的音频信息。在会议终端采集到的视频当中也是同时包括图像信息与音频信息的。It should be noted that, in general, the media collection device on the conference terminal side may include a microphone, etc., in addition to a camera, and the microphone is set to collect audio information of users participating in the conference. The video collected by the conference terminal also includes both image information and audio information.
在一场视频会议当中,通常会有多个与会用户,因此,对于服务器而言,会接收到视频会议中多个会议终端发送的视频码流。从服务器的角度来看,这些提供视频码流的会议终端就是视频源。在一些会议场景下,部分与会用户可能会关闭自己的摄像头,也即在会议过程中不提供本会议终端的图像信息,在 这种情况下,可以认为该会议终端不是视频源。In a video conference, there are usually multiple participants. Therefore, as far as the server is concerned, the video stream sent by multiple conference terminals in the video conference will be received. From the point of view of the server, these conference terminals that provide video streams are video sources. In some conference scenarios, some participants may turn off their cameras, that is, do not provide image information of the conference terminal during the conference. In this case, the conference terminal can be considered as not a video source.
S104:服务器对部分视频源的视频码流进行解码、画面合成及重新编码,得到合成码流发送给每个会议终端,并将除所述部分视频源之外的视频源的视频码流作为独立码流转发给每个会议终端。S104: The server decodes, synthesizes and re-encodes the video code streams of some video sources, obtains a synthesized code stream and sends it to each conference terminal, and treats the video code streams of video sources other than the part of the video sources as independent The code stream is forwarded to each conference terminal.
服务器在接收到视频源发送的视频码流后,可以仅对其中部分视频码流进行解码、画面合成以及重新编码,对于除上述部分视频码流之外的视频源的视频码流,服务器不做解码、画面合成等处理。例如,假定一场视频会议中包含四个视频源,则服务器可以仅对其中三个视频源的视频码流进行解码、画面合成以及重新编码,形成合成码流,对于另外一个视频码流,则继续保持独立。当然,在另外一些示例当中,服务器也可以仅对其中两个视频源的视频码流进行解码、画面合成及重新编码处理,让剩余的两个视频码流继续保持独立。After the server receives the video code stream sent by the video source, it can decode, synthesize and re-encode only part of the video code stream. For the video code stream of the video source other than the above part of the video code stream, the server does not do Decoding, picture synthesis and other processing. For example, assuming that a video conference contains four video sources, the server can only decode, synthesize and re-encode the video streams of three of the video sources to form a composite stream. For the other video stream, then Continue to be independent. Of course, in other examples, the server may only decode, synthesize and re-encode the video streams of two of the video sources, so that the remaining two video streams continue to be independent.
未经服务器进行画面合成的视频码流中仅仅包含单个会议终端侧的视频画面,而合成码流的行程经历了画面合成,其中包含至少两个会议终端侧的视频画面,所以,相对于合成码流,这里可以将未经服务器解码、画面合成及重新编码的视频码流称为“独立码流”。The video code stream that has not been synthesized by the server only contains the video picture on the side of a single conference terminal, while the itinerary of the synthesized code stream has undergone picture synthesis, which contains at least two video pictures on the conference terminal side. Stream. Here, the video stream that has not been decoded by the server, synthesized and re-encoded by the server can be called "independent stream".
对接收到的视频码流进行处理得到合成码流后,服务器可以将合成码流发送给每个会议终端,以便在每个会议终端上显示合成码流对应的视频画面。另一方面,对于独立码流,服务器也需要将其发送给每个会议终端,以便每个会议终端显示独立码流对应的视频画面。基于合成码流与独立码流,会议终端上可以显示多个视频源侧的图像信息。值得说明的是,服务器发送合成码流的动作与发送独立码流的动作之间并无严格的时序关系,在一些情景中,服务器先发送合成码流,然后再发送独立码流,在另外一些情景当中,独立码流先于合成码流传输,甚至在一些示例当中,合成码流与独立码流是同时被传输到会议终端侧的。事实上,为了降低会议终端侧的画面延时,让会议终端能够及时对视频码流进行处理,从而尽早进行画面展示,服务器在某一视频码流可以传输 到会议终端侧的时候,可以立即进行传输,而不用关注其他视频码流。After processing the received video stream to obtain the composite stream, the server can send the composite stream to each conference terminal so as to display the video image corresponding to the composite stream on each conference terminal. On the other hand, for the independent code stream, the server also needs to send it to each conference terminal so that each conference terminal can display the video image corresponding to the independent code stream. Based on the composite code stream and the independent code stream, the conference terminal can display the image information of multiple video sources. It is worth noting that there is no strict timing relationship between the server’s action of sending the composite stream and the action of sending the independent stream. In some scenarios, the server sends the composite stream first, and then the independent stream. In other cases, the server sends the composite stream first, and then the independent stream. In the scenario, the independent code stream is transmitted before the composite code stream, and even in some examples, the composite code stream and the independent code stream are transmitted to the conference terminal side at the same time. In fact, in order to reduce the picture delay on the conference terminal side, so that the conference terminal can process the video stream in time, so as to display the picture as soon as possible. When a certain video stream can be transmitted to the conference terminal side, the server can proceed immediately. Transmission without paying attention to other video streams.
例如,在一种示例当中,在服务器对接收到的视频码流a1、c1以及d1进行解码、画面合成及重新编码的过程中,服务器又接收到了视频码流b1,因为服务器无须对视频码流b1进行额外处理,因此,其可以直接将视频码流b1作为独立码流发送到每个会议终端侧。随后,在合成码流生成完成之后,再将合成码流传输给会议终端。又例如,a2、b2、c2、d2以及e2五方进行视频会议,按照设定,服务器将b2、c2、d2三方的视频码流进行解码、画面合成及重新编码,而a2与e2两方的视频码流则分别作为独立码流。如果服务器接收到的第一个视频码流是a2的视频码流,则服务器可以直接将该视频码流独立码流发送到每个会议终端侧。随后,服务器依次接收到b2、c2、e2、d2的视频码流,在接收到b2、c2的视频码流之后,服务器可以先对这两个视频码流进行解码,在接收到e2的视频码流之后,将该视频码流转发给会议终端。在接收到d2的视频码流之后,服务器对其进行解码,然后与b2、c2两侧视频码流的解码结果进行画面合成,然后对合成画面进行重新编码,得到合成码流,并将合成码流发送到会议终端侧。For example, in an example, when the server decodes, synthesizes and re-encodes the received video streams a1, c1, and d1, the server receives the video stream b1 again, because the server does not need to process the video stream b1. b1 performs additional processing, so it can directly send the video bit stream b1 as an independent bit stream to each conference terminal side. Subsequently, after the synthesis code stream is generated, the synthesized code stream is transmitted to the conference terminal. For another example, the five parties a2, b2, c2, d2, and e2 conduct a video conference. According to the settings, the server decodes, synthesizes and re-encodes the video streams of the three parties b2, c2, and d2. The video streams are treated as independent streams. If the first video code stream received by the server is a video code stream of a2, the server can directly send the independent code stream of the video code stream to each conference terminal side. Subsequently, the server sequentially receives the video streams of b2, c2, e2, and d2. After receiving the video streams of b2 and c2, the server can decode the two video streams first. After receiving the video code of e2 After streaming, the video code stream is forwarded to the conference terminal. After receiving the d2 video code stream, the server decodes it, and then synthesizes the picture with the decoding results of the video code streams on both sides of b2 and c2, and then re-encodes the synthesized picture to obtain a composite code stream, and combine the code The stream is sent to the conference terminal side.
S106:会议终端分别对合成码流与至少一个独立码流进行解码。S106: The conference terminal respectively decodes the composite code stream and at least one independent code stream.
会议终端在接收到服务器发送的视频码流之后,可以对接收到的视频码流进行解码。可以理解的是,相关技术中的会议终端仅需要对一种视频码流进行解码,即仅需要对包含全部视频源视频画面的合成码流进行解码,所以解码方式仅仅与服务器侧的编码方式相对应,是固定唯一的。而本实施例中会议终端需要对至少两个视频码流(一个合成码流,至少一个独立码流)进行解码,独立码流与合成码流分别由不同的对象编码:合成码流由服务器编码,而独立编码则是由对应的会议终端编码。在一些示例当中,会议终端与服务器所采用的编码方式相同,则会议终端在对自服务器处接收到的独立码流与合成码流进行解码的时候,可以采用同样的解码方式,也即,会议终端不需要区分不同的视频码流来采用不同的解码方式。After receiving the video code stream sent by the server, the conference terminal can decode the received video code stream. It is understandable that the conference terminal in the related technology only needs to decode one video stream, that is, it only needs to decode the composite stream containing all video source video pictures, so the decoding method is only the same as the encoding method on the server side. Correspondence is fixed and unique. In this embodiment, the conference terminal needs to decode at least two video streams (one composite stream and at least one independent stream). The independent stream and the composite stream are encoded by different objects: the composite stream is encoded by the server. , And the independent code is coded by the corresponding conference terminal. In some examples, the conference terminal and the server use the same encoding method, then the conference terminal can use the same decoding method when decoding the independent code stream and the composite code stream received from the server, that is, the conference terminal The terminal does not need to distinguish between different video code streams to adopt different decoding methods.
但在更多情况下,会议终端与服务器的编码方式可能会不同,甚至不同会议终端彼此之间所采用的编码方式也不完全相同。在这些情况下,会议终端对自己接收到的视频码流进行解码的时候,对于不同的视频码流也需要采用不同的解码方式。However, in more cases, the encoding methods of the conference terminal and the server may be different, and even the encoding methods adopted by different conference terminals are not exactly the same. In these cases, when the conference terminal decodes the video code stream it receives, it also needs to adopt different decoding methods for different video code streams.
在本实施例的一些示例当中,服务器与多个会议终端可以预先约定彼此对每个视频码流的编解码方式,这样,当服务器将一个视频码流传输给对应的会议终端的时候,只需要在对应的视频码流中携带一个标识信息向对应会议终端指示该视频码流是哪一个视频码流即可。例如,服务器与会议终端约定对于合成码流采用解码方式一进行解码,对于独立码流采用解码方式二进行解码。这样,在服务器向会议终端发送合成码流的时候,只需要在该视频码流中携带上对应的标识信息,会议终端接收到视频码流之后,根据其携带的标识信息就可以查询到针对该视频码流需要采用解码方式一进行解码。In some examples of this embodiment, the server and multiple conference terminals can pre-appoint each other's coding and decoding methods for each video stream. In this way, when the server transmits a video code stream to the corresponding conference terminal, only It is sufficient to carry an identification information in the corresponding video code stream to indicate to the corresponding conference terminal which video code stream the video code stream is. For example, the server and the conference terminal agree to use decoding method 1 for decoding the composite code stream, and use decoding method 2 for the independent code stream to decode. In this way, when the server sends the composite code stream to the conference terminal, it only needs to carry the corresponding identification information in the video code stream. After the conference terminal receives the video code stream, according to the identification information carried by it, it can query the corresponding identification information. The video code stream needs to be decoded using the first decoding method.
S108:会议终端对合成码流与至少一个独立码流对应的视频画面进行显示。S108: The conference terminal displays the video picture corresponding to the composite code stream and the at least one independent code stream.
会议终端对自己所接收到视频码流完成解码之后,可以将对应的视频画面显示在屏幕上。在本实施例的一些示例当中,会议终端可以在对某一个视频码流解码后立即对该解码完成的视频码流对应的视频画面进行显示,并不需要等待对独立码流和合成码流的解码均完成之后再进行显示。After the conference terminal completes the decoding of the video code stream it has received, it can display the corresponding video picture on the screen. In some examples of this embodiment, the conference terminal can display the video screen corresponding to the decoded video stream immediately after decoding a certain video stream, and does not need to wait for the independent stream and composite stream to be displayed. Display after the decoding is completed.
本实施例提供的视频会议方法中,服务器可以仅对部分视频源的视频码流进行解码、画面合成以及重新编码,并不需要对所有的视频源的视频码流均进行该处理,对于其没有处理的视频码流,可以直接发送给每个会议终端,让每个会议终端直接解码显示,这样可以充分利用会议终端侧的处理资源,降低服务器的处理负担,降低视频画面的延时,提升视频会议的品质。In the video conference method provided in this embodiment, the server can only decode, synthesize and re-encode the video code streams of some video sources, and does not need to perform this processing on the video code streams of all video sources. The processed video stream can be directly sent to each conference terminal, allowing each conference terminal to directly decode and display, which can make full use of the processing resources on the conference terminal side, reduce the processing burden of the server, reduce the delay of the video picture, and improve the video The quality of the meeting.
实施例二:Embodiment two:
根据前述实施例的介绍可知,本申请实施例中服务器可以仅对部分视频源 的视频码流进行解码、画面合成以及重新编码形成合成码流,然后将合成码流发送给每个会议终端,至于除上述部分视频源的视频码流之外的视频码流,服务器直接将其发送给每个会议终端。According to the introduction of the foregoing embodiment, the server in the embodiment of the present application can only decode, synthesize and re-encode the video stream of some video sources to form a composite stream, and then send the composite stream to each conference terminal. As for In addition to the video code streams of some of the video sources mentioned above, the server directly sends them to each conference terminal.
可以理解的是,服务器所生成的合成码流中包括多少个视频码流,以及合成码流、独立码流的编解码方式如何都是可以预先设置的,甚至,服务器生成的合成码流中包括哪几个视频源的视频码流也都是可以预先设置的,例如,在一些示例当中,服务器与会议终端属于同一个厂家,设计人员可以在设备出厂之前先将这些内容固化在会议终端和服务器中;或者,程序人员可以以设备升级的方式将这些内容写入升级程序中,从而分别推送至服务器与会议终端侧。It is understandable that how many video streams are included in the composite stream generated by the server, and how the encoding and decoding methods of the composite stream and the independent stream can be set in advance, and even the composite stream generated by the server includes The video code streams of which video sources can also be set in advance. For example, in some examples, the server and the conference terminal belong to the same manufacturer, and the designer can solidify these contents on the conference terminal and the server before the equipment leaves the factory. Or, programmers can write these contents into the upgrade program in the way of equipment upgrade, and then push them to the server and conference terminal side respectively.
当然,在本实施例的一些示例当中,上述内容也是可以由服务器根据当前视频会议的情况自己决定的,下面结合图2示出的流程图对这种视频会议方法进行阐述。Of course, in some examples of this embodiment, the above content can also be determined by the server itself according to the current video conference situation. The following describes this video conference method with reference to the flowchart shown in FIG. 2.
S202:会议终端向服务器发送本会议终端的视频编解码能力参数。S202: The conference terminal sends the video codec capability parameter of the conference terminal to the server.
在本实施例中,当会议终端进入视频会议之后,可以向服务器发送自己的视频编解码能力参数,可以理解的是,会议终端发送视频编解码能力参数可以是主动向服务器上报,也可以是在接收到服务器的请求之后向服务器发送。In this embodiment, after the conference terminal enters the video conference, it can send its own video codec capability parameters to the server. It is understandable that the conference terminal can actively report the video codec capability parameters to the server or at After receiving the server's request, send it to the server.
视频编解码能力参数可以表征该会议终端自己的视频编解码能力,在本实施例的一种示例当中,视频编解码能力参数包括编码参数与解码参数,编码参数包括本会议终端的编码能力。解码参数包括本会议终端的解码能力、上会速度、帧频、以及格式信息中的至少一种。The video codec capability parameter may represent the video codec capability of the conference terminal. In an example of this embodiment, the video codec capability parameter includes coding parameters and decoding parameters, and the coding parameter includes the coding capability of the conference terminal. The decoding parameters include at least one of the decoding capability of the conference terminal, the meeting speed, the frame rate, and format information.
S204:服务器根据每个会议终端的视频编解码能力参数以及本服务器的视频编解码能力,确定视频会议的解码显示策略。S204: The server determines the decoding and display strategy of the video conference according to the video codec capability parameter of each conference terminal and the video codec capability of the server.
服务器在获取到每个会议终端的视频编解码能力参数之后,可以基于这些会议终端的编解码能力以及本服务器自身的视频编解码能力确定此次视频会议的解码显示策略。服务器基于此次视频会议中会议终端的数目,以及每个会议 终端对其本端视频进行编码的方式等,可以确定出此次视频会议中将全部视频源的视频码流处理成合成码流的处理要求。示例性的,服务器可以判断自身的视频编解码能力是否满足该处理要求,在服务器判断自身的视频编解码能力满足该处理要求的情况下,说明服务器可以对全部视频源的视频码流均进行解码、画面合成及重新编码,也即,服务器可以将所有视频源的视频码流处理成合成码流。但如果服务器经过判断,确定自己的视频编解码能力低于将视频会议中全部视频源的视频码流处理成合成码流的处理要求,也即服务器的处理能力不足以对全部视频源的视频码流进行解码、画面合成及重新编码,则服务器将会确定在视频会议的后续过程中,自己仅会对其中部分视频码流进行合成码流的处理。至于究竟对其中多少视频码流进行处理,对哪些视频码流进行处理,处理后的合成码流的编码方式如何,则需要结合每个会议终端的视频编解码能力确定。服务器需要保障的是自己发送给每个会议终端的视频码流,包括合成码流与至少一个独立码流,所采用的编码方式都是会议终端所支持的,否则如果某一个会议终端无法对自己接收到的视频码流进行解码,则会导致该会议终端无法显示至少某一个会议终端侧的视频画面。After the server obtains the video coding and decoding capability parameters of each conference terminal, it can determine the decoding and display strategy of the video conference based on the coding and decoding capabilities of these conference terminals and the video coding and decoding capabilities of the server itself. Based on the number of conference terminals in this video conference, and the way each conference terminal encodes its local video, etc., the server can determine that the video stream of all video sources will be processed into a composite stream in this video conference. Processing requirements. Exemplarily, the server can determine whether its own video codec capability meets the processing requirement. In the case where the server determines that its own video codec capability meets the processing requirement, it indicates that the server can decode all video code streams of all video sources. , Picture synthesis and re-encoding, that is, the server can process the video stream of all video sources into a composite stream. However, if the server determines that its video codec capability is lower than the processing requirements for processing the video code streams of all video sources in the video conference into a composite code stream, that is, the server’s processing capacity is not enough to deal with the video codes of all video sources. When the stream is decoded, screen synthesized and re-encoded, the server will determine that in the subsequent process of the video conference, it will only process part of the video stream to synthesize the stream. As for how many video streams are processed, which video streams are processed, and the encoding method of the processed composite stream, it needs to be determined based on the video encoding and decoding capabilities of each conference terminal. What the server needs to ensure is the video code stream that it sends to each conference terminal, including the composite code stream and at least one independent code stream. The encoding method adopted is supported by the conference terminal, otherwise if a conference terminal cannot respond to itself Decoding the received video code stream will cause the conference terminal to be unable to display at least one video screen on the side of the conference terminal.
在本实施例的一些示例当中,服务器确定的解码显示策略中包括解码指示,解码指示用于指示会议终端在对其接收到的多个视频码流,也即合成码流与至少一个独立码流进行解码的方式。例如,服务器指示会议终端对携带标识信息为“1”的视频码流采用解码方式一进行解码,对携带标识信息为“2”的视频码流采用解码方式二进行解码,对于携带标识信息为“3”的视频码流采用解码方式三进行解码,则服务器在后续过程中向每个会议终端发送视频码流的时候,只需要携带上对应的标识信息,会议终端就可以结合解码显示策略确定出对该视频码流的解码方式。In some examples of this embodiment, the decoding display strategy determined by the server includes a decoding instruction, and the decoding instruction is used to indicate that the conference terminal is in the process of receiving multiple video streams, that is, a composite stream and at least one independent stream. The way to decode. For example, the server instructs the conference terminal to decode the video code stream carrying the identification information "1" using decoding method one, and the video code stream carrying the identification information "2" to decode the video code stream carrying the identification information as " The 3" video code stream is decoded by decoding method three. When the server sends the video code stream to each conference terminal in the subsequent process, it only needs to carry the corresponding identification information, and the conference terminal can determine the result based on the decoding and display strategy. The decoding method of the video stream.
在本实施例的另外一些示例当中,解码显示策略中还包括显示指示,显示指示用于告知会议终端服务器所发送的合成码流和至少一个独立码流中的每一个码流与显示区域的映射关系,这样,当会议终端接收到一个视频码流并对其 进行解码之后,其可以基于解码显示策略中显示指示确定该将对应的视频画面显示到屏幕的哪个区域。In some other examples of this embodiment, the decoding display strategy also includes a display indication, which is used to inform the conference terminal server of the composite code stream sent by the conference terminal server and the mapping of each code stream in the at least one independent code stream to the display area. In this way, after the conference terminal receives a video code stream and decodes it, it can determine which area of the screen to display the corresponding video picture on the basis of the display indication in the decoding display strategy.
在一些示例当中,显示指示并不是解码显示策略中必须的内容,因为在这些示例当中,会议终端在显示屏上可以根据自己将会接收到的视频码流的路数设置对应数目的显示区域,例如,假定会议终端在与服务器进行协商之后确定在本次会议中自己将要接收到的视频码流的路数是k路,则会议终端侧可以设置k个显示区域,每当接收并解码一路视频码流后,会议终端随机从当前未填充画面的显示区域中选择一个用于显示该视频码流的视频画面即可。只不过,由于合成码流对应的视频画面中会同时包含至少两个会议终端侧的视频画面,因此,如果不能保证合成码流显示在一个面积较大的显示区域内,则会导致视频画面中的人物等很小,用户观看细节吃力的情况。如图3所示,视频会议中有a3、b3以及c3三个视频源,服务器会对a3与b3的视频码流进行处理形成合成码流,而c3则继续作为独立码流。按照前面的显示方案,每个会议终端侧仅需要设置两个显示区域,其中合成码流占用一个显示区域,独立码流将占用另一个显示区域,这样,a3、b3对应的视频画面就需要共用一个显示区域,这就使得a3、b3两侧的视频画面仅有c3侧视频画面的一半大小,这不仅使得用户难以看清楚a3、b3两侧的画面,也不符合用户视频会议的习惯。In some examples, the display instructions are not necessary for the decoding display strategy, because in these examples, the conference terminal can set a corresponding number of display areas on the display screen according to the number of video streams that it will receive. For example, suppose the conference terminal determines that the number of video streams that it will receive in this meeting is k channels after negotiating with the server, then the conference terminal side can set k display areas, and whenever a video is received and decoded After the code stream, the conference terminal randomly selects a video picture for displaying the video code stream from the display area of the currently unfilled picture. However, since the video screen corresponding to the composite stream contains at least two video images on the conference terminal side at the same time, if the composite stream cannot be guaranteed to be displayed in a large display area, it will cause the video screen The characters etc. are very small, and the user is struggling to watch the details. As shown in Figure 3, there are three video sources a3, b3, and c3 in a video conference. The server processes the video streams of a3 and b3 to form a composite stream, and c3 continues to be an independent stream. According to the previous display scheme, only two display areas need to be set on each conference terminal side. The composite stream occupies one display area, and the independent stream occupies the other display area. In this way, the video images corresponding to a3 and b3 need to be shared. A display area, which makes the video pictures on both sides of a3 and b3 only half the size of the video picture on the side of c3, which not only makes it difficult for users to see the pictures on both sides of a3 and b3, but also does not conform to the user's video conference habit.
在本实施例更多的示例当中,解码显示策略当中包括多个视频码流与多个显示区域之间的映射关系,通过这种方式,服务器可以保证多个视频源的视频画面在会议终端侧的显示效果,例如,保证每个视频源采用相同大小的显示区域进行显示,又例如,保证多个视频源的视频画面可以拼接显示在一个整体区域内,如图4所示:视频会议中有a4、b4、c4、d4、e4、f4六个会议终端参会,且每个会议终端都开启了自己的摄像头,因此,共有六个视频源,服务器将a4、b4、c4、d4的视频码流处理成合成码流,另外两个视频源的视频码流作为独立码流。服务器给每个会议终端写上了每个视频码流对应的显示区域,其中,第一区域401中用于显示合成码流对应的视频画面,至于视频画面内部每个子画 面的排布,由服务器决定;第二区域402用于显示e4对应的视频画面,第三区域403用于显示f4对应的视频画面。通过这种显示拼接,六个视频源的画面集中显示在一个区域内,不会分布在屏幕的各处,且在图4当中,多个视频源的视频画面规格一致,符合用户视频的观看习惯。In more examples of this embodiment, the decoding display strategy includes the mapping relationship between multiple video streams and multiple display areas. In this way, the server can ensure that the video images of multiple video sources are on the conference terminal side. For example, to ensure that each video source uses the same size display area for display, and for example, to ensure that the video images of multiple video sources can be spliced and displayed in a whole area, as shown in Figure 4: Six conference terminals a4, b4, c4, d4, e4, and f4 participate in the conference, and each conference terminal has its own camera turned on. Therefore, there are a total of six video sources. The server transfers the video codes of a4, b4, c4, and d4 The stream is processed into a composite stream, and the video streams of the other two video sources are treated as independent streams. The server writes the display area corresponding to each video stream to each conference terminal. Among them, the first area 401 is used to display the video picture corresponding to the composite stream. As for the arrangement of each sub-picture in the video picture, the server Decide; the second area 402 is used to display the video screen corresponding to e4, and the third area 403 is used to display the video screen corresponding to f4. Through this display splicing, the pictures of the six video sources are displayed in one area and will not be distributed everywhere on the screen. In Figure 4, the video picture specifications of multiple video sources are consistent, which conforms to the user's video viewing habits. .
S206:服务器将解码显示策略发送给每个会议终端。S206: The server sends the decoded display strategy to each conference terminal.
服务器确定出解码显示策略之后,可以将解码显示策略发送给每个会议终端,以便让每个会议终端了解此次视频会议的实现方案。After the server determines the decoding display strategy, it can send the decoding display strategy to each conference terminal so that each conference terminal can understand the implementation plan of the video conference.
S208:服务器对视频会议n个视频源中m个视频源的视频码流进行解码及画面合成,并根据与解码显示策略中解码方式对应的编码方式对合成画面进行重新编码形成合成码流。S208: The server decodes and synthesizes the video code streams of the m video sources among the n video sources of the video conference, and re-encodes the synthesized pictures according to the encoding mode corresponding to the decoding mode in the decoding display strategy to form a synthesized code stream.
在本实施例中,假定总共有n个视频源,而服务器在与会议终端协商的过程中决定自己将对其中m个视频源的视频码流进行处理形成合成码流,至于剩余的n-m个视频码流,将分别作为独立码流发送给每个会议终端,这里的m小于n。In this embodiment, it is assumed that there are a total of n video sources, and the server decides in the process of negotiating with the conference terminal that it will process the video stream of m video sources to form a composite stream. As for the remaining nm videos The code stream will be sent to each conference terminal as an independent code stream, where m is less than n.
所以,服务器在接收到用于形成合成码流的视频码流之后,可以对这些视频码流进行解码、画面合成以及重新编码,进而形成合成码流。在本实施例的一些示例当中,服务器与多个会议终端在协商阶段(即解码显示策略的确定阶段)确定了形成合成码流的是哪些视频源的视频码流,则服务器在生成合成码流之前也必须先等到这些视频源的视频码流都已接收到。但在本实施例的另外一些示例当中,服务器在解码显示策略中并未指定合成码流由哪些视频源的视频码流构成,因此,服务器可以根据视频会议的实际情况临时确定将哪些视频源的视频码流合成在一起。例如,在本实施例的一些示例当中,服务器可以按照自己从多个视频源处接收到视频码流的先后顺序选择前m个视频码流来形成合成码流。下面请参见图5示出的服务器对部分视频源的视频码流进行解码、画面合成及重新编码得到合成码流的流程图。Therefore, after the server receives the video stream used to form the composite stream, it can decode, synthesize and re-encode the video stream to form a composite stream. In some examples of this embodiment, the server and multiple conference terminals determine which video source streams form the composite stream during the negotiation phase (that is, the determination phase of the decoding display strategy), and the server is generating the composite stream. You must wait until the video streams of these video sources have been received. However, in some other examples of this embodiment, the server does not specify which video source streams constitute the composite stream in the decoding and display strategy. Therefore, the server can temporarily determine which video sources are used according to the actual situation of the video conference. The video streams are combined together. For example, in some examples of this embodiment, the server may select the first m video code streams to form a composite code stream according to the order in which it receives video code streams from multiple video sources. Please refer to the flowchart shown in FIG. 5 for the server to decode, synthesize and re-encode the video stream of some video sources to obtain the composite stream.
S502:按照从多个视频源处接收视频码流的先后顺序获取前m个视频源的视频码流。S502: Obtain video code streams of the first m video sources according to the sequence of receiving video code streams from multiple video sources.
S504:对m个视频源的视频码流进行解码。S504: Decode video code streams of m video sources.
S506:对m个视频码流对应的解码结果进行画面合成,得到合成画面。S506: Perform picture synthesis on the decoding results corresponding to the m video code streams to obtain a synthesized picture.
S508:按照与解码显示策略中解码方式对应的编码方式,对合成画面进行编码得到合成码流。S508: According to the encoding method corresponding to the decoding method in the decoding display strategy, encode the composite picture to obtain a composite code stream.
S210:服务器将合成码流以及剩余n-m个自视频源处接收到的独立码流发送给每个会议终端。S210: The server sends the composite code stream and the remaining n-m independent code streams received from the video source to each conference terminal.
服务器可以将自己处理生成的合成码流发送给每个会议终端,另一方面,服务器可以将自己接收到的作为独立码流的视频码流也发送给每个会议终端,对于这些视频码流的发送顺序,前述实施例中已经作为较为详细的说明,这里不再赘述。The server can send the composite code stream generated by its own processing to each conference terminal. On the other hand, the server can also send the video code stream that it receives as an independent code stream to each conference terminal. The sending sequence has been described in more detail in the foregoing embodiment, and will not be repeated here.
S212:会议终端根据解码显示策略对视频码流进行解码显示。S212: The conference terminal decodes and displays the video code stream according to the decoding and display strategy.
会议终端接收到服务器发送的视频码流后,可以按照解码显示策略中的解码指示所指示的解码方式对合成码流与至少一个独立码流进行解码。随后,会议终端按照解码显示策略中显示指示将合成码流和至少一个独立码流的视频画面填充到对应的显示区域进行显示。After receiving the video code stream sent by the server, the conference terminal can decode the composite code stream and at least one independent code stream according to the decoding mode indicated by the decoding instruction in the decoding display strategy. Subsequently, the conference terminal fills the video images of the composite code stream and at least one independent code stream into the corresponding display area for display according to the display instructions in the decoding display strategy.
本实施例提供的视频会议方法,在视频会议正式开始以前,服务器可以获取到每个会议终端的视频编解码能力参数,然后基于服务器自身的视频编解码能力以及多个会议终端的视频编解码能力,确定自己是否可以对视频会议中全部视频源的视频码流进行处理形成合成码流,在服务器判定自己可以对视频会议中全部视频源的视频码流进行处理形成合成码流的情况下,服务器在后续过程中仍然可以按照相关技术中提供的视频会议方案来处理多个视频源的视频码流;但如果服务器确定自己的视频编解码能力低于将此次视频会议中全部视频源的视频码流处理成合成码流的处理要求,则其可以仅对其中部分视频源的视 频码流进行处理,并同时充分利用每个会议终端侧的解码资源,进而降低每个会议终端侧视频画面的延时,提升视频会议参与用户的会议体验。In the video conference method provided in this embodiment, before the video conference officially starts, the server can obtain the video coding and decoding capability parameters of each conference terminal, and then based on the server’s own video coding and decoding capabilities and the video coding and decoding capabilities of multiple conference terminals. , To determine whether you can process the video streams of all video sources in the video conference to form a composite stream. When the server determines that it can process the video streams of all video sources in the video conference to form a composite stream, the server In the subsequent process, the video code streams of multiple video sources can still be processed according to the video conference solution provided in related technologies; but if the server determines that its own video codec capability is lower than the video code of all video sources in the video conference If the stream is processed into a composite code stream, it can only process the video code stream of some of the video sources, and at the same time make full use of the decoding resources on the side of each conference terminal, thereby reducing the delay of the video screen on the side of each conference terminal. Enhance the meeting experience of users participating in the video conference.
实施例三:Example three:
为了使本领域技术人员对本申请实施例所提供的视频会议方法的特点与细节有更清楚的了解,本实施例将结合示例对该视频会议方法进行详细阐述,请参见图6。In order to enable those skilled in the art to have a clearer understanding of the characteristics and details of the video conference method provided in the embodiments of the present application, this embodiment will describe the video conference method in detail in conjunction with examples, please refer to FIG. 6.
S602:会议发起端在MCU管理平台上创建会议。S602: The conference initiator creates a conference on the MCU management platform.
应当理解的是,会议发起端实际上也是视频会议中的一个会议终端,该会议终端通常由会议主持持有。会议发起端在MCU管理平台上创建会议,也相当于是在管理平台上开启了一个网络“会议室”。It should be understood that the conference initiator is actually a conference terminal in the video conference, and the conference terminal is usually held by the conference host. The conference initiator creates a conference on the MCU management platform, which is equivalent to opening a network "meeting room" on the management platform.
S604:MCU通知需要参加会议会的会议终端进入该视频会议。S604: The MCU notifies the conference terminal that needs to participate in the conference to enter the video conference.
S606:会议终端上报自身的视频编解码能力参数给MCU。S606: The conference terminal reports its own video coding and decoding capability parameters to the MCU.
S608:MCU将自身的最优解码能力、会议终端最优解码能力确定出来,并通知会议终端。S608: The MCU determines its own optimal decoding capability and the optimal decoding capability of the conference terminal, and notifies the conference terminal.
上会时MCU默认会选择画面布局,MCU会将画面布局告诉会议终端,MCU发送给会议终端的控制消息中包含了多画面数、多画面布局模式、每路子画面的内容(如主视频源、辅视频源、第一路主视频解码、第二路主视频解码、第一路辅视频解码等)。During the meeting, the MCU will select the screen layout by default. The MCU will tell the conference terminal the screen layout. The control message sent by the MCU to the conference terminal includes the number of multi-screens, multi-screen layout mode, and the content of each sub-screen (such as the main video source, Auxiliary video source, the first channel of main video decoding, the second channel of main video decoding, the first channel of auxiliary video decoding, etc.).
S610:会议终端将本地视频按照协商后的编码方式编码后发给MCU。S610: The conference terminal encodes the local video according to the negotiated encoding method and sends it to the MCU.
S612:MCU对接收到的视频码流进行处理后向每个会议终端发送。S612: The MCU processes the received video code stream and sends it to each conference terminal.
如果是单画面或者MCU解码能力完全满足的多画面布局(双画面、三画面)时,则MCU收到视频码流后,进行视频码流解码、画面合成、重新编码再发送给会议终端。如图7所示,只有三路远端(Far)视频码流、一路本端(Near) 视频码流,MCU的解码能力完全足够,因此,MCU可以负责全部视频码流的解码、画面合成及重新编码,形成合成码流后再发给每个会议终端。此时,服务器发送出去的视频码流中仅有合成码流,不包含独立码流。If it is a single-picture or a multi-picture layout (dual-picture, three-picture) that fully meets the decoding capabilities of the MCU, after receiving the video stream, the MCU performs video stream decoding, picture synthesis, re-encoding, and then sends it to the conference terminal. As shown in Figure 7, there are only three far-end (Far) video streams and one local (Near) video stream. The decoding capability of the MCU is completely sufficient. Therefore, the MCU can be responsible for the decoding of all video streams, picture synthesis and Re-encode to form a composite code stream and then send it to each conference terminal. At this time, the video code stream sent by the server only has a composite code stream, not an independent code stream.
如果画面布局是超出MCU解码能力的多画面布局,则MCU根据协商出的解码能力,默认按照自己接收码流的先后顺序,选择其中部分视频码流形成合成码流:MCU对这些视频码流先进行解码,解码完成以后再进行画面合成,再将画面合成数据编码形成合成码流,并发给每个会议终端。对于剩下的视频码流,MCU直接带上标识标签发送每个会议终端。如图8所示的多画面布局视频会议,一种包括六个视频源,经过协商之后,MCU将处理其中的四个视频源的视频码流,每个会议终端处理剩余两个视频源的视频码流。其中,Far1、Far、Far3、Far4的视频码流由MCU解码、画面合成再编码形成合成码流后再发给会议终端;Far5、Near两路码流MCU收到以后贴上相应标识标签,然后发送给每个会议终端。If the picture layout is a multi-picture layout that exceeds the decoding capabilities of the MCU, the MCU will select some of the video streams to form a composite stream according to the negotiated decoding capabilities and by default according to the order in which it receives the stream. Perform decoding, and then perform picture synthesis after the decoding is completed, and then encode the picture synthesis data to form a composite code stream and send it to each conference terminal. For the rest of the video stream, the MCU directly carries the identification tag and sends it to each conference terminal. As shown in Figure 8, a video conference with a multi-screen layout includes six video sources. After negotiation, the MCU will process the video streams of the four video sources, and each conference terminal will process the video of the remaining two video sources. Code stream. Among them, the video streams of Far1, Far, Far3, and Far4 are decoded by the MCU, synthesized and then coded to form a composite stream and then sent to the conference terminal; the two streams of Far5 and Near are received by the MCU and labeled with the corresponding identification labels, and then Sent to each conference terminal.
S614:会议终端对接收到的视频码流进行解码显示。S614: The conference terminal decodes and displays the received video code stream.
会议终端收到MCU发送的合成码流后解码,并显示在相应区域;另一方面,会议终端收到独立码流以后也会解码,然后根据相应标识标签在相应区域显示。The conference terminal decodes the composite code stream sent by the MCU and displays it in the corresponding area; on the other hand, the conference terminal decodes the independent code stream after receiving it, and then displays it in the corresponding area according to the corresponding identification label.
在多画面模式场景下,根据码流信息,会议终端将合成码流所对应的视频画面填充到指定区域,将其他独立码流对应的视频画面根据相应标签填充到多画面布局中指定区域。In the multi-picture mode scenario, according to the code stream information, the conference terminal fills the video pictures corresponding to the composite code stream into the designated area, and fills the video pictures corresponding to other independent code streams into the designated area in the multi-picture layout according to the corresponding tags.
实施例四:Embodiment four:
本实施例提供一种存储介质,该存储介质中可以存储有至少一个可供至少一个处理器读取、编译并执行的计算机程序,在本实施例中,该存储介质可以存储有第一视频会议程序、第二视频会议程序中的至少一个,其中,第一视频会议程序可供至少一个处理器执行实现前述实施例介绍的任意一种视频会议方 法会议终端侧的流程。第二视频会议程序可供至少一个处理器执行实现前述实施例介绍的任意一种视频会议方法服务器侧的流程。This embodiment provides a storage medium. The storage medium can store at least one computer program that can be read, compiled, and executed by at least one processor. In this embodiment, the storage medium can store the first video conference. At least one of a program and a second video conference program, wherein the first video conference program can be used by at least one processor to execute a process on the conference terminal side that implements any one of the video conference methods introduced in the foregoing embodiments. The second video conference program can be used by at least one processor to execute the server-side process for implementing any of the video conference methods introduced in the foregoing embodiments.
本实施例中还提供一种会议终端,如图9所示:会议终端90包括第一处理器91、第一存储器92以及设置为连接第一处理器91与第一存储器92的第一通信总线93,其中第一存储器92可以为前述存储有第一视频会议程序的存储介质,第一处理器91可以读取第一视频会议程序,进行编译并执行实现前述实施例中介绍的视频会议方法会议终端侧的流程。This embodiment also provides a conference terminal, as shown in FIG. 9: the conference terminal 90 includes a first processor 91, a first memory 92, and a first communication bus configured to connect the first processor 91 and the first memory 92 93. The first memory 92 may be the aforementioned storage medium storing the first video conference program, and the first processor 91 may read the first video conference program, compile and execute the video conference method introduced in the foregoing embodiment. The process on the terminal side.
第一处理器91接收服务器发送的合成码流与至少一个独立码流,其中,合成码流由服务器对视频会议中部分视频源的视频码流进行解码、画面合成及重新编码形成,至少一个独立码流为服务器自除所述部分视频源之外的视频源处接收到后转发给本会议终端的视频码流。随后,第一处理器91分别对合成码流与至少一个独立码流进行解码,并对合成码流与至少一个独立码流对应的视频画面进行显示。The first processor 91 receives the composite code stream and at least one independent code stream sent by the server, where the composite code stream is formed by the server decoding, synthesizing and re-encoding the video code streams of some video sources in the video conference, and at least one independent code stream is formed by the server. The code stream is the video code stream that the server receives from a video source other than the part of the video source and forwards it to the conference terminal. Subsequently, the first processor 91 respectively decodes the composite code stream and the at least one independent code stream, and displays the composite code stream and the video picture corresponding to the at least one independent code stream.
在本实施例的一种示例当中,第一处理器91接收服务器发送的合成码流与至少一个独立码流之前,还会先向服务器发送本会议终端的视频编解码能力参数,并接收服务器发送的解码显示策略,其中,由服务器根据服务器自身的视频编解码能力以及视频会议中的每个会议终端的视频编解码能力参数确定,解码显示策略用于指示本会议终端对合成码流与至少一个独立码流的解码显示方式。In an example of this embodiment, before receiving the composite code stream and at least one independent code stream sent by the server, the first processor 91 will first send the video codec capability parameters of the conference terminal to the server, and receive the server sent The decoding and display strategy is determined by the server according to the server’s own video coding and decoding capabilities and the video coding and decoding capabilities of each conference terminal in the video conference. The decoding and display strategy is used to instruct the conference terminal to compare the composite code stream with at least one Decoding display mode of independent stream.
在本实施例的一种示例当中,发送给服务器的视频编解码能力参数包括编码参数与解码参数,编码参数包括本会议终端的编码能力;解码参数包括本会议终端的解码能力、上会速度、帧频、以及格式信息中的至少一种。In an example of this embodiment, the video encoding and decoding capability parameters sent to the server include encoding parameters and decoding parameters. The encoding parameters include the encoding capability of the conference terminal; the decoding parameters include the decoding capability of the conference terminal, the meeting speed, At least one of frame rate and format information.
可选地,解码显示策略包括解码指示与显示指示,其中解码指示用于指示本会议终端对合成码流和至少一个独立码流的解码方式;显示指示用于指示服务器发送的合成码流和至少一个独立码流中的每一个码流与显示区域的映射关 系。第一处理器91分别对合成码流与至少一个独立码流进行解码时,可以按照解码指示所指示的解码方式对合成码流与至少一个独立码流进行解码。在对合成码流与至少一个独立码流对应的视频画面进行显示时,第一处理器91按照显示指示将合成码流和至少一个独立码流的视频画面填充到对应的显示区域进行显示。Optionally, the decoding display strategy includes a decoding instruction and a display instruction, where the decoding instruction is used to indicate how the conference terminal decodes the synthesized code stream and at least one independent code stream; the display instruction is used to indicate the synthesized code stream and at least one independent code stream sent by the server. The mapping relationship between each code stream in an independent code stream and the display area. When the first processor 91 respectively decodes the composite code stream and the at least one independent code stream, it may decode the composite code stream and the at least one independent code stream according to the decoding mode indicated by the decoding instruction. When displaying the video picture corresponding to the composite code stream and the at least one independent code stream, the first processor 91 fills the composite code stream and the video picture of the at least one independent code stream into the corresponding display area for display according to the display instruction.
本实施例中还提供一种服务器,如图10所示:服务器100包括第二处理器101、第二存储器102以及设置为连接第二处理器101与第二存储器102的第二通信总线103,其中第二存储器102可以为前述存储有第二视频会议程序的存储介质,第二处理器101可以读取第二视频会议程序,进行编译并执行实现前述实施例中介绍的视频会议方法服务器侧的流程。This embodiment also provides a server, as shown in FIG. 10: the server 100 includes a second processor 101, a second memory 102, and a second communication bus 103 configured to connect the second processor 101 and the second memory 102, The second memory 102 may be the aforementioned storage medium storing the second video conference program, and the second processor 101 may read the second video conference program, compile and execute the server-side video conference method introduced in the foregoing embodiment. Process.
第二处理器101接收视频会议中视频源发送的视频码流,然后对部分视频源的视频码流进行解码、画面合成及重新编码,得到合成码流发送给每个会议终端,并将除所述部分视频源之外的视频源的视频码流作为独立码流转发给每个会议终端。The second processor 101 receives the video code stream sent by the video source in the video conference, and then decodes, synthesizes and re-encodes the video code stream of part of the video source to obtain the synthesized code stream and send it to each conference terminal, and remove all The video code streams of the video sources other than the above-mentioned video sources are forwarded to each conference terminal as an independent code stream.
可选地,第二处理器101接收视频会议中视频源发送的视频码流之前,还可以先获取视频会议中每个会议终端的视频编解码能力参数,然后根据每个会议终端的视频编解码能力参数以及本服务器的视频编解码能力,确定视频会议的解码显示策略,其中,解码显示策略用于指示每个会议终端对合成码流与至少一个独立码流的解码显示方式。随后,第二处理器101再将解码显示策略发送给每个会议终端。Optionally, before the second processor 101 receives the video code stream sent by the video source in the video conference, it may also first obtain the video codec capability parameters of each conference terminal in the video conference, and then according to the video codec of each conference terminal The capability parameters and the video encoding and decoding capabilities of the server determine the decoding and display strategy of the video conference, where the decoding and display strategy is used to indicate the decoding and display mode of each conference terminal on the composite code stream and at least one independent code stream. Subsequently, the second processor 101 sends the decoded display strategy to each conference terminal.
在本实施例的一种示例当中,第二处理器101对部分视频源的视频码流进行解码、画面合成及重新编码之前,还会先根据每个会议终端的视频编解码能力参数以及本服务器的视频编解码能力,确定本服务器自身的编解码能力低于将视频会议中全部视频源的视频码流处理成合成码流的处理要求。In an example of this embodiment, before the second processor 101 decodes, synthesizes and re-encodes the video code streams of some video sources, it also first performs the following steps according to the video codec capability parameters of each conference terminal and the server The video encoding and decoding capabilities of the server determine that the encoding and decoding capabilities of this server are lower than the processing requirements for processing the video streams of all video sources in the video conference into composite streams.
在本实施例的一种示例当中,合成码流由m个视频源的视频码流形成,m 大于或等于2;第二处理器101对部分视频源的视频码流进行解码、画面合成及重新编码得到合成码流时,按照从多个视频源处接收视频码流的先后顺序获取前m个视频源的视频码流,然后对m个视频源的视频码流进行解码,再对m个视频码流对应的解码结果进行画面合成得到合成画面。随后,第二处理器101按照与解码显示策略中解码方式对应的编码方式,对合成画面进行编码得到合成码流。In an example of this embodiment, the composite code stream is formed by video code streams of m video sources, and m is greater than or equal to 2. The second processor 101 decodes, synthesizes and re-images the video code streams of some video sources. When the composite code stream is obtained by encoding, the video code streams of the first m video sources are obtained in the order in which the video code streams are received from multiple video sources, and then the video code streams of m video sources are decoded, and then m videos are decoded. The decoded result corresponding to the code stream is subjected to picture synthesis to obtain a synthesized picture. Subsequently, the second processor 101 encodes the composite picture to obtain a composite code stream in accordance with the encoding method corresponding to the decoding method in the decoding display strategy.
本实施例还提供一种视频会议系统,请参见图11所示,在该视频会议系统11当中,包括服务器100以及多个会议终端90。服务器100可以为MCU服务器,会议终端可以以各种形式来实施。例如,可以包括诸如手机、平板电脑、笔记本电脑、掌上电脑、PDA(Personal Digital Assistant,个人数字助理)、导航装置、可穿戴设备、智能手环、计步器等移动终端,以及诸如数字TV、台式计算机等固定终端。This embodiment also provides a video conference system. As shown in FIG. 11, the video conference system 11 includes a server 100 and a plurality of conference terminals 90. The server 100 may be an MCU server, and the conference terminal may be implemented in various forms. For example, it may include mobile terminals such as mobile phones, tablet computers, notebook computers, PDAs, PDAs (Personal Digital Assistants), navigation devices, wearable devices, smart bracelets, pedometers, etc., as well as mobile terminals such as digital TV, Fixed terminals such as desktop computers.
本实施例提供的会议终端、服务器,在视频会议过程中,服务器接收到视频会议中多个视频源发送的视频码流后,仅对部分视频源的视频码流进行解码、画面合成及重新编码,形成合成码流并发送给会议终端,同时服务器将除所述部分视频源之外的视频源的视频码流作为独立码流也发送给每个会议终端,让会议终端对合成码流与至少一个独立码流进行解码显示,这种会议方案中服务器因为不需要对全部视频源的视频码流进行解码、画面合成以及重新编码,因此,降低了对服务器侧编解码能力的要求。对于服务器处理能力范围之外的其他视频码流,直接发送给会议终端,从而充分利用了会议终端侧的处理资源,降低了会议终端侧视频画面的延时,提升了视频会议的流畅度,增强了用户体验。In the conference terminal and server provided in this embodiment, during the video conference, after the server receives the video code streams sent by multiple video sources in the video conference, it only decodes, synthesizes and re-encodes the video code streams of some video sources , Forming a composite code stream and sending it to the conference terminal. At the same time, the server sends the video code stream of the video source other than the part of the video source as an independent code stream to each conference terminal, so that the conference terminal can compare the composite code stream with at least An independent code stream is decoded and displayed. In this kind of conference solution, the server does not need to decode, synthesize and re-encode the video code streams of all video sources. Therefore, the requirement on the server-side codec capability is reduced. For other video streams outside the processing capacity of the server, they are directly sent to the conference terminal, thereby making full use of the processing resources on the conference terminal side, reducing the delay of the video screen on the conference terminal side, improving the smoothness of the video conference, and enhancing Improve the user experience.
本领域的技术人员应该明白,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件(可以用计算装置可执行的程序代码来实现)、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述 中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括随机存取存储器(Random Access Memory,RAM),只读存储器(Read-Only Memory,ROM),带电可擦可编程只读存储器(Electrically Erasable Programmable Read Only Memory,EEPROM)、闪存或其他存储器技术、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM),数字多功能盘(Digital Versatile Disc,DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以设置为存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。所以,本申请不限制于任何特定的硬件和软件结合。Those skilled in the art should understand that all or some of the steps, system, and functional modules/units in the device disclosed above can be implemented as software (which can be implemented by program code executable by a computing device), firmware , Hardware and its appropriate combination. In the hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, a physical component may have multiple functions, or a function or step may consist of several physical components. The components are executed cooperatively. Some physical components or all physical components can be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit . Such software may be distributed on a computer-readable medium and executed by a computing device. In some cases, the steps shown or described may be executed in a different order than here. The computer-readable medium may include computer storage. Medium (or non-transitory medium) and communication medium (or temporary medium). As is well known to those of ordinary skill in the art, the term computer storage medium includes volatile and non-volatile data implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Sexual, removable and non-removable media. Computer storage media include random access memory (Random Access Memory, RAM), read-only memory (Read-Only Memory, ROM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read Only Memory, EEPROM), flash memory or other Memory technology, compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, magnetic disk storage or other magnetic storage devices Or it can be set as any other medium that stores the desired information and can be accessed by the computer. In addition, as is well known to those of ordinary skill in the art, communication media usually contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as carrier waves or other transmission mechanisms, and may include any information delivery media. . Therefore, this application is not limited to any specific combination of hardware and software.

Claims (11)

  1. 一种视频会议方法,应用于会议终端,包括:A video conference method, applied to a conference terminal, includes:
    接收服务器发送的合成码流与至少一个独立码流,所述合成码流由所述服务器对视频会议中部分视频源的视频码流进行解码、画面合成及重新编码形成,所述至少一个独立码流为所述服务器自除所述部分视频源之外的视频源处接收到后转发给所述会议终端的视频码流;Receive a composite code stream sent by a server and at least one independent code stream, where the composite code stream is formed by the server decoding, synthesizing and re-encoding the video code streams of some video sources in the video conference, and the at least one independent code stream A stream is a video code stream that is received by the server from a video source other than the part of the video source and then forwarded to the conference terminal;
    分别对所述合成码流与所述至少一个独立码流进行解码;Respectively decode the composite code stream and the at least one independent code stream;
    对所述合成码流与所述至少一个独立码流对应的视频画面进行显示。Displaying the composite code stream and the video picture corresponding to the at least one independent code stream.
  2. 如权利要求1所述的方法,所述接收服务器发送的合成码流与至少一个独立码流之前,还包括:The method according to claim 1, before the receiving the composite code stream sent by the server and the at least one independent code stream, the method further comprises:
    向所述服务器发送所述会议终端的视频编解码能力参数;Sending the video coding and decoding capability parameters of the conference terminal to the server;
    接收所述服务器发送的解码显示策略,所述解码显示策略由所述服务器根据所述服务器自身的视频编解码能力以及视频会议的每个会议终端的视频编解码能力参数确定,所述解码显示策略用于指示所述会议终端对所述合成码流与所述至少一个独立码流的解码显示方式。Receive the decoded display strategy sent by the server, the decoded display strategy is determined by the server according to the server's own video codec capability and the video codec capability parameters of each conference terminal of the video conference, the decoded display strategy It is used to instruct the conference terminal to decode and display the composite code stream and the at least one independent code stream.
  3. 如权利要求2所述的方法,其中,所述视频编解码能力参数包括编码参数与解码参数,所述编码参数包括所述会议终端的编码能力;所述解码参数包括所述会议终端的解码能力、上会速度、帧频、以及格式信息中的至少一种。3. The method of claim 2, wherein the video coding and decoding capability parameters include coding parameters and decoding parameters, the coding parameters include the coding capabilities of the conference terminal; and the decoding parameters include the decoding capabilities of the conference terminal , At least one of conference speed, frame rate, and format information.
  4. 如权利要求2所述的方法,其中,所述解码显示策略包括解码指示与显示指示,所述解码指示用于指示所述会议终端对合成码流和至少一个独立码流的解码方式;所述显示指示用于指示所述服务器发送的合成码流和至少一个独立码流中的每一个码流与显示区域的映射关系;3. The method according to claim 2, wherein the decoding display strategy includes a decoding instruction and a display instruction, and the decoding instruction is used to instruct the conference terminal to decode the composite code stream and the at least one independent code stream; The display indication is used to indicate the mapping relationship between each of the composite code stream and the at least one independent code stream sent by the server and the display area;
    所述分别对所述合成码流与所述至少一个独立码流进行解码,包括:The respectively decoding the composite code stream and the at least one independent code stream includes:
    按照所述解码指示所指示的解码方式对所述合成码流与所述至少一个独立码流进行解码;Decode the composite code stream and the at least one independent code stream according to the decoding manner indicated by the decoding instruction;
    所述对所述合成码流与所述至少一个独立码流对应的视频画面进行显示,包括:The displaying the composite code stream and the video picture corresponding to the at least one independent code stream includes:
    按照所述显示指示将合成码流和至少一个独立码流的视频画面填充到对应的显示区域进行显示。According to the display instruction, the video pictures of the composite code stream and the at least one independent code stream are filled into the corresponding display area for display.
  5. 一种视频会议方法,应用于服务器,包括:A video conference method applied to a server, including:
    接收视频会议中视频源发送的视频码流;Receive the video stream sent by the video source in the video conference;
    对部分视频源的视频码流进行解码、画面合成及重新编码,得到合成码流发送给所述视频会议中的每个会议终端,并将除所述部分视频源之外的视频源的视频码流作为独立码流转发给所述视频会议中的每个会议终端。Decode, synthesize and re-encode the video code streams of some video sources to obtain a synthesized code stream and send it to each conference terminal in the video conference, and combine the video codes of video sources other than the part of the video source The stream is forwarded to each conference terminal in the video conference as an independent code stream.
  6. 如权利要求5所述的方法,所述接收视频会议中视频源发送的视频码流之前,还包括:The method according to claim 5, before the receiving the video code stream sent by the video source in the video conference, the method further comprises:
    获取所述视频会议中每个会议终端的视频编解码能力参数;Acquiring the video codec capability parameter of each conference terminal in the video conference;
    根据每个所述会议终端的视频编解码能力参数以及所述服务器的视频编解码能力,确定所述视频会议的解码显示策略,所述解码显示策略用于指示每个所述会议终端对所述合成码流与至少一个所述独立码流的解码显示方式;Determine the decoding and display strategy of the video conference according to the video coding and decoding capability parameters of each conference terminal and the video coding and decoding capabilities of the server, and the decoding and display strategy is used to instruct each conference terminal to respond to the A decoding display mode of the synthesized code stream and at least one of the independent code streams;
    将所述解码显示策略发送给每个所述会议终端。Sending the decoded display strategy to each of the conference terminals.
  7. 如权利要求6所述的方法,所述对部分视频源的视频码流进行解码、画面合成及重新编码之前,还包括:8. The method according to claim 6, before said decoding, synthesizing and re-encoding the video stream of a part of the video source, the method further comprises:
    根据每个所述会议终端的视频编解码能力参数以及所述服务器的视频编解码能力,确定所述服务器自身的编解码能力低于将所述视频会议中全部视频源的视频码流处理成合成码流的处理要求。According to the video codec capability parameters of each conference terminal and the video codec capability of the server, it is determined that the codec capability of the server itself is lower than that of processing the video code streams of all video sources in the video conference into synthesis The processing requirements of the code stream.
  8. 如权利要求6-7任一项所述的方法,其中,所述合成码流由m个视频源的视频码流形成,所述m大于或等于2;所述对部分视频源的视频码流进行解码、画面合成及重新编码得到合成码流,包括:The method according to any one of claims 6-7, wherein the composite code stream is formed by video code streams of m video sources, and the m is greater than or equal to 2; the video code streams of some video sources Perform decoding, picture synthesis and re-encoding to obtain a synthesized code stream, including:
    按照从多个视频源处接收视频码流的先后顺序,获取前m个视频源的视频码流;Obtain the video code streams of the first m video sources according to the sequence of receiving video code streams from multiple video sources;
    对所述m个视频源的视频码流进行解码;Decoding the video code streams of the m video sources;
    对m个视频码流对应的解码结果进行画面合成,得到合成画面;Perform picture synthesis on the decoding results corresponding to the m video code streams to obtain a synthesized picture;
    按照与所述解码显示策略中解码方式对应的编码方式,对所述合成画面进行编码得到合成码流。According to the encoding method corresponding to the decoding method in the decoding display strategy, the composite picture is encoded to obtain a composite code stream.
  9. 一种会议终端,所述会议终端包括第一处理器、第一存储器及第一通信总线;A conference terminal, the conference terminal including a first processor, a first memory, and a first communication bus;
    所述第一通信总线设置为实现第一处理器和第一存储器之间的连接通信;The first communication bus is configured to realize connection and communication between the first processor and the first memory;
    所述第一处理器设置为执行第一存储器中存储的至少一个程序,以实现如权利要求1至4中任一项所述的视频会议方法。The first processor is configured to execute at least one program stored in the first memory to implement the video conference method according to any one of claims 1 to 4.
  10. 一种服务器,所述服务器包括第二处理器、第二存储器及第二通信总线;A server, the server includes a second processor, a second memory, and a second communication bus;
    所述第二通信总线设置为实现第二处理器和第二存储器之间的连接通信;The second communication bus is configured to realize connection and communication between the second processor and the second memory;
    所述第二处理器设置为执行第二存储器中存储的至少一个程序,以实现如权利要求5至8中任一项所述的视频会议方法。The second processor is configured to execute at least one program stored in the second memory to implement the video conference method according to any one of claims 5 to 8.
  11. 一种存储介质,所述存储介质存储有第一视频会议程序与第二视频会议程序中的至少一个,所述第一视频会议程序可被至少一个处理器执行,以实现如权利要求1至4中任一项所述的视频会议方法;所述第二视频会议程序可被至少一个处理器执行,以实现如权利要求5至8中任一项所述的视频会议方法。A storage medium, the storage medium stores at least one of a first video conference program and a second video conference program, the first video conference program can be executed by at least one processor to implement claims 1 to 4 The video conference method according to any one of the above; the second video conference program can be executed by at least one processor to realize the video conference method according to any one of claims 5 to 8.
PCT/CN2020/129049 2019-11-14 2020-11-16 Video meeting method, meeting terminal, server, and storage medium WO2021093882A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911115565.3A CN112804471A (en) 2019-11-14 2019-11-14 Video conference method, conference terminal, server and storage medium
CN201911115565.3 2019-11-14

Publications (1)

Publication Number Publication Date
WO2021093882A1 true WO2021093882A1 (en) 2021-05-20

Family

ID=75803917

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/129049 WO2021093882A1 (en) 2019-11-14 2020-11-16 Video meeting method, meeting terminal, server, and storage medium

Country Status (2)

Country Link
CN (1) CN112804471A (en)
WO (1) WO2021093882A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114679548A (en) * 2022-03-25 2022-06-28 航天国盛科技有限公司 Multi-picture synthesis method, system and device based on AVC framework

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101257607A (en) * 2008-03-12 2008-09-03 中兴通讯股份有限公司 Multiple-picture processing system and method for video conference
CN101860715A (en) * 2010-05-14 2010-10-13 中兴通讯股份有限公司 Multi-picture synthesis method and system and media processing device
CN102572368A (en) * 2010-12-16 2012-07-11 中兴通讯股份有限公司 Processing method and system of distributed video and multipoint control unit
US20130093835A1 (en) * 2011-10-18 2013-04-18 Avaya Inc. Defining active zones in a traditional multi-party video conference and associating metadata with each zone
US20130106988A1 (en) * 2011-10-28 2013-05-02 Joseph Davis Compositing of videoconferencing streams
US20130282820A1 (en) * 2012-04-23 2013-10-24 Onmobile Global Limited Method and System for an Optimized Multimedia Communications System

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101257607A (en) * 2008-03-12 2008-09-03 中兴通讯股份有限公司 Multiple-picture processing system and method for video conference
CN101860715A (en) * 2010-05-14 2010-10-13 中兴通讯股份有限公司 Multi-picture synthesis method and system and media processing device
CN102572368A (en) * 2010-12-16 2012-07-11 中兴通讯股份有限公司 Processing method and system of distributed video and multipoint control unit
US20130093835A1 (en) * 2011-10-18 2013-04-18 Avaya Inc. Defining active zones in a traditional multi-party video conference and associating metadata with each zone
US20130106988A1 (en) * 2011-10-28 2013-05-02 Joseph Davis Compositing of videoconferencing streams
US20130282820A1 (en) * 2012-04-23 2013-10-24 Onmobile Global Limited Method and System for an Optimized Multimedia Communications System

Also Published As

Publication number Publication date
CN112804471A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
EP3562163B1 (en) Audio-video synthesis method and system
KR101224097B1 (en) Controlling method and device of multi-point meeting
JP4585479B2 (en) Server apparatus and video distribution method
JP5216303B2 (en) Composite video distribution apparatus and method and program thereof
US20220279028A1 (en) Segmented video codec for high resolution and high frame rate video
WO2017080175A1 (en) Multi-camera used video player, playing system and playing method
US11089343B2 (en) Capability advertisement, configuration and control for video coding and decoding
JP2002330440A (en) Image transmission method, program for the image transmission method, recording medium for recording the program for the image transmission method, and image transmitter
US10666903B1 (en) Combining encoded video streams
WO2021168649A1 (en) Multifunctional receiving device and conference system
WO2021093882A1 (en) Video meeting method, meeting terminal, server, and storage medium
US20190141352A1 (en) Tile-based 360 vr video encoding method and tile-based 360 vr video decoding method
CN112752058A (en) Method and device for adjusting attribute of video stream
CN113141352B (en) Multimedia data transmission method and device, computer equipment and storage medium
CN112817913B (en) Data transmission method and device, electronic equipment and storage medium
KR20160087225A (en) System for cloud streaming service, method of image cloud streaming service to provide a multi-view screen, and apparatus for the same
CN105812922A (en) Multimedia file data processing method, system, player and client
US20220303596A1 (en) System and method for dynamic bitrate switching of media streams in a media broadcast production
CN112738056B (en) Encoding and decoding method and system
TWI531244B (en) Method and system for processing video data of meeting
CN113038183B (en) Video processing method, system, device and medium based on multiprocessor system
US11967345B2 (en) System and method for rendering key and fill video streams for video processing
CN112738565B (en) Interactive bandwidth optimization method, device, computer equipment and storage medium
US20220335976A1 (en) System and method for rendering key and fill video streams for video processing
CN117859321A (en) Method and apparatus for zoom-in and zoom-out during video conferencing using Artificial Intelligence (AI)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20888166

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20888166

Country of ref document: EP

Kind code of ref document: A1