WO2022100528A1 - 音视频转发方法、装置、终端与系统 - Google Patents

音视频转发方法、装置、终端与系统 Download PDF

Info

Publication number
WO2022100528A1
WO2022100528A1 PCT/CN2021/129041 CN2021129041W WO2022100528A1 WO 2022100528 A1 WO2022100528 A1 WO 2022100528A1 CN 2021129041 W CN2021129041 W CN 2021129041W WO 2022100528 A1 WO2022100528 A1 WO 2022100528A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
stream
target
streams
video
Prior art date
Application number
PCT/CN2021/129041
Other languages
English (en)
French (fr)
Inventor
朱景升
梅君君
赵志东
官丹
孟天亮
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to EP21891053.7A priority Critical patent/EP4243407A4/en
Publication of WO2022100528A1 publication Critical patent/WO2022100528A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234363Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the spatial resolution, e.g. for clients with a lower screen resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/24Monitoring of processes or resources, e.g. monitoring of server load, available bandwidth, upstream requests
    • H04N21/2402Monitoring of the downstream path of the transmission network, e.g. bandwidth available
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25808Management of client data
    • H04N21/25825Management of client data involving client display capabilities, e.g. screen resolution of a mobile phone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting

Definitions

  • the embodiments of the present application relate to the field of communications, and in particular, to an audio and video forwarding method, device, terminal, and system.
  • a multi-party conference is usually involved, and the forwarding server needs to forward the video content of each party to other participants.
  • the video may involve a variety of different formats.
  • the forwarding server needs to forward the audio and video of different formats or resolutions to different participants. During this process, it will involve that the participant does not support a certain format of video or the network condition of the participant is unstable and cannot be In the case of receiving high-resolution video, the participants cannot play the video during the video conference, which affects the stability of the conference.
  • the embodiments of the present application provide an audio and video forwarding method, device, terminal, and system.
  • an audio and video forwarding method includes: during a session between a first sending terminal and a first receiving terminal, acquiring multiple first video streams sent by the above-mentioned first sending terminal and Multiple first audio streams, wherein the different first video streams have different formats or different resolutions, the contents corresponding to the multiple first video streams are the same, the formats of the different first audio streams are different, and the multiple first video streams have different formats.
  • the content corresponding to the first audio stream is the same; according to the terminal parameters of the first receiving terminal and the network bandwidth used by the first receiving terminal, determine the first audio stream from the plurality of first video streams and the plurality of first audio streams.
  • a target video stream and a first target audio stream wherein the first target video stream is the video stream with the highest resolution supported by the first receiving terminal among the plurality of first video streams, and the first target audio stream For the audio stream with the best sound quality supported by the first receiving terminal among the plurality of first audio streams; sending the first target video stream and the first target audio stream to the first receiving terminal.
  • an audio and video forwarding method including: during a session between a first sending terminal and a first receiving terminal, acquiring multiple first video streams and multiple first audio streams , wherein the formats or resolutions of the different first video streams are different, the contents corresponding to the multiple first video streams are the same, the formats of the different first audio streams are different, and the corresponding The content is the same; the above-mentioned multiple first video streams are sent to the server, so that the above-mentioned server determines the first target video stream and the first target audio stream from the above-mentioned multiple first video streams and the above-mentioned multiple first audio streams, and sending the first target video stream and the first target audio stream to the first receiving terminal, wherein the first target video stream is a resolution supported by the first receiving terminal to play among the plurality of first video streams The video stream with the highest rate, and the first target audio stream is the audio stream with the best sound quality supported by the first receiving terminal among the plurality of first audio streams.
  • an audio and video forwarding method including: during a session between a first sending terminal and a first receiving terminal, the first receiving terminal receives a first target video stream sent by a server and the first target audio stream, wherein the first target video stream is a video stream determined by the server from a plurality of first video streams, and the first target audio stream is determined from a plurality of first audio streams.
  • the first target video stream is the video stream with the highest resolution supported by the first receiving terminal among the plurality of first video streams
  • the first target audio stream is among the plurality of first audio streams
  • the above-mentioned first receiving terminal supports playing the audio stream with the best sound quality, the formats or resolutions of the different first video streams are different, the contents corresponding to the above-mentioned multiple first video streams are the same, and the above-mentioned multiple first audio streams are:
  • the audio streams sent by the first sending terminal to the above-mentioned server have different formats of the above-mentioned first audio streams, and the corresponding contents of the above-mentioned multiple first audio streams are the same; the above-mentioned first receiving terminal plays the above-mentioned first target video stream.
  • a target audio stream is a target audio stream.
  • an audio and video forwarding apparatus including: a first obtaining unit, configured to obtain the first sending terminal during a session between the first sending terminal and the first receiving terminal The sent multiple first video streams and multiple first audio streams, wherein the different first video streams have different formats or different resolutions, the contents corresponding to the multiple first video streams are the same, and the different first video streams have different contents.
  • the formats of the audio streams are different, and the contents corresponding to the multiple first audio streams are the same; the first determining unit is set to, according to the terminal parameters of the first receiving terminal and the network bandwidth used by the first receiving terminal, from the above multiple A first target video stream and a first target audio stream are determined from the first video streams and the plurality of first audio streams, wherein the first target video stream is the first received video stream among the plurality of first video streams.
  • the video stream with the highest resolution supported by the terminal, and the first target audio stream is the audio stream with the best sound quality supported by the first receiving terminal among the plurality of first audio streams; the first sending unit is set to Sending the first target video stream and the first target audio stream to the first receiving terminal.
  • an audio and video forwarding apparatus including: an obtaining unit configured to obtain a plurality of first video streams and a plurality of first video streams during a session between a first sending terminal and a first receiving terminal Multiple first audio streams, wherein the different first video streams have different formats or different resolutions, the contents corresponding to the multiple first video streams are the same, the formats of the different first audio streams are different, and the multiple first audio streams have different formats.
  • the content corresponding to the first audio stream is the same; the sending unit is configured to send the plurality of first video streams to the server, so that the server determines from the plurality of first video streams and the plurality of first audio streams.
  • the first target video stream and the first target audio stream send the above-mentioned first target video stream and the above-mentioned first target audio stream to the above-mentioned first receiving terminal, wherein the above-mentioned first target video stream Among the video streams, the first receiving terminal supports playing a video stream with the highest resolution, and the first target audio stream is an audio stream having the best sound quality supported by the first receiving terminal among the plurality of first audio streams.
  • an audio and video forwarding apparatus including: a first receiving unit configured to receive the first sending A target video stream and a first target audio stream, wherein the first target video stream is a video stream determined by the server from a plurality of first video streams, and the first target audio stream is a video stream from a plurality of first audio streams The determined audio stream, the first target video stream is the video stream with the highest resolution supported by the first receiving terminal among the plurality of first video streams, and the first target audio stream is the video stream with the highest resolution among the plurality of first video streams.
  • the first receiving terminal supports playing the audio stream with the best sound quality, the formats or resolutions of the different first video streams are different, the contents corresponding to the multiple first video streams are the same, and the multiple first video streams are the same.
  • the audio stream is an audio stream sent by the first sending terminal to the above-mentioned server, the formats of different above-mentioned first audio streams are different, and the corresponding contents of the above-mentioned multiple first audio streams are the same; the first playing unit is set to the above-mentioned first receiving unit.
  • the terminal plays the first target video stream and the first target audio stream.
  • an audio and video forwarding system including: a first obtaining unit configured to obtain the first sending terminal during a session between the first sending terminal and the first receiving terminal The sent multiple first video streams and multiple first audio streams, wherein the different first video streams have different formats or different resolutions, the contents corresponding to the multiple first video streams are the same, and the different first video streams have different contents.
  • the formats of the audio streams are different, and the contents corresponding to the multiple first audio streams are the same; the first determining unit is set to, according to the terminal parameters of the first receiving terminal and the network bandwidth used by the first receiving terminal, from the above multiple A first target video stream and a first target audio stream are determined from the first video streams and the plurality of first audio streams, wherein the first target video stream is the first received video stream among the plurality of first video streams.
  • the video stream with the highest resolution supported by the terminal, and the first target audio stream is the audio stream with the best sound quality supported by the first receiving terminal among the plurality of first audio streams; the first sending unit is set to Sending the first target video stream and the first target audio stream to the first receiving terminal.
  • an audio and video forwarding terminal comprising: an acquisition unit configured to acquire a plurality of first video streams and Multiple first audio streams, wherein the different first video streams have different formats or different resolutions, the contents corresponding to the multiple first video streams are the same, the formats of the different first audio streams are different, and the multiple first audio streams have different formats.
  • the content corresponding to the first audio stream is the same;
  • the sending unit is configured to send the plurality of first video streams to the server, so that the server determines from the plurality of first video streams and the plurality of first audio streams.
  • the first target video stream and the first target audio stream send the above-mentioned first target video stream and the above-mentioned first target audio stream to the above-mentioned first receiving terminal, wherein the above-mentioned first target video stream Among the video streams, the first receiving terminal supports playing a video stream with the highest resolution, and the first target audio stream is an audio stream having the best sound quality supported by the first receiving terminal among the plurality of first audio streams.
  • an audio and video forwarding terminal including: a first receiving unit configured to receive the first sending terminal sent by the server during the session between the first sending terminal and the first receiving terminal.
  • a target video stream and a first target audio stream wherein the first target video stream is a video stream determined by the server from a plurality of first video streams, and the first target audio stream is a video stream from a plurality of first audio streams
  • the determined audio stream, the first target video stream is the video stream with the highest resolution supported by the first receiving terminal among the plurality of first video streams, and the first target audio stream is the video stream with the highest resolution among the plurality of first video streams.
  • the first receiving terminal supports playing the audio stream with the best sound quality, the formats or resolutions of the different first video streams are different, the contents corresponding to the multiple first video streams are the same, and the multiple first video streams are the same.
  • the audio stream is an audio stream sent by the first sending terminal to the above-mentioned server, the formats of different above-mentioned first audio streams are different, and the corresponding contents of the above-mentioned multiple first audio streams are the same; the first playing unit is set to the above-mentioned first receiving unit.
  • the terminal plays the first target video stream and the first target audio stream.
  • a computer-readable storage medium is also provided, where a computer program is stored in the computer-readable storage medium, wherein the computer program is configured to execute any one of the above method embodiments when running steps in .
  • an electronic device including a memory and a processor, the memory stores a computer program, and the processor is configured to run the computer program to execute any of the above method embodiments steps in .
  • FIG. 1 is a block diagram of a hardware structure of a mobile terminal according to an audio and video forwarding method according to an embodiment of the present application
  • FIG. 2 is a network architecture diagram of a method for forwarding audio and video according to an embodiment of the present application
  • FIG. 3 is a flowchart of a method for forwarding audio and video according to an embodiment of the present application
  • FIG. 4 is a structural block diagram of a terminal uplink module of a method for forwarding audio and video according to an embodiment of the present application
  • FIG. 5 is a structural block diagram of a terminal downlink module of a method for forwarding audio and video according to an embodiment of the present application
  • FIG. 6 is a structural block diagram of a conference media forwarding server according to an audio and video forwarding method according to an embodiment of the present application
  • FIG. 7 is a block diagram of a forwarding model in which a terminal code stream is sent to a conference media forwarding server according to an audio and video forwarding method according to an embodiment of the present application;
  • FIG. 8 is a block diagram of a conference media forwarding server forwarding a code stream to a terminal model of an audio and video forwarding method according to an embodiment of the present application;
  • FIG. 9 is a flow chart of sorting and forwarding of terminal audio volume values of a method for forwarding audio and video according to an embodiment of the present application.
  • FIG. 10 is a flow chart of the preferred audio active streaming of a conference media forwarding server according to a method for forwarding audio and video according to an embodiment of the present application;
  • FIG. 11 is a flow chart of the preferred video active streaming of a conference media forwarding server according to an audio and video forwarding method according to an embodiment of the present application;
  • FIG. 12 is a flowchart of media source switching and forwarding of a method for forwarding audio and video according to an embodiment of the present application
  • FIG. 13 is a flowchart of another audio and video forwarding method according to an embodiment of the present application.
  • FIG. 15 is a structural block diagram of an audio and video forwarding apparatus according to an embodiment of the present application.
  • 16 is a structural block diagram of another audio and video forwarding apparatus according to an embodiment of the present application.
  • FIG. 17 is a structural block diagram of another audio and video forwarding apparatus according to an embodiment of the present application.
  • FIG. 18 is a structural block diagram of an audio and video forwarding system according to an embodiment of the present application.
  • 19 is a structural block diagram of an audio and video forwarding terminal according to an embodiment of the present application.
  • FIG. 20 is a structural block diagram of another audio and video forwarding terminal according to an embodiment of the present application.
  • FIG. 1 is a hardware structural block diagram of a mobile terminal according to an audio and video forwarding method according to an embodiment of the present application.
  • the mobile terminal may include one or more (only one is shown in FIG. 1 ) processor 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, wherein the above-mentioned mobile terminal may also include a transmission device 106 and an input and output device 108 for communication functions.
  • processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA
  • a memory 104 for storing data
  • the above-mentioned mobile terminal may also include a transmission device 106 and an input and output device 108 for communication functions.
  • FIG. 1 is only a schematic diagram, which does not limit the structure of the above-mentioned mobile terminal.
  • the mobile terminal may also include more or fewer components than those shown in FIG. 1 , or have a different configuration than that shown in FIG. 1 .
  • the memory 104 can be used to store computer programs, for example, software programs and modules of application software, such as the computer programs corresponding to the audio and video forwarding methods in the embodiments of the present application, and the processor 102 executes the computer programs stored in the memory 104 by running.
  • Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • memory 104 may include memory located remotely from processor 102, which may be connected to the mobile terminal through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • Transmission means 106 are used to receive or transmit data via a network.
  • the specific example of the above-mentioned network may include a wireless network provided by a communication provider of the mobile terminal.
  • the transmission device 106 includes a network adapter (Network Interface Controller, NIC for short), which can be connected to other network devices through a base station so as to communicate with the Internet.
  • the transmission device 106 may be a radio frequency (Radio Frequency, RF for short) module, which is used to communicate with the Internet in a wireless manner.
  • RF Radio Frequency
  • the network architecture includes: a first sending terminal, a forwarding server, and a first receiving terminal.
  • the first sending terminal encodes the video stream into different formats or resolutions, and encodes the audio stream into different formats, and then encodes multiple first video streams of different formats or resolutions and multiple first audio streams of different formats Send to the forwarding server, and the forwarding server obtains the terminal parameters of the first receiving terminal and the network parameters used by the first receiving terminal, and then determines the first target video with the highest resolution supported by the first receiving terminal from the plurality of first video streams stream and the first target audio stream with the best sound quality determined from the plurality of first audio streams, and the forwarding server forwards the first target video stream and the first target audio stream to the second sending terminal.
  • FIG. 2 is a flowchart of an audio and video forwarding method according to an embodiment of the present application. As shown in FIG. 3 , the process includes the following steps:
  • S304 determine the first target video stream and the first target video stream from the plurality of first video streams and the plurality of first audio streams.
  • a target audio stream wherein the first target video stream is the video stream with the highest resolution supported by the first receiving terminal among the plurality of first video streams, and the first target audio stream is the video stream with the highest resolution among the plurality of first video streams.
  • S306 Send the first target video stream and the first target audio stream to the first receiving terminal.
  • a plurality of first video streams with different formats and resolutions sent by the first sending terminal and multiple first video streams with different formats sent by the first sending terminal can be obtained.
  • an audio stream, and which video stream and which audio stream to send to the first receiving terminal is determined according to the terminal parameters of the first receiving terminal and the network bandwidth used by the first receiving terminal, therefore, it can be guaranteed that the first receiving terminal plays The best video stream and audio stream that can be supported solve the problem that the video cannot be played in the video conference, and achieve the effect of improving the stability of the video conference.
  • the execution subject of the above steps may be a forwarding server.
  • each participant can be the first sending terminal, sending video and audio streams, and each participant can be the first receiving terminal, receiving video and audio streams.
  • the server plays the role of forwarding data.
  • new participants can join in the middle, and one or some participants can also leave the meeting halfway, and can rejoin after exiting.
  • the audio and video forwarding method in this embodiment of the present application may adopt an SFU architecture, and the terminal sends the supported audio and video media capabilities (including supported formats, resolutions, terminal parameters, and network parameters) to the media forwarding system (forwarding server), and the media The forwarding system selects the code stream with the best quality and forwards it to the terminal according to the bandwidth and capability of the terminal.
  • the terminal has the capability of adaptive decoding and encoding in multiple formats, so that the media forwarding system only needs to select the capabilities supported by the terminal.
  • the embodiment of the present application can quickly switch media sources in a large-capacity conference, so as to achieve the effect of fast and smooth switching. For example, if you switch the video of a participant that you are currently watching to the video of another participant, you can change the participant you are watching.
  • the terminal in this embodiment of the present application may be a terminal that sends data or may also be a terminal that receives data. It should be noted that the terminal may have both the capabilities of the first sending terminal (or the second sending terminal and other sending terminals) and the capabilities of the first receiving terminal (or the second receiving terminal and other receiving terminals). In a multi-person conference, a terminal needs to send its own video to other terminals, and also needs to obtain video content of other terminals from other terminals.
  • the sound quality in the embodiments of the present application can be scored through various aspects such as timbre, sound field sense, layering sense, positioning sense, transparency sense, analytical power, overall balance, imaging ability, and body sense, and the weighted summation of scoring results is the timbre color 's score. The higher the score, the better the sound.
  • the above-mentioned terminal includes an uplink module and a downlink module.
  • the above-mentioned uplink module includes:
  • the acquisition module is used to collect audio and video raw data.
  • Audio encoding module for encoding audio data into multiple specified audio formats.
  • the video encoding module is used to encode video data into multiple specified video formats.
  • the package sending module packages multiple audio streams into multiple streams and sends them through one port; packages multiple video streams into multiple streams and sends them through one port.
  • the encoding control module is used to receive the control of the conference media forwarding server, notify the audio and video encoding module to start the required audio and video format encoding, and the video will also undergo code stream package conversion.
  • the stream package mainly includes information such as resolution, bit rate, and frame rate.
  • the above-mentioned downlink modules include:
  • the data receiving module is used to receive audio and video data, split the audio data into code streams of multiple formats, and split the video data into code streams of multiple formats and resolutions.
  • the decoding adaptive module is used to check the format of the audio code stream, and start audio decoding according to the detected format; check the format of the video code stream, and start video decoding according to the detected format.
  • the audio decoding module is used to parse the audio stream into linear codes, and supports simultaneous decoding of multiple formats.
  • the video decoding module is used to parse the video stream into YUV data, and supports simultaneous decoding of multiple formats and resolutions.
  • the output module mixes the audio stream and outputs it to the sound card, lays out the video data and outputs it to the graphics card, and synchronizes the audio and video.
  • the above-mentioned conference media forwarding structure includes:
  • the code stream receiving module receives the audio and video code stream sent by the terminal, and demultiplexes the multiple streams.
  • the code stream sending module multiplexes the code streams of multiple terminals and sends them to the terminal.
  • the audio forwarding module includes: a volume value acquisition module, which acquires the volume value from the real-time transmission protocol (Real-time Transport Protocol, RTP) extension of the selected terminal's code stream; in order to be compatible with the old terminal, it can be obtained through The volume value is obtained by decoding.
  • the volume value sorting module sorts all the terminals participating in the conference according to the obtained volume values, selects the three terminals with the loudest sound, and sends them to the corresponding terminals respectively.
  • the audio code stream optimization module selects the optimal audio format code stream through network bandwidth prediction and terminal audio capability according to the set audio optimization strategy, and sends the code stream with the best sound quality to the terminal.
  • the audio stream control module according to the preferred audio format, requests or stops the sending of the specified audio format code stream to the terminal.
  • the above-mentioned video forwarding module includes: a bit rate adaptive module, which adjusts the bit rate according to the set bit rate strategy through network bandwidth prediction and terminal video capabilities, and selects the video bit stream by the video bit stream in the scenario where the bit rate cannot meet the requirements. Take over.
  • the video stream selection module re-selects the best encoding format and best stream package according to network bandwidth prediction and terminal video capabilities.
  • the video stream control module requests or stops the sending of the specified stream to the terminal according to the preferred video format and stream package.
  • the video stream switching module is used in conference scenarios. When everyone is watching the same broadcast source (one or more terminals), it can quickly switch the broadcast source.
  • the video terminal and the conference media forwarding server negotiate multiple audio and video codec capabilities, and the conference media forwarding server selects the best audio and video format to send to the video terminal according to the terminal negotiation capabilities and network conditions. and streaming packages.
  • the above-mentioned video terminal is controlled by the conference media forwarding server, and generates code streams of different audio and video formats and different packages. Multiplex the generated audio format code stream and video format code stream respectively, and send it to the conference media forwarding server for demultiplexing. Then the media forwarding server sorts the audio code stream according to the volume value, and selects the 3 terminals with the loudest sound for forwarding.
  • the receiving terminal receives multiple audio streams in different formats, performs adaptive decoding, and mixes the sound; receives multiple video streams in different formats or code stream packages, performs adaptive decoding, and performs screen layout.
  • Sorting the volume of the conference media forwarding server When the terminal sends an audio stream, it will collect the volume value and put the volume value in the RTP extension. When the conference media forwarding server parses the RTP, it will obtain the extended volume value. The extended species has no volume value, and needs to be decoded to obtain the volume value. In some embodiments, to reduce the performance consumption of decoding, an audio packet can be parsed at intervals (for example, 1 second) to obtain the volume value.
  • Conference media forwarding server media switching. The conference media server sets the input source and output destination of the switcher. There can be multiple input sources and output destinations. The received terminal code stream is input to the switcher as the input source, and the switcher will input The source code stream is sent to all output destination terminals. When the broadcast source needs to be switched, it is only necessary to switch the input source. For better smooth switching, it is generally necessary to wait until the I frame of the new input source is received. Switch to the new source.
  • the uplink module of the terminal in FIG. 4 includes a collection module, an audio coding module, a video coding module, a coding control module and a packet sending module.
  • the acquisition module collects the audio and video of the terminal equipment, and sends them to the audio coding module and the video coding module for coding respectively.
  • the coding control module receives the control of the conference media forwarding server, and sends the encoded code stream. Send it to the packaging module and send it to the conference media forwarding server.
  • the packaging and sending module packages the audio and video streams, and then multiplexes the audio and video streams respectively, and sends them to the conference media forwarding server.
  • the downlink module of the terminal in FIG. 5 includes a data receiving module, an adaptive decoding module, an audio decoding module, a video decoding module and an output module.
  • the data receiving module receives the code stream of the conference media forwarding server, demultiplexes the audio code stream and the video code stream respectively, then adaptively identifies the audio format and video format, and informs the audio decoding module and the video decoding module to decode, and the decoded code
  • the stream is sent to the output module.
  • the output module mixes the decoded audio stream, makes a multi-screen layout of the decoded video data, and then presents the audio and video synchronously.
  • the conference media forwarding server in FIG. 6 includes a code stream receiving module, an audio forwarding module, a video forwarding module and a code stream sending module.
  • the code stream receiving module receives the upstream code stream of the terminal, demultiplexes the audio code stream and the video code stream, and parses it into streams of different formats and different code stream packages. The stream is sent to the video forwarding module.
  • the audio forwarding module obtains the volume value of the received audio stream, sorts the volume values of all streams in the conference, obtains the three loudest terminals, and sends them to the code stream sending module; according to the feedback of the terminal receiving module and the network conditions , the best audio to be sent to this terminal is preferably sent to the source terminal of the code stream through the audio code stream control module.
  • the video forwarding module forwards the received video stream to the code stream switching module and then sends it to the code stream sending module or directly to the code stream sending module; according to the feedback of the terminal receiving module and the network situation, the best one sent to the terminal is preferred.
  • the video format and stream package are sent to the stream source terminal through the video stream control module.
  • the code stream sending module multiplexes multiple streams of different formats and different code stream packages, and sends them to the terminal.
  • FIG. 7 is a forwarding model when a terminal code stream is sent to a conference media forwarding server, including four video terminals and one conference media forwarding server.
  • the UE1 terminal audio encodes 2 audio streams in G711 and EVS formats, and the video encodes the H264 180P, H264 720P, H265 1080P package and H265 4K package video streams, and sends them to the conference media forwarding server; the conference media forwarding server, according to the capabilities of the terminal
  • the audio forwarding module forwards the code stream of G711 to UE2, and forwards the code stream of EVS to UE3 and UE4; the video forwarding module forwards H264 720P to UE2, and forwards H265 4K to UE3 and UE4; UE2 , UE3 and UE4 adaptively decode audio and video streams.
  • Figure 8 is a model of a conference media forwarding server forwarding a code stream to a terminal, including four video terminals and one conference media forwarding server.
  • UE1 terminal audio encodes audio streams in EVS and G711 formats, and video encodes video streams in H265 4K, H264 180P and H264 720P packages;
  • conference media forwarding server forwarding according to the capabilities of the terminal and network conditions code stream, the audio forwarding module forwards UE1's EVS, UE2's G711 and UE3's G711 code stream to UE4;
  • the video forwarding module forwards UE1's H265 4K, UE2's H264 180
  • FIG. 9 is a sequence and forwarding process of terminal audio volume value, including 6 video terminals and 1 conference media forwarding server.
  • the UE1, UE3 and UE5 terminals carry the volume value in the RTP extension, and the UE2, UE4 and UE6 terminals do not carry the volume value.
  • Audio forwarding receives the audio streams of 6 terminals, obtains the volume value in the RTP of UE1, UE3 and UE5, and obtains the volume value after decoding the code stream of UE2, UE4 and UE6; after sorting the volume values of the 6 terminals, obtain
  • the 4 loudest terminals are UE2, UE4, UE3 and UE1 in order of volume value; the 3 loudest terminals are forwarded to UE1-4 respectively, excluding themselves, that is, even if the volume value of the audio stream sent by the terminal is Maximum, the terminal will not receive the audio stream, because the audio stream is sent by the terminal itself.
  • the active streaming process of audio is preferred for the conference media forwarding server.
  • the terminal negotiates the supported audio media capabilities, and then joins the conference; the conference media forwarding server obtains the three terminals with the loudest voices; then, according to the terminal capabilities and network conditions, selects the best audio media capabilities of the three terminals with the loudest voices, and forwards the terminals .
  • FIG. 11 is the process of active streaming of the preferred video of the conference media forwarding server.
  • the conference media forwarding server actively pushes the default screen to the terminal; then according to the capabilities and network conditions of each terminal, the best video stream is selected and forwarded to the terminal; The number of terminals in the conference changes, and the best code stream to push also changes.
  • Figure 12 shows the media source switching and forwarding process, including 4 video terminals, UEA-UED, 1 service control and 1 conference media forwarding server.
  • the service control controls the conference media forwarding server to receive the stream through the HTTP REST interface.
  • the stream receiving modules of UEA and UEB respectively register the stream receiving queue, and the upstream stream of the terminal is transferred to the media stream pool respectively;
  • UEC After meeting with the UED, the service control controls the conference media forwarding server to send the code stream through the HTTP REST interface, and the code stream sending modules of UEC and UED respectively register the code stream receiving queue;
  • the service control controls the switching module to receive the code stream of the UEA through the HTTP REST interface , and then sent to UEC and UED.
  • the switching module registers the receiving queue to receive the code stream of UEA respectively, and the registering sending queue sends the code stream to the receiving queues of UEC and UED respectively; the code stream sending module of UEC and UED forwards the code stream to UEC and UED.
  • the service control controls the receiving of the switching module from UEA to UEB through the HTTP REST interface, and then informs the code stream receiving module to stop sending the code stream of UEA and start Send the code stream of UEB.
  • the receive queue S1 receives A1 and B1, and the transmit queue S2 transmits A1 and B1.
  • the method further includes: monitoring the terminal parameters of the first receiving terminal and the first receiving terminal. 1. The network bandwidth used by the receiving terminal; when the network bandwidth or the terminal parameter changes, the first target video stream and the first target audio stream are re-determined.
  • the method further includes: sending the re-determined first target video stream and the first target audio stream to the first receiving terminal.
  • a target audio stream
  • the above method further includes: selecting from the plurality of first video streams and the plurality of first audio streams according to the network bandwidth used by the second receiving terminal and the terminal parameters of the second receiving terminal.
  • the second target video stream and the second target audio stream are determined, wherein the second target video stream is the video stream with the highest resolution supported by the second receiving terminal among the plurality of first video streams, and the second target video stream is the highest resolution video stream supported by the second receiving terminal.
  • the target audio stream is the audio stream with the best sound quality supported by the second receiving terminal among the plurality of first audio streams, and the second receiving terminal is a terminal that conducts a conversation with the first sending terminal and the first receiving terminal ; Send the second target video stream and the second target audio stream to the second receiving terminal.
  • the method further includes: monitoring the network bandwidth and the network bandwidth used by the second receiving terminal and the The above-mentioned terminal parameters; when the above-mentioned network bandwidth used by the above-mentioned second receiving terminal or the above-mentioned terminal parameters changes, the above-mentioned second target video stream and the above-mentioned second target audio stream are re-determined.
  • the method further includes: sending the re-determined second target video stream and the second target audio stream to the second receiving terminal. Two target audio streams.
  • the above method further includes: acquiring multiple second video streams and multiple second audio streams sent by the second sending terminal, wherein the different second video streams have different formats or different resolutions , the contents corresponding to the multiple second video streams are the same, the formats of the different second audio streams are different, the contents corresponding to the multiple second audio streams are the same, and the second sending terminal is the same as the first sending terminal, the first sending terminal A terminal that receives a terminal for a conversation; according to the terminal parameters of the first receiving terminal and the network bandwidth used by the first receiving terminal, determine a third video stream from the plurality of second video streams and the plurality of second audio streams.
  • a target video stream and a third target audio stream wherein the third target video stream is the video stream with the highest resolution supported by the first receiving terminal among the plurality of second video streams, and the third target audio stream is Among the plurality of second audio streams, the first receiving terminal supports playing the audio stream with the best sound quality; and sends the third target video stream and the third target audio stream to the first receiving terminal.
  • the method further includes: monitoring the network bandwidth and the network bandwidth used by the first receiving terminal and the The above-mentioned terminal parameters; when the above-mentioned network bandwidth or the above-mentioned terminal parameters used by the above-mentioned first receiving terminal change, the above-mentioned third target video stream and the above-mentioned third target audio stream are re-determined.
  • the method further includes: sending the re-determined third target video stream and the third target audio stream to the first receiving terminal.
  • the above method further includes: selecting from the plurality of second video streams and the plurality of second audio streams according to the network bandwidth used by the second receiving terminal and the terminal parameters of the second receiving terminal A fourth target video stream and a fourth target audio stream are determined, wherein the fourth target video stream is the video stream with the highest resolution supported by the second receiving terminal among the plurality of second video streams, and the fourth target video stream is the highest resolution video stream supported by the second receiving terminal.
  • the target audio stream is the audio stream with the best sound quality supported by the second receiving terminal among the plurality of second audio streams, and the second receiving terminal is the same as the first sending terminal, the first receiving terminal, and the second sending terminal.
  • a terminal that conducts a conversation sending the above-mentioned fourth target video stream and the above-mentioned fourth target audio stream to the above-mentioned second receiving terminal.
  • the method further includes: monitoring the network bandwidth and the network bandwidth used by the second receiving terminal and the The above-mentioned terminal parameters; when the above-mentioned network bandwidth or the above-mentioned terminal parameters used by the above-mentioned second receiving terminal change, the above-mentioned fourth target video stream and the above-mentioned fourth target audio stream are re-determined.
  • the method further includes: sending the re-determined fourth target video stream and the fourth target audio stream to the second receiving terminal.
  • Four target audio streams are possible.
  • the above method further includes: in the case of acquiring multiple target video streams and multiple target audio streams, sending the multiple target video streams to each receiving terminal during the session, and sending the multiple target video streams to each receiving terminal during the session.
  • the N target audio streams with the loudest voices in the above-mentioned multiple target audio streams are sent to each of the above-mentioned receiving terminals, wherein the above-mentioned N is a positive integer, and the above-mentioned N is determined according to the number of the above-mentioned target audio streams, and the above-mentioned target video streams include the above-mentioned first
  • the target video stream, the target audio stream includes the first target audio stream, and each receiving terminal includes the first receiving terminal.
  • an audio and video forwarding method is also provided, as shown in FIG. 13 , including:
  • S1304 Send the multiple first video streams to the server, so that the server determines the first target video stream and the first target audio stream from the multiple first video streams and the multiple first audio streams, and Sending the first target video stream and the first target audio stream to the first receiving terminal, wherein the first target video stream is a resolution supported by the first receiving terminal for playback among the plurality of first video streams The highest video stream, the first target audio stream is the audio stream with the best sound quality supported by the first receiving terminal among the plurality of first audio streams.
  • a plurality of first video streams with different formats and resolutions sent by the first sending terminal and multiple first video streams with different formats sent by the first sending terminal can be obtained.
  • an audio stream, and which video stream and which audio stream to send to the first receiving terminal is determined according to the terminal parameters of the first receiving terminal and the network bandwidth used by the first receiving terminal, therefore, it can be guaranteed that the first receiving terminal plays The best video stream and audio stream that can be supported solve the problem that the video cannot be played in the video conference, and achieve the effect of improving the stability of the video conference.
  • this embodiment please refer to the above examples, which will not be repeated here.
  • the above method before obtaining the plurality of first video streams and the plurality of first audio streams, the above method further includes: obtaining a video coding format, a video coding resolution and an audio coding format from the above server;
  • the above-mentioned video encoding format, video encoding resolution and audio encoding format encode the original video stream and the original audio stream to obtain the plurality of first video streams and the plurality of first audio streams.
  • an audio and video forwarding method is also provided, as shown in FIG. 14 , including:
  • the first receiving terminal receives the first target video stream and the first target audio stream sent by the server, where the first target video stream is the server A video stream determined from a plurality of first video streams, the first target audio stream is an audio stream determined from a plurality of first audio streams, and the first target video stream is an audio stream determined from the plurality of first video streams
  • the video stream with the highest resolution supported by the first receiving terminal above, the first target audio stream is the audio stream with the best sound quality supported by the first receiving terminal among the plurality of first audio streams, and the different above
  • the formats or resolutions of the first video streams are different, the contents corresponding to the multiple first video streams are the same, and the multiple first audio streams are the audio streams sent by the first sending terminal to the server.
  • the formats of the streams are different, and the contents corresponding to the above-mentioned multiple first audio streams are the same;
  • the first receiving terminal plays the first target video stream and the first target audio stream.
  • a plurality of first video streams with different formats and resolutions sent by the first sending terminal and multiple first video streams with different formats sent by the first sending terminal can be obtained.
  • an audio stream, and which video stream and which audio stream to send to the first receiving terminal is determined according to the terminal parameters of the first receiving terminal and the network bandwidth used by the first receiving terminal, therefore, it can be guaranteed that the first receiving terminal plays The best video stream and audio stream that can be supported solve the problem that the video cannot be played in the video conference, and achieve the effect of improving the stability of the video conference.
  • this embodiment please refer to the above examples, which will not be repeated here.
  • the above-mentioned method further includes: the above-mentioned first receiving terminal receives a third target video stream and a third target audio stream sent by a server, wherein the above-mentioned third target video stream is obtained by the above-mentioned server from a plurality of second The video stream determined in the video stream, the third target audio stream is the audio stream determined from a plurality of second audio streams, and the third target video stream is the first received audio stream in the plurality of second video streams.
  • the video stream with the highest resolution supported by the terminal for playback, the third target audio stream is the audio stream with the best sound quality supported by the first receiving terminal among the plurality of second audio streams, and the audio streams of the different second video streams are
  • the formats or resolutions are different, the contents corresponding to the multiple first video streams are the same, the multiple first audio streams are the audio streams sent by the first sending terminal to the server, and the formats of the different first audio streams are different,
  • the content corresponding to the plurality of first audio streams is the same; the first receiving terminal plays the third target video stream and the third target audio stream.
  • the above-mentioned first receiving terminal playing the above-mentioned third target video stream includes: the above-mentioned first receiving terminal stops playing the above-mentioned first target video stream and the above-mentioned first receiving terminal plays the above-mentioned third target video stream.
  • the above-mentioned first receiving terminal playing the above-mentioned third target video stream includes: the above-mentioned first receiving terminal playing the above-mentioned first target video stream in a first playing area and the above-mentioned first receiving terminal in a second playing area. Play the above-mentioned third target video stream.
  • the above-mentioned method further includes: when the above-mentioned first receiving terminal receives a plurality of target audio streams, mixing the above-mentioned plurality of target audio streams into one audio stream, wherein the above-mentioned target audio stream includes The above-mentioned first target audio stream; the above-mentioned first receiving terminal plays the above-mentioned mixed audio stream.
  • playing the first target video stream and the first target audio stream by the first receiving terminal includes: the first receiving terminal performing synchronization on the first target video stream and the first target audio stream Operation: The first receiving terminal plays the synchronized first target video stream and the first target audio stream.
  • an audio and video forwarding device is also provided, as shown in FIG. 15 , including:
  • the first obtaining unit 1502 is configured to obtain multiple first video streams and multiple first audio streams sent by the first sending terminal during the session between the first sending terminal and the first receiving terminal, wherein different The formats or resolutions of the above-mentioned first video streams are different, the contents corresponding to the above-mentioned multiple first video streams are the same, the formats of the different first audio streams are different, and the corresponding contents of the above-mentioned multiple first audio streams are the same;
  • the first determining unit 1504 is configured to determine the first video stream from the plurality of first video streams and the plurality of first audio streams according to the terminal parameters of the first receiving terminal and the network bandwidth used by the first receiving terminal.
  • a target video stream and a first target audio stream wherein the first target video stream is the video stream with the highest resolution supported by the first receiving terminal among the plurality of first video streams, and the first target audio stream It is the audio stream with the best sound quality that the first receiving terminal supports to play among the plurality of first audio streams;
  • the first sending unit 1506 is configured to send the first target video stream and the first target audio stream to the first receiving terminal.
  • a plurality of first video streams with different formats and resolutions sent by the first sending terminal and multiple first video streams with different formats sent by the first sending terminal can be obtained.
  • an audio stream, and which video stream and which audio stream to send to the first receiving terminal is determined according to the terminal parameters of the first receiving terminal and the network bandwidth used by the first receiving terminal, therefore, it can be guaranteed that the first receiving terminal plays The best video stream and audio stream that can be supported solve the problem that the video cannot be played in the video conference, and achieve the effect of improving the stability of the video conference.
  • this embodiment please refer to the above examples, which will not be repeated here.
  • the above-mentioned apparatus further includes: a first monitoring unit configured to monitor the above-mentioned first receiving terminal after the above-mentioned first target video stream and the above-mentioned first target audio stream are sent to the above-mentioned first receiving terminal
  • the terminal parameters of the terminal and the network bandwidth used by the first receiving terminal the second determining unit is configured to re-determine the first target video stream and the above-mentioned first target video stream when the network bandwidth or the above-mentioned terminal parameters change.
  • the first target audio stream is configured to monitor the above-mentioned first receiving terminal after the above-mentioned first target video stream and the above-mentioned first target audio stream are sent to the above-mentioned first receiving terminal.
  • the above-mentioned apparatus further includes: a second sending unit, configured to send the re-determined above-mentioned first target video stream to the above-mentioned first receiving terminal after re-determining the above-mentioned first target video stream and the above-mentioned first target audio stream The first target video stream and the above-mentioned first target audio stream.
  • a second sending unit configured to send the re-determined above-mentioned first target video stream to the above-mentioned first receiving terminal after re-determining the above-mentioned first target video stream and the above-mentioned first target audio stream The first target video stream and the above-mentioned first target audio stream.
  • the above-mentioned apparatus further includes: a third determining unit configured to, according to the network bandwidth used by the second receiving terminal and the terminal parameters of the above-mentioned second receiving terminal, select from the above-mentioned multiple first video streams and A second target video stream and a second target audio stream are determined from the plurality of first audio streams, wherein the second target video stream is a resolution supported by the second receiving terminal for playback in the plurality of first video streams The highest video stream, the second target audio stream is the audio stream with the best sound quality supported by the second receiving terminal among the plurality of first audio streams, and the second receiving terminal is the same as the first sending terminal, the above The first receiving terminal is a terminal for conducting a conversation; the third sending unit is configured to send the second target video stream and the second target audio stream to the second receiving terminal.
  • a third determining unit configured to, according to the network bandwidth used by the second receiving terminal and the terminal parameters of the above-mentioned second receiving terminal, select from the above-mentioned multiple first video streams and A
  • the above-mentioned apparatus further includes: a second monitoring unit configured to monitor the above-mentioned second receiving terminal after the above-mentioned second target video stream and the above-mentioned second target audio stream are sent to the above-mentioned second receiving terminal the above-mentioned network bandwidth and the above-mentioned terminal parameters used by the terminal; the fourth determining unit is configured to re-determine the above-mentioned second target video stream when the above-mentioned network bandwidth used by the above-mentioned second receiving terminal or the above-mentioned terminal parameter changes with the above second destination audio stream.
  • the above-mentioned apparatus further includes: a fourth sending unit configured to send the re-determined above-mentioned second target audio stream to the above-mentioned second receiving terminal after re-determining the above-mentioned second target video stream and the above-mentioned second target audio stream The second target video stream and the above-mentioned second target audio stream.
  • a fourth sending unit configured to send the re-determined above-mentioned second target audio stream to the above-mentioned second receiving terminal after re-determining the above-mentioned second target video stream and the above-mentioned second target audio stream The second target video stream and the above-mentioned second target audio stream.
  • the above-mentioned apparatus further includes: a second obtaining unit, configured to obtain multiple second video streams and multiple second audio streams sent by the second sending terminal, wherein different second video streams above The formats of the streams are different or the resolutions are different, the contents corresponding to the multiple second video streams are the same, the formats of the different second audio streams are different, the contents corresponding to the multiple second audio streams are the same, and the second sending terminal is A terminal that conducts a conversation with the first sending terminal and the first receiving terminal; the fifth determining unit is configured to, according to the terminal parameters of the first receiving terminal and the network bandwidth used by the first receiving terminal, select from the plurality of first receiving terminals.
  • a third target video stream and a third target audio stream are determined from the two video streams and the plurality of second audio streams, wherein the third target video stream is supported by the first receiving terminal among the plurality of second video streams
  • the video stream with the highest resolution to be played, the above-mentioned third target audio stream is the audio stream with the best sound quality supported by the above-mentioned first receiving terminal in the above-mentioned multiple second audio streams;
  • the fifth sending unit is set to the above-mentioned
  • the third target video stream and the third target audio stream are sent to the first receiving terminal.
  • the above-mentioned apparatus further includes: a third monitoring unit configured to monitor the above-mentioned first receiving terminal after the above-mentioned third target video stream and the above-mentioned third target audio stream are sent to the above-mentioned first receiving terminal the above-mentioned network bandwidth and the above-mentioned terminal parameters used by the terminal; the sixth determination unit is configured to re-determine the above-mentioned third target video stream when the above-mentioned network bandwidth or the above-mentioned terminal parameter used by the above-mentioned first receiving terminal changes with the above third destination audio stream.
  • the above-mentioned apparatus further includes: a sixth sending unit, configured to send the re-determined above-mentioned re-determined audio stream to the above-mentioned first receiving terminal after re-determining the above-mentioned third target video stream and the above-mentioned third target audio stream The third target video stream and the above-mentioned third target audio stream.
  • a sixth sending unit configured to send the re-determined above-mentioned re-determined audio stream to the above-mentioned first receiving terminal after re-determining the above-mentioned third target video stream and the above-mentioned third target audio stream The third target video stream and the above-mentioned third target audio stream.
  • the above-mentioned apparatus further includes: a seventh determining unit, configured to, according to the network bandwidth used by the second receiving terminal and the terminal parameters of the above-mentioned second receiving terminal, select from the above-mentioned multiple second video streams and A fourth target video stream and a fourth target audio stream are determined from the plurality of second audio streams, wherein the fourth target video stream is a resolution supported by the second receiving terminal for playback in the plurality of second video streams The highest video stream, the fourth target audio stream is the audio stream with the best sound quality supported by the second receiving terminal among the plurality of second audio streams, and the second receiving terminal is the same as the first sending terminal and the second receiving terminal.
  • a receiving terminal and a second sending terminal conducting a conversation; a seventh sending unit is configured to send the fourth target video stream and the fourth target audio stream to the second receiving terminal.
  • the above-mentioned apparatus further includes: a fourth monitoring unit configured to monitor the above-mentioned second receiving terminal after the above-mentioned fourth target video stream and the above-mentioned fourth target audio stream are sent to the above-mentioned second receiving terminal the above-mentioned network bandwidth and the above-mentioned terminal parameters used by the terminal; the eighth determining unit is configured to re-determine the above-mentioned fourth target video stream when the above-mentioned network bandwidth or the above-mentioned terminal parameter used by the above-mentioned second receiving terminal changes with the above fourth destination audio stream.
  • the above-mentioned apparatus further includes: an eighth sending unit, configured to send the re-determined above-mentioned second receiving terminal to the above-mentioned second receiving terminal after re-determining the above-mentioned fourth target video stream and the above-mentioned fourth target audio stream The fourth target video stream and the above-mentioned fourth target audio stream.
  • an eighth sending unit configured to send the re-determined above-mentioned second receiving terminal to the above-mentioned second receiving terminal after re-determining the above-mentioned fourth target video stream and the above-mentioned fourth target audio stream The fourth target video stream and the above-mentioned fourth target audio stream.
  • the above-mentioned apparatus further includes: a ninth sending unit, configured to send the above-mentioned multiple target video streams to the session process when multiple target video streams and multiple target audio streams are acquired
  • a ninth sending unit configured to send the above-mentioned multiple target video streams to the session process when multiple target video streams and multiple target audio streams are acquired
  • Each receiving terminal in the above-mentioned multiple target audio streams sends the loudest N target audio streams to each of the above-mentioned receiving terminals, wherein the above-mentioned N is a positive integer, and the above-mentioned N is determined according to the number of the above-mentioned target audio streams
  • the target video stream includes the first target video stream
  • the target audio stream includes the first target audio stream
  • each receiving terminal includes the first receiving terminal.
  • an audio and video forwarding device is also provided, as shown in FIG. 16 , including:
  • the obtaining unit 1602 is configured to obtain a plurality of first video streams and a plurality of first audio streams during the session between the first sending terminal and the first receiving terminal, wherein the formats of the different first video streams are different Or the resolutions are different, the contents corresponding to the multiple first video streams are the same, the formats of the different first audio streams are different, and the contents corresponding to the multiple first audio streams are the same;
  • the sending unit 1604 is configured to send the plurality of first video streams to the server, so that the server determines the first target video stream and the first target video stream from the plurality of first video streams and the plurality of first audio streams. target audio stream, and send the first target video stream and the first target audio stream to the first receiving terminal, where the first target video stream is the first receiving terminal among the plurality of first video streams
  • the video stream with the highest resolution supported for playback, and the first target audio stream is the audio stream with the best sound quality supported by the first receiving terminal among the plurality of first audio streams.
  • a plurality of first video streams with different formats and resolutions sent by the first sending terminal and multiple first video streams with different formats sent by the first sending terminal can be obtained.
  • an audio stream, and which video stream and which audio stream to send to the first receiving terminal is determined according to the terminal parameters of the first receiving terminal and the network bandwidth used by the first receiving terminal, therefore, it can be guaranteed that the first receiving terminal plays The best video stream and audio stream that can be supported solve the problem that the video cannot be played in the video conference, and achieve the effect of improving the stability of the video conference.
  • this embodiment please refer to the above examples, which will not be repeated here.
  • the above-mentioned apparatus further includes: a receiving unit, configured to obtain a video encoding format and a video encoding resolution from the above-mentioned server before obtaining the above-mentioned plurality of first video streams and the above-mentioned plurality of first audio streams and the audio encoding format; the encoding unit is set to encode the original video stream and the original audio stream according to the above-mentioned video encoding format, video encoding resolution and audio encoding format to obtain the above-mentioned multiple first video streams and the above-mentioned multiple first video streams. audio stream.
  • an audio and video forwarding device is also provided, as shown in FIG. 17 , including:
  • the first receiving unit 1702 is configured to receive the first target video stream and the first target audio stream sent by the server during the session between the first sending terminal and the first receiving terminal, wherein the first target video stream is: The video stream determined by the server from the plurality of first video streams, the first target audio stream is the audio stream determined from the plurality of first audio streams, and the first target video stream is the audio stream determined from the plurality of first audio streams.
  • the video stream with the highest resolution supported by the first receiving terminal, and the first target audio stream is the audio stream with the best sound quality supported by the first receiving terminal among the plurality of first audio streams.
  • the formats or resolutions of the above-mentioned first video streams are different, the contents corresponding to the above-mentioned multiple first video streams are the same, the above-mentioned multiple first audio streams are the audio streams sent by the first sending terminal to the above-mentioned server, and the different above-mentioned first video streams are the audio streams sent by the first sending terminal to the above-mentioned server.
  • the formats of an audio stream are different, and the contents corresponding to the above-mentioned multiple first audio streams are the same;
  • the first playing unit 1704 is configured to play the first target video stream and the first target audio stream on the first receiving terminal.
  • a plurality of first video streams with different formats and resolutions sent by the first sending terminal and multiple first video streams with different formats sent by the first sending terminal can be obtained.
  • an audio stream, and which video stream and which audio stream to send to the first receiving terminal is determined according to the terminal parameters of the first receiving terminal and the network bandwidth used by the first receiving terminal, therefore, it can be guaranteed that the first receiving terminal plays The best video stream and audio stream that can be supported solve the problem that the video cannot be played in the video conference, and achieve the effect of improving the stability of the video conference.
  • this embodiment please refer to the above examples, which will not be repeated here.
  • the above-mentioned apparatus further includes: a second receiving unit configured to receive a third target video stream and a third target audio stream sent by the server, wherein the above-mentioned third target video stream is the above-mentioned server from multiple A video stream determined from a plurality of second video streams, the third target audio stream is an audio stream determined from a plurality of second audio streams, and the third target video stream is an audio stream determined from the plurality of second video streams.
  • the video stream with the highest resolution supported by the first receiving terminal, the third target audio stream is the audio stream with the best sound quality supported by the first receiving terminal among the plurality of second audio streams, and the second audio stream is different.
  • the formats or resolutions of the video streams are different, the contents corresponding to the multiple first video streams are the same, and the multiple first audio streams are the audio streams sent by the first sending terminal to the server. If the formats are different, the contents corresponding to the plurality of first audio streams are the same; the second playback unit is configured to enable the first receiving terminal to play the third target video stream and the third target audio stream.
  • the second playing unit includes: a first playing module configured to stop playing the first target video stream and the first receiving terminal to play the third target video stream.
  • the second playback unit includes: a second playback module configured to play the first target video stream in a first playback area and the first receiving terminal to play the third video stream in the second playback area target video stream.
  • the above-mentioned apparatus further includes: a mixing unit configured to mix the above-mentioned multiple target audio streams into one audio stream when the above-mentioned first receiving terminal receives multiple target audio streams, wherein , the above-mentioned target audio stream includes the above-mentioned first target audio stream; and a third playing unit is configured to play the above-mentioned mixed audio stream.
  • the first playback unit includes: a synchronization module configured to perform a synchronization operation on the first target video stream and the first target audio stream; a third playback module configured to play after synchronization The above-mentioned first target video stream and the above-mentioned first target audio stream.
  • an audio and video forwarding system is also provided, as shown in FIG. 18 , including:
  • the first obtaining unit 1802 is configured to obtain multiple first video streams and multiple first audio streams sent by the first sending terminal during the session between the first sending terminal and the first receiving terminal, wherein different The formats or resolutions of the above-mentioned first video streams are different, the contents corresponding to the above-mentioned multiple first video streams are the same, the formats of the different first audio streams are different, and the corresponding contents of the above-mentioned multiple first audio streams are the same;
  • the first determining unit 1804 is configured to determine the first video stream from the plurality of first video streams and the plurality of first audio streams according to the terminal parameters of the first receiving terminal and the network bandwidth used by the first receiving terminal.
  • a target video stream and a first target audio stream wherein the first target video stream is the video stream with the highest resolution supported by the first receiving terminal among the plurality of first video streams, and the first target audio stream It is the audio stream with the best sound quality that the first receiving terminal supports to play among the plurality of first audio streams;
  • the first sending unit 1806 is configured to send the first target video stream and the first target audio stream to the first receiving terminal.
  • a plurality of first video streams with different formats and resolutions sent by the first sending terminal and multiple first video streams with different formats sent by the first sending terminal can be obtained.
  • an audio stream, and which video stream and which audio stream to send to the first receiving terminal is determined according to the terminal parameters of the first receiving terminal and the network bandwidth used by the first receiving terminal, therefore, it can be guaranteed that the first receiving terminal plays The best video stream and audio stream that can be supported solve the problem that the video cannot be played in the video conference, and achieve the effect of improving the stability of the video conference.
  • this embodiment please refer to the above examples, which will not be repeated here.
  • an audio and video forwarding terminal is also provided, as shown in FIG. 19 , including:
  • the obtaining unit 1902 is configured to obtain a plurality of first video streams and a plurality of first audio streams during the session between the first sending terminal and the first receiving terminal, wherein the formats of the different first video streams are different Or the resolutions are different, the contents corresponding to the multiple first video streams are the same, the formats of the different first audio streams are different, and the contents corresponding to the multiple first audio streams are the same;
  • the sending unit 1904 is configured to send the plurality of first video streams to the server, so that the server determines the first target video stream and the first target video stream from the plurality of first video streams and the plurality of first audio streams. target audio stream, and send the first target video stream and the first target audio stream to the first receiving terminal, where the first target video stream is the first receiving terminal among the plurality of first video streams
  • the video stream with the highest resolution supported for playback, and the first target audio stream is the audio stream with the best sound quality supported by the first receiving terminal among the plurality of first audio streams.
  • a plurality of first video streams with different formats and resolutions sent by the first sending terminal and multiple first video streams with different formats sent by the first sending terminal can be obtained.
  • an audio stream, and which video stream and which audio stream to send to the first receiving terminal is determined according to the terminal parameters of the first receiving terminal and the network bandwidth used by the first receiving terminal, therefore, it can be guaranteed that the first receiving terminal plays The best video stream and audio stream that can be supported solve the problem that the video cannot be played in the video conference, and achieve the effect of improving the stability of the video conference.
  • this embodiment please refer to the above examples, which will not be repeated here.
  • an audio and video forwarding terminal is also provided, as shown in FIG. 20 , including:
  • the first receiving unit 2002 is configured to receive the first target video stream and the first target audio stream sent by the server during the session between the first sending terminal and the first receiving terminal, wherein the first target video stream is: The video stream determined by the server from the plurality of first video streams, the first target audio stream is the audio stream determined from the plurality of first audio streams, and the first target video stream is the audio stream determined from the plurality of first audio streams.
  • the video stream with the highest resolution supported by the first receiving terminal, and the first target audio stream is the audio stream with the best sound quality supported by the first receiving terminal among the plurality of first audio streams.
  • the formats or resolutions of the above-mentioned first video streams are different, the contents corresponding to the above-mentioned multiple first video streams are the same, the above-mentioned multiple first audio streams are the audio streams sent by the first sending terminal to the above-mentioned server, and the different above-mentioned first video streams are the audio streams sent by the first sending terminal to the above-mentioned server.
  • the formats of an audio stream are different, and the contents corresponding to the above-mentioned multiple first audio streams are the same;
  • the first playing unit 2004 is configured to play the first target video stream and the first target audio stream on the first receiving terminal.
  • a plurality of first video streams with different formats and resolutions sent by the first sending terminal and multiple first video streams with different formats sent by the first sending terminal can be obtained.
  • an audio stream, and which video stream and which audio stream to send to the first receiving terminal is determined according to the terminal parameters of the first receiving terminal and the network bandwidth used by the first receiving terminal, therefore, it can be guaranteed that the first receiving terminal plays The best video stream and audio stream that can be supported solve the problem that the video cannot be played in the video conference, and achieve the effect of improving the stability of the video conference.
  • this embodiment please refer to the above examples, which will not be repeated here.
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, wherein the computer program is configured to execute the steps in any one of the above method embodiments when running.
  • the above-mentioned computer-readable storage medium may include, but is not limited to, a USB flash drive, a read-only memory (Read-Only Memory, referred to as ROM for short), and a random access memory (Random Access Memory, referred to as RAM for short) , mobile hard disk, magnetic disk or CD-ROM and other media that can store computer programs.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • Embodiments of the present application further provide an electronic device, including a memory and a processor, where a computer program is stored in the memory, and the processor is configured to run the computer program to execute the steps in any one of the above method embodiments.
  • the above-mentioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the above-mentioned processor, and the input-output device is connected to the above-mentioned processor.
  • the process of forwarding the video of the first sending terminal it is possible to obtain a plurality of first video streams with different formats sent by the first sending terminal and different resolutions and those with different formats sent by the first sending terminal.
  • a plurality of first audio streams, and which video stream and which audio stream to send to the first receiving terminal is determined according to the terminal parameters of the first receiving terminal and the network bandwidth used by the first receiving terminal. Therefore, the first receiving terminal can be guaranteed.
  • the receiving terminal plays the best video stream and audio stream that can be supported, which solves the problem that the video cannot be played in the video conference, and achieves the effect of improving the stability of the video conference.
  • each module or each step of the above-mentioned embodiments of the present application can be implemented by a general-purpose computing device, and they can be concentrated on a single computing device, or distributed in multiple computing devices. network, they can be implemented in program code executable by a computing device, so that they can be stored in a storage device and executed by the computing device, and in some cases, can be executed in a different order than here.
  • the steps shown or described can be implemented either by fabricating them separately into individual integrated circuit modules, or by fabricating multiple modules or steps in them into a single integrated circuit module. As such, the present application is not limited to any particular combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Graphics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

一种音视频转发方法、装置、终端与系统,上述方法包括在第一发送终端与第一接收终端进行会话的过程中,获取第一发送终端发送的多个第一视频流与多个第一音频流(S302);按照第一接收终端的终端参数与第一接收终端所使用的网络带宽,从多个第一视频流与多个第一音频流中确定出第一目标视频流与第一目标音频流(S304),其中,第一目标视频流为在多个第一视频流中第一接收终端支持播放的分辨率最高的视频流,第一目标音频流为在多个第一音频流中第一接收终端支持播放的音质最好的音频流;将第一目标视频流与第一目标音频流发送给第一接收终端(S306)。

Description

音视频转发方法、装置、终端与系统
相关申请的交叉引用
本申请基于申请号为202011257786.7、申请日为2020年11月11日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请实施例涉及通信领域,具体而言,涉及一种音视频转发方法、装置、终端与系统。
背景技术
在一些情形下,在视频会议的过程中,通常涉及到多方会议,转发服务器需要将每一方的视频内容转发到其他参会方。而由于参会方的硬件设备的不同,视频可能会涉及到多种不同的格式。转发服务器需要将不同格式或者分辨率的音视频转发到不同的参会方,而在此过程中,会涉及到参会方不支持某个格式的视频或者参会方的网络状况不稳定,无法接收分辨率高的视频的情况,造成视频会议中,参会方无法播放视频,影响到会议稳定性。
发明内容
有鉴于此,本申请实施例提供了一种音视频转发方法、装置、终端与系统。
根据本申请的一个实施例,提供了一种音视频转发方法,包括:在第一发送终端与第一接收终端进行会话的过程中,获取上述第一发送终端发送的多个第一视频流与多个第一音频流,其中,不同的上述第一视频流的格式不同或者分辨率不同,上述多个第一视频流对应的内容相同,不同的上述第一音频流的格式不同,上述多个第一音频流对应的内容相同;按照上述第一接收终端的终端参数与上述第一接收终端所使用的网络带宽,从上述多个第一视频流与上述多个第一音频流中确定出第一目标视频流与第一目标音频流,其中,上述第一目标视频流为在上述多个第一视频流中上述第一接收终端支持播放的分辨率最高的视频流,上述第一目标音频流为在上述多个第一音频流中上述第一接收终端支持播放的音质最好的音频流;将上述第一目标视频流与上述第一目标音频流发送给上述第一接收终端。
根据本申请的另一个实施例,提供了一种音视频转发方法,包括:在第一发送终端与第一接收终端进行会话的过程中,获取多个第一视频流与多个第一音频流,其中,不同的上述第一视频流的格式不同或者分辨率不同,上述多个第一视频流对应的内容相同,不同的上述第一音频流的格式不同,上述多个第一音频流对应的内容相同;将上述多个第一视频流发送到服务器,以使上述服务器从上述多个第一视频流与上述多个第一音频流中确定出第一目标视频流与第一目标音频流,并将上述第一目标视频流与上述第一目标音频流发送给上述第一接收终端,其中,上述第一目标视频流为在上述多个第一视频流中上述第一接收终端支持播放的分辨率最高的视频流,上述第一目标音频流为在上述多个第一音频流中上述第一接收终端支持播放的音质最好的音频流。
根据本申请的另一个实施例,提供了一种音视频转发方法,包括:在第一发送终端与第一接收终端进行会话的过程中,上述第一接收终端接收服务器发送的第一目标视频流与第一目标音频流,其中,上述第一目标视频流为上述服务器从多个第一视频流中确定出的视频流,上述第一目标音频流为从多个第一音频流中确定出的音频流,上述第一目标视频流为在上述多个第一视频流中上述第一接收终端支持播放的分辨率最高的视频流,上述第一目标音频流为在上述多个第一音频流中上述第一接收终端支持播放的音质最好的音频流,不同的上述第一视频流的格式不同或者分辨率不同,上述多个第一视频流对应的内容相同,上述多个第一音频流为第一发送终端发送给上述服务器的音频流,不同的上述第一音频流的格式不同,上 述多个第一音频流对应的内容相同;上述第一接收终端播放上述第一目标视频流与上述第一目标音频流。
根据本申请的另一个实施例,提供了一种音视频转发装置,包括:第一获取单元,被设置为在第一发送终端与第一接收终端进行会话的过程中,获取上述第一发送终端发送的多个第一视频流与多个第一音频流,其中,不同的上述第一视频流的格式不同或者分辨率不同,上述多个第一视频流对应的内容相同,不同的上述第一音频流的格式不同,上述多个第一音频流对应的内容相同;第一确定单元,被设置为按照上述第一接收终端的终端参数与上述第一接收终端所使用的网络带宽,从上述多个第一视频流与上述多个第一音频流中确定出第一目标视频流与第一目标音频流,其中,上述第一目标视频流为在上述多个第一视频流中上述第一接收终端支持播放的分辨率最高的视频流,上述第一目标音频流为在上述多个第一音频流中上述第一接收终端支持播放的音质最好的音频流;第一发送单元,被设置为将上述第一目标视频流与上述第一目标音频流发送给上述第一接收终端。
根据本申请的另一个实施例,提供了一种音视频转发装置,包括:获取单元,被设置为在第一发送终端与第一接收终端进行会话的过程中,获取多个第一视频流与多个第一音频流,其中,不同的上述第一视频流的格式不同或者分辨率不同,上述多个第一视频流对应的内容相同,不同的上述第一音频流的格式不同,上述多个第一音频流对应的内容相同;发送单元,被设置为将上述多个第一视频流发送到服务器,以使上述服务器从上述多个第一视频流与上述多个第一音频流中确定出第一目标视频流与第一目标音频流,并将上述第一目标视频流与上述第一目标音频流发送给上述第一接收终端,其中,上述第一目标视频流为在上述多个第一视频流中上述第一接收终端支持播放的分辨率最高的视频流,上述第一目标音频流为在上述多个第一音频流中上述第一接收终端支持播放的音质最好的音频流。
根据本申请的另一个实施例,提供了一种音视频转发装置,包括:第一接收单元,被设置为在第一发送终端与第一接收终端进行会话的过程中,接收服务器发送的第一目标视频流与第一目标音频流,其中,上述第一目标视频流为上述服务器从多个第一视频流中确定出的视频流,上述第一目标音频流为从多个第一音频流中确定出的音频流,上述第一目标视频流为在上述多个第一视频流中上述第一接收终端支持播放的分辨率最高的视频流,上述第一目标音频流为在上述多个第一音频流中上述第一接收终端支持播放的音质最好的音频流,不同的上述第一视频流的格式不同或者分辨率不同,上述多个第一视频流对应的内容相同,上述多个第一音频流为第一发送终端发送给上述服务器的音频流,不同的上述第一音频流的格式不同,上述多个第一音频流对应的内容相同;第一播放单元,被设置为上述第一接收终端播放上述第一目标视频流与上述第一目标音频流。
根据本申请的另一个实施例,提供了一种音视频转发系统,包括:第一获取单元,被设置为在第一发送终端与第一接收终端进行会话的过程中,获取上述第一发送终端发送的多个第一视频流与多个第一音频流,其中,不同的上述第一视频流的格式不同或者分辨率不同,上述多个第一视频流对应的内容相同,不同的上述第一音频流的格式不同,上述多个第一音频流对应的内容相同;第一确定单元,被设置为按照上述第一接收终端的终端参数与上述第一接收终端所使用的网络带宽,从上述多个第一视频流与上述多个第一音频流中确定出第一目标视频流与第一目标音频流,其中,上述第一目标视频流为在上述多个第一视频流中上述第一接收终端支持播放的分辨率最高的视频流,上述第一目标音频流为在上述多个第一音频流中上述第一接收终端支持播放的音质最好的音频流;第一发送单元,被设置为将上述第一目标视频流与上述第一目标音频流发送给上述第一接收终端。
根据本申请的另一个实施例,提供了一种音视频转发终端,包括:获取单元,被设置为在第一发送终端与第一接收终端进行会话的过程中,获取多个第一视频流与多个第一音频流,其中,不同的上述第一视频流的格式不同或者分辨率不同,上述多个第一视频流对应的内容 相同,不同的上述第一音频流的格式不同,上述多个第一音频流对应的内容相同;发送单元,被设置为将上述多个第一视频流发送到服务器,以使上述服务器从上述多个第一视频流与上述多个第一音频流中确定出第一目标视频流与第一目标音频流,并将上述第一目标视频流与上述第一目标音频流发送给上述第一接收终端,其中,上述第一目标视频流为在上述多个第一视频流中上述第一接收终端支持播放的分辨率最高的视频流,上述第一目标音频流为在上述多个第一音频流中上述第一接收终端支持播放的音质最好的音频流。
根据本申请的另一个实施例,提供了一种音视频转发终端,包括:第一接收单元,被设置为在第一发送终端与第一接收终端进行会话的过程中,接收服务器发送的第一目标视频流与第一目标音频流,其中,上述第一目标视频流为上述服务器从多个第一视频流中确定出的视频流,上述第一目标音频流为从多个第一音频流中确定出的音频流,上述第一目标视频流为在上述多个第一视频流中上述第一接收终端支持播放的分辨率最高的视频流,上述第一目标音频流为在上述多个第一音频流中上述第一接收终端支持播放的音质最好的音频流,不同的上述第一视频流的格式不同或者分辨率不同,上述多个第一视频流对应的内容相同,上述多个第一音频流为第一发送终端发送给上述服务器的音频流,不同的上述第一音频流的格式不同,上述多个第一音频流对应的内容相同;第一播放单元,被设置为上述第一接收终端播放上述第一目标视频流与上述第一目标音频流。
根据本申请的又一个实施例,还提供了一种计算机可读存储介质,上述计算机可读存储介质中存储有计算机程序,其中,上述计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。
根据本申请的又一个实施例,还提供了一种电子装置,包括存储器和处理器,上述存储器中存储有计算机程序,上述处理器被设置为运行上述计算机程序以执行上述任一项方法实施例中的步骤。
附图说明
图1是根据本申请实施例的一种音视频转发方法的移动终端的硬件结构框图;
图2是根据本申请实施例的一种音视频转发方法的网络架构图;
图3是根据本申请实施例的一种音视频转发方法的流程图;
图4是根据本申请实施例的一种音视频转发方法的终端上行模块的结构框图;
图5是根据本申请实施例的一种音视频转发方法的终端下行模块的结构框图;
图6是根据本申请实施例的一种音视频转发方法的会议媒体转发服务器的结构框图;
图7是根据本申请实施例的一种音视频转发方法的终端码流发到会议媒体转发服务器进行转发模型的框图;
图8是根据本申请实施例的一种音视频转发方法的会议媒体转发服务器转发码流到终端模型的框图;
图9是根据本申请实施例的一种音视频转发方法的终端音频音量值排序转发流程图;
图10是根据本申请实施例的一种音视频转发方法的会议媒体转发服务器优选音频主动推流流程图;
图11是根据本申请实施例的一种音视频转发方法的会议媒体转发服务器优选视频主动推流流程图;
图12是根据本申请实施例的一种音视频转发方法的媒体源切换转发流程图;
图13是根据本申请实施例的另一种音视频转发方法的流程图;
图14是根据本申请实施例的又一种音视频转发方法的流程图;
图15是根据本申请实施例的音视频转发装置的结构框图;
图16是根据本申请实施例的另一种音视频转发装置的结构框图;
图17是根据本申请实施例的又一种音视频转发装置的结构框图;
图18是根据本申请实施例的音视频转发系统的结构框图;
图19是根据本申请实施例的音视频转发终端的结构框图;
图20是根据本申请实施例的另一种音视频转发终端的结构框图。
具体实施方式
下文中将参考附图并结合实施例来详细说明本申请的实施例。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
本申请实施例中所提供的方法实施例可以在移动终端、计算机终端或者类似的运算装置中执行。以运行在移动终端上为例,图1是本申请实施例的一种音视频转发方法的移动终端的硬件结构框图。如图1所示,移动终端可以包括一个或多个(图1中仅示出一个)处理器102(处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)和用于存储数据的存储器104,其中,上述移动终端还可以包括用于通信功能的传输设备106以及输入输出设备108。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述移动终端的结构造成限定。例如,移动终端还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。
存储器104可用于存储计算机程序,例如,应用软件的软件程序以及模块,如本申请实施例中的音视频转发方法对应的计算机程序,处理器102通过运行存储在存储器104内的计算机程序,从而执行各种功能应用以及数据处理,即实现上述的方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至移动终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
传输装置106用于经由一个网络接收或者发送数据。上述的网络具体实例可包括移动终端的通信供应商提供的无线网络。在一个实例中,传输装置106包括一个网络适配器(Network Interface Controller,简称为NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输装置106可以为射频(Radio Frequency,简称为RF)模块,其用于通过无线方式与互联网进行通讯。
本申请实施例可以运行于图2所示的网络架构上,如图2所示,该网络架构包括:第一发送终端、转发服务器与第一接收终端。第一发送终端将视频流编码为不同的格式或分辨率,以及将音频流编码为不同的格式,然后将不同格式或分辨率的多个第一视频流与不同格式的多个第一音频流发送到转发服务器,转发服务器获取第一接收终端的终端参数与第一接收终端使用的网络参数,然后从多个第一视频流中确定出第一接收终端支持的分辨率最高的第一目标视频流与从多个第一音频流中确定出音质最好的第一目标音频流,转发服务器将第一目标视频流与第一目标音频流转发给第二发送终端。
在本实施例中提供了一种运行于上述网络架构的音视频转发方法,图2是根据本申请实施例的音视频转发方法的流程图,如图3所示,该流程包括如下步骤:
S302,在第一发送终端与第一接收终端进行会话的过程中,获取上述第一发送终端发送的多个第一视频流与多个第一音频流,其中,不同的上述第一视频流的格式不同或者分辨率不同,上述多个第一视频流对应的内容相同,不同的上述第一音频流的格式不同,上述多个第一音频流对应的内容相同;
S304,按照上述第一接收终端的终端参数与上述第一接收终端所使用的网络带宽,从上述多个第一视频流与上述多个第一音频流中确定出第一目标视频流与第一目标音频流,其中,上述第一目标视频流为在上述多个第一视频流中上述第一接收终端支持播放的分辨率最高的视频流,上述第一目标音频流为在上述多个第一音频流中上述第一接收终端支持播放的音质最好的音频流;
S306,将上述第一目标视频流与上述第一目标音频流发送给上述第一接收终端。
通过本申请,由于在转发第一发送终端的视频的过程中,可以获取第一发送终端发送的 格式与分辨率不同的多个第一视频流和第一发送终端发送的格式不同的多个第一音频流,以及按照第一接收终端的终端参数和第一接收终端的所用的网络带宽来决定向第一接收终端发送哪一个视频流与哪一个音频流,因此,可以保证第一接收终端播放能够支持的最佳的视频流与音频流,解决了视频会议中无法播放视频的问题,达到提高视频会议稳定性的效果。
其中,上述步骤的执行主体可以转发服务器。
本申请中的上述方法可以应用在多人云会议的过程中。在多人云会议的过程中,每一个参会方都可以为第一发送终端,发送视频流与音频流,同时每一个参会方都可以为第一接收终端,接收视频流与音频流。服务器起到转发数据的作用。在会议过程中,可以中途加入新的参会方,某一个或者一些参会方也可以中途退出会议,退出后可以重新加入。
以下结合示例进行说明。本申请实施例中的音视频转发方法可以采用SFU的架构,终端将支持的音视频媒体能力(包括支持的格式、分辨率,终端参数以及网络参数)发送给媒体转发系统(转发服务器),媒体转发系统根据终端带宽和能力,选择质量最优的码流转发给终端。终端具备自适应解码和多种格式编码的能力,让媒体转发系统只要选择终端支持的能力即可。本申请实施例可以在大容量会议中,快速切换媒体源,达到快速平滑切换的效果。如将当前观看的某参会方的视频切换到另一个参会方的视频,更换观看的参会方。
本申请实施例中的终端可以为发送数据的终端或者也可以为接收数据的终端。应该说明的是,终端可以既具备第一发送终端(或第二发送终端以及其他发送终端)的能力,也可以具备第一接收终端(或第二接收终端以及其他接收终端)的能力。在多人会议中,一个终端需要将自身的视频发送给其他终端,也需要从其他终端获取其他终端的视频内容。
本申请实施例中的音质可以通过音色、音场感、层次感、定位感、透明感、解析力、整体平衡性、结像力与形体感等各个方面进行打分,打分结果加权求和为音色的分值。分值越高,音色越好。
作为一种实施例,上述终端包括上行模块和下行模块。如图4,上述上行模块包括:
采集模块,用于采集音视频原始数据。
音频编码模块,用于将音频数据编码为多个指定的音频格式。
视频编码模块,用于将视频数据编码为多个指定的视频格式。
打包发送模块,将多个音频流打包成多路流,通过一个端口发送;将多个视频流打包成多路流,通过一个端口发送。
编码控制模块,用于接收会议媒体转发服务器的控制,通知音频和视频编码模块启动要求的音视频格式编码,视频还会进行码流套餐的变换。码流套餐主要包括分辨率、码率和帧率等信息。
如图5,上述下行模块包括:
数据接收模块,用于接收音视频数据,将音频数据拆分为多个格式的码流;将视频数据拆分为多个格式和多个分辨率的码流。
解码自适应模块,用于检查音频码流的格式,根据检测出的格式启动音频解码;检查视频码流的格式,根据检测出的格式启动视频解码。
音频解码模块,用于将音频码流解析成线性码,支持多种格式同时解码。
视频解码模块,用于将视频码流解析成YUV数据,支持多种格式和分辨率同时解码。
输出模块,将音频码流混音后输出到声卡,将视频数据进行布局后输出到显卡,并进行音视频同步。
如图6所示,上述会议媒体转发结构(服务器)包括:
码流接收模块,接收终端发送的音视频码流,将多路流解复用。
码流发送模块,将多路终端的码流进行复用,发送给终端。
音频转发模块,包括:音量值获取模块,从选取的终端的码流的实时传送协议(Real-time Transport Protocol,简称为RTP)扩展中,将音量值获取出来;为了兼容老的终端,可以通过解码的方式获取音量值。音量值排序模块,将参与会议的所有终端,根据获取的音量值 大小进行排序,选取声音最大的三个端,分别发送到对应的终端。音频码流优选模块,通过网络带宽预测和终端音频能力,根据设定的音频优选策略,选择最优的音频格式码流,给终端发送最佳音质的码流。音频码流控制模块,根据优选的音频格式,向终端请求或停止指定音频格式码流的发送。
上述视频转发模块包括:码率自适应模块,通过网络带宽预测和终端视频能力,根据设定的码率策略,调整码率,并在码率无法满足要求的场景下,由视频码流优选模块进行接管。视频码流优选模块,根据网络带宽预测和终端视频能力,重新选择最佳编码格式和最佳码流套餐。视频码流控制模块,根据优选的视频格式和码流套餐,向终端请求或停止指定码流的发送。视频码流切换模块,用于会议场景中,所有人看相同的广播源(1个或多个终端)时,能够快速切换广播源。
在会议前或会议过程中,上述视频终端和上述会议媒体转发服务器协商多个音视频编解码能力,会议媒体转发服务器根据终端协商能力和网络情况,选择发给上述视频终端最佳的音视频格式和码流套餐。上述视频终端,接受会议媒体转发服务器器的控制,生成不同音视频格式和不同套餐的码流。将生成的音频格式码流和视频格式码流分别打复用,发送给会议媒体转发服务器进行解复用,然后媒体转发服务器将音频码流根据音量值进行排序,选取声音最大的3个端转发给需要的终端;将视频码流根据码型和套餐信息,转发给需要的终端。接收终端接收到多个不同格式的音频码流,进行自适应解码,并将声音进行混音;接收多个不同格式或者码流套餐的视频码流,进行自适应解码,并进行画面布局。
会议媒体转发服务器音量排序,终端在发送音频码流时,会采集音量值,将音量值放在RTP的扩展中,会议媒体转发服务器解析RTP时,获取扩展的音量值;为了兼容老终端,RTP扩展种没有音量值,需要进行解码获取音量值,在一些实施例中,减少解码的性能消耗,可以隔一段时间(比如1秒)解析一个音频包获取音量值。会议媒体转发服务器媒体切换,会议媒体服务器设置切换器的输入源和输出目的,输入源和输出目的都可以是多个,将收到的终端码流作为输入源输入到切换器,切换器将输入源的码流发送所有的输出目的终端,当需要切换广播源时,只需要将输入源进行切换即可,为了更好的平滑切换,一般需要等到新的输入源的I帧收到后,再切换到新的源。
图4中终端上行模块包括采集模块、音频编码模块、视频编码模块、编码控制模块和打包发送模块。采集模块采集终端设备的音频和视频,分别送到音频编码模块和视频编码模块进行编码,需要启动哪些格式和码流套餐,由编码控制模块接收会议媒体转发服务器的控制,将编码后的码流发送给打包模块发送给会议媒体转发服务器。打包发送模块,将音频和视频码流进行打包,然后分别将音频码流和视频码流打复用,发送给会议媒体转发服务器。
图5中终端下行模块包括数据接收模块、解码自适应模块、音频解码模块、视频解码模块和输出模块。数据接收模块接收会议媒体转发服务器的码流,分别将音频码流和视频码流解复用,然后自适应识别音频格式和视频格式,通知音频解码模块和视频解码模块进行解码,解码后的码流发送到输出模块。输出模块将解出的音频码流进行混音,将解出的视频数据进行多画面布局,然后进行音视频同步呈现。
图6中会议媒体转发服务器包括码流接收模块、音频转发模块、视频转发模块和码流发送模块。码流接收模块接收终端上行的码流,将音频码流和视频码流解复用,解析成不同格式和不同码流套餐的流,音频解析出的流发送到音频转发模块,视频解析出的流发送到视频转发模块。音频转发模块对收到的音频流获取音量值,将会议中的所有流的音量值进行排序,获取声音最大的3个端,然后发送给码流发送模块;根据终端接收模块的反馈和网络情况,优选发给这个终端的最佳音频,通过音频码流控制模块发送给码流源终端。视频转发模块对收到的视频流,转发到码流切换模块然后发送给码流发送模块或者直接发送给码流发送模块;根据终端接收模块的反馈和网络情况,优选发给这个终端的最佳视频格式和码流套餐,通过视频码流控制模块发送给码流源终端。码流发送模块,将多个不同格式和不同码流套餐的流打复用,发送给终端。
图7为终端码流发到会议媒体转发服务器进行转发模型,包括4个视频终端和1个会议媒体转发服务器。UE1终端音频编码出G711和EVS格式的2路音频流,视频编码出H264 180P、H264 720P、H265 1080P套餐和H265 4K套餐视频流,发送到会议媒体转发服务器;会议媒体转发服务器,根据终端的能力和网络情况转发码流,音频转发模块将G711的码流转发给UE2,将EVS的码流转发给UE3和UE4;视频转发模块将H264 720P转发给UE2,将H265 4K转发给UE3和UE4;UE2、UE3和UE4自适应解码音频码流和视频码流。
图8为会议媒体转发服务器转发码流到终端模型,包括4个视频终端和1个会议媒体转发服务器。UE1终端音频编码出EVS何G711格式的音频流,视频编码出H265 4K、H264 180P和H264 720P套餐的视频流;UE2终端音频编码出G711、AMR格式的音频流,视频编码出H264 180P、H264 360P和H264 1080P套餐的视频流;UE3终端音频编码出G711、EVS格式的音频流,视频编码出H264 180P、H264 1080P和H265 4K套餐的视频流;会议媒体转发服务器,根据终端的能力和网络情况转发码流,音频转发模块将UE1的EVS、UE2的G711和UE3的G711码流转发给UE4;视频转发模块将UE1的H265 4K、UE2的H264 180P和UE3的H264 180P码流转发给UE4;UE4自适应解码音频码流和视频码流,然后进行混音和多画面显示。
图9为终端音频音量值排序转发流程,包括6个视频终端和1个会议媒体转发服务器。UE1、UE3和UE5终端在RTP扩展里面携带音量值,UE2、UE4和UE6终端未携带音量值。音频转发收到6个终端的音频流,在UE1、UE3和UE5的RTP中获取音量值,在UE2、UE4和UE6的码流解码后获取音量值;将6个终端的音量值排序后,获得声音最大的4个端是按照音量值排序后为UE2、UE4、UE3和UE1;将声音最大的3个端排除自己分别转发给UE1-4,也就是说,即使终端发送的音频流的音量值最大,终端也不会接收该音频流,因为该音频流为该终端自己发送的。将UE2、UE3和UE4转发给UE5和UE6。
图10所示,为会议媒体转发服务器优选音频主动推流流程。终端协商出支持的音频媒体能力,然后上会;会议媒体转发服务器获取声音最大的3个终端;然后根据终端能力和网络情况,优选出声音最大的3个终端的最佳音频媒体能力,转发终端。
图11为会议媒体转发服务器优选视频主动推流流程。终端上会后,根据上会顺序,会议媒体转发服务器主动推默认的画面给终端;然后根据每个终端的能力和网络情况,优选出最佳的视频码流转发给终端;推送画面数随着上会终端数发生变化,推送的最佳码流也跟着变化。
图12为媒体源切换转发流程,包括4个视频终端,UEA-UED、1个业务控制和1个会议媒体转发服务器。UEA和UEB上会后,业务控制通过HTTP REST接口控制会议媒体转发服务器接收码流,UEA和UEB的码流接收模块分别注册码流接收队列,终端上行码流分别流转到媒体码流池;UEC和UED上会后,业务控制通过HTTP REST接口控制会议媒体转发服务器发送码流,UEC和UED的码流发送模块分别注册码流接收队列;业务控制通过HTTP REST接口控制切换模块接收UEA的码流,然后发送到UEC和UED,切换模块分别注册接收队列接收UEA的码流,注册发送队列将码流分别发送UEC和UED的接收队列;UEC和UED的码流发送模块接收到码流后转发给UEC和UED。当需要将发给UEC和UED的UEA的码流切换为UEB时,业务控制通过HTTP REST接口控制将切换模块的接收由UEA变为UEB,然后通知码流接收模块停止发送UEA的码流,开始发送UEB的码流。接收队列S1接收A1和B1,发送队列S2发送A1和B1。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对在一些情形下做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例上述的方法。
在一个示例性实施例中,在将上述第一目标视频流与上述第一目标音频流发送给上述第 一接收终端之后,上述方法还包括:监控上述第一接收终端的上述终端参数与上述第一接收终端所使用的上述网络带宽;在上述网络带宽或上述终端参数发生变化的情况下,重新确定上述第一目标视频流与上述第一目标音频流。
在一个示例性实施例中,在重新确定上述第一目标视频流与上述第一目标音频流之后,上述方法还包括:向上述第一接收终端发送重新确定的上述第一目标视频流与上述第一目标音频流。
在一个示例性实施例中,上述方法还包括:按照第二接收终端所使用的网络带宽与上述第二接收终端的终端参数,从上述多个第一视频流与上述多个第一音频流中确定出第二目标视频流与第二目标音频流,其中,上述第二目标视频流为在上述多个第一视频流中上述第二接收终端支持播放的分辨率最高的视频流,上述第二目标音频流为在上述多个第一音频流中上述第二接收终端支持播放的音质最好的音频流,上述第二接收终端为与上述第一发送终端、上述第一接收终端进行会话的终端;将上述第二目标视频流与上述第二目标音频流发送给上述第二接收终端。
在一个示例性实施例中,在将上述第二目标视频流与上述第二目标音频流发送给上述第二接收终端之后,上述方法还包括:监控上述第二接收终端所使用的上述网络带宽与上述终端参数;在上述第二接收终端的所使用上述网络带宽或上述终端参数发生变化的情况下,重新确定上述第二目标视频流与上述第二目标音频流。
在一个示例性实施例中,在重新确定上述第二目标视频流与上述第二目标音频流之后,上述方法还包括:向上述第二接收终端发送重新确定的上述第二目标视频流与上述第二目标音频流。
在一个示例性实施例中,上述方法还包括:获取第二发送终端发送的多个第二视频流与多个第二音频流,其中,不同的上述第二视频流的格式不同或者分辨率不同,上述多个第二视频流对应的内容相同,不同的上述第二音频流的格式不同,上述多个第二音频流对应的内容相同,上述第二发送终端为与上述第一发送终端、第一接收终端进行会话的终端;按照上述第一接收终端的终端参数与上述第一接收终端所使用的网络带宽,从上述多个第二视频流与上述多个第二音频流中确定出第三目标视频流与第三目标音频流,其中,上述第三目标视频流为在上述多个第二视频流中上述第一接收终端支持播放的分辨率最高的视频流,上述第三目标音频流为在上述多个第二音频流中上述第一接收终端支持播放的音质最好的音频流;将上述第三目标视频流与上述第三目标音频流发送给上述第一接收终端。
在一个示例性实施例中,在将上述第三目标视频流与上述第三目标音频流发送给上述第一接收终端之后,上述方法还包括:监控上述第一接收终端所使用的上述网络带宽与上述终端参数;在上述第一接收终端所使用的上述网络带宽或上述终端参数发生变化的情况下,重新确定上述第三目标视频流与上述第三目标音频流。
在一个示例性实施例中,在重新确定上述第三目标视频流与上述第三目标音频流之后,上述方法还包括:向上述第一接收终端发送重新确定的上述第三目标视频流与上述第三目标音频流。
在一个示例性实施例中,上述方法还包括:按照第二接收终端所使用的网络带宽与上述第二接收终端的终端参数,从上述多个第二视频流与上述多个第二音频流中确定出第四目标视频流与第四目标音频流,其中,上述第四目标视频流为在上述多个第二视频流中上述第二接收终端支持播放的分辨率最高的视频流,上述第四目标音频流为在上述多个第二音频流中上述第二接收终端支持播放的音质最好的音频流,上述第二接收终端为与上述第一发送终端、第一接收终端、第二发送终端进行会话的终端;将上述第四目标视频流与上述第四目标音频流发送给上述第二接收终端。
在一个示例性实施例中,在将上述第四目标视频流与上述第四目标音频流发送给上述第二接收终端之后,上述方法还包括:监控上述第二接收终端所使用的上述网络带宽与上述终端参数;在上述第二接收终端所使用的上述网络带宽或上述终端参数发生变化的情况下,重新确定上述第四目标视频流与上述第四目标音频流。
在一个示例性实施例中,在重新确定上述第四目标视频流与上述第四目标音频流之后,上述方法还包括:向上述第二接收终端发送重新确定的上述第四目标视频流与上述第四目标音频流。
在一个示例性实施例中,上述方法还包括:在获取到多个目标视频流与多个目标音频流的情况下,将上述多个目标视频流发送给会话过程中的每个接收终端,将上述多个目标音频流中声音最大的N个目标音频流发送给上述每个接收终端,其中,上述N为正整数,上述N根据上述目标音频流的数量确定,上述目标视频流包括上述第一目标视频流,上述目标音频流包括上述第一目标音频流,上述每个接收终端包括上述第一接收终端。
在本实施例中还提供了一种音视频转发方法,如图13所示,包括:
S1302,在第一发送终端与第一接收终端进行会话的过程中,获取多个第一视频流与多个第一音频流,其中,不同的上述第一视频流的格式不同或者分辨率不同,上述多个第一视频流对应的内容相同,不同的上述第一音频流的格式不同,上述多个第一音频流对应的内容相同;
S1304,将上述多个第一视频流发送到服务器,以使上述服务器从上述多个第一视频流与上述多个第一音频流中确定出第一目标视频流与第一目标音频流,并将上述第一目标视频流与上述第一目标音频流发送给上述第一接收终端,其中,上述第一目标视频流为在上述多个第一视频流中上述第一接收终端支持播放的分辨率最高的视频流,上述第一目标音频流为在上述多个第一音频流中上述第一接收终端支持播放的音质最好的音频流。
通过本申请,由于在转发第一发送终端的视频的过程中,可以获取第一发送终端发送的格式与分辨率不同的多个第一视频流和第一发送终端发送的格式不同的多个第一音频流,以及按照第一接收终端的终端参数和第一接收终端的所用的网络带宽来决定向第一接收终端发送哪一个视频流与哪一个音频流,因此,可以保证第一接收终端播放能够支持的最佳的视频流与音频流,解决了视频会议中无法播放视频的问题,达到提高视频会议稳定性的效果。本实施例的其他示例请参见上述示例,在此不做赘述。
在一个示例性实施例中,在获取上述多个第一视频流与上述多个第一音频流之前,上述方法还包括:从上述服务器获取视频编码格式、视频编码分辨率与音频编码格式;按照上述视频编码格式、视频编码分辨率与音频编码格式对原始视频流与原始音频流进行编码,得到上述多个第一视频流与上述多个第一音频流。
在本实施例中还提供了一种音视频转发方法,如图14所示,包括:
S1402,在第一发送终端与第一接收终端进行会话的过程中,上述第一接收终端接收服务器发送的第一目标视频流与第一目标音频流,其中,上述第一目标视频流为上述服务器从多个第一视频流中确定出的视频流,上述第一目标音频流为从多个第一音频流中确定出的音频流,上述第一目标视频流为在上述多个第一视频流中上述第一接收终端支持播放的分辨率最高的视频流,上述第一目标音频流为在上述多个第一音频流中上述第一接收终端支持播放的音质最好的音频流,不同的上述第一视频流的格式不同或者分辨率不同,上述多个第一视频流对应的内容相同,上述多个第一音频流为第一发送终端发送给上述服务器的音频流,不同的上述第一音频流的格式不同,上述多个第一音频流对应的内容相同;
S1404,上述第一接收终端播放上述第一目标视频流与上述第一目标音频流。
通过本申请,由于在转发第一发送终端的视频的过程中,可以获取第一发送终端发送的格式与分辨率不同的多个第一视频流和第一发送终端发送的格式不同的多个第一音频流,以及按照第一接收终端的终端参数和第一接收终端的所用的网络带宽来决定向第一接收终端发送哪一个视频流与哪一个音频流,因此,可以保证第一接收终端播放能够支持的最佳的视频流与音频流,解决了视频会议中无法播放视频的问题,达到提高视频会议稳定性的效果。本实施例的其他示例请参见上述示例,在此不做赘述。
在一个示例性实施例中,上述方法还包括:上述第一接收终端接收服务器发送的第三目标视频流与第三目标音频流,其中,上述第三目标视频流为上述服务器从多个第二视频流中确定出的视频流,上述第三目标音频流为从多个第二音频流中确定出的音频流,上述第三目标视频流为在上述多个第二视频流中上述第一接收终端支持播放的分辨率最高的视频流,上述第三目标音频流为在上述多个第二音频流中上述第一接收终端支持播放的音质最好的音频流,不同的上述第二视频流的格式不同或者分辨率不同,上述多个第一视频流对应的内容相同,上述多个第一音频流为第一发送终端发送给上述服务器的音频流,不同的上述第一音频流的格式不同,上述多个第一音频流对应的内容相同;上述第一接收终端播放上述第三目标视频流与上述第三目标音频流。
在一个示例性实施例中,上述第一接收终端播放上述第三目标视频流包括:上述第一接收终端停止播放上述第一目标视频流且上述第一接收终端播放上述第三目标视频流。
在一个示例性实施例中,上述第一接收终端播放上述第三目标视频流包括:上述第一接收终端在第一播放区域播放上述第一目标视频流且上述第一接收终端在第二播放区域播放上述第三目标视频流。
在一个示例性实施例中,上述方法还包括:在上述第一接收终端接收到多个目标音频流的情况下,将上述多个目标音频流混合为一个音频流,其中,上述目标音频流包括上述第一目标音频流;上述第一接收终端播放混合得到的上述音频流。
在一个示例性实施例中,上述第一接收终端播放上述第一目标视频流与上述第一目标音频流包括:上述第一接收终端对上述第一目标视频流与上述第一目标音频流执行同步操作;上述第一接收终端播放同步后的上述第一目标视频流与上述第一目标音频流。
在本实施例中还提供了一种音视频转发装置,如图15所示,包括:
第一获取单元1502,被设置为在第一发送终端与第一接收终端进行会话的过程中,获取上述第一发送终端发送的多个第一视频流与多个第一音频流,其中,不同的上述第一视频流的格式不同或者分辨率不同,上述多个第一视频流对应的内容相同,不同的上述第一音频流的格式不同,上述多个第一音频流对应的内容相同;
第一确定单元1504,被设置为按照上述第一接收终端的终端参数与上述第一接收终端所使用的网络带宽,从上述多个第一视频流与上述多个第一音频流中确定出第一目标视频流与第一目标音频流,其中,上述第一目标视频流为在上述多个第一视频流中上述第一接收终端支持播放的分辨率最高的视频流,上述第一目标音频流为在上述多个第一音频流中上述第一接收终端支持播放的音质最好的音频流;
第一发送单元1506,被设置为将上述第一目标视频流与上述第一目标音频流发送给上述第一接收终端。
通过本申请,由于在转发第一发送终端的视频的过程中,可以获取第一发送终端发送的格式与分辨率不同的多个第一视频流和第一发送终端发送的格式不同的多个第一音频流,以及按照第一接收终端的终端参数和第一接收终端的所用的网络带宽来决定向第一接收终端发送哪一个视频流与哪一个音频流,因此,可以保证第一接收终端播放能够支持的最佳的视频 流与音频流,解决了视频会议中无法播放视频的问题,达到提高视频会议稳定性的效果。本实施例的其他示例请参见上述示例,在此不做赘述。
在一个示例性实施例中,上述装置还包括:第一监控单元,被设置为在将上述第一目标视频流与上述第一目标音频流发送给上述第一接收终端之后,监控上述第一接收终端的上述终端参数与上述第一接收终端所使用的上述网络带宽;第二确定单元,被设置为在上述网络带宽或上述终端参数发生变化的情况下,重新确定上述第一目标视频流与上述第一目标音频流。
在一个示例性实施例中,上述装置还包括:第二发送单元,被设置为在重新确定上述第一目标视频流与上述第一目标音频流之后,向上述第一接收终端发送重新确定的上述第一目标视频流与上述第一目标音频流。
在一个示例性实施例中,上述装置还包括:第三确定单元,被设置为按照第二接收终端所使用的网络带宽与上述第二接收终端的终端参数,从上述多个第一视频流与上述多个第一音频流中确定出第二目标视频流与第二目标音频流,其中,上述第二目标视频流为在上述多个第一视频流中上述第二接收终端支持播放的分辨率最高的视频流,上述第二目标音频流为在上述多个第一音频流中上述第二接收终端支持播放的音质最好的音频流,上述第二接收终端为与上述第一发送终端、上述第一接收终端进行会话的终端;第三发送单元,被设置为将上述第二目标视频流与上述第二目标音频流发送给上述第二接收终端。
在一个示例性实施例中,上述装置还包括:第二监控单元,被设置为在将上述第二目标视频流与上述第二目标音频流发送给上述第二接收终端之后,监控上述第二接收终端所使用的上述网络带宽与上述终端参数;第四确定单元,被设置为在上述第二接收终端的所使用上述网络带宽或上述终端参数发生变化的情况下,重新确定上述第二目标视频流与上述第二目标音频流。
在一个示例性实施例中,上述装置还包括:第四发送单元,被设置为在重新确定上述第二目标视频流与上述第二目标音频流之后,向上述第二接收终端发送重新确定的上述第二目标视频流与上述第二目标音频流。
在一个示例性实施例中,上述装置还包括:第二获取单元,被设置为获取第二发送终端发送的多个第二视频流与多个第二音频流,其中,不同的上述第二视频流的格式不同或者分辨率不同,上述多个第二视频流对应的内容相同,不同的上述第二音频流的格式不同,上述多个第二音频流对应的内容相同,上述第二发送终端为与上述第一发送终端、第一接收终端进行会话的终端;第五确定单元,被设置为按照上述第一接收终端的终端参数与上述第一接收终端所使用的网络带宽,从上述多个第二视频流与上述多个第二音频流中确定出第三目标视频流与第三目标音频流,其中,上述第三目标视频流为在上述多个第二视频流中上述第一接收终端支持播放的分辨率最高的视频流,上述第三目标音频流为在上述多个第二音频流中上述第一接收终端支持播放的音质最好的音频流;第五发送单元,被设置为将上述第三目标视频流与上述第三目标音频流发送给上述第一接收终端。
在一个示例性实施例中,上述装置还包括:第三监控单元,被设置为在将上述第三目标视频流与上述第三目标音频流发送给上述第一接收终端之后,监控上述第一接收终端所使用的上述网络带宽与上述终端参数;第六确定单元,被设置为在上述第一接收终端所使用的上述网络带宽或上述终端参数发生变化的情况下,重新确定上述第三目标视频流与上述第三目标音频流。
在一个示例性实施例中,上述装置还包括:第六发送单元,被设置为在重新确定上述第三目标视频流与上述第三目标音频流之后,向上述第一接收终端发送重新确定的上述第三目标视频流与上述第三目标音频流。
在一个示例性实施例中,上述装置还包括:第七确定单元,被设置为按照第二接收终端所使用的网络带宽与上述第二接收终端的终端参数,从上述多个第二视频流与上述多个第二音频流中确定出第四目标视频流与第四目标音频流,其中,上述第四目标视频流为在上述多个第二视频流中上述第二接收终端支持播放的分辨率最高的视频流,上述第四目标音频流为在上述多个第二音频流中上述第二接收终端支持播放的音质最好的音频流,上述第二接收终端为与上述第一发送终端、第一接收终端、第二发送终端进行会话的终端;第七发送单元,被设置为将上述第四目标视频流与上述第四目标音频流发送给上述第二接收终端。
在一个示例性实施例中,上述装置还包括:第四监控单元,被设置为在将上述第四目标视频流与上述第四目标音频流发送给上述第二接收终端之后,监控上述第二接收终端所使用的上述网络带宽与上述终端参数;第八确定单元,被设置为在上述第二接收终端所使用的上述网络带宽或上述终端参数发生变化的情况下,重新确定上述第四目标视频流与上述第四目标音频流。
在一个示例性实施例中,上述装置还包括:第八发送单元,被设置为在重新确定上述第四目标视频流与上述第四目标音频流之后,向上述第二接收终端发送重新确定的上述第四目标视频流与上述第四目标音频流。
在一个示例性实施例中,上述装置还包括:第九发送单元,被设置为在获取到多个目标视频流与多个目标音频流的情况下,将上述多个目标视频流发送给会话过程中的每个接收终端,将上述多个目标音频流中声音最大的N个目标音频流发送给上述每个接收终端,其中,上述N为正整数,上述N根据上述目标音频流的数量确定,上述目标视频流包括上述第一目标视频流,上述目标音频流包括上述第一目标音频流,上述每个接收终端包括上述第一接收终端。
在本实施例中还提供了一种音视频转发装置,如图16所示,包括:
获取单元1602,被设置为在第一发送终端与第一接收终端进行会话的过程中,获取多个第一视频流与多个第一音频流,其中,不同的上述第一视频流的格式不同或者分辨率不同,上述多个第一视频流对应的内容相同,不同的上述第一音频流的格式不同,上述多个第一音频流对应的内容相同;
发送单元1604,被设置为将上述多个第一视频流发送到服务器,以使上述服务器从上述多个第一视频流与上述多个第一音频流中确定出第一目标视频流与第一目标音频流,并将上述第一目标视频流与上述第一目标音频流发送给上述第一接收终端,其中,上述第一目标视频流为在上述多个第一视频流中上述第一接收终端支持播放的分辨率最高的视频流,上述第一目标音频流为在上述多个第一音频流中上述第一接收终端支持播放的音质最好的音频流。
通过本申请,由于在转发第一发送终端的视频的过程中,可以获取第一发送终端发送的格式与分辨率不同的多个第一视频流和第一发送终端发送的格式不同的多个第一音频流,以及按照第一接收终端的终端参数和第一接收终端的所用的网络带宽来决定向第一接收终端发送哪一个视频流与哪一个音频流,因此,可以保证第一接收终端播放能够支持的最佳的视频流与音频流,解决了视频会议中无法播放视频的问题,达到提高视频会议稳定性的效果。本实施例的其他示例请参见上述示例,在此不做赘述。
在一个示例性实施例中,上述装置还包括:接收单元,被设置为在获取上述多个第一视频流与上述多个第一音频流之前,从上述服务器获取视频编码格式、视频编码分辨率与音频编码格式;编码单元,被设置为按照上述视频编码格式、视频编码分辨率与音频编码格式对原始视频流与原始音频流进行编码,得到上述多个第一视频流与上述多个第一音频流。
在本实施例中还提供了一种音视频转发装置,如图17所示,包括:
第一接收单元1702,被设置为在第一发送终端与第一接收终端进行会话的过程中,接收服务器发送的第一目标视频流与第一目标音频流,其中,上述第一目标视频流为上述服务器从多个第一视频流中确定出的视频流,上述第一目标音频流为从多个第一音频流中确定出的音频流,上述第一目标视频流为在上述多个第一视频流中上述第一接收终端支持播放的分辨率最高的视频流,上述第一目标音频流为在上述多个第一音频流中上述第一接收终端支持播放的音质最好的音频流,不同的上述第一视频流的格式不同或者分辨率不同,上述多个第一视频流对应的内容相同,上述多个第一音频流为第一发送终端发送给上述服务器的音频流,不同的上述第一音频流的格式不同,上述多个第一音频流对应的内容相同;
第一播放单元1704,被设置为上述第一接收终端播放上述第一目标视频流与上述第一目标音频流。
通过本申请,由于在转发第一发送终端的视频的过程中,可以获取第一发送终端发送的格式与分辨率不同的多个第一视频流和第一发送终端发送的格式不同的多个第一音频流,以及按照第一接收终端的终端参数和第一接收终端的所用的网络带宽来决定向第一接收终端发送哪一个视频流与哪一个音频流,因此,可以保证第一接收终端播放能够支持的最佳的视频流与音频流,解决了视频会议中无法播放视频的问题,达到提高视频会议稳定性的效果。本实施例的其他示例请参见上述示例,在此不做赘述。
在一个示例性实施例中,上述装置还包括:第二接收单元,被设置为接收服务器发送的第三目标视频流与第三目标音频流,其中,上述第三目标视频流为上述服务器从多个第二视频流中确定出的视频流,上述第三目标音频流为从多个第二音频流中确定出的音频流,上述第三目标视频流为在上述多个第二视频流中上述第一接收终端支持播放的分辨率最高的视频流,上述第三目标音频流为在上述多个第二音频流中上述第一接收终端支持播放的音质最好的音频流,不同的上述第二视频流的格式不同或者分辨率不同,上述多个第一视频流对应的内容相同,上述多个第一音频流为第一发送终端发送给上述服务器的音频流,不同的上述第一音频流的格式不同,上述多个第一音频流对应的内容相同;第二播放单元,被设置为上述第一接收终端播放上述第三目标视频流与上述第三目标音频流。
在一个示例性实施例中,上述第二播放单元包括:第一播放模块,被设置为停止播放上述第一目标视频流且上述第一接收终端播放上述第三目标视频流。
在一个示例性实施例中,上述第二播放单元包括:第二播放模块,被设置为在第一播放区域播放上述第一目标视频流且上述第一接收终端在第二播放区域播放上述第三目标视频流。
在一个示例性实施例中,上述装置还包括:混合单元,被设置为在上述第一接收终端接收到多个目标音频流的情况下,将上述多个目标音频流混合为一个音频流,其中,上述目标音频流包括上述第一目标音频流;第三播放单元,被设置为播放混合得到的上述音频流。
在一个示例性实施例中,上述第一播放单元包括:同步模块,被设置为对上述第一目标视频流与上述第一目标音频流执行同步操作;第三播放模块,被设置为播放同步后的上述第一目标视频流与上述第一目标音频流。
在本实施例中还提供了一种音视频转发系统,如图18所示,包括:
第一获取单元1802,被设置为在第一发送终端与第一接收终端进行会话的过程中,获取上述第一发送终端发送的多个第一视频流与多个第一音频流,其中,不同的上述第一视频流的格式不同或者分辨率不同,上述多个第一视频流对应的内容相同,不同的上述第一音频流的格式不同,上述多个第一音频流对应的内容相同;
第一确定单元1804,被设置为按照上述第一接收终端的终端参数与上述第一接收终端所使用的网络带宽,从上述多个第一视频流与上述多个第一音频流中确定出第一目标视频流与 第一目标音频流,其中,上述第一目标视频流为在上述多个第一视频流中上述第一接收终端支持播放的分辨率最高的视频流,上述第一目标音频流为在上述多个第一音频流中上述第一接收终端支持播放的音质最好的音频流;
第一发送单元1806,被设置为将上述第一目标视频流与上述第一目标音频流发送给上述第一接收终端。
通过本申请,由于在转发第一发送终端的视频的过程中,可以获取第一发送终端发送的格式与分辨率不同的多个第一视频流和第一发送终端发送的格式不同的多个第一音频流,以及按照第一接收终端的终端参数和第一接收终端的所用的网络带宽来决定向第一接收终端发送哪一个视频流与哪一个音频流,因此,可以保证第一接收终端播放能够支持的最佳的视频流与音频流,解决了视频会议中无法播放视频的问题,达到提高视频会议稳定性的效果。本实施例的其他示例请参见上述示例,在此不做赘述。
在本实施例中还提供了一种音视频转发终端,如图19所示,包括:
获取单元1902,被设置为在第一发送终端与第一接收终端进行会话的过程中,获取多个第一视频流与多个第一音频流,其中,不同的上述第一视频流的格式不同或者分辨率不同,上述多个第一视频流对应的内容相同,不同的上述第一音频流的格式不同,上述多个第一音频流对应的内容相同;
发送单元1904,被设置为将上述多个第一视频流发送到服务器,以使上述服务器从上述多个第一视频流与上述多个第一音频流中确定出第一目标视频流与第一目标音频流,并将上述第一目标视频流与上述第一目标音频流发送给上述第一接收终端,其中,上述第一目标视频流为在上述多个第一视频流中上述第一接收终端支持播放的分辨率最高的视频流,上述第一目标音频流为在上述多个第一音频流中上述第一接收终端支持播放的音质最好的音频流。
通过本申请,由于在转发第一发送终端的视频的过程中,可以获取第一发送终端发送的格式与分辨率不同的多个第一视频流和第一发送终端发送的格式不同的多个第一音频流,以及按照第一接收终端的终端参数和第一接收终端的所用的网络带宽来决定向第一接收终端发送哪一个视频流与哪一个音频流,因此,可以保证第一接收终端播放能够支持的最佳的视频流与音频流,解决了视频会议中无法播放视频的问题,达到提高视频会议稳定性的效果。本实施例的其他示例请参见上述示例,在此不做赘述。
在本实施例中还提供了一种音视频转发终端,如图20所示,包括:
第一接收单元2002,被设置为在第一发送终端与第一接收终端进行会话的过程中,接收服务器发送的第一目标视频流与第一目标音频流,其中,上述第一目标视频流为上述服务器从多个第一视频流中确定出的视频流,上述第一目标音频流为从多个第一音频流中确定出的音频流,上述第一目标视频流为在上述多个第一视频流中上述第一接收终端支持播放的分辨率最高的视频流,上述第一目标音频流为在上述多个第一音频流中上述第一接收终端支持播放的音质最好的音频流,不同的上述第一视频流的格式不同或者分辨率不同,上述多个第一视频流对应的内容相同,上述多个第一音频流为第一发送终端发送给上述服务器的音频流,不同的上述第一音频流的格式不同,上述多个第一音频流对应的内容相同;
第一播放单元2004,被设置为上述第一接收终端播放上述第一目标视频流与上述第一目标音频流。
通过本申请,由于在转发第一发送终端的视频的过程中,可以获取第一发送终端发送的格式与分辨率不同的多个第一视频流和第一发送终端发送的格式不同的多个第一音频流,以及按照第一接收终端的终端参数和第一接收终端的所用的网络带宽来决定向第一接收终端发送哪一个视频流与哪一个音频流,因此,可以保证第一接收终端播放能够支持的最佳的视频流与音频流,解决了视频会议中无法播放视频的问题,达到提高视频会议稳定性的效果。本 实施例的其他示例请参见上述示例,在此不做赘述。
本申请的实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。
在一个示例性实施例中,上述计算机可读存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储计算机程序的介质。
本申请的实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。
在一个示例性实施例中,上述电子装置还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。
本实施例中的具体示例可以参考上述实施例及示例性实施方式中所描述的示例,本实施例在此不再赘述。
通过本申请的实施例,由于在转发第一发送终端的视频的过程中,可以获取第一发送终端发送的格式与分辨率不同的多个第一视频流和第一发送终端发送的格式不同的多个第一音频流,以及按照第一接收终端的终端参数和第一接收终端的所用的网络带宽来决定向第一接收终端发送哪一个视频流与哪一个音频流,因此,可以保证第一接收终端播放能够支持的较佳的视频流与音频流,解决了视频会议中无法播放视频的问题,达到提高视频会议稳定性的效果。
显然,本领域的技术人员应该明白,上述的本申请实施例的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请不限制于任何特定的硬件和软件结合。
以上所述仅为本申请的若干实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (29)

  1. 一种音视频转发方法,包括:
    在第一发送终端与第一接收终端进行会话的过程中,获取所述第一发送终端发送的多个第一视频流与多个第一音频流,其中,不同的所述第一视频流的格式不同或者分辨率不同,所述多个第一视频流对应的内容相同,不同的所述第一音频流的格式不同,所述多个第一音频流对应的内容相同;
    按照所述第一接收终端的终端参数与所述第一接收终端所使用的网络带宽,从所述多个第一视频流与所述多个第一音频流中确定出第一目标视频流与第一目标音频流,其中,所述第一目标视频流为在所述多个第一视频流中所述第一接收终端支持播放的分辨率最高的视频流,所述第一目标音频流为在所述多个第一音频流中所述第一接收终端支持播放的音质最好的音频流;
    将所述第一目标视频流与所述第一目标音频流发送给所述第一接收终端。
  2. 根据权利要求1所述的方法,其中,在将所述第一目标视频流与所述第一目标音频流发送给所述第一接收终端之后,所述方法还包括:
    监控所述第一接收终端的所述终端参数与所述第一接收终端所使用的所述网络带宽;
    在所述网络带宽或所述终端参数发生变化的情况下,重新确定所述第一目标视频流与所述第一目标音频流。
  3. 根据权利要求2所述的方法,其中,在重新确定所述第一目标视频流与所述第一目标音频流之后,所述方法还包括:
    向所述第一接收终端发送重新确定的所述第一目标视频流与所述第一目标音频流。
  4. 根据权利要求1所述的方法,还包括:
    按照第二接收终端所使用的网络带宽与所述第二接收终端的终端参数,从所述多个第一视频流与所述多个第一音频流中确定出第二目标视频流与第二目标音频流,其中,所述第二目标视频流为在所述多个第一视频流中所述第二接收终端支持播放的分辨率最高的视频流,所述第二目标音频流为在所述多个第一音频流中所述第二接收终端支持播放的音质最好的音频流,所述第二接收终端为与所述第一发送终端、所述第一接收终端进行会话的终端;
    将所述第二目标视频流与所述第二目标音频流发送给所述第二接收终端。
  5. 根据权利要求4所述的方法,其中,在将所述第二目标视频流与所述第二目标音频流发送给所述第二接收终端之后,所述方法还包括:
    监控所述第二接收终端所使用的所述网络带宽与所述终端参数;
    在所述第二接收终端的所使用所述网络带宽或所述终端参数发生变化的情况下,重新确定所述第二目标视频流与所述第二目标音频流。
  6. 根据权利要求5所述的方法,其中,在重新确定所述第二目标视频流与所述第二目标音频流之后,所述方法还包括:
    向所述第二接收终端发送重新确定的所述第二目标视频流与所述第二目标音频流。
  7. 根据权利要求1所述的方法,还包括:
    获取第二发送终端发送的多个第二视频流与多个第二音频流,其中,不同的所述第二视频流的格式不同或者分辨率不同,所述多个第二视频流对应的内容相同,不同的所述第二音 频流的格式不同,所述多个第二音频流对应的内容相同,所述第二发送终端为与所述第一发送终端、第一接收终端进行会话的终端;
    按照所述第一接收终端的终端参数与所述第一接收终端所使用的网络带宽,从所述多个第二视频流与所述多个第二音频流中确定出第三目标视频流与第三目标音频流,其中,所述第三目标视频流为在所述多个第二视频流中所述第一接收终端支持播放的分辨率最高的视频流,所述第三目标音频流为在所述多个第二音频流中所述第一接收终端支持播放的音质最好的音频流;
    将所述第三目标视频流与所述第三目标音频流发送给所述第一接收终端。
  8. 根据权利要求7所述的方法,其中,在将所述第三目标视频流与所述第三目标音频流发送给所述第一接收终端之后,所述方法还包括:
    监控所述第一接收终端所使用的所述网络带宽与所述终端参数;
    在所述第一接收终端所使用的所述网络带宽或所述终端参数发生变化的情况下,重新确定所述第三目标视频流与所述第三目标音频流。
  9. 根据权利要求8所述的方法,其中,在重新确定所述第三目标视频流与所述第三目标音频流之后,所述方法还包括:
    向所述第一接收终端发送重新确定的所述第三目标视频流与所述第三目标音频流。
  10. 根据权利要求7所述的方法,还包括:
    按照第二接收终端所使用的网络带宽与所述第二接收终端的终端参数,从所述多个第二视频流与所述多个第二音频流中确定出第四目标视频流与第四目标音频流,其中,所述第四目标视频流为在所述多个第二视频流中所述第二接收终端支持播放的分辨率最高的视频流,所述第四目标音频流为在所述多个第二音频流中所述第二接收终端支持播放的音质最好的音频流,所述第二接收终端为与所述第一发送终端、第一接收终端、第二发送终端进行会话的终端;
    将所述第四目标视频流与所述第四目标音频流发送给所述第二接收终端。
  11. 根据权利要求10所述的方法,其中,在将所述第四目标视频流与所述第四目标音频流发送给所述第二接收终端之后,所述方法还包括:
    监控所述第二接收终端所使用的所述网络带宽与所述终端参数;
    在所述第二接收终端所使用的所述网络带宽或所述终端参数发生变化的情况下,重新确定所述第四目标视频流与所述第四目标音频流。
  12. 根据权利要求11所述的方法,其中,在重新确定所述第四目标视频流与所述第四目标音频流之后,所述方法还包括:
    向所述第二接收终端发送重新确定的所述第四目标视频流与所述第四目标音频流。
  13. 根据权利要求1至12任意一项所述的方法,还包括:
    在获取到多个目标视频流与多个目标音频流的情况下,将所述多个目标视频流发送给会话过程中的每个接收终端,将所述多个目标音频流中声音最大的N个目标音频流发送给所述每个接收终端,其中,所述N为正整数,所述N根据所述目标音频流的数量确定,所述目标视频流包括所述第一目标视频流,所述目标音频流包括所述第一目标音频流,所述每个接收终端包括所述第一接收终端。
  14. 一种音视频转发方法,包括:
    在第一发送终端与第一接收终端进行会话的过程中,获取多个第一视频流与多个第一音频流,其中,不同的所述第一视频流的格式不同或者分辨率不同,所述多个第一视频流对应的内容相同,不同的所述第一音频流的格式不同,所述多个第一音频流对应的内容相同;
    将所述多个第一视频流发送到服务器,以使所述服务器从所述多个第一视频流与所述多个第一音频流中确定出第一目标视频流与第一目标音频流,并将所述第一目标视频流与所述第一目标音频流发送给所述第一接收终端,其中,所述第一目标视频流为在所述多个第一视频流中所述第一接收终端支持播放的分辨率最高的视频流,所述第一目标音频流为在所述多个第一音频流中所述第一接收终端支持播放的音质最好的音频流。
  15. 根据权利要求14所述的方法,其中,在获取所述多个第一视频流与所述多个第一音频流之前,所述方法还包括:
    从所述服务器获取视频编码格式、视频编码分辨率与音频编码格式;
    按照所述视频编码格式、视频编码分辨率与音频编码格式对原始视频流与原始音频流进行编码,得到所述多个第一视频流与所述多个第一音频流。
  16. 一种音视频转发方法,包括:
    在第一发送终端与第一接收终端进行会话的过程中,所述第一接收终端接收服务器发送的第一目标视频流与第一目标音频流,其中,所述第一目标视频流为所述服务器从多个第一视频流中确定出的视频流,所述第一目标音频流为从多个第一音频流中确定出的音频流,所述第一目标视频流为在所述多个第一视频流中所述第一接收终端支持播放的分辨率最高的视频流,所述第一目标音频流为在所述多个第一音频流中所述第一接收终端支持播放的音质最好的音频流,不同的所述第一视频流的格式不同或者分辨率不同,所述多个第一视频流对应的内容相同,所述多个第一音频流为第一发送终端发送给所述服务器的音频流,不同的所述第一音频流的格式不同,所述多个第一音频流对应的内容相同;
    所述第一接收终端播放所述第一目标视频流与所述第一目标音频流。
  17. 根据权利要求16所述的方法,还包括:
    所述第一接收终端接收服务器发送的第三目标视频流与第三目标音频流,其中,所述第三目标视频流为所述服务器从多个第二视频流中确定出的视频流,所述第三目标音频流为从多个第二音频流中确定出的音频流,所述第三目标视频流为在所述多个第二视频流中所述第一接收终端支持播放的分辨率最高的视频流,所述第三目标音频流为在所述多个第二音频流中所述第一接收终端支持播放的音质最好的音频流,不同的所述第二视频流的格式不同或者分辨率不同,所述多个第一视频流对应的内容相同,所述多个第一音频流为第一发送终端发送给所述服务器的音频流,不同的所述第一音频流的格式不同,所述多个第一音频流对应的内容相同;
    所述第一接收终端播放所述第三目标视频流与所述第三目标音频流。
  18. 根据权利要求17所述的方法,其中,所述第一接收终端播放所述第三目标视频流包括:
    所述第一接收终端停止播放所述第一目标视频流且所述第一接收终端播放所述第三目标视频流。
  19. 根据权利要求17所述的方法,其中,所述第一接收终端播放所述第三目标视频流包括:
    所述第一接收终端在第一播放区域播放所述第一目标视频流且所述第一接收终端在第二播放区域播放所述第三目标视频流。
  20. 根据权利要求16所述的方法,还包括:
    在所述第一接收终端接收到多个目标音频流的情况下,将所述多个目标音频流混合为一个音频流,其中,所述目标音频流包括所述第一目标音频流;
    所述第一接收终端播放混合得到的所述音频流。
  21. 根据权利要求16所述的方法,其中,所述第一接收终端播放所述第一目标视频流与所述第一目标音频流包括:
    所述第一接收终端对所述第一目标视频流与所述第一目标音频流执行同步操作;
    所述第一接收终端播放同步后的所述第一目标视频流与所述第一目标音频流。
  22. 一种音视频转发装置,包括:
    第一获取单元,被设置为在第一发送终端与第一接收终端进行会话的过程中,获取所述第一发送终端发送的多个第一视频流与多个第一音频流,其中,不同的所述第一视频流的格式不同或者分辨率不同,所述多个第一视频流对应的内容相同,不同的所述第一音频流的格式不同,所述多个第一音频流对应的内容相同;
    第一确定单元,被设置为按照所述第一接收终端的终端参数与所述第一接收终端所使用的网络带宽,从所述多个第一视频流与所述多个第一音频流中确定出第一目标视频流与第一目标音频流,其中,所述第一目标视频流为在所述多个第一视频流中所述第一接收终端支持播放的分辨率最高的视频流,所述第一目标音频流为在所述多个第一音频流中所述第一接收终端支持播放的音质最好的音频流;
    第一发送单元,被设置为将所述第一目标视频流与所述第一目标音频流发送给所述第一接收终端。
  23. 一种音视频转发装置,包括:
    获取单元,被设置为在第一发送终端与第一接收终端进行会话的过程中,获取多个第一视频流与多个第一音频流,其中,不同的所述第一视频流的格式不同或者分辨率不同,所述多个第一视频流对应的内容相同,不同的所述第一音频流的格式不同,所述多个第一音频流对应的内容相同;
    发送单元,被设置为将所述多个第一视频流发送到服务器,以使所述服务器从所述多个第一视频流与所述多个第一音频流中确定出第一目标视频流与第一目标音频流,并将所述第一目标视频流与所述第一目标音频流发送给所述第一接收终端,其中,所述第一目标视频流为在所述多个第一视频流中所述第一接收终端支持播放的分辨率最高的视频流,所述第一目标音频流为在所述多个第一音频流中所述第一接收终端支持播放的音质最好的音频流。
  24. 一种音视频转发装置,包括:
    第一接收单元,被设置为在第一发送终端与第一接收终端进行会话的过程中,接收服务器发送的第一目标视频流与第一目标音频流,其中,所述第一目标视频流为所述服务器从多个第一视频流中确定出的视频流,所述第一目标音频流为从多个第一音频流中确定出的音频流,所述第一目标视频流为在所述多个第一视频流中所述第一接收终端支持播放的分辨率最高的视频流,所述第一目标音频流为在所述多个第一音频流中所述第一接收终端支持播放的音质最好的音频流,不同的所述第一视频流的格式不同或者分辨率不同,所述多个第一视频流对应的内容相同,所述多个第一音频流为第一发送终端发送给所述服务器的音频流,不同的所述第一音频流的格式不同,所述多个第一音频流对应的内容相同;
    第一播放单元,被设置为所述第一接收终端播放所述第一目标视频流与所述第一目标音 频流。
  25. 一种音视频转发系统,包括:
    第一获取单元,被设置为在第一发送终端与第一接收终端进行会话的过程中,获取所述第一发送终端发送的多个第一视频流与多个第一音频流,其中,不同的所述第一视频流的格式不同或者分辨率不同,所述多个第一视频流对应的内容相同,不同的所述第一音频流的格式不同,所述多个第一音频流对应的内容相同;
    第一确定单元,被设置为按照所述第一接收终端的终端参数与所述第一接收终端所使用的网络带宽,从所述多个第一视频流与所述多个第一音频流中确定出第一目标视频流与第一目标音频流,其中,所述第一目标视频流为在所述多个第一视频流中所述第一接收终端支持播放的分辨率最高的视频流,所述第一目标音频流为在所述多个第一音频流中所述第一接收终端支持播放的音质最好的音频流;
    第一发送单元,被设置为将所述第一目标视频流与所述第一目标音频流发送给所述第一接收终端。
  26. 一种音视频转发终端,包括:
    获取单元,被设置为在第一发送终端与第一接收终端进行会话的过程中,获取多个第一视频流与多个第一音频流,其中,不同的所述第一视频流的格式不同或者分辨率不同,所述多个第一视频流对应的内容相同,不同的所述第一音频流的格式不同,所述多个第一音频流对应的内容相同;
    发送单元,被设置为将所述多个第一视频流发送到服务器,以使所述服务器从所述多个第一视频流与所述多个第一音频流中确定出第一目标视频流与第一目标音频流,并将所述第一目标视频流与所述第一目标音频流发送给所述第一接收终端,其中,所述第一目标视频流为在所述多个第一视频流中所述第一接收终端支持播放的分辨率最高的视频流,所述第一目标音频流为在所述多个第一音频流中所述第一接收终端支持播放的音质最好的音频流。
  27. 一种音视频转发终端,包括:
    第一接收单元,被设置为在第一发送终端与第一接收终端进行会话的过程中,接收服务器发送的第一目标视频流与第一目标音频流,其中,所述第一目标视频流为所述服务器从多个第一视频流中确定出的视频流,所述第一目标音频流为从多个第一音频流中确定出的音频流,所述第一目标视频流为在所述多个第一视频流中所述第一接收终端支持播放的分辨率最高的视频流,所述第一目标音频流为在所述多个第一音频流中所述第一接收终端支持播放的音质最好的音频流,不同的所述第一视频流的格式不同或者分辨率不同,所述多个第一视频流对应的内容相同,所述多个第一音频流为第一发送终端发送给所述服务器的音频流,不同的所述第一音频流的格式不同,所述多个第一音频流对应的内容相同;
    第一播放单元,被设置为所述第一接收终端播放所述第一目标视频流与所述第一目标音频流。
  28. 一种计算机可读存储介质,其中,所述计算机可读存储介质中存储有计算机程序,其中,所述计算机程序被处理器执行时实现所述权利要求1至13或14至15或16至21任一项中所述的方法的步骤。
  29. 一种电子装置,包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现所述权利要求1至13或14至15或16至21任一项中所述的方法的步骤。
PCT/CN2021/129041 2020-11-11 2021-11-05 音视频转发方法、装置、终端与系统 WO2022100528A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP21891053.7A EP4243407A4 (en) 2020-11-11 2021-11-05 AUDIO/VIDEO TRANSFER METHOD AND APPARATUS, TERMINALS, AND SYSTEM

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011257786.7 2020-11-11
CN202011257786.7A CN114500914A (zh) 2020-11-11 2020-11-11 音视频转发方法、装置、终端与系统

Publications (1)

Publication Number Publication Date
WO2022100528A1 true WO2022100528A1 (zh) 2022-05-19

Family

ID=81490717

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/129041 WO2022100528A1 (zh) 2020-11-11 2021-11-05 音视频转发方法、装置、终端与系统

Country Status (3)

Country Link
EP (1) EP4243407A4 (zh)
CN (1) CN114500914A (zh)
WO (1) WO2022100528A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115883501A (zh) * 2022-12-08 2023-03-31 武汉斗鱼鱼乐网络科技有限公司 一种多人即时通信方法、系统、介质及设备

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117692598A (zh) * 2024-02-04 2024-03-12 浙江华创视讯科技有限公司 视频流的发送方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102246458A (zh) * 2008-12-15 2011-11-16 微软公司 使用多个比特率流的视频会议订阅
CN108886638A (zh) * 2016-03-28 2018-11-23 索尼公司 再现装置和再现方法、以及文件生成装置和文件生成方法
US20180352089A1 (en) * 2017-06-02 2018-12-06 Apple Inc. High-Quality Audio/Visual Conferencing
CN111385515A (zh) * 2018-12-27 2020-07-07 北京紫荆视通科技有限公司 视频会议数据的传输方法和视频会议数据的传输系统

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8144187B2 (en) * 2008-03-14 2012-03-27 Microsoft Corporation Multiple video stream capability negotiation
US8947492B2 (en) * 2010-06-18 2015-02-03 Microsoft Corporation Combining multiple bit rate and scalable video coding
US10757365B2 (en) * 2013-06-26 2020-08-25 Touchcast LLC System and method for providing and interacting with coordinated presentations
WO2015153593A1 (en) * 2014-03-31 2015-10-08 Polycom, Inc. System and method for a hybrid topology media conferencing system
WO2020081596A1 (en) * 2018-10-17 2020-04-23 Mythical Labs, Inc. System and method for web enabled application execution and management

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102246458A (zh) * 2008-12-15 2011-11-16 微软公司 使用多个比特率流的视频会议订阅
CN108886638A (zh) * 2016-03-28 2018-11-23 索尼公司 再现装置和再现方法、以及文件生成装置和文件生成方法
US20180352089A1 (en) * 2017-06-02 2018-12-06 Apple Inc. High-Quality Audio/Visual Conferencing
CN111385515A (zh) * 2018-12-27 2020-07-07 北京紫荆视通科技有限公司 视频会议数据的传输方法和视频会议数据的传输系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4243407A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115883501A (zh) * 2022-12-08 2023-03-31 武汉斗鱼鱼乐网络科技有限公司 一种多人即时通信方法、系统、介质及设备

Also Published As

Publication number Publication date
EP4243407A4 (en) 2024-04-24
CN114500914A (zh) 2022-05-13
EP4243407A1 (en) 2023-09-13

Similar Documents

Publication Publication Date Title
US11503250B2 (en) Method and system for conducting video conferences of diverse participating devices
AU2022209216B2 (en) Methods and apparatus for use of compact concurrent codecs in multimedia communications
EP2863632B1 (en) System and method for real-time adaptation of a conferencing system to current conditions of a conference session
JP5320406B2 (ja) オーディオ処理の方法、システム、及び制御サーバ
US9596433B2 (en) System and method for a hybrid topology media conferencing system
WO2022100528A1 (zh) 音视频转发方法、装置、终端与系统
US20080005246A1 (en) Multipoint processing unit
US8385234B2 (en) Media stream setup in a group communication system
US10063609B2 (en) Methods and apparatus for multimedia conferences using single source multi-unicast
US11711550B2 (en) Method and apparatus for supporting teleconferencing and telepresence containing multiple 360 degree videos
US20170006078A1 (en) Methods and apparatus for codec negotiation in decentralized multimedia conferences
KR20050038646A (ko) 멀티미디어 데이터를 스트리밍하는 방법
CN112839197B (zh) 图像码流处理方法、装置、系统及存储介质
CN115734028A (zh) 一种基于级联编码的媒体流推送方法及系统
CN116156099A (zh) 一种网络传输方法、装置和系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21891053

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021891053

Country of ref document: EP

Effective date: 20230609