CN112135155A - Audio and video connecting and converging method and device, electronic equipment and storage medium - Google Patents

Audio and video connecting and converging method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112135155A
CN112135155A CN202010953283.7A CN202010953283A CN112135155A CN 112135155 A CN112135155 A CN 112135155A CN 202010953283 A CN202010953283 A CN 202010953283A CN 112135155 A CN112135155 A CN 112135155A
Authority
CN
China
Prior art keywords
video data
data stream
audio
merged
audio data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010953283.7A
Other languages
Chinese (zh)
Other versions
CN112135155B (en
Inventor
葛余浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Qiniu Information Technology Co ltd
Original Assignee
Shanghai Qiniu Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Qiniu Information Technology Co ltd filed Critical Shanghai Qiniu Information Technology Co ltd
Priority to CN202010953283.7A priority Critical patent/CN112135155B/en
Publication of CN112135155A publication Critical patent/CN112135155A/en
Application granted granted Critical
Publication of CN112135155B publication Critical patent/CN112135155B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • H04N21/2365Multiplexing of several video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4347Demultiplexing of several video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/643Communication protocols

Abstract

The invention relates to the technical field of video synthesis, and provides a method, a device, electronic equipment and a storage medium for audio and video live-line converging, wherein the method comprises the following steps: extracting a plurality of strands of audio and video data received by the interactive server; separating the obtained audio and video data to obtain a plurality of audio data streams and a plurality of video data streams; converging the multiple video data streams in the interactive server to obtain a converged video data stream, sending the converged video data stream to a streaming media server, and simultaneously directly sending the multiple audio data streams to the streaming media server; the streaming media server sends the multiple audio data streams and the merged video data stream to a client, and the merging parameters of the multiple audio data streams are modified at the client to obtain a merged audio data stream; and outputting the merged video data stream and the merged audio data stream at the client. The invention can realize the customized service of the user and break the limitation of the user to acquire data.

Description

Audio and video connecting and converging method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of video synthesis, in particular to a method and a device for audio and video live-line and live-line converging, electronic equipment and a storage medium.
Background
At present, in a live broadcast scenario with multiple people connected to the internet, a common method is to merge audio and video streams of a main broadcast at an interactive server to obtain an audio track and a video track after the merging, then forward the audio track and the video track to a streaming media server in a team form through an rtmp protocol (real-time transport protocol), and finally forward the audio track and the video track to common live broadcast audiences.
Under the framework, audiences can watch the audio and video without carrying out more processing, and the fluency and the experience degree are greatly improved. However, in some cases, some general live viewers do not want to see or hear the pictures, sounds of a certain anchor, certain anchors, or only want to see a certain anchor and not other anchors, due to personal emotions or other factors. Obviously, under such a situation, the current "one-cut" type of merging architecture cannot meet the needs of these audiences. Therefore, in the prior art, the problem that a client cannot stop the tone or close the picture for one or more anchor players in a customized manner exists.
Disclosure of Invention
The embodiment of the invention provides an audio and video online meeting method which can provide customized service for users when multiple anchor online meetings participate in activities in live broadcast services based on a service end meeting framework.
In a first aspect, an embodiment of the present invention provides an audio and video joint wheat merging method, where the method includes the following steps:
after detecting that the interactive server enters a microphone connecting mode, extracting multi-strand audio and video data received by the interactive server;
separating the obtained audio and video data to obtain a plurality of audio data streams and a plurality of video data streams;
converging the multiple video data streams in the interactive server to obtain a converged video data stream, sending the converged video data stream to a streaming media server, and simultaneously directly sending the multiple audio data streams to the streaming media server;
the streaming media server sends the multiple audio data streams and the merged video data stream to a client, and the merging parameters of the multiple audio data streams are modified at the client to obtain a merged audio data stream;
and outputting the merged video data stream and the merged audio data stream at the client.
In a second aspect, an embodiment of the present invention further provides an audio and video combined stream device, including:
the acquisition module is used for extracting the multi-strand audio and video data received by the interactive server after detecting that the interactive server enters a microphone connecting mode;
the separation module is used for separating the acquired audio and video data to obtain a plurality of audio data streams and a plurality of video data streams;
the first sending module is used for converging the multiple video data streams in the interactive server to obtain a converged video data stream, sending the converged video data stream to a streaming media server, and simultaneously directly sending the multiple audio data streams to the streaming media server;
the first confluence module is used for sending the multiple audio data streams and the confluence video data stream to a client by the streaming media server, and converging after modifying confluence parameters of the multiple audio data streams by the client to obtain a confluence audio data stream;
and the output module is used for outputting the merged video data stream and the merged audio data stream at the client.
In a third aspect, an embodiment of the present invention further provides an electronic device, including: the device comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the steps in the audio and video joint flow method provided by the embodiment when executing the computer program.
In a fourth aspect, a computer-readable storage medium has a computer program stored thereon, and the computer program, when executed by a processor, implements the steps in the method for merging streams of audio and video provided by the embodiment.
In the embodiment of the invention, after the interactive server is detected to enter the microphone connecting mode, the multi-strand audio and video data received by the interactive server are extracted; separating the obtained audio and video data to obtain a plurality of audio data streams and a plurality of video data streams; converging the multiple video data streams in the interactive server to obtain a converged video data stream, sending the converged video data stream to a streaming media server, and simultaneously directly sending the multiple audio data streams to the streaming media server; the streaming media server sends the multiple audio data streams and the merged video data stream to a client, and the merging parameters of the multiple audio data streams are modified at the client to obtain a merged audio data stream; and outputting the merged video data stream and the merged audio data stream at the client. The invention aims at the process that a multi-bit main broadcast connecting the wheat does activities, through separating a plurality of audio data streams from a plurality of video data streams, the plurality of video data streams are directly subjected to video confluence processing at one end of an interactive server, and then the plurality of audio data streams are sent to a client through a streaming media server, a user can carry out confluence parameter modification on the plurality of audio data streams at the client and carry out confluence on the plurality of audio data streams, thus, the user can selectively select the plurality of audio data streams according to personal requirements without displaying all main broadcast contents on a user terminal interface, thereby realizing customized service of the user and breaking the limitation of obtaining audio and video data by the user.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of an audio-video joint wheat merging method provided by an embodiment of the present invention;
fig. 2 is a structural diagram of a method for merging the live and live videos provided by the embodiment of the present invention;
fig. 3 is a flowchart of another audio-video joint wheat merging method provided by the embodiment of the invention;
fig. 4 is a schematic structural diagram of another audio-video joint wheat merging method provided by the embodiment of the invention;
fig. 5 is a schematic structural diagram of another audio-video joint wheat merging method provided by the embodiment of the invention;
fig. 6 is a schematic structural diagram of an audio-video wheat-connecting converging device provided by an embodiment of the invention;
fig. 7 is a schematic structural diagram of another audio-video wheat-connecting converging device provided by the embodiment of the invention;
fig. 8 is a schematic structural diagram of another audio-video wheat-connecting converging device provided by the embodiment of the invention;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "comprising" and "having," and any variations thereof, in the description and claims of this application and the description of the figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
As shown in fig. 1, fig. 1 is a flowchart of an audio-video microphone connecting and merging method provided by an embodiment of the present invention, where the audio-video microphone connecting and merging method includes the following steps:
s101, after the interactive server is detected to enter a microphone connecting mode, extracting multi-strand audio and video data received by the interactive server.
In this embodiment, the audio and video linkage and convergence method can be applied to a live broadcast service platform and the like. And the electronic equipment on which the audio-video microphone-connecting and converging method operates can acquire a plurality of strands of audio-video data in a wired connection mode or a wireless connection mode, and transmit various data related to the embodiment of the invention. The Wireless connection may include, but is not limited to, a 3G/4G connection, a WiFi (Wireless-Fidelity) connection, a bluetooth connection, a wimax (worldwide Interoperability for microwave access) connection, a Zigbee (low power local area network protocol) connection, a uwb (ultra wideband) connection, and other Wireless connection methods now known or developed in the future.
Specifically, in a scene of multi-person live broadcasting with live wheat, a plurality of live broadcasters can be connected with an interactive server through limited connection or wireless connection, and audio and video of the live broadcasters can be fused in the interactive server. When a plurality of live broadcasters are connected with the interactive server, the connection state can be detected in real time, and when successful wheat connection is confirmed, prompt information (voice prompt, subtitle prompt, light prompt and the like) of successful wheat connection can be sent out. When the wheat connecting state is detected, the live broadcast room number, the respective identification and other parameter information of each live broadcast person can be obtained.
The plurality of strands of audio and video data may include a plurality of strands of audio data and a plurality of strands of video data, and each live broadcaster may input the audio data and the video data to the interactive server. The type of the interactive server is not limited to one type, and can be selected according to the requirement.
S102, separating the acquired audio and video data to obtain multiple audio data streams and multiple video data streams.
When the extracted audio and video data is separated, the audio and video data can be decoded by a decoder, and the audio and video data are separated according to the difference of pulse modulation signals of the audio data stream and the video data stream to respectively obtain a plurality of audio data streams and a plurality of video data streams. The separated multiple audio data streams can be input into an audio encoder for audio encoding, and multiple independent audio data streams are obtained. The multiple video data streams can be input into a video encoder for video encoding, and multiple independent video data streams are obtained.
And S103, converging the multiple video data streams in the interactive server to obtain a converged video data stream, sending the converged video data stream to the streaming media server, and simultaneously directly sending the multiple audio data streams to the streaming media server.
Referring to fig. 2, fig. 2 is a schematic structural diagram of an audio and video connecting and merging method provided by an embodiment of the present invention. Because of the need to merge multiple audio data streams and multiple video data streams, the data amount, power consumption, merging time, etc. of the video data streams are all greater than those of the audio data streams, for example: in terms of file size, taking mp4 and mp3 formats as examples, the size of a 1-minute mp4 format is generally about 4M to 5M, while the size of a one-minute mp3 format is generally about 1.5M, and the size of an audio file is 3 times smaller than that of a video file, that is, 3 to 5 audio files are transmitted simultaneously to be equivalent to transmitting one video file. For another example: in terms of the merging speed and the size after merging, the time and power consumption required by merging the audio data streams are much smaller than those of merging the video data streams, because the total time length of the two audio data streams is unchanged after merging, the number of channels, the sound channels, the frame rate and the like are not obviously changed, and the total data volume after merging is equivalent to or even smaller than one of the original audio data streams. For the video data stream, the size of the file after merging is larger, and the consumption in the merging process is also larger, mainly because the number of pixel points in each frame of image is increased by merging of the video data stream, thereby further increasing the file volume. Thus, it can be seen that there are many differences in the merging of the video and audio data streams. In the embodiment of the invention, the video data stream and the audio data stream can be merged separately.
The merging process of the video data streams is carried out at one end of the interactive server. The process of converging the video data stream at the interactive service end is a process of automatically setting converging parameters by each live player to carry out data converging. The merge parameters of the merged video data streams may include boundaries in the frames, levels of the respective anchor frames, parameters of the live microphone, parameters of the camera, live image quality, sharpness, pixels, and so on. After merging the video data streams, a merged video data stream (video track) is obtained. The video track may then be forwarded to a Streaming media server (Streaming media server) in the form of steam via a real time transport protocol (rtmp transport protocol). In the process of sending the merged video data stream to the streaming media server, the multiple audio data streams which are not merged can also be sent to the streaming media server in a steam form through rtmp transmission protocol.
The streaming media mentioned above refers to a form of media that streams audio, video and multimedia files in a network. Compared with the network playing mode of watching after downloading, the streaming media is typically characterized in that continuous audio and video information is compressed and then put on a network server, and a user can watch while downloading without waiting for the whole file to be downloaded. The technology is widely applied to video-on-demand, video conference, remote education, remote medical treatment and online live broadcast systems.
And S104, the streaming media server sends the multiple audio data streams and the merged video data stream to the client, and the merging parameters of the multiple audio data streams are modified at the client to obtain the merged audio data stream.
The merging parameters of the merged audio data streams may include parameters such as volume, multiple, and audio format of audio. After receiving the streaming multi-stream audio data stream and the merged video data stream, the streaming media server may forward the streaming multi-stream audio data stream and the merged video data stream to the client in a steam form based on rtmp transmission protocol or based on a fragment transmission protocol (HLS transmission protocol). When the client receives the audio data streams which are not combined, the live broadcast audience (user) can autonomously modify the confluence parameters and then confluence. The confluence parameters set by each live audience are different, and the audio effects after confluence are presented at the user side are also different. Therefore, the same confluent video data stream is pulled, and the confluent parameters are modified according to requirements to generate respective confluent audio data streams, so that the clients can flexibly control the confluent of the audio and video data on respective clients according to the requirements, and the degree of freedom of confluent of the audio and video data is improved.
The client may also be referred to as a user terminal, a terminal device, a terminal, etc. And the user side may include a plurality of clients, that is, each user may have one or more clients. The client includes, but is not limited to, an electronic device with functions of playing, displaying, and the like, such as a mobile phone, a tablet, a personal computer, a computer, and the like.
And S105, outputting the merged streaming video data stream and the merged streaming audio data stream at the client.
After the merged video data stream and the merged audio data stream generated according to the modified merging parameters are received by the client of each user, the merged video data stream and the merged audio data stream generated according to the modified merging parameters can be mixed through a synthesizer (muxer), so that required audio and video are generated and output at the respective clients.
Specifically, the personalized picture requirements for the audiences can be freely realized by the audiences at the client side, so that the video pictures seen by the audiences are different. The implementation may be provided by an SDK (Software Development Kit) of a real-time audio/video service provider, or may be autonomously provided by an app (application) of a client, which is not limited in the embodiment of the present invention. The personalization implementation mainly includes that when a viewer clicks 'close picture' in a video, a sticker is added to the picture boundary of each anchor in the video according to confluence parameters (parameters such as the boundary of the picture of each anchor in the merged picture, the level of each anchor picture and the like) to cover the anchor, and the 'closed anchor picture for you' is identified. Of course, it may also be further supported that after the viewer clicks a certain anchor, the mode is switched to the "single anchor mode" by the client automatically cropping the enlarged picture, and so on.
In the embodiment of the invention, after the interactive server is detected to enter the microphone connecting mode, the multi-strand audio and video data received by the interactive server are extracted; separating the obtained audio and video data to obtain a plurality of audio data streams and a plurality of video data streams; converging multiple video data streams in an interactive server to obtain a converged video data stream, sending the converged video data stream to a streaming media server, and simultaneously directly sending the multiple audio data streams to the streaming media server; the streaming media server sends the multiple audio data streams and the confluent video data stream to the client, and the confluent parameters of the multiple audio data streams are modified at the client to obtain the confluent audio data stream; and outputting the combined stream video data stream and the combined stream audio data stream at the client. The invention aims at the process that a multi-bit main broadcast connecting the wheat does activities, through separating a plurality of audio data streams from a plurality of video data streams, the plurality of video data streams are directly subjected to video confluence processing at one end of an interactive server, and then the plurality of audio data streams are sent to a client through a streaming media server, a user can carry out confluence parameter modification on the plurality of audio data streams at the client and carry out confluence on the plurality of audio data streams, thus, the user can selectively select the plurality of audio data streams according to personal requirements without displaying all main broadcast contents on a user terminal interface, thereby realizing customized service of the user and breaking the limitation of obtaining audio and video data by the user.
As shown in fig. 3, fig. 3 is a flowchart of another method provided in the embodiment of the present invention, which specifically includes the following steps:
s201, after the interactive server is detected to enter a microphone connecting mode, extracting a plurality of strands of audio and video data received by the interactive server.
S202, separating the acquired audio and video data to obtain multiple audio data streams and multiple video data streams.
And S203, converging the multiple audio data streams in the interactive server to obtain a converged video data stream.
And S204, converging the multiple video data streams in the interactive server to obtain a converged audio data stream.
S205, the merged video data stream and the merged audio data stream are directly sent to the streaming media server and are forwarded to the client side for output based on the streaming media server.
As a possible embodiment, the audio data stream and the video data stream may be merged at the same time at the interactive server side. As shown in fig. 4, fig. 4 is a schematic structural diagram of another audio-video wheat-connecting merging method provided by the embodiment of the present invention. When the confluence is carried out at the interactive server side, the confluence parameters can be modified by each anchor and the client. Therefore, the merged video data stream and the merged audio data stream directly generated at one end of the interactive server can be forwarded to the streaming media server through the rtmp transmission protocol, and the streaming media server stores and performs background processing on the received merged audio data stream and the received merged video data stream and forwards the merged audio data stream and the received merged video data stream to each client through the rtmp transmission protocol/hls transmission protocol. Therefore, the same confluence picture and sound can be obtained at each client, and a user can conveniently obtain the same information.
As another possible embodiment, as shown in fig. 5, fig. 5 is a schematic structural diagram of another audio-video microphone-connecting merging method provided in the embodiment of the present invention. The multiple audio data streams and the multiple video data streams can be forwarded to each client, confluence parameters of the multiple audio data streams and the multiple video data streams are modified at each client according to needs, the multiple data streams are converged through corresponding confluence processing tools, and effects displayed at each client after confluence include but are not limited to receiving of the multiple streams (single pictures and sounds of multiple anchors), so that customized services of users can be realized. However, in the course of modifying the streaming parameters, the sensitivity of the client to network jitter will be increased in consideration of the way that the client accepts multiple video data streams and audio data streams, so that the network strength needs to be enhanced.
Optionally, after receiving the merged video data stream and the audio data stream, the streaming media server performs cloud storage and background data processing on the merged video data stream and the audio data stream, where the background data processing includes performing security judgment on the merged video data stream and the audio data stream.
In addition, when the streaming media server receives the merged video data stream and the merged audio data stream, cloud storage and background data processing can be performed as well.
The cloud storage can represent that the received combined video data stream and/or combined audio data stream are stored in the cloud, so that the background can acquire other devices and can call the devices at any time. In consideration of the limitation of the storage space, the stored data can be cleaned regularly, for example: and setting 3 months as a deadline, deleting data stored for more than 3 months, deleting temporary data at regular time, and the like.
The performing the background processing may include performing real-time data monitoring on the streaming video data stream and/or the streaming audio data stream, monitoring whether the streaming video data stream and/or the streaming audio data stream meet a specification, and performing multi-dimensional quality detection, data crash analysis, and the like on the streaming video data stream and/or the streaming audio data stream.
In the embodiment of the invention, the streaming audio data stream and the streaming video data stream are directly formed at the interactive client by modifying the streaming parameters of the audio data stream and the video data stream at one end of the interactive server, are forwarded to the streaming media server for storage and monitoring, and are simultaneously forwarded to a plurality of clients for playing audio and video data. In addition, the confluence parameters of the audio data stream and the video data stream can be modified at each user end, so that the customized service of the user can be realized, and the service limitation of the user is broken. And the audio data stream, the video data stream/the audio data stream and the video data stream are sent to the streaming media server for storage monitoring, so that the safety and the reliability of the data can be ensured.
As shown in fig. 6, fig. 6 is a schematic structural diagram of an audio-video wheat-connecting merging device provided in an embodiment of the present invention, where the device 300 includes:
the acquisition module 301 is configured to extract multiple strands of audio and video data received by the interactive server after detecting that the interactive server enters the microphone connecting mode;
the separation module 302 is configured to separate the acquired audio and video data to obtain multiple audio data streams and multiple video data streams;
the first sending module 303 is configured to obtain a merged video data stream after merging the multiple video data streams in the interactive server, send the merged video data stream to the streaming media server, and directly send the multiple audio data streams to the streaming media server;
the first confluence module 304 is used for the streaming media server to send the multiple audio data streams and the confluent video data stream to the client, and the confluence parameters of the multiple audio data streams are converged after being modified by the client to obtain a confluent audio data stream;
an output module 305, configured to output the merged streaming video data stream and the merged streaming audio data stream at the client.
Optionally, as shown in fig. 7, fig. 7 is a schematic structural diagram of another audio-video wheat-connecting merging device provided in the embodiment of the present invention, and the device 300 further includes:
a second merging module 306, configured to merge multiple audio data streams in the interactive server to obtain a merged video data stream;
a third merging module 307, configured to merge multiple video data streams in the interactive server to obtain a merged audio data stream;
and a second sending module 308, configured to send the merged video data stream and the merged audio data stream directly to the streaming media server, and forward the merged video data stream and the merged audio data stream to the client based on the streaming media server for output.
Optionally, as shown in fig. 8, fig. 8 is a schematic structural diagram of another audio-video wheat-connecting merging device provided in the embodiment of the present invention, and the device 300 further includes:
the storage module 309 is configured to, after the streaming media server receives the merged video data stream and the audio data stream, perform cloud storage and background data processing on the merged video data stream and the audio data, where the background data processing includes performing security judgment on the merged video data stream and the audio data.
Optionally, the first sending module 303 is further configured to, after detecting that the merged video data stream is completed, send the merged video data stream to the streaming media server by the interactive server through a real-time transmission protocol.
Optionally, the first merging module 304 is further configured to send the streaming server to send the multiple audio data streams and the merged video data stream to the client through a real-time transport protocol/a fragment transport protocol.
As shown in fig. 9, fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where the electronic device 600 includes: the system comprises a processor 601, a memory 602, a network interface 603 and a computer program which is stored on the memory 602 and can run on the processor 601, wherein the processor 601 executes the computer program to realize the steps in the audio and video connecting and merging method provided by the embodiment.
Specifically, the processor 601 is configured to perform the following steps:
after detecting that the interactive server enters a microphone connecting mode, extracting multi-strand audio and video data received by the interactive server;
separating the obtained audio and video data to obtain a plurality of audio data streams and a plurality of video data streams;
converging multiple video data streams in an interactive server to obtain a converged video data stream, sending the converged video data stream to a streaming media server, and simultaneously directly sending the multiple audio data streams to the streaming media server;
the streaming media server sends the multiple audio data streams and the confluent video data stream to the client, and the confluent parameters of the multiple audio data streams are modified at the client to obtain the confluent audio data stream;
and outputting the combined stream video data stream and the combined stream audio data stream at the client.
Optionally, the processor 601 is further configured to perform merging on multiple audio data streams in the interactive server to obtain a merged video data stream;
converging a plurality of video data streams in an interactive server to obtain a converged audio data stream;
and directly sending the merged video data stream and the merged audio data stream to a streaming media server, and forwarding the merged video data stream and the merged audio data stream to a client side for output based on the streaming media server.
Optionally, the processor 601 is further configured to execute, after the streaming media server receives the merged video data stream and the audio data stream, cloud storage and background data processing on the merged video data stream and the audio data, where the background data processing includes performing security judgment on the merged video data stream and the audio data.
Optionally, the step, executed by the processor 601, of converging the multiple video data streams in the interactive server to obtain a converged video data stream, and sending the converged video data stream to the streaming media server includes:
and after detecting that the confluent video data stream is finished, the interactive server sends the confluent video data stream to the streaming media server through a real-time transmission protocol.
Optionally, the streaming server executed by the processor 601 sends the multiple audio data streams and the merged video data stream to the client, including:
the streaming media server sends the multiple audio data streams and the merged video data stream to the client through a real-time transmission protocol/a fragment transmission protocol.
The electronic device 600 provided by the embodiment of the present invention can implement each implementation manner in the embodiment of the audio and video live-wheat joining and merging method, and has corresponding beneficial effects, and for avoiding repetition, details are not repeated here.
It is noted that 601-603 are shown with components only, but it is understood that not all of the illustrated components are required and that more or fewer components may alternatively be implemented. As will be understood by those skilled in the art, the electronic device 600 is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable gate array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The electronic device 600 may be a desktop computer, a notebook, a palmtop computer, or other computing devices. The electronic device 600 may interact with a user through a keyboard, a mouse, a remote control, a touch pad, or a voice-activated device.
The memory 602 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 602 may be an internal storage unit of the electronic device 600, such as a hard disk or a memory of the electronic device 600. In other embodiments, the memory 602 may also be an external storage device of the electronic device 600, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the electronic device 600. Of course, the memory 602 may also include both internal and external memory units of the electronic device 600. In this embodiment, the memory 602 is generally used to store an operating system installed in the electronic device 600 and various types of application software, such as program codes of the audio/video streaming method. In addition, the memory 602 may also be used to temporarily store various types of data that have been output or are to be output.
Processor 601 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 601 is generally used to control the overall operation of the electronic device 600. In this embodiment, the processor 601 is configured to execute the program code stored in the memory 602 or process data, for example, execute the program code of the audio/video joint flow method.
The network interface 603 may include a wireless network interface or a wired network interface, and the network interface 603 is generally used for establishing a communication connection between the electronic device 600 and other electronic devices.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when being executed by the processor 601, the computer program implements each process in the audio and video joint-microphone merging method provided in the embodiment, and can achieve the same technical effect, and is not described here again to avoid repetition.
It can be understood by those skilled in the art that all or part of the processes in the method for implementing an audio-video combined stream according to the embodiments may be implemented by instructing the relevant hardware through a computer program, where the program may be stored in a computer-readable storage medium, and when the program is executed, the processes may include, for example, the processes according to the embodiments of the methods. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The first, second, etc. mentioned in the embodiments of the present invention do not indicate the size, but are merely for convenience of description. The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims (10)

1. The audio and video continuous wheat converging method is characterized by comprising the following steps of:
after detecting that the interactive server enters a microphone connecting mode, extracting multi-strand audio and video data received by the interactive server;
separating the obtained audio and video data to obtain a plurality of audio data streams and a plurality of video data streams;
converging the multiple video data streams in the interactive server to obtain a converged video data stream, sending the converged video data stream to a streaming media server, and simultaneously directly sending the multiple audio data streams to the streaming media server;
the streaming media server sends the multiple audio data streams and the merged video data stream to a client, and the merging parameters of the multiple audio data streams are modified at the client to obtain a merged audio data stream;
and outputting the merged video data stream and the merged audio data stream at the client.
2. The audio-visual barley confluence method as claimed in claim 1, wherein the method comprises:
converging the multiple audio data streams in the interactive server to obtain a converged video data stream;
converging the multiple video data streams in the interactive server to obtain a converged audio data stream;
and directly sending the merged video data stream and the merged audio data stream to the streaming media server, and forwarding the merged video data stream and the merged audio data stream to the client side for output based on the streaming media server.
3. The audio-visual barley confluence method as claimed in claim 1, wherein the method further comprises the steps of:
and after receiving the merged video data stream and the audio data stream, the streaming media server performs cloud storage and background data processing on the merged video data stream and the audio data, wherein the background data processing comprises security judgment on the merged video data stream and the audio data.
4. The method for merging the multiple video data streams according to claim 1, wherein the step of merging the multiple video data streams in the interactive server to obtain a merged video data stream and sending the merged video data stream to a streaming media server comprises:
and after detecting that the merged video data stream is finished, the interactive server sends the merged video data stream to the streaming media server through a real-time transmission protocol.
5. The method for integrating multiple audios and videos according to claim 4, wherein the step of sending the multiple audio data streams and the integrated video data stream to the client by the streaming media server comprises:
and the streaming media server sends the multiple audio data streams and the merged video data stream to a client through the real-time transmission protocol/fragment transmission protocol.
6. The wheat-connecting confluence device of audios and videos is characterized by comprising:
the acquisition module is used for extracting the multi-strand audio and video data received by the interactive server after detecting that the interactive server enters a microphone connecting mode;
the separation module is used for separating the acquired audio and video data to obtain a plurality of audio data streams and a plurality of video data streams;
the first sending module is used for converging the multiple video data streams in the interactive server to obtain a converged video data stream, sending the converged video data stream to a streaming media server, and simultaneously directly sending the multiple audio data streams to the streaming media server;
the first confluence module is used for sending the multiple audio data streams and the confluence video data stream to a client by the streaming media server, and converging after modifying confluence parameters of the multiple audio data streams by the client to obtain a confluence audio data stream;
and the output module is used for outputting the merged video data stream and the merged audio data stream at the client.
7. The audio-visual wheat-joining confluence device as claimed in claim 6, wherein said device comprises:
the second confluence module is used for converging the multiple audio data streams in the interactive server to obtain a confluent video data stream;
the third stream merging module is used for merging the multiple video data streams in the interactive server to obtain a merged stream audio data stream;
and the second sending module is used for directly sending the merged video data stream and the merged audio data stream to the streaming media server and forwarding the merged video data stream and the merged audio data stream to the client side for output based on the streaming media server.
8. The audio-visual wheat-joining confluence device as claimed in claim 7, wherein said device further comprises:
and the storage module is used for carrying out cloud storage and background data processing on the combined video data stream and the audio data after the streaming media server receives the combined video data stream and the audio data stream, wherein the background data processing comprises safety judgment on the combined video data stream and the audio data.
9. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the method of joining a plurality of audios and videos according to any one of claims 1 to 5 when executing the computer program.
10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which, when being executed by a processor, implements the steps in the method of joining streams of audio and video according to any one of claims 1 to 5.
CN202010953283.7A 2020-09-11 2020-09-11 Audio and video connecting and converging method and device, electronic equipment and storage medium Active CN112135155B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010953283.7A CN112135155B (en) 2020-09-11 2020-09-11 Audio and video connecting and converging method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010953283.7A CN112135155B (en) 2020-09-11 2020-09-11 Audio and video connecting and converging method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112135155A true CN112135155A (en) 2020-12-25
CN112135155B CN112135155B (en) 2022-07-19

Family

ID=73846358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010953283.7A Active CN112135155B (en) 2020-09-11 2020-09-11 Audio and video connecting and converging method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112135155B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542792A (en) * 2021-07-14 2021-10-22 北京字节跳动网络技术有限公司 Audio merging method, audio uploading method, device and program product
CN114760520A (en) * 2022-04-20 2022-07-15 广州方硅信息技术有限公司 Live small and medium video shooting interaction method, device, equipment and storage medium
CN115022665A (en) * 2022-06-27 2022-09-06 咪咕视讯科技有限公司 Live broadcast making method and device, multimedia processing equipment and multimedia processing system
WO2023165580A1 (en) * 2022-03-03 2023-09-07 北京字跳网络技术有限公司 Stream mixing method and device for co-streaming

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102325173A (en) * 2011-08-30 2012-01-18 重庆抛物线信息技术有限责任公司 Mixed audio and video sharing method and system
CN103384235A (en) * 2012-05-04 2013-11-06 腾讯科技(深圳)有限公司 Method, server and system used for data presentation during conversation of multiple persons
CN105657579A (en) * 2015-10-29 2016-06-08 乐视致新电子科技(天津)有限公司 Live broadcast audio switching method, stream media server and client
CN106131583A (en) * 2016-06-30 2016-11-16 北京小米移动软件有限公司 A kind of live processing method, device, terminal unit and system
CN207099275U (en) * 2017-07-05 2018-03-13 深圳开心聊吧科技有限公司 One kind chats multi-screen interactive device
CN107819833A (en) * 2017-10-20 2018-03-20 贵州白山云科技有限公司 A kind of method and device for accessing live even wheat
CN107846633A (en) * 2016-09-18 2018-03-27 腾讯科技(深圳)有限公司 A kind of live broadcasting method and system
CN108495141A (en) * 2018-03-05 2018-09-04 网宿科技股份有限公司 A kind of synthetic method and system of audio and video
CN108848106A (en) * 2018-06-30 2018-11-20 武汉斗鱼网络科技有限公司 Customized data method, device and readable storage medium storing program for executing are transmitted by audio stream
CN108848391A (en) * 2018-06-21 2018-11-20 深圳市思迪信息技术股份有限公司 The more people Lian Mai method and devices of net cast
CN108900920A (en) * 2018-07-20 2018-11-27 广州虎牙信息科技有限公司 A kind of live streaming processing method, device, equipment and storage medium
CN110299144A (en) * 2018-03-21 2019-10-01 腾讯科技(深圳)有限公司 Audio mixing method, server and client
US20190387263A1 (en) * 2015-12-22 2019-12-19 Youku Internet Technology (Beijing) Co., Ltd. Synchronously displaying and matching streaming media and subtitles
CN110677696A (en) * 2019-09-10 2020-01-10 南京清豆华创科技有限公司 Live broadcast interaction system and method, equipment and storage medium
CN111050185A (en) * 2018-10-15 2020-04-21 武汉斗鱼网络科技有限公司 Live broadcast room wheat-connected video mixing method, storage medium, electronic equipment and system
EP3698856A1 (en) * 2019-02-22 2020-08-26 Technogym S.p.A. Adaptive audio and video channels in a group exercise class

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102325173A (en) * 2011-08-30 2012-01-18 重庆抛物线信息技术有限责任公司 Mixed audio and video sharing method and system
CN103384235A (en) * 2012-05-04 2013-11-06 腾讯科技(深圳)有限公司 Method, server and system used for data presentation during conversation of multiple persons
CN105657579A (en) * 2015-10-29 2016-06-08 乐视致新电子科技(天津)有限公司 Live broadcast audio switching method, stream media server and client
US20190387263A1 (en) * 2015-12-22 2019-12-19 Youku Internet Technology (Beijing) Co., Ltd. Synchronously displaying and matching streaming media and subtitles
CN106131583A (en) * 2016-06-30 2016-11-16 北京小米移动软件有限公司 A kind of live processing method, device, terminal unit and system
CN107846633A (en) * 2016-09-18 2018-03-27 腾讯科技(深圳)有限公司 A kind of live broadcasting method and system
CN207099275U (en) * 2017-07-05 2018-03-13 深圳开心聊吧科技有限公司 One kind chats multi-screen interactive device
CN107819833A (en) * 2017-10-20 2018-03-20 贵州白山云科技有限公司 A kind of method and device for accessing live even wheat
CN108495141A (en) * 2018-03-05 2018-09-04 网宿科技股份有限公司 A kind of synthetic method and system of audio and video
CN110299144A (en) * 2018-03-21 2019-10-01 腾讯科技(深圳)有限公司 Audio mixing method, server and client
CN108848391A (en) * 2018-06-21 2018-11-20 深圳市思迪信息技术股份有限公司 The more people Lian Mai method and devices of net cast
CN108848106A (en) * 2018-06-30 2018-11-20 武汉斗鱼网络科技有限公司 Customized data method, device and readable storage medium storing program for executing are transmitted by audio stream
CN108900920A (en) * 2018-07-20 2018-11-27 广州虎牙信息科技有限公司 A kind of live streaming processing method, device, equipment and storage medium
CN111050185A (en) * 2018-10-15 2020-04-21 武汉斗鱼网络科技有限公司 Live broadcast room wheat-connected video mixing method, storage medium, electronic equipment and system
EP3698856A1 (en) * 2019-02-22 2020-08-26 Technogym S.p.A. Adaptive audio and video channels in a group exercise class
CN110677696A (en) * 2019-09-10 2020-01-10 南京清豆华创科技有限公司 Live broadcast interaction system and method, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
朱长利: "用Premiere 6网络直播的实现", 《桌面出版与设计》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542792A (en) * 2021-07-14 2021-10-22 北京字节跳动网络技术有限公司 Audio merging method, audio uploading method, device and program product
CN113542792B (en) * 2021-07-14 2023-04-07 北京字节跳动网络技术有限公司 Audio merging method, audio uploading method, device and program product
WO2023165580A1 (en) * 2022-03-03 2023-09-07 北京字跳网络技术有限公司 Stream mixing method and device for co-streaming
CN114760520A (en) * 2022-04-20 2022-07-15 广州方硅信息技术有限公司 Live small and medium video shooting interaction method, device, equipment and storage medium
CN115022665A (en) * 2022-06-27 2022-09-06 咪咕视讯科技有限公司 Live broadcast making method and device, multimedia processing equipment and multimedia processing system

Also Published As

Publication number Publication date
CN112135155B (en) 2022-07-19

Similar Documents

Publication Publication Date Title
CN112135155B (en) Audio and video connecting and converging method and device, electronic equipment and storage medium
CN111818359B (en) Processing method and device for live interactive video, electronic equipment and server
CN110798697B (en) Video display method, device and system and electronic equipment
CN107483460B (en) Method and system for multi-platform parallel broadcasting and stream pushing
CN112714330B (en) Gift presenting method and device based on live broadcast with wheat and electronic equipment
US10271009B2 (en) Method and apparatus for providing additional information of video using visible light communication
EP3562163B1 (en) Audio-video synthesis method and system
EP2940940B1 (en) Methods for sending and receiving video short message, apparatus and handheld electronic device thereof
CN110784730B (en) Live video data transmission method, device, equipment and storage medium
CN106331880B (en) Information processing method and system
WO2016150317A1 (en) Method, apparatus and system for synthesizing live video
US11227620B2 (en) Information processing apparatus and information processing method
KR20130138263A (en) Streaming digital video between video devices using a cable television system
US20130332952A1 (en) Method and Apparatus for Adding User Preferred Information To Video on TV
KR101915786B1 (en) Service System and Method for Connect to Inserting Broadcasting Program Using an Avata
US11451858B2 (en) Method and system of processing information flow and method of displaying comment information
CN106792155A (en) A kind of method and device of the net cast of multiple video strems
KR101915792B1 (en) System and Method for Inserting an Advertisement Using Face Recognition
US20130291011A1 (en) Transcoding server and method for overlaying image with additional information therein
US20200213631A1 (en) Transmission system for multi-channel image, control method therefor, and multi-channel image playback method and apparatus
CA2969721A1 (en) Location agnostic media control room and broadcasting facility
JP2024048339A (en) Server, terminal and computer program
CN116962742A (en) Live video image data transmission method, device and live video system
KR101067952B1 (en) Managing System for less traffic in video communication and Method thereof
CN112532719A (en) Information flow pushing method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant