CN114885198A - Mixed network-oriented accompanying sound and video collaborative presentation system - Google Patents

Mixed network-oriented accompanying sound and video collaborative presentation system Download PDF

Info

Publication number
CN114885198A
CN114885198A CN202210791260.XA CN202210791260A CN114885198A CN 114885198 A CN114885198 A CN 114885198A CN 202210791260 A CN202210791260 A CN 202210791260A CN 114885198 A CN114885198 A CN 114885198A
Authority
CN
China
Prior art keywords
audio
video
delay time
delay
stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210791260.XA
Other languages
Chinese (zh)
Other versions
CN114885198B (en
Inventor
姜文波
顾军
刘玓
王振中
马健
赵旭
卢冠宇
孙剑
刘永强
顿子振
王汗青
李婵
�田�浩
喻庆杰
刘寒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Media Group
Original Assignee
China Media Group
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Media Group filed Critical China Media Group
Priority to CN202210791260.XA priority Critical patent/CN114885198B/en
Publication of CN114885198A publication Critical patent/CN114885198A/en
Application granted granted Critical
Publication of CN114885198B publication Critical patent/CN114885198B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43072Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Abstract

The embodiment of the application provides a mixed network-oriented accompanying audio and video collaborative presentation system, which comprises: the video link module is used for acquiring a video stream, inserting a timestamp into the video stream and calculating the delay time of a video full link; the audio link module is used for acquiring an audio stream corresponding to the video stream and generating a plurality of paths of delayed audio streams with different delay times according to the audio stream; meanwhile, a visual time stamp is inserted into the audio stream to generate a delay measuring and calculating signal stream; and the application gateway module is used for calculating the delay time of the audio full link according to the delay measuring and calculating signal stream, determining a path of delay audio stream according to the delay time of the video full link and the delay time of the audio full link, and controlling the audio link module to send the delay audio stream to the mobile terminal. When the presentation system provided by the embodiment of the application is used for independently distributing the main signals of the television audio and the channel video through different networks, the sound and picture quasi-real-time synchronous playing can be realized by calculating and actively aligning the delay.

Description

Mixed network-oriented accompanying sound and video collaborative presentation system
Technical Field
The application relates to the technical field of mobile media, in particular to a mixed network-oriented accompanying audio and video collaborative presentation system.
Background
In recent years, with the rapid development of ultra-high definition outdoor large screens and mobile media, new broadcast television transmission modes are changed, and the presentation forms and distribution methods of contents are diversified. On one hand, the video transmission of broadcasting such as digital television, IPTV and the like or a special network is still the main means for transmitting television programs; on the other hand, with the continuous improvement of the network quality and speed of the internet/mobile internet, numerous low-delay transmission protocols such as WebRTC/QUIC/SRT are developed, and the internet/mobile internet can be used to realize video and audio distribution comparable to the transmission quality of the broadcast dedicated network.
However, the prior art does not consider uniform control based on the distribution end, distribution delay of different networks and automatic switching of different channels, and along with diversification of mobile terminals used by users and complex network environments, the phenomenon of 'asynchronous sound and picture' often occurs, and a sense of discomfort is brought to the users in experience.
Disclosure of Invention
In order to solve one of the technical defects, the embodiment of the present application provides a mixed network-oriented audio and video collaborative presentation system.
According to a first aspect of embodiments of the present application, there is provided a mixed network-oriented audio and video collaborative presentation system, including:
the video link module is used for acquiring a video stream, inserting a timestamp into the video stream and calculating the delay time of a video full link;
the audio link module is used for acquiring an audio stream corresponding to the video stream and generating a plurality of paths of delayed audio streams with different delay times according to the audio stream; meanwhile, a visual time stamp is inserted into the audio stream to generate a delay measuring and calculating signal stream;
and the application gateway module is used for calculating the delay time of the audio full link according to the delay measuring and calculating signal stream, determining a path of delay audio stream according to the delay time of the video full link and the delay time of the audio full link, and controlling the audio link module to send the delay audio stream to the mobile terminal.
Optionally, the video link module includes a video encoder, a transmission distribution server and a set-top box terminal, and the video encoder, the transmission distribution server and the set-top box terminal are connected in sequence;
the video encoder is used for calculating the delay time of the video stream transmitted to the video encoder and recording the delay time as a first video delay time;
the set-top box terminal is used for calculating the delay time of the video stream transmitted to the set-top box terminal by the video encoder based on the timestamp and recording the delay time as a second video delay time; calculating the delay time of the video stream transmitted from the set-top box terminal to an outdoor large screen, and recording as a third video delay time;
and the transmission and distribution server is used for obtaining the delay time of the video full link according to the first video delay time, the second video delay time and the third video delay time and sending the delay time to the application gateway module.
Optionally, the video encoder inserts a timestamp into the SEI information of the video stream, and specifically, the SEI information is placed in front of each key frame before the main coded image data.
Optionally, the audio link module includes an audio coding server, an audio delay server, and an audio live broadcast server, where the audio coding server, the audio delay server, and the audio live broadcast server are connected in sequence;
the audio coding server is used for calculating the delay time of the audio stream transmitted to an audio coder and recording the delay time as a first audio delay time;
the audio delay server is used for calculating the delay time of the encoded audio stream transmitted to the audio delay server and recording the delay time as a second audio delay time;
and the audio live broadcast server is used for sending the first audio delay time and the second audio delay time to an application gateway module.
Optionally, the calculating, by the application gateway module, the delay time of the audio full link according to the delay measurement and calculation signal stream includes:
calculating the delay time from the audio delay server to the mobile terminal according to the delay measuring and calculating signal stream, and recording as a third audio delay time;
and obtaining the delay time of the audio full link according to the first audio delay time, the second audio delay time and the third audio delay time.
Optionally, the calculating the delay time from the audio delay server to the mobile terminal according to the delay measurement and calculation signal stream includes:
acquiring local time when the mobile terminal starts to play the delay measuring and calculating signal stream and a playing image of the mobile terminal playing the delay measuring and calculating signal stream;
and calculating the delay time from the audio delay server to the mobile terminal according to the time corresponding to the playing image and the local time.
Optionally, the audio delay server generates multiple delayed audio streams with different delay times, which are stepped by a set time.
Optionally, the set time is 200 ms.
Optionally, the application gateway module is further configured to obtain location information of the mobile terminal, and determine a corresponding delay time of the video full link according to the location information.
Optionally, the determining a path of delayed audio stream and controlling the audio link module to send to the mobile terminal according to the delay time of the video full link and the delay time of the audio full link includes:
and determining a path of delayed audio stream corresponding to the delay time according to the difference between the delay time of the video full link and the delay time of the audio full link, and controlling the audio link module to send the delayed audio stream to the mobile terminal.
By adopting the mixed network-oriented accompanying sound and video collaborative presentation system provided by the embodiment of the application, when accompanying sound signals and video signals are independently distributed through different networks, sound and picture quasi-real-time synchronous playing of the accompanying sound signals and the video signals can be realized through calculation and active alignment of delay.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a schematic block diagram of a hybrid network-oriented audio and video collaborative presentation system according to an embodiment of the present application;
fig. 2 is a schematic block diagram of a video link module according to an embodiment of the present disclosure;
fig. 3 is a schematic block diagram of an audio link module provided in an embodiment of the present application;
fig. 4 is a signal flow diagram of an audio and video collaborative presentation system facing a hybrid network according to an embodiment of the present application;
fig. 5 is a service flow diagram of a mixed network-oriented audio and video collaborative presentation system according to an embodiment of the present application;
fig. 6 is a flowchart of an audio delay function provided in an embodiment of the present application.
Detailed Description
In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following further detailed description of the exemplary embodiments of the present application with reference to the accompanying drawings makes it clear that the described embodiments are only a part of the embodiments of the present application, and are not exhaustive of all embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
In the process of implementing the application, the inventor finds out how to utilize the internet/mobile internet to realize independent transmission of ultra-high definition videos and television accompanying sounds in a heterogeneous network for the traditional outdoor large-screen scene only having video playing capability, and realizes synchronous presentation of the large-screen videos and the television accompanying sounds on a mobile phone of a user at a terminal, so that the method becomes a key link for innovating outdoor large-screen program forms and improving user experience.
Fig. 1 is a schematic block diagram of a hybrid network-oriented audio and video collaborative presentation system according to an embodiment of the present application; fig. 2 is a schematic block diagram of a video link module according to an embodiment of the present disclosure; fig. 3 is a schematic block diagram of an audio link module provided in an embodiment of the present application; fig. 4 is a signal flow diagram of an audio and video collaborative presentation system facing a hybrid network according to an embodiment of the present application; fig. 5 is a service flow diagram of a mixed network-oriented audio and video collaborative presentation system according to an embodiment of the present application; fig. 6 is a flowchart of an audio delay function provided in an embodiment of the present application.
As shown in fig. 1 to 6, in an embodiment of the present application, there is provided a mixed network-oriented audio and video collaborative presentation system, including:
the video link module 1 is used for acquiring a video stream, inserting a timestamp into the video stream, and calculating the delay time of a video full link; by inserting a timestamp into the video stream, the delay time from encoding to decoding of the video stream can be calculated, which is the most influential factor in the video full link.
The audio link module 2 is configured to obtain an audio stream corresponding to the video stream, and generate a plurality of paths of delayed audio streams with different delay times according to the audio stream; and meanwhile, inserting a visual time stamp into the audio stream to generate a delay measuring and calculating signal stream. According to the delay measurement signal flow, the delay time of the audio stream from the delay processing to the mobile terminal can be calculated, and the delay time is the most influential factor in the audio full link.
And the application gateway module 3 is used for calculating the delay time of the audio full link according to the delay measurement and calculation signal stream, determining a path of delay audio stream according to the delay time of the video full link and the delay time of the audio full link, and controlling the audio link module to send the delay audio stream to the mobile terminal.
It should be noted that the mobile terminal is a device for playing accompanying sound, and may be a mobile phone or a tablet computer.
In the video link module 1, the video stream may be in an ultra high definition 4K/8K format, and the application takes the video stream without compression of the SMPTE 2110 output by the 8K broadcast system as an example.
The video full link refers to a video transmission full link from an 8K broadcasting system to outdoor large screens, wherein the outdoor large screens are more in number and distributed at different positions, so that the delay time of the video full link is different according to the outdoor large screens.
In the audio link module 2, the audio stream is a lossless audio/video stream output by the 8K broadcasting system.
The audio full link refers to an audio transmission full link from the 8K broadcasting system to the mobile terminal, and the delay time of the audio full link is different according to the position of the mobile terminal.
In the application gateway module 3, a video full link and an audio full link which need to be cooperatively presented are selected, and delay calculation and alignment are performed, so that quasi-real-time synchronous playing is realized.
When the sound signal and the video signal are independently distributed through different networks, the sound and picture quasi-real-time synchronous playing of the sound signal and the video signal can be realized through the calculation and the active alignment of the delay.
As an alternative embodiment, as shown in fig. 2, the video link module 1 includes a video encoder 110, a transmission distribution server 120, and a set-top terminal 130, where the video encoder 110, the transmission distribution server 120, and the set-top terminal 130 are connected in sequence.
The video encoder 110 is configured to calculate a delay time for transmitting the video stream to the video encoder, which is recorded as a first video delay time; a video encoder acquires a video stream, encodes the video stream, and inserts a timestamp in the video stream. Specifically, the video encoder may adopt an AVS3 encoder, and may also adopt encoders of other types, which is not limited in this embodiment.
The set-top box terminal 130 is configured to calculate, based on the timestamp, a delay time for the video stream to be transmitted to the set-top box terminal through the video encoder, and record the delay time as a second video delay time; calculating the delay time of the video stream transmitted from the set-top box terminal to an outdoor large screen, and recording as a third video delay time; and the set-top box terminal decodes the received coded video stream and sends the decoded video stream to an outdoor large screen for playing.
The transmission distribution server 120 is configured to obtain the delay time of the full video link according to the first video delay time, the second video delay time, and the third video delay time, and send the delay time to the application gateway module. And the transmission and distribution server is also used for transmitting and distributing the coded video stream to each set top box terminal.
Optionally, the video encoder 110 inserts a timestamp in the SEI information of the video stream. In particular, the SEI information is placed before the main coded image data, at the front of each key frame. And (3) putting the timestamp into the signal h264 Access Unit in a character string form at a video encoder end for transmission, inserting a key frame into the video stream, giving a continuous time code to the video stream, and transmitting and distributing the continuous time code to a client, namely a set top box terminal, through a link. When the video stream is transmitted and distributed to the set-top box terminal, the SEI information is decoded at the terminal, and the delay data from the video stream to the set-top box terminal can be calculated by combining the decoding time and the buffer area state.
The SEI Information is called supplement Enhancement Information, and can put data into the h264 Access Unit for transmission. SEI characteristics: independent of the related protocols, both RTSP and RTMP can be supported, and other protocols can be used as long as the playing end supports SEI analysis; the compatibility is good, if the playing end does not support custom SEI data analysis, the SEI data is lost to an H264 decoder, the decoder omits, and normal playing is not influenced; the video frame is carried, and the video is completely synchronized.
As an alternative embodiment, as shown in fig. 3, the audio link module 2 includes an audio coding server 210, an audio delay server 220, and an audio live broadcast server 230, where the audio coding server 210, the audio delay server 220, and the audio live broadcast server 230 are connected in sequence.
The audio encoding server 210 is configured to calculate a delay time for transmitting the audio stream to an audio encoder, which is denoted as a first audio delay time. And the audio coding server codes the received non-coded audio stream and pushes the coded audio stream to the audio delay server.
The audio delay server 220 is configured to calculate a delay time for transmitting the encoded audio stream to the audio delay server, and record the delay time as a second audio delay time. Multiple delayed audio streams of different delay times may be generated from the encoded audio stream. Meanwhile, a visual time stamp is inserted into the coded audio stream to generate a delay measuring and calculating signal stream; and sending the multi-path delayed audio stream and the delayed measurement audio stream to an audio live broadcast server, wherein the audio live broadcast server is responsible for distribution.
The audio live broadcast server 230 is configured to send the first audio delay time and the second audio delay time to an application gateway module.
The audio delay server 220 generates multiple audio streams with different delay times by taking set time as steps according to the audio streams, wherein the set time is obtained by independent calculation according to a service scene, and the set time can be 200ms, and performs data frame reference counting when audio stream data frames are cached and the multiple audio streams are simultaneously distributed.
The method and the device realize the one-input-multiple-output function, each path of output delay is independently calculated according to the service scene, and data frame reference counting is carried out when audio source stream data frame caching and multi-path output are simultaneously distributed in the memory, so that the memory peak value is reduced, the utilization rate of computing resources is improved, multi-path different delay audio streams are output in real time, and more multi-path output streams with different delays can be supported by a single process. And meanwhile, Demux optimization is carried out on the particularity of the source single AAC audio RTMP stream, the time consumed by additionally buffering data frames in the Probe stage of the single-track RTMP stream is reasonably processed, and the input delay is reduced. The whole transmission link transparently transmits the audio data, and delay consumption caused by live stream processing is reduced to the maximum extent.
AAC, fully known as Advanced Audio Coding, is a file compression format designed specifically for sound data.
RTMP is an acronym for Real Time Messaging Protocol. The protocol is based on TCP and is a protocol family, comprising various variants of RTMP basic protocol and RTMPT/RTMPS/RTMPE. RTMP is a network protocol designed for real-time data communication, and is mainly used for audio-video and data communication between a Flash/AIR platform and a streaming media/interaction server supporting the RTMP protocol. Software supporting the protocol includes Adobe Media Server/Ultrant Media Server/red5, etc. RTMP is the same as HTTP, and belongs to the application layer of TCP/IP four-layer model.
Demux, a data distributor, often referred to simply as a demultiplexer or demultiplexer.
On the basis of the above embodiment, the calculating, by the application gateway module, the delay time of the audio full link includes:
and calculating the delay time from the audio delay server to the mobile terminal according to the delay measuring and calculating signal stream, and recording the delay time as a third audio delay time.
And obtaining the delay time of the audio full link according to the first audio delay time, the second audio delay time and the third audio delay time.
Optionally, the calculating the delay time from the audio delay server to the mobile terminal according to the delay measurement and calculation signal stream includes:
and acquiring the local time when the mobile terminal starts to play the delay measuring and calculating signal stream and the playing image of the mobile terminal playing the delay measuring and calculating signal stream.
And calculating the delay time from the audio delay server to the mobile terminal according to the time corresponding to the playing image and the local time.
The acquisition of the playing image can be realized by pulling a delay measuring and calculating signal flow by a mobile terminal application, firstly, a video player is placed outside a visual range for hiding, then, a local player loads a video flow for decoding to carry out mute playing, and simultaneously, a screenshot is carried out on a designated area (namely the player) and the local screenshot time is recorded.
And identifying the time corresponding to the playing image based on an OCR technology, and calculating the delay time from the audio delay server to the mobile terminal according to the time corresponding to the playing image and the local time. And identifying the time corresponding to the playing image based on OCR calculation, firstly preprocessing the image by adopting a noise reduction, binarization, character segmentation and normalization method, and analyzing a character area needing to be identified in the image. In order to improve the recognition efficiency, secondly, the character distinguishing is subjected to feature extraction and dimension reduction, and features are extracted and sent to a classifier to confirm the character type. And finally, post-processing is carried out, correct character information in the visible time stamp screenshot in the video stream is obtained, the difference between the correct character information and the screenshot time is recorded and compared, and a data basis is provided for audio full-flow delay calculation.
On the basis of the foregoing embodiment, as an optional embodiment, the application gateway module is further configured to obtain location information of the mobile terminal, and determine a corresponding delay time of the video full link according to the location information. Specifically, GPS positioning service can be utilized, firstly, GPS position information of an outdoor large screen is input into gateway service, then the position information is coded through a GeoHash code, the process is that two-dimensional space longitude and latitude data is coded into a character string, and finally a group of coded large screen GPS code sets are obtained. And then when the application of the mobile terminal is started, acquiring the positioning information of the mobile phone of the user, uploading the positioning information to a gateway service, coding the longitude and latitude of the mobile phone by the gateway service by using the GeoHash codes, comparing the GeoHash codes of the mobile phone end with the GeoHash code set of the outdoor large screen, and determining the binding relationship between the mobile phone end and the outdoor large screen by adopting an algorithm with the closest absolute distance.
Optionally, the determining a path of delayed audio stream and controlling the audio link module to send to the mobile terminal according to the delay time of the video full link and the delay time of the audio full link includes:
and determining a path of delayed audio stream corresponding to the delay time according to the difference between the delay time of the video full link and the delay time of the audio full link, and controlling the audio link module to send the delayed audio stream to the mobile terminal.
The application gateway module prestores position information of all outdoor large screens and delay time of a video full link, and acquires positioning information of the mobile terminal and the delay time of an audio full link; and determining the nearest outdoor large screen according to the positioning information, selecting one path of audio stream corresponding to the delay time according to the delay time of the audio full link of the mobile terminal and the delay time of the video full link corresponding to the outdoor large screen, and controlling and transmitting the audio stream to the mobile terminal for cooperative presentation.
The invention has the advantages that:
1. service scene: the method fills the gap of service scenes that the television and the large-screen video cooperatively play audio at the mobile terminal under the condition of facing to the mixed network, and improves the sound and picture synchronization capability of the user for experiencing the cooperative play of the same content on different devices. The television live broadcast signals are multicast to the national outdoor large screen through a private network for playing, and an audio signal stream is synchronously separated from the broadcast and is provided for a user to synchronously listen to the audio of the outdoor large screen playing picture through a mobile phone end through the Internet. The intelligent automatic synchronous matching of sound and picture is realized, the corresponding audio stream of the recent large-screen video picture is matched, and meanwhile, the audio correction and adjustment of a user at the mobile phone end in the forward and backward directions are supported, and the audio is microscopically adjusted to be synchronous with the sound and picture.
2. And (3) delay measurement: the video signals and the audio signals are distributed through different network links, meanwhile, due to links such as network delay and jitter, coding and decoding efficiency, streaming media service and the like, the audio and video playing of the terminal inevitably delays with the actual source output, a video stream with a visible timestamp is generated, the code stream of the stream (video or audio) to be calculated is simulated through the control of the image size, and the code stream is pushed to a downstream live broadcast distribution system. A screen capture program is developed at a terminal, capture time is recorded while a screen is captured, timestamps in a video are restored through a text recognition technology, delay is calculated, and repeated data measurement and calculation correction is carried out, so that delay data from source stream output to playing are basically realized.
3. And (3) time delay compensation: due to the fact that delay errors between video signals of outdoor large screens in various regions and audio signals distributed through the internet are large, sound and picture synchronization is achieved, and synchronization requirements cannot be met only through caching of a traditional terminal player. Therefore, the control of the delay of the server is added, a group of audio streams with the step length of 200 milliseconds is generated, and the mobile terminal selects the audio streams with close delay according to the result of the calculation of the delay time and plays the audio streams to realize the effect of approximate synchronous presentation.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the present application can be implemented by using various computer languages, for example, C language, object-oriented programming language Java, and transliterated scripting language JavaScript.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A hybrid network-oriented audio and video co-presentation system, comprising:
the video link module is used for acquiring a video stream, inserting a timestamp into the video stream and calculating the delay time of a video full link;
the audio link module is used for acquiring an audio stream corresponding to the video stream and generating a plurality of paths of delayed audio streams with different delay times according to the audio stream; meanwhile, a visual time stamp is inserted into the audio stream to generate a delay measuring and calculating signal stream;
and the application gateway module is used for calculating the delay time of the audio full link according to the delay measuring and calculating signal stream, determining a path of delay audio stream according to the delay time of the video full link and the delay time of the audio full link, and controlling the audio link module to send the delay audio stream to the mobile terminal.
2. The hybrid network-oriented audio and video collaborative presentation system according to claim 1, wherein the video link module comprises a video encoder, a transmission distribution server and a set-top terminal, and the video encoder, the transmission distribution server and the set-top terminal are connected in sequence;
the video encoder is used for calculating the delay time of the video stream transmitted to the video encoder and recording the delay time as a first video delay time;
the set-top box terminal is used for calculating the delay time of the video stream transmitted to the set-top box terminal by the video encoder based on the timestamp and recording the delay time as a second video delay time; calculating the delay time of the video stream transmitted from the set-top box terminal to an outdoor large screen, and recording as a third video delay time;
and the transmission and distribution server is used for obtaining the delay time of the video full link according to the first video delay time, the second video delay time and the third video delay time and sending the delay time to the application gateway module.
3. A hybrid network oriented audio and video collaborative presentation system according to claim 2, wherein said video encoder inserts a time stamp in SEI information of said video stream, specifically, SEI information placed in front of the main coded image data, at the front of each key frame.
4. The hybrid network-oriented audio and video collaborative presentation system according to claim 1, wherein the audio link module comprises an audio coding server, an audio delay server, and an audio live broadcast server, and the audio coding server, the audio delay server, and the audio live broadcast server are connected in sequence;
the audio coding server is used for calculating the delay time of the audio stream transmitted to the audio coder and recording the delay time as a first audio delay time;
the audio delay server is used for calculating the delay time of the encoded audio stream transmitted to the audio delay server and recording the delay time as a second audio delay time;
and the audio live broadcast server is used for sending the first audio delay time and the second audio delay time to an application gateway module.
5. The hybrid network oriented audio and video collaborative presentation system according to claim 4, wherein the application gateway module calculates the delay time of the audio full link according to the delay measurement signal stream, and comprises:
calculating the delay time from the audio delay server to the mobile terminal according to the delay measuring and calculating signal stream, and recording as a third audio delay time;
and obtaining the delay time of the audio full link according to the first audio delay time, the second audio delay time and the third audio delay time.
6. The system of claim 5, wherein the calculating the delay time from the audio delay server to the mobile terminal according to the delay measurement signal stream comprises:
acquiring local time when the mobile terminal starts to play the delay measuring and calculating signal stream and a playing image of the mobile terminal playing the delay measuring and calculating signal stream;
and calculating the delay time from the audio delay server to the mobile terminal according to the time corresponding to the playing image and the local time.
7. The hybrid network oriented audio and video collaborative presentation system according to claim 4, wherein the audio delay server generates multiple delayed audio streams of different delay times stepped by a set time.
8. The hybrid network-oriented audio and video collaborative presentation system according to claim 7, wherein the set time is 200 ms.
9. The system of claim 1, wherein the application gateway module is further configured to obtain location information of the mobile terminal, and determine a delay time of the corresponding video full link according to the location information.
10. The system for cooperative presentation of audio and video towards a hybrid network according to claim 1 or 9, wherein the determining a path of delayed audio stream and controlling the audio link module to send to the mobile terminal according to the delay time of the video full link and the delay time of the audio full link comprises:
and according to the difference between the delay time of the video full link and the delay time of the audio full link, determining a path of delayed audio stream corresponding to the delay time according to the difference, and controlling the audio link module to send the delayed audio stream to the mobile terminal.
CN202210791260.XA 2022-07-07 2022-07-07 Mixed network-oriented accompanying sound and video collaborative presentation system Active CN114885198B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210791260.XA CN114885198B (en) 2022-07-07 2022-07-07 Mixed network-oriented accompanying sound and video collaborative presentation system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210791260.XA CN114885198B (en) 2022-07-07 2022-07-07 Mixed network-oriented accompanying sound and video collaborative presentation system

Publications (2)

Publication Number Publication Date
CN114885198A true CN114885198A (en) 2022-08-09
CN114885198B CN114885198B (en) 2022-10-21

Family

ID=82682715

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210791260.XA Active CN114885198B (en) 2022-07-07 2022-07-07 Mixed network-oriented accompanying sound and video collaborative presentation system

Country Status (1)

Country Link
CN (1) CN114885198B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116112722A (en) * 2023-02-17 2023-05-12 央广新媒体文化传媒(北京)有限公司 Audio playing method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103905879A (en) * 2014-03-13 2014-07-02 北京奇艺世纪科技有限公司 Video data and audio data synchronized playing method and device and equipment
CN104618798A (en) * 2015-02-12 2015-05-13 北京清源新创科技有限公司 Playing time control method and device for Internet live video
CN108965971A (en) * 2018-07-27 2018-12-07 北京数码视讯科技股份有限公司 MCVF multichannel voice frequency synchronisation control means, control device and electronic equipment
US20210044867A1 (en) * 2019-08-05 2021-02-11 Grass Valley Limited System and method of measuring delay between transmitted audio and video signals
CN114598825A (en) * 2022-02-22 2022-06-07 中央广播电视总台 Video and audio signal scheduling method and device, computer equipment and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103905879A (en) * 2014-03-13 2014-07-02 北京奇艺世纪科技有限公司 Video data and audio data synchronized playing method and device and equipment
CN104618798A (en) * 2015-02-12 2015-05-13 北京清源新创科技有限公司 Playing time control method and device for Internet live video
CN108965971A (en) * 2018-07-27 2018-12-07 北京数码视讯科技股份有限公司 MCVF multichannel voice frequency synchronisation control means, control device and electronic equipment
US20210044867A1 (en) * 2019-08-05 2021-02-11 Grass Valley Limited System and method of measuring delay between transmitted audio and video signals
CN114598825A (en) * 2022-02-22 2022-06-07 中央广播电视总台 Video and audio signal scheduling method and device, computer equipment and readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张娟等: "基于三维模型的播出系统网络拓扑构建方法研究", 《现代电视技术》 *
黄若宏;刘怀兰;陈永强: "音视频流和屏幕流的同步传输方法研究", 《计算机工程与设计》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116112722A (en) * 2023-02-17 2023-05-12 央广新媒体文化传媒(北京)有限公司 Audio playing method and device, electronic equipment and storage medium
CN116112722B (en) * 2023-02-17 2023-06-27 央广新媒体文化传媒(北京)有限公司 Audio playing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114885198B (en) 2022-10-21

Similar Documents

Publication Publication Date Title
CN108566558B (en) Video stream processing method and device, computer equipment and storage medium
WO2019205872A1 (en) Video stream processing method and apparatus, computer device and storage medium
US10693936B2 (en) Transporting coded audio data
US10694264B2 (en) Correlating timeline information between media streams
CN104885473B (en) Live timing method for the dynamic self-adapting stream transmission (DASH) via HTTP
CN101917613B (en) Acquiring and coding service system of streaming media
US11622163B2 (en) System and method for synchronizing metadata with audiovisual content
KR100837720B1 (en) Method and Apparatus for synchronizing data service with video service in Digital Multimedia Broadcasting and Executing Method of Data Service
CN106134146A (en) Process continuous print multicycle content
CN104081785A (en) Streaming of multimedia data from multiple sources
US10015530B2 (en) Extracting data from advertisement files for ad insertion
US20230319371A1 (en) Distribution of Multiple Signals of Video Content Independently over a Network
CA3000847C (en) Gateway multi-view video stream processing for second-screen content overlay
CN108494792A (en) A kind of flash player plays the converting system and its working method of hls video flowings
CN115623264A (en) Live stream subtitle processing method and device and live stream playing method and device
CN114885198B (en) Mixed network-oriented accompanying sound and video collaborative presentation system
Tang et al. Audio and video mixing method to enhance WebRTC
KR101538114B1 (en) Video processing apparatus and method for seamless video playing in a mobile smart device based on multi-codec
KR20130056829A (en) Transmitter/receiver for 3dtv broadcasting, and method for controlling the same
CA2824708C (en) Video content generation
KR101999235B1 (en) Method and system for providing hybrid broadcast broadband service based on mmtp
KR101810883B1 (en) Live streaming system and streaming client thereof
KR101403969B1 (en) How to recognize the point of the subtitles of the video playback time code is lost
CN115665117A (en) Webpage-side video stream playing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant