CN110602542B - Audio and video synchronization method, audio and video synchronization system, equipment and storage medium - Google Patents

Audio and video synchronization method, audio and video synchronization system, equipment and storage medium Download PDF

Info

Publication number
CN110602542B
CN110602542B CN201910745460.XA CN201910745460A CN110602542B CN 110602542 B CN110602542 B CN 110602542B CN 201910745460 A CN201910745460 A CN 201910745460A CN 110602542 B CN110602542 B CN 110602542B
Authority
CN
China
Prior art keywords
video
audio
stream data
data packet
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910745460.XA
Other languages
Chinese (zh)
Other versions
CN110602542A (en
Inventor
乔金龙
杨春晖
王艳辉
沈军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Visionvera Information Technology Co Ltd
Original Assignee
Visionvera Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Visionvera Information Technology Co Ltd filed Critical Visionvera Information Technology Co Ltd
Priority to CN201910745460.XA priority Critical patent/CN110602542B/en
Publication of CN110602542A publication Critical patent/CN110602542A/en
Application granted granted Critical
Publication of CN110602542B publication Critical patent/CN110602542B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4305Synchronising client clock from received content stream, e.g. locking decoder clock with encoder clock, extraction of the PCR packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/643Communication protocols
    • H04N21/6437Real-time Transport Protocol [RTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application provides an audio and video synchronization method, an audio and video synchronization system, an audio and video synchronization device and a storage medium. The method is applied to an audio and video synchronization system; the audio and video synchronization system comprises a protocol conversion server; the protocol conversion server virtualizes a video networking virtual terminal comprising a video networking module, an audio and video synchronization module and a real-time transmission module; through at audio and video synchronization module, audio coding time stamp and system time stamp are compared to and video coding time stamp and system time stamp, the video networking module is according to the comparison result again, control the opportunity of sending audio bare stream data package and video bare stream data package to real-time transmission module, realizes audio and video in the synchronization of video networking transmission in-process to make the internet terminal no longer appear the picture of watching and be slower than the delay phenomenon of hearing the sound.

Description

Audio and video synchronization method, audio and video synchronization system, equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to an audio and video synchronization method, an audio and video synchronization system, an audio and video synchronization device, and a storage medium.
Background
Because the audio data packet is smaller than the video data packet, in the process of transmitting audio and video from the video network terminal to the internet terminal, the transmission speed of the audio data packet is greater than that of the video data packet, so that point-to-point video telephone calling is carried out at the video network terminal and the internet terminal, or when a video conference is carried out, the audio data collected by the video network terminal is transmitted to the internet terminal earlier than the video data collected at the same time, and further, the phenomenon that a picture watched by the internet terminal is slower than the delay phenomenon of hearing sound is caused.
Disclosure of Invention
In view of the above, embodiments of the present application provide a method, an audio and video synchronization system, an apparatus, and a storage medium for audio and video synchronization, which overcome the above problems or at least partially solve the above problems.
In order to solve the above problem, an embodiment of the present application discloses an audio and video synchronization method, which is applied to an audio and video synchronization system; the audio and video synchronization system comprises a protocol conversion server; the protocol conversion server virtualizes a video networking virtual terminal comprising a video networking module, an audio and video synchronization module and a real-time transmission module; the method comprises the following steps:
the video networking module obtains an audio encoding time stamp corresponding to an audio bare stream data packet and the audio bare stream data packet, and a video encoding time stamp corresponding to a video bare stream data packet and the video bare stream data packet;
the video networking module transmits the audio coding time stamp and the video coding time stamp to the audio and video synchronization module;
the audio and video synchronization module returns the audio coding time stamp to the video networking module when the system time stamp is equal to the audio coding time stamp;
the video networking module transmits the returned audio bare stream data packet corresponding to the audio coding time stamp to the real-time transmission module;
the audio and video synchronization module returns the video coding time stamp closest to the system time stamp to the video networking module;
and the video networking module transmits the returned video bare stream data packet corresponding to the video coding timestamp to the real-time transmission module.
Optionally, the audio and video synchronization system further comprises a video networking terminal and a video networking server; the method further comprises the following steps:
the video network terminal collects the audio bare stream data packet and generates a corresponding audio coding time stamp for the audio bare stream data packet;
the video networking terminal encodes the audio bare stream data packet according to the audio encoding timestamp to obtain an audio encoding;
the video networking terminal collects the video bare stream data packet and generates a corresponding video coding timestamp for the video bare stream data packet;
the video networking terminal encodes the video bare stream data packet according to the video encoding timestamp to obtain a video code;
the video networking terminal respectively encapsulates the audio codes and the video codes to obtain video networking audio protocol packets and video networking video protocol packets, and sends the video networking audio protocol packets and the video networking video protocol packets to the video networking server;
and the video networking server sends the received video networking audio protocol packet and the received video networking video protocol packet to the protocol conversion server.
Optionally, the video networking module obtains an audio encoding timestamp corresponding to an audio bare stream data packet and the audio bare stream data packet, and a video encoding timestamp corresponding to a video bare stream data packet and the video bare stream data packet; the method comprises the following steps:
the video networking module receives the video networking audio protocol packet and the video networking video protocol packet;
the video networking module extracts the video networking audio protocol packet to obtain the audio bare stream data packet and the audio coding time stamp corresponding to the audio bare stream data packet;
and the video networking module extracts the video networking video protocol packet to obtain the video bare stream data packet and the video coding timestamp corresponding to the video bare stream data packet.
Optionally, the audio and video synchronization system further includes an internet terminal, and the method further includes:
the real-time transmission module receives the audio bare stream data packet and resets the audio coding time stamp according to the current value of the system time stamp;
the real-time transmission module receives the video bare stream data packet and resets the video coding timestamp according to the current value of the system timestamp;
the real-time transmission module respectively encodes the audio bare stream data packet and the video bare stream data packet according to the reset audio encoding time stamp and the reset video time stamp to obtain an audio transmitting code and a video transmitting code;
and the real-time transmission module respectively encapsulates the audio sending code and the video sending code to obtain the internet audio protocol packet and the internet video protocol packet, and sends the internet audio protocol packet and the internet video protocol packet to the internet terminal.
Optionally, the method further comprises:
setting a time difference threshold;
when the system time stamp is equal to the audio coding time stamp, the audio and video synchronization module returns the audio coding time stamp to the video networking module, and the method comprises the following steps:
if the difference value between the audio coding time stamp and the system time stamp is larger than the time difference threshold value, the audio and video synchronization module feeds back a slowing signal to the video networking module so that the video networking module slows down the transmission speed of the audio bare stream data packet;
the audio and video synchronization module returns the video coding time stamp closest to the system time stamp to the video networking module, and the audio and video synchronization module comprises:
if the difference value between the video coding time stamp and the system time stamp is larger than the time difference threshold value, the audio and video synchronization module feeds back an acceleration signal to the video networking module so that the video networking module accelerates the transmission speed of the video naked stream data packet.
A second aspect of the embodiments of the present application provides an audio and video synchronization system, where the audio and video synchronization system includes a collaboration server; the protocol conversion server virtualizes a video networking virtual terminal comprising a video networking module, an audio and video synchronization module and a real-time transmission module; wherein the content of the first and second substances,
the video networking module is used for obtaining an audio bare stream data packet and an audio coding time stamp corresponding to the audio bare stream data packet, and a video coding time stamp corresponding to a video bare stream data packet and the video bare stream data packet;
the video networking module is also used for transmitting the audio coding time stamp and the video coding time stamp to the audio and video synchronization module;
the audio and video synchronization module is used for returning the audio coding time stamp to the video networking module when the system time stamp is equal to the audio coding time stamp;
the video networking module is also used for transmitting the returned audio bare stream data packet corresponding to the audio coding time stamp to the real-time transmission module;
the audio and video synchronization module is also used for returning the video coding time stamp closest to the system time stamp to the video networking module;
and the video networking module is also used for transmitting the returned video bare stream data packet corresponding to the video coding timestamp to the real-time transmission module.
Optionally, the system further comprises an internet of view terminal and an internet of view server; wherein the content of the first and second substances,
the video network terminal is used for collecting the audio bare stream data packet and generating a corresponding audio coding time stamp for the audio bare stream data packet;
the video networking terminal is also used for coding the audio bare stream data packet according to the audio coding time stamp to obtain an audio code;
the video networking terminal is further used for collecting the video bare stream data packet and generating a corresponding video coding timestamp for the video bare stream data packet;
the video networking terminal is further used for coding the video bare stream data packet according to the video coding timestamp to obtain a video code;
the video networking terminal is also used for respectively packaging the audio codes and the video codes to obtain video networking audio protocol packets and video networking video protocol packets, and sending the video networking audio protocol packets and the video networking video protocol packets to the video networking server;
the video networking server is used for sending the received video networking audio protocol packet and the received video networking video protocol packet to the protocol conversion server.
Optionally, the video networking module is to:
receiving the video networking audio protocol packet and the video networking video protocol packet;
extracting the video networking audio protocol packet to obtain the audio bare stream data packet and the audio coding time stamp corresponding to the audio bare stream data packet;
and extracting the video protocol packet of the video network to obtain the video bare stream data packet and the video coding time stamp corresponding to the video bare stream data packet.
Optionally, the system further comprises an internet terminal; wherein the content of the first and second substances,
the real-time transmission module is used for receiving the audio bare stream data packet and resetting the audio coding time stamp according to the current value of the system time stamp;
the real-time transmission module is also used for receiving the video bare stream data packet and resetting the video coding timestamp according to the current value of the system timestamp;
the real-time transmission module is further used for respectively coding the audio bare stream data packet and the video bare stream data packet according to the reset audio coding time stamp and the reset video time stamp to obtain an audio sending code and a video sending code;
the real-time transmission module is further configured to encapsulate the audio transmission code and the video transmission code, respectively, obtain the internet audio protocol packet and the internet video protocol packet, and transmit the internet audio protocol packet and the internet video protocol packet to an internet terminal.
Optionally, the audio and video synchronization system is provided with a time difference threshold;
the audio and video synchronization module is used for feeding back a slowing signal to the video networking module when the difference value between the audio coding time stamp and the system time stamp is larger than the time difference threshold value so as to slow down the transmission speed of the audio bare stream data packet by the video networking module;
the audio and video synchronization module is used for feeding back an acceleration signal to the video networking module when the difference value between the video coding time stamp and the system time stamp is larger than the time difference threshold value, so that the video networking module accelerates the transmission speed of the video naked stream data packet.
A third aspect of embodiments of the present application provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps in the method according to the first aspect of the present application.
A fourth aspect of the embodiments of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method according to the first aspect of the present application.
In the embodiment of the application, an audio coding time stamp and a video coding time stamp are respectively generated correspondingly to the collected audio and video, an audio and video synchronization module added to a virtual terminal of the video network virtualized by a protocol conversion server compares the audio coding time stamp with a system time stamp, and when the system time stamp is equal to the audio coding time stamp, an audio bare stream data packet corresponding to the audio coding time stamp is arranged to be sent to a real-time transmission module so as to package an internet protocol packet; and comparing the video coding time stamp with the system time stamp in the video networking module, arranging a video bare stream data packet corresponding to the video coding time stamp closest to the system time stamp, and sending the video bare stream data packet to the real-time transmission module for packaging the Internet protocol packet. The whole synchronization process takes system time as a standard, the transmission of audio bare stream data packets is delayed at the transmission tail end of a video networking end of an audio and video synchronization system, the transmission of the video bare stream data packets is accelerated, so that the audio bare stream data packets and the video bare stream data packets are synchronously transmitted to a real-time transmission module for internet packaging, the synchronous processing of audio and video in the process of transmitting from a video networking terminal to an internet terminal is further realized, and the delay feeling of firstly hearing sound and then seeing a picture is avoided when the internet terminal listens to and watches the other party to speak.
Drawings
FIG. 1 is a networking schematic of a video network of the present application;
FIG. 2 is a schematic diagram of a hardware architecture of a node server according to the present application;
fig. 3 is a schematic diagram of a hardware architecture of an access switch of the present application;
fig. 4 is a schematic diagram of a hardware structure of an ethernet protocol conversion gateway according to the present application;
fig. 5 is a network architecture diagram of communication between a video networking end and an internet end of an audio and video synchronization system according to an embodiment of the present application;
fig. 6 is a flowchart of the video networking terminal obtaining a video networking audio protocol packet and a video networking video protocol packet and sending the video networking audio protocol packet and the video networking video protocol packet to the protocol conversion server according to the embodiment of the present application;
fig. 7 is a flowchart of an audio and video synchronization method according to an embodiment of the present application;
fig. 8 is a flowchart illustrating a real-time transport module obtaining an internet audio protocol packet and an internet video protocol packet according to an embodiment of the present application;
fig. 9 is a schematic diagram of an audio and video synchronization system according to an embodiment of the present application.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.
The video networking is an important milestone for network development, is a real-time network, can realize high-definition video real-time transmission, and pushes a plurality of internet applications to high-definition video, and high-definition faces each other.
The video networking adopts a real-time high-definition video exchange technology, can integrate required services such as dozens of services of video, voice, pictures, characters, communication, data and the like on a system platform on a network platform, such as high-definition video conference, video monitoring, intelligent monitoring analysis, emergency command, digital broadcast television, delayed television, network teaching, live broadcast, VOD on demand, television mail, Personal Video Recorder (PVR), intranet (self-office) channels, intelligent video broadcast control, information distribution and the like, and realizes high-definition quality video broadcast through a television or a computer.
To better understand the embodiments of the present application, the following description refers to the internet of view:
some of the technologies applied in the video networking are as follows:
network Technology (Network Technology)
Network technology innovation in video networking has improved the traditional Ethernet (Ethernet) to face the potentially huge first video traffic on the network. Unlike pure network Packet Switching (Packet Switching) or network Circuit Switching (Circuit Switching), the Packet Switching is adopted by the technology of the video networking to meet the Streaming requirement. The video networking technology has the advantages of flexibility, simplicity and low price of packet switching, and simultaneously has the quality and safety guarantee of circuit switching, thereby realizing the seamless connection of the whole network switching type virtual circuit and the data format.
Switching Technology (Switching Technology)
The video network adopts two advantages of asynchronism and packet switching of the Ethernet, eliminates the defects of the Ethernet on the premise of full compatibility, has end-to-end seamless connection of the whole network, is directly communicated with a user terminal, and directly bears an IP data packet. The user data does not require any format conversion across the entire network. The video networking is a higher-level form of the Ethernet, is a real-time exchange platform, can realize the real-time transmission of the whole-network large-scale high-definition video which cannot be realized by the existing Internet, and pushes a plurality of network video applications to high-definition and unification.
Server Technology (Server Technology)
The server technology on the video networking and unified video platform is different from the traditional server, the streaming media transmission of the video networking and unified video platform is established on the basis of connection orientation, the data processing capacity of the video networking and unified video platform is independent of flow and communication time, and a single network layer can contain signaling and data transmission. For voice and video services, the complexity of video networking and unified video platform streaming media processing is much simpler than that of data processing, and the efficiency is greatly improved by more than one hundred times compared with that of a traditional server.
Storage Technology (Storage Technology)
The super-high speed storage technology of the unified video platform adopts the most advanced real-time operating system in order to adapt to the media content with super-large capacity and super-large flow, the program information in the server instruction is mapped to the specific hard disk space, the media content is not passed through the server any more, and is directly sent to the user terminal instantly, and the general waiting time of the user is less than 0.2 second. The optimized sector distribution greatly reduces the mechanical motion of the magnetic head track seeking of the hard disk, the resource consumption only accounts for 20% of that of the IP internet of the same grade, but concurrent flow which is 3 times larger than that of the traditional hard disk array is generated, and the comprehensive efficiency is improved by more than 10 times.
Network Security Technology (Network Security Technology)
The structural design of the video network completely eliminates the network security problem troubling the internet structurally by the modes of independent service permission control each time, complete isolation of equipment and user data and the like, generally does not need antivirus programs and firewalls, avoids the attack of hackers and viruses, and provides a structural carefree security network for users.
Service Innovation Technology (Service Innovation Technology)
The unified video platform integrates services and transmission, and is not only automatically connected once whether a single user, a private network user or a network aggregate. The user terminal, the set-top box or the PC are directly connected to the unified video platform to obtain various multimedia video services in various forms. The unified video platform adopts a menu type configuration table mode to replace the traditional complex application programming, can realize complex application by using very few codes, and realizes infinite new service innovation.
Networking of the video network is as follows:
the video network is a centralized control network structure, and the network can be a tree network, a star network, a ring network and the like, but on the basis of the centralized control node, the whole network is controlled by the centralized control node in the network.
As shown in fig. 1, the video network is divided into an access network and a metropolitan network.
The devices of the access network part can be mainly classified into 3 types: node server, access switch, terminal (including various set-top boxes, coding boards, memories, etc.). The node server is connected to an access switch, which may be connected to a plurality of terminals and may be connected to an ethernet network.
The node server is a node which plays a centralized control function in the access network and can control the access switch and the terminal. The node server can be directly connected with the access switch or directly connected with the terminal.
Similarly, devices of the metropolitan network portion may also be classified into 3 types: a metropolitan area server, a node switch and a node server. The metro server is connected to a node switch, which may be connected to a plurality of node servers.
The node server is a node server of the access network part, namely the node server belongs to both the access network part and the metropolitan area network part.
The metropolitan area server is a node which plays a centralized control function in the metropolitan area network and can control a node switch and a node server. The metropolitan area server can be directly connected with the node switch or directly connected with the node server.
Therefore, the whole video network is a network structure with layered centralized control, and the network controlled by the node server and the metropolitan area server can be in various structures such as tree, star and ring.
The access network part can form a unified video platform (the part in the dotted circle), and a plurality of unified video platforms can form a video network; each unified video platform may be interconnected via metropolitan area and wide area video networking.
Video networking device classification
1.1 devices in the video network of the embodiment of the present application can be mainly classified into 3 types: server, exchanger (including Ethernet protocol conversion gateway), terminal (including various set-top boxes, code board, memory, etc.). The video network as a whole can be divided into a metropolitan area network (or national network, global network, etc.) and an access network.
1.2 wherein the devices of the access network part can be mainly classified into 3 types: node server, access exchanger (including Ethernet protocol conversion gateway), terminal (including various set-top boxes, coding board, memory, etc.).
The specific hardware structure of each access network device is as follows:
a node server:
as shown in fig. 2, the system mainly includes a network interface module 201, a switching engine module 202, a CPU module 203, and a disk array module 204;
the network interface module 201, the CPU module 203, and the disk array module 204 all enter the switching engine module 202; the switching engine module 202 performs an operation of looking up the address table 205 on the incoming packet, thereby obtaining the direction information of the packet; and stores the packet in a queue of the corresponding packet buffer 206 based on the packet's steering information; if the queue of the packet buffer 206 is nearly full, it is discarded; the switching engine module 202 polls all packet buffer queues for forwarding if the following conditions are met: 1) the port send buffer is not full; 2) the queue packet counter is greater than zero. The disk array module 204 mainly implements control over the hard disk, including initialization, read-write, and other operations on the hard disk; the CPU module 203 is mainly responsible for protocol processing with an access switch and a terminal (not shown in the figure), configuring an address table 205 (including a downlink protocol packet address table, an uplink protocol packet address table, and a data packet address table), and configuring the disk array module 204.
The access switch:
as shown in fig. 3, the network interface module mainly includes a network interface module (a downlink network interface module 301 and an uplink network interface module 302), a switching engine module 303 and a CPU module 304;
wherein, the packet (uplink data) coming from the downlink network interface module 301 enters the packet detection module 305; the packet detection module 305 detects whether the Destination Address (DA), the Source Address (SA), the packet type, and the packet length of the packet meet the requirements, and if so, allocates a corresponding stream identifier (stream-id) and enters the switching engine module 303, otherwise, discards the stream identifier; the packet (downstream data) coming from the upstream network interface module 302 enters the switching engine module 303; the incoming data packet of the CPU module 304 enters the switching engine module 303; the switching engine module 303 performs an operation of looking up the address table 306 on the incoming packet, thereby obtaining the direction information of the packet; if the packet entering the switching engine module 303 is from the downstream network interface to the upstream network interface, the packet is stored in the queue of the corresponding packet buffer 307 in association with the stream-id; if the queue of the packet buffer 307 is nearly full, it is discarded; if the packet entering the switching engine module 303 is not from the downlink network interface to the uplink network interface, the data packet is stored in the queue of the corresponding packet buffer 307 according to the guiding information of the packet; if the queue of the packet buffer 307 is nearly full, it is discarded.
The switching engine module 303 polls all packet buffer queues and may include two cases:
if the queue is from the downlink network interface to the uplink network interface, the following conditions are met for forwarding: 1) the port send buffer is not full; 2) the queued packet counter is greater than zero; 3) obtaining a token generated by a code rate control module;
if the queue is not from the downlink network interface to the uplink network interface, the following conditions are met for forwarding: 1) the port send buffer is not full; 2) the queue packet counter is greater than zero.
The rate control module 308 is configured by the CPU module 304, and generates tokens for packet buffer queues from all downstream network interfaces to upstream network interfaces at programmable intervals to control the rate of upstream forwarding.
The CPU module 304 is mainly responsible for protocol processing with the node server, configuration of the address table 306, and configuration of the code rate control module 308.
Ethernet protocol conversion gateway
As shown in fig. 4, the apparatus mainly includes a network interface module (a downlink network interface module 401 and an uplink network interface module 402), a switching engine module 403, a CPU module 404, a packet detection module 405, a rate control module 408, an address table 406, a packet buffer 407, a MAC adding module 409, and a MAC deleting module 410.
Wherein, the data packet coming from the downlink network interface module 401 enters the packet detection module 405; the packet detection module 405 detects whether the ethernet MAC DA, the ethernet MAC SA, the ethernet length or frame type, the video network destination address DA, the video network source address SA, the video network packet type, and the packet length of the packet meet the requirements, and if so, allocates a corresponding stream identifier (stream-id); then, the MAC deletion module 410 subtracts MAC DA, MAC SA, length or frame type (2byte) and enters the corresponding receiving buffer, otherwise, discards it;
the downlink network interface module 401 detects the sending buffer of the port, and if there is a packet, obtains the ethernet MAC DA of the corresponding terminal according to the destination address DA of the packet, adds the ethernet MAC DA of the terminal, the MAC SA of the ethernet protocol gateway, and the ethernet length or frame type, and sends the packet.
The other modules in the ethernet protocol gateway function similarly to the access switch.
A terminal:
the system mainly comprises a network interface module, a service processing module and a CPU module; for example, the set-top box mainly comprises a network interface module, a video and audio coding and decoding engine module and a CPU module; the coding board mainly comprises a network interface module, a video and audio coding engine module and a CPU module; the memory mainly comprises a network interface module, a CPU module and a disk array module.
1.3 devices of the metropolitan area network part can be mainly classified into 2 types: node server, node exchanger, metropolitan area server. The node switch mainly comprises a network interface module, a switching engine module and a CPU module; the metropolitan area server mainly comprises a network interface module, a switching engine module and a CPU module.
2. Video networking packet definition
2.1 Access network packet definition
The data packet of the access network mainly comprises the following parts: destination Address (DA), Source Address (SA), reserved bytes, payload (pdu), CRC.
As shown in the following table, the data packet of the access network mainly includes the following parts:
DA SA Reserved Payload CRC
wherein:
the Destination Address (DA) is composed of 8 bytes (byte), the first byte represents the type of the data packet (such as various protocol packets, multicast data packets, unicast data packets, etc.), there are 256 possibilities at most, the second byte to the sixth byte are metropolitan area network addresses, and the seventh byte and the eighth byte are access network addresses;
the Source Address (SA) is also composed of 8 bytes (byte), defined as the same as the Destination Address (DA);
the reserved byte consists of 2 bytes;
the payload part has different lengths according to different types of datagrams, and is 64 bytes if the datagram is various types of protocol packets, and is 32+1024 or 1056 bytes if the datagram is a unicast packet, of course, the length is not limited to the above 2 types;
the CRC consists of 4 bytes and is calculated in accordance with the standard ethernet CRC algorithm.
2.2 metropolitan area network packet definition
The topology of a metropolitan area network is a graph and there may be 2, or even more than 2, connections between two devices, i.e., there may be more than 2 connections between a node switch and a node server, a node switch and a node switch, and a node switch and a node server. However, the metro network address of the metro network device is unique, and in order to accurately describe the connection relationship between the metro network devices, parameters are introduced in the embodiment of the present application: a label to uniquely describe a metropolitan area network device.
In this specification, the definition of the Label is similar to that of the Label of MPLS (Multi-Protocol Label Switch), and assuming that there are two connections between the device a and the device B, there are 2 labels for the packet from the device a to the device B, and 2 labels for the packet from the device B to the device a. The label is classified into an incoming label and an outgoing label, and assuming that the label (incoming label) of the packet entering the device a is 0x0000, the label (outgoing label) of the packet leaving the device a may become 0x 0001. The network access process of the metro network is a network access process under centralized control, that is, address allocation and label allocation of the metro network are both dominated by the metro server, and the node switch and the node server are both passively executed, which is different from label allocation of MPLS, and label allocation of MPLS is a result of mutual negotiation between the switch and the server.
As shown in the following table, the data packet of the metro network mainly includes the following parts:
DA SA Reserved label (R) Payload CRC
Namely Destination Address (DA), Source Address (SA), Reserved byte (Reserved), tag, payload (pdu), CRC. The format of the tag may be defined by reference to the following: the tag is 32 bits with the upper 16 bits reserved and only the lower 16 bits used, and its position is between the reserved bytes and payload of the packet.
Specifically, in a video conference, for example, an audio and a video are collected by an aurora terminal and transmitted to TE30 for playing through a video network terminal, the audio and video synchronization method in the embodiment of the present application is described in detail.
The aurora terminal has strong audio and video processing capacity, can be connected with a camera and a microphone, can process collected audio and video, converts the collected audio and video into video networking data, and ensures the security of transmission contents through video networking transmission.
Referring to fig. 5, fig. 5 is a network architecture diagram of communication between a video networking end and an internet end of an audio and video synchronization system according to an embodiment of the present application. First, connect the polar light terminal to the video network server, the video network server connects XMCU coordination server through the switch, the XMCU coordination server connects Huawei TE30 through the switch.
After the connection between the video networking system and Huawei TE30 is established, an XMCU coordination server is started, the XMCU coordination server runs a program, a video networking virtual terminal is virtualized, and after the video networking virtual terminal accesses the network, an aurora terminal is used for calling Huawei TE30 so as to realize the communication between the video networking system and Huawei TE 30.
The video networking virtual terminal virtualized by the XMCU coordination server operation program calls a dynamic library capable of completing audio and video synchronization, so that the video networking virtual terminal virtualized by the XMCU coordination server has functional modules such as a video networking module, an audio and video synchronization module, a real-time transmission module and the like, and a foundation is provided for realizing synchronization of audio and video data transmitted from the video networking to the Internet.
The aurora terminal begins operation after establishing communication with huaye TE 30. Referring to fig. 6, fig. 6 is a flowchart of a video networking terminal obtaining a video networking audio protocol packet and a video networking video protocol packet and sending the video networking audio protocol packet and the video networking video protocol packet to a coordination server according to the embodiment of the present application. The audio and video synchronization system also comprises a video networking terminal and a video networking server; the method further comprises the following steps:
s601: the video network terminal collects the audio bare stream data packet and generates a corresponding audio coding time stamp for the audio bare stream data packet;
s602: the video networking terminal encodes the audio bare stream data packet according to the audio encoding timestamp to obtain an audio encoding;
s603: the video networking terminal collects the video bare stream data packet and generates a corresponding video coding timestamp for the video bare stream data packet;
s604: the video networking terminal encodes the video bare stream data packet according to the video encoding timestamp to obtain a video code;
when the aurora terminal receives a video bare stream data packet acquired by a camera at the current moment and receives an audio bare stream data packet acquired by a microphone at the current moment, video timestamps and audio timestamps are respectively stamped on the video bare stream data packet and the audio bare stream data packet according to system time aiming at the audio bare stream data packet and the video bare stream data packet at the current moment.
And then, compared with the number of data packets with standard sizes which can be transmitted in the system time, namely, according to the transmission speed of the data packets with standard sizes in the system time and the difference condition of the size and the transmission speed of the data packets in the current audio frame, based on the audio time stamp, generating the audio coding time stamp of the audio bare stream data packet at the current moment, wherein the audio coding time stamp is increased relatively fast to the system time. Based on the same principle, the video coding time stamp of the video bare stream data packet at the current moment is generated, and the video coding time stamp is slower relative to the system time.
S605: the video networking terminal respectively encapsulates the audio codes and the video codes to obtain video networking audio protocol packets and video networking video protocol packets, and sends the video networking audio protocol packets and the video networking video protocol packets to the video networking server;
s606: and the video networking server sends the received video networking audio protocol packet and the received video networking video protocol packet to the protocol conversion server.
The polar light terminal encodes the audio bare stream data packet and the audio encoding timestamp collected by the microphone to form a G711 audio encoding; simultaneously coding the video bare stream data packet acquired by the camera and the video coding timestamp to form H264 video coding; g711 is then encapsulated into a 2002 video networking protocol package, and H264 is encapsulated into a 2001 video networking protocol package. The collected audio and video can be forwarded to the XMCU co-transfer server at the other end of the video conference through the video networking server based on the video networking V2V protocol, and the whole process of transmitting the audio and the video to the XMCU co-transfer server does not pass through an external network and the Internet, so that the transmission content is prevented from being illegally attacked by the external network.
After the XMCU coordination server receives the 2001 video networking protocol packet and the 2002 video networking protocol packet, a video networking module of a virtual video networking virtual terminal receives the 2001 video networking protocol packet and the 2002 video networking protocol packet, performs audio separation operation and video separation operation, extracts a video bare data packet and a video coding time stamp from the 2001 video networking protocol packet respectively, and extracts an audio bare data packet and a video coding time stamp from the 2002 video networking protocol packet.
The video networking module obtains an audio encoding time stamp corresponding to an audio bare stream data packet and the audio bare stream data packet, and a video encoding time stamp corresponding to a video bare stream data packet and the video bare stream data packet; the method comprises the following steps:
the video networking module receives the video networking audio protocol packet and the video networking video protocol packet;
the video networking module extracts the video networking audio protocol packet to obtain the audio bare stream data packet and the audio coding time stamp corresponding to the audio bare stream data packet;
and the video networking module extracts the video networking video protocol packet to obtain the video bare stream data packet and the video coding timestamp corresponding to the video bare stream data packet.
Referring to fig. 7, fig. 7 is a flowchart of an audio and video synchronization method according to an embodiment of the present application. The method is applied to an audio and video synchronization system; the audio and video synchronization system comprises a protocol conversion server; the protocol conversion server virtualizes a video networking virtual terminal comprising a video networking module, an audio and video synchronization module and a real-time transmission module; the method comprises the following steps:
with continuing reference to fig. 5, the video network terminal of the audio and video synchronization method according to the embodiment of the present application includes: the system comprises an aurora terminal, a video network server and an XMCU cooperative server. The polar light terminal is responsible for collecting audio and video and packaging the collected audio and video into a video networking protocol packet; the video network server is responsible for transmitting a video network protocol packet formed by packaging audio and video in the video network; the virtual video networking virtual terminal virtualized by the XMCU coordination conversion server is responsible for receiving a video networking protocol packet formed by packaging audio and video, and converting the video networking protocol packet formed by packaging the audio and the video into an Internet protocol packet; the audio data packet and the video data packet are different in size, and further, the video networking protocol packets encapsulated by the audio data packet and the video data packet are different in size, the transmission speed in the video networking is different, and the time for the audio data packet and the time for the video data packet to reach the cooperative conversion server at the other end of the video conference are also different; the XMCU cooperation server runs a program, executes a relevant dynamic library, adds an audio and video synchronization module to a virtual video network virtual terminal virtualized by the XMCU cooperation server, and synchronously adjusts inconsistency caused in the transmission process of the video network before an audio data packet and a video data packet are transmitted to the Internet terminal.
The method for executing the audio and video synchronization by the video networking virtual terminal virtualized by the XMCU coordination server through the video networking module, the audio and video synchronization module and the real-time transmission module comprises the following specific steps:
s701: the video networking module obtains an audio encoding time stamp corresponding to an audio bare stream data packet and the audio bare stream data packet, and a video encoding time stamp corresponding to a video bare stream data packet and the video bare stream data packet;
s702, the video networking module transmits the audio coding time stamp and the video coding time stamp to the audio and video synchronization module;
s703, the audio and video synchronization module returns the audio coding time stamp to the video networking module when the system time stamp is equal to the audio coding time stamp;
s704, the video networking module transmits the audio bare stream data packet corresponding to the returned audio coding time stamp to the real-time transmission module;
s705, the audio and video synchronization module returns the video coding time stamp closest to the system time stamp to the video networking module;
s706, the video networking module transmits the returned video bare stream data packet corresponding to the video coding time stamp to the real-time transmission module.
The XMCU coordination and conversion server extracts a video naked stream data packet and a video coding time stamp from a 2001 video networking protocol packet, transmits the video coding time stamp and the audio coding time stamp to an audio and video synchronization module after extracting the audio naked stream data packet and the audio coding time stamp from a 2002 video networking protocol packet, compares the audio and video synchronization module with a system time stamp, feeds back the video coding time stamp and the audio coding time stamp to the video networking module according to a comparison result, and the video networking module arranges the sending of the video naked stream data packet and the audio naked stream data packet according to the fed back video coding time stamp and the audio coding time stamp. It should be noted that the audio encoding time stamp and the video encoding time stamp extracted from the audio encoding and the video encoding in the video networking protocol package represent complete, verifiable data that existed prior to the extraction operation and no longer have encoding capabilities.
When the audio and video synchronization module performs a current action (the current action may be that the audio and video synchronization module receives an audio data packet or a video data packet, the audio and video synchronization module compares an audio coding timestamp with a system timestamp, and the audio and video synchronization module compares a video coding timestamp with the system timestamp), the audio and video synchronization module may execute a function of obtaining a current time to obtain a system timestamp of the current time when the current action is performed.
The audio coding time stamp is increased relatively fast to the system time, so that the audio coding time stamp is larger than or equal to the system time stamp, when the system time stamp is equal to the audio coding time stamp, the audio and video synchronization module returns the audio coding time stamp to the video networking module, the video networking module receives the returned audio coding time stamp, and the audio bare stream data packet corresponding to the returned audio coding time stamp is transmitted to the real-time transmission module.
The video coding time stamp is slowly increased relative to the system time stamp, so that the video coding time stamp is smaller than or equal to the system time stamp, the audio and video synchronization module needs to directly return the time stamp after receiving the video coding time stamp, or when the difference between the video coding time stamp and the system time stamp is larger, the next video coding time stamp of the video coding time stamp is smaller than the system time stamp, and when the video coding time stamp is close to the system time stamp, the audio and video synchronization module directly discards the video coding time stamp, the next video coding time stamp of the video coding time stamp is returned to the video networking module, the video networking module receives the returned video coding time stamp, and the audio streaming data naked packet corresponding to the returned video coding time stamp is transmitted to the real-time transmission module.
In the embodiment of the application, the audio coding time stamp and the video coding time stamp are compared with the system time stamp, the audio returns the audio coding time stamp equal to the system time stamp and the video coding time stamp closest to the system time stamp to the video networking module, and the video networking module schedules the sending of the audio bare stream data packet and the video bare stream data packet according to the returned audio coding time stamp and video coding time stamp, so that the audio coding time stamp and the video coding time stamp corresponding to the audio bare stream data packet and the video bare stream data packet scheduled to be sent by the video networking module are equal to or infinitely close to the time of the system.
Aiming at the phenomenon that video is delayed relative to audio in the transmission process of the video network due to the fact that the transmission speed of the audio bare stream data packet is different from that of the video bare stream data packet, the transmission end point of the video network end, namely the XMCU coordination server, is adjusted. The video networking virtual terminal virtualized by the XMCU coordination server is in an audio and video synchronization module of the video networking virtual terminal, and takes system time as reference, and sends an audio coding time stamp and a video coding time stamp which are infinitely close, and a corresponding audio bare stream data packet and a corresponding video bare stream data packet to a real-time transmission module synchronously, so that the inconsistency of the audio and the video in the video networking transmission process is adjusted.
And after the real-time transmission module receives the audio bare stream data packet and the video bare stream data packet which are synchronized, resetting the audio coding time stamp and the video coding time stamp according to the system time of receiving the audio bare stream data packet and the video bare stream data packet. Then, the synchronized audio bare stream data packet and the reset audio coding time stamp are coded to form G' 711 audio coding; coding the synchronized video bare stream data packet and the reset video coding time stamp to form H' 264 video coding; and finally, respectively packaging G '711 audio coding and H' 264 video coding to obtain 2001 Internet protocol packets and 2002 Internet protocol packets, and transmitting the 2001 Internet protocol packets and the 2002 Internet protocol packets to Huawei TE30 for playing through Internet connection.
Referring to fig. 8, fig. 8 is a flowchart of obtaining an internet audio protocol packet and an internet video protocol packet by a real-time transport module according to an embodiment of the present application. The audio and video synchronization system further comprises an internet terminal, and the method further comprises the following steps:
s801: the real-time transmission module receives the audio bare stream data packet and resets the audio coding time stamp according to the current value of the system time stamp;
s802: the real-time transmission module receives the video bare stream data packet and resets the video coding timestamp according to the current value of the system timestamp;
s803: the real-time transmission module respectively encodes the audio bare stream data packet and the video bare stream data packet according to the reset audio encoding time stamp and the reset video time stamp to obtain an audio transmitting code and a video transmitting code;
s804: and the transmission module respectively encapsulates the audio transmission code and the video transmission code to obtain the internet audio protocol packet and the internet video protocol packet, and transmits the internet audio protocol packet and the internet video protocol packet to the internet terminal.
The real-time transmission module encapsulates G '711 audio coding and H' 264 video coding, in the processing of obtaining internet audio protocol package and internet video protocol package, audio coding time stamp and video coding time stamp have been reset according to the system time, audio data and video data in internet audio protocol package and the internet video protocol package have not only been synchronized, it reads audio data and video data to make it be TE30 more can be according to audio coding time stamp and video coding time stamp after the reset, or audio coding time stamp and video coding time stamp after according to reset further synchronization audio data and video data, and then the synchronization of audio frequency and video in the video networking terminal to internet terminal transmission process has been accomplished, and the synchronization of audio frequency and video at the broadcast end.
In addition to singleness, the audio bare stream data packet and the video bare stream data packet are sent according to the comparison result of the audio coding time stamp and the system time stamp and the comparison result of the video coding time stamp and the system time stamp by the audio and video synchronization module, and then the faster audio bare stream data packet and the slower video bare stream data packet are synchronously transmitted.
The method further comprises the following steps:
setting a time difference threshold;
when the system time stamp is equal to the audio coding time stamp, the audio and video synchronization module returns the audio coding time stamp to the video networking module, and the method comprises the following steps:
if the difference value between the audio coding time stamp and the system time stamp is larger than the time difference threshold value, the audio and video synchronization module feeds back a slowing signal to the video networking module so that the video networking module slows down the transmission speed of the audio bare stream data packet;
the audio and video synchronization module returns the video coding time stamp closest to the system time stamp to the video networking module, and the audio and video synchronization module comprises:
if the difference value between the video coding time stamp and the system time stamp is larger than the time difference threshold value, the audio and video synchronization module feeds back an acceleration signal to the video networking module so that the video networking module accelerates the transmission speed of the video naked stream data packet.
Firstly, according to the maximum difference value of an audio coding time stamp and a system time stamp obtained by an audio and video synchronization module, and under the condition of the difference value, the system adjusts an audio bare stream data packet; setting a time difference threshold value according to the maximum difference value of the video coding time stamp and the system time stamp obtained by the audio and video synchronization module and the adjustment condition of the system to the video naked stream data packet under the difference value condition;
if the time difference between the audio coding timestamp and the system time is greater than the time difference threshold, it indicates that the transmission speed of the audio bare stream data packet needs to be slowed down in combination and the transmission of the audio data needs to be adjusted on the basis that the video networking module controls the sending time of the audio bare stream data packet; similarly, if the time difference between the video coding timestamp and the system time is greater than the time difference threshold, it indicates that the transmission speed of the video bare-stream data packet needs to be increased in combination with the video networking module controlling the sending time of the video bare-stream data packet, and the transmission of the audio data needs to be adjusted at the same time.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the embodiments. Further, those skilled in the art will also appreciate that the embodiments described in the specification are presently preferred and that no particular act is required of the embodiments of the application.
Referring to fig. 9, fig. 9 is a schematic diagram of an audio and video synchronization system according to an embodiment of the present application. Based on the same inventive concept, the embodiment of the application provides an audio and video synchronization system, which is characterized by comprising a coordination server; the protocol conversion server virtualizes a video networking virtual terminal comprising a video networking module, an audio and video synchronization module and a real-time transmission module; wherein the content of the first and second substances,
the video networking module 901 is configured to obtain an audio bare stream data packet and an audio coding timestamp corresponding to the audio bare stream data packet, and a video coding timestamp corresponding to a video bare stream data packet and the video bare stream data packet;
the video networking module 901 is further configured to transmit the audio coding time stamp and the video coding time stamp to the audio and video synchronization module;
the audio and video synchronization module 902 is configured to return the audio encoding timestamp to the video networking module 901 when the system timestamp is equal to the audio encoding timestamp;
the video networking module 901 is further configured to transmit the audio bare stream data packet corresponding to the returned audio encoding timestamp to the real-time transmission module 903;
the audio and video synchronization module 902 is further configured to return the video encoding timestamp closest to the system timestamp to the video networking module 901;
the video networking module 901 is further configured to transmit the returned video bare stream data packet corresponding to the video coding timestamp to the real-time transmission module 903.
With continuing reference to FIG. 9, optionally, the system further comprises a video networking terminal and a video networking server; wherein the content of the first and second substances,
the video network terminal 904 is configured to collect the audio bare stream data packet, and generate the corresponding audio encoding timestamp for the audio bare stream data packet;
the video networking terminal 904 is further configured to encode the audio bare stream data packet according to the audio encoding timestamp to obtain an audio encoding;
the video networking terminal 904 is further configured to collect the video bare stream data packet, and generate the corresponding video coding timestamp for the video bare stream data packet;
the video networking terminal 904 is further configured to encode the video bare stream data packet according to the video encoding timestamp to obtain a video code;
the video networking terminal 904 is further configured to respectively encapsulate the audio code and the video code to obtain a video networking audio protocol packet and a video networking video protocol packet, and send the video networking audio protocol packet and the video networking video protocol packet to the video networking server;
the video networking server 905 is configured to send the received video networking audio protocol packet and the received video networking video protocol packet to the protocol conversion server.
With continuing reference to fig. 9, optionally, the view networking module 901 is configured to:
receiving the video networking audio protocol packet and the video networking video protocol packet;
extracting the video networking audio protocol packet to obtain the audio bare stream data packet and the audio coding time stamp corresponding to the audio bare stream data packet;
and extracting the video protocol packet of the video network to obtain the video bare stream data packet and the video coding time stamp corresponding to the video bare stream data packet.
With continuing reference to fig. 9, optionally, the system further comprises an internet terminal; wherein the content of the first and second substances,
the real-time transmission module 903 is configured to receive the audio bare-stream data packet, and reset the audio encoding timestamp according to the current value of the system timestamp;
the real-time transmission module 903 is further configured to receive the video bare stream data packet, and reset the video encoding timestamp according to the current value of the system timestamp;
the real-time transmission module 903 is further configured to encode the audio bare data packet and the video bare data packet respectively according to the reset audio encoding timestamp and the reset video timestamp, so as to obtain an audio transmission code and a video transmission code;
the real-time transmission module 903 is further configured to encapsulate the audio transmission code and the video transmission code, respectively, obtain the internet audio protocol packet and the internet video protocol packet, and send the internet audio protocol packet and the internet video protocol packet to the internet terminal 906.
With continued reference to fig. 9, optionally, the audio video synchronization system is provided with a time difference threshold;
the audio and video synchronization module 902 is configured to feed back a slowing signal to the video networking module when a difference between the audio encoding timestamp and the system timestamp is greater than the time difference threshold, so that the video networking module slows down a transmission speed of the audio bare stream data packet;
the audio and video synchronization module 902 is configured to feed back an acceleration signal to the video networking module when a difference between the video coding timestamp and the system timestamp is greater than the time difference threshold, so that the video networking module accelerates the transmission speed of the video bare stream data packet.
For an embodiment of an audio and video synchronization system, since it is basically similar to an embodiment of an audio and video synchronization method, the description is relatively simple, and relevant points can be referred to partial description of an embodiment of an audio and video synchronization method.
Based on the same inventive concept, another embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps in the method for audio and video synchronization according to any of the above embodiments of the present application.
Based on the same inventive concept, another embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and running on the processor, and when the processor executes the computer program, the steps of the method for audio and video synchronization described in any of the above embodiments of the present application are implemented.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one of skill in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The method for audio and video synchronization, the audio and video synchronization system, the computer-readable storage medium and the electronic device provided by the present application are introduced in detail, and specific examples are applied in the present application to explain the principles and embodiments of the present application, and the descriptions of the above embodiments are only used to help understand the method and the core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (12)

1. A method for audio and video synchronization is characterized in that the method is applied to an audio and video synchronization system; the audio and video synchronization system comprises a protocol conversion server; the protocol conversion server operates a program to virtualize a video networking virtual terminal comprising a video networking module, an audio and video synchronization module and a real-time transmission module; the audio and video synchronization module is obtained by calling a dynamic library for realizing audio and video synchronization by the protocol conversion server; the virtual terminal of the video network virtualized by the protocol conversion server is used for receiving the video network protocol packet encapsulated by the audio and the video and converting the video network protocol packet encapsulated by the audio and the video into an internet protocol packet; the method comprises the following steps:
the video networking module obtains an audio encoding time stamp corresponding to an audio bare stream data packet and the audio bare stream data packet, and a video encoding time stamp corresponding to a video bare stream data packet and the video bare stream data packet; the audio bare stream data packet and the audio bare stream data packet are respectively extracted from a video networking audio protocol packet and a video networking video protocol packet, and the video networking audio protocol packet and the video networking video protocol packet are obtained by packaging collected audio and video by a video networking terminal;
the video networking module transmits the audio coding time stamp and the video coding time stamp to the audio and video synchronization module;
the audio and video synchronization module returns the audio coding time stamp to the video networking module when the system time stamp is equal to the audio coding time stamp;
the video networking module transmits the returned audio bare stream data packet corresponding to the audio coding time stamp to the real-time transmission module;
the audio and video synchronization module returns the video coding time stamp closest to the system time stamp to the video networking module;
the video networking module transmits the returned video bare stream data packet corresponding to the video coding timestamp to the real-time transmission module;
and the real-time transmission module encapsulates the acquired audio bare stream data packet and the acquired video bare stream data packet into an internet protocol packet and then transmits the internet protocol packet to an internet terminal.
2. The method of claim 1, wherein the audio video synchronization system further comprises a video networking terminal and a video networking server; the method further comprises the following steps:
the video network terminal collects the audio bare stream data packet and generates a corresponding audio coding time stamp for the audio bare stream data packet;
the video networking terminal encodes the audio bare stream data packet according to the audio encoding timestamp to obtain an audio encoding;
the video networking terminal collects the video bare stream data packet and generates a corresponding video coding timestamp for the video bare stream data packet;
the video networking terminal encodes the video bare stream data packet according to the video encoding timestamp to obtain a video code;
the video networking terminal respectively encapsulates the audio codes and the video codes to obtain video networking audio protocol packets and video networking video protocol packets, and sends the video networking audio protocol packets and the video networking video protocol packets to the video networking server;
and the video networking server sends the received video networking audio protocol packet and the received video networking video protocol packet to the protocol conversion server.
3. The method according to claim 1, wherein the video networking module obtains audio encoding timestamps corresponding to an audio bare stream packet and the audio bare stream packet, and video encoding timestamps corresponding to a video bare stream packet and the video bare stream packet; the method comprises the following steps:
the video networking module receives the video networking audio protocol packet and the video networking video protocol packet;
the video networking module extracts the video networking audio protocol packet to obtain the audio bare stream data packet and the audio coding time stamp corresponding to the audio bare stream data packet;
and the video networking module extracts the video networking video protocol packet to obtain the video bare stream data packet and the video coding timestamp corresponding to the video bare stream data packet.
4. The method of claim 1, wherein the audio video synchronization system further comprises an internet terminal, the method further comprising:
the real-time transmission module receives the audio bare stream data packet and resets the audio coding time stamp according to the current value of the system time stamp;
the real-time transmission module receives the video bare stream data packet and resets the video coding timestamp according to the current value of the system timestamp;
the real-time transmission module respectively encodes the audio bare stream data packet and the video bare stream data packet according to the reset audio encoding time stamp and the reset video time stamp to obtain an audio transmitting code and a video transmitting code;
and the real-time transmission module respectively encapsulates the audio sending code and the video sending code to obtain the internet audio protocol packet and the internet video protocol packet, and sends the internet audio protocol packet and the internet video protocol packet to the internet terminal.
5. The method of claim 1, further comprising:
setting a time difference threshold;
when the system time stamp is equal to the audio coding time stamp, the audio and video synchronization module returns the audio coding time stamp to the video networking module, and the method comprises the following steps:
if the difference value between the audio coding time stamp and the system time stamp is larger than the time difference threshold value, the audio and video synchronization module feeds back a slowing signal to the video networking module so that the video networking module slows down the transmission speed of the audio bare stream data packet;
the audio and video synchronization module returns the video coding time stamp closest to the system time stamp to the video networking module, and the audio and video synchronization module comprises:
if the difference value between the video coding time stamp and the system time stamp is larger than the time difference threshold value, the audio and video synchronization module feeds back an acceleration signal to the video networking module so that the video networking module accelerates the transmission speed of the video naked stream data packet.
6. An audio and video synchronization system is characterized by comprising a corotation server; the protocol conversion server operates a program to virtualize a video networking virtual terminal comprising a video networking module, an audio and video synchronization module and a real-time transmission module; the audio and video synchronization module is obtained by calling a dynamic library for realizing audio and video synchronization by the protocol conversion server; the virtual terminal of the video network virtualized by the protocol conversion server is used for receiving the video network protocol packet encapsulated by the audio and the video and converting the video network protocol packet encapsulated by the audio and the video into an internet protocol packet; wherein the content of the first and second substances,
the video networking module is used for obtaining an audio bare stream data packet and an audio coding time stamp corresponding to the audio bare stream data packet, and a video coding time stamp corresponding to a video bare stream data packet and the video bare stream data packet; the audio bare stream data packet and the audio bare stream data packet are respectively extracted from a video networking audio protocol packet and a video networking video protocol packet, and the video networking audio protocol packet and the video networking video protocol packet are obtained by packaging collected audio and video by a video networking terminal;
the video networking module is also used for transmitting the audio coding time stamp and the video coding time stamp to the audio and video synchronization module;
the audio and video synchronization module is used for returning the audio coding time stamp to the video networking module when the system time stamp is equal to the audio coding time stamp;
the video networking module is also used for transmitting the returned audio bare stream data packet corresponding to the audio coding time stamp to the real-time transmission module;
the audio and video synchronization module is also used for returning the video coding time stamp closest to the system time stamp to the video networking module;
the video networking module is also used for transmitting the returned video bare stream data packet corresponding to the video coding timestamp to the real-time transmission module;
and the real-time transmission module is used for encapsulating the acquired audio bare stream data packet and the acquired video bare stream data packet into an internet protocol packet and transmitting the internet protocol packet to an internet terminal.
7. The audio-video synchronization system of claim 6, wherein the system further comprises a video networking terminal and a video networking server; wherein the content of the first and second substances,
the video network terminal is used for collecting the audio bare stream data packet and generating a corresponding audio coding time stamp for the audio bare stream data packet;
the video networking terminal is also used for coding the audio bare stream data packet according to the audio coding time stamp to obtain an audio code;
the video networking terminal is further used for collecting the video bare stream data packet and generating a corresponding video coding timestamp for the video bare stream data packet;
the video networking terminal is further used for coding the video bare stream data packet according to the video coding timestamp to obtain a video code;
the video networking terminal is also used for respectively packaging the audio codes and the video codes to obtain video networking audio protocol packets and video networking video protocol packets, and sending the video networking audio protocol packets and the video networking video protocol packets to the video networking server;
the video networking server is used for sending the received video networking audio protocol packet and the received video networking video protocol packet to the protocol conversion server.
8. The audio-video synchronization system of claim 6, wherein the video networking module is configured to:
receiving the video networking audio protocol packet and the video networking video protocol packet;
extracting the video networking audio protocol packet to obtain the audio bare stream data packet and the audio coding time stamp corresponding to the audio bare stream data packet;
and extracting the video protocol packet of the video network to obtain the video bare stream data packet and the video coding time stamp corresponding to the video bare stream data packet.
9. The audio-video synchronization system of claim 6, wherein the system further comprises an internet terminal; wherein the content of the first and second substances,
the real-time transmission module is used for receiving the audio bare stream data packet and resetting the audio coding time stamp according to the current value of the system time stamp;
the real-time transmission module is also used for receiving the video bare stream data packet and resetting the video coding timestamp according to the current value of the system timestamp;
the real-time transmission module is further used for respectively coding the audio bare stream data packet and the video bare stream data packet according to the reset audio coding time stamp and the reset video time stamp to obtain an audio sending code and a video sending code;
the real-time transmission module is further configured to encapsulate the audio transmission code and the video transmission code, respectively, obtain the internet audio protocol packet and the internet video protocol packet, and transmit the internet audio protocol packet and the internet video protocol packet to an internet terminal.
10. The audio-video synchronization system of claim 6, wherein the audio-video synchronization system is provided with a time difference threshold;
the audio and video synchronization module is used for feeding back a slowing signal to the video networking module when the difference value between the audio coding time stamp and the system time stamp is larger than the time difference threshold value so as to slow down the transmission speed of the audio bare stream data packet by the video networking module;
the audio and video synchronization module is used for feeding back an acceleration signal to the video networking module when the difference value between the video coding time stamp and the system time stamp is larger than the time difference threshold value, so that the video networking module accelerates the transmission speed of the video naked stream data packet.
11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.
12. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 5 are implemented when the computer program is executed by the processor.
CN201910745460.XA 2019-08-13 2019-08-13 Audio and video synchronization method, audio and video synchronization system, equipment and storage medium Active CN110602542B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910745460.XA CN110602542B (en) 2019-08-13 2019-08-13 Audio and video synchronization method, audio and video synchronization system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910745460.XA CN110602542B (en) 2019-08-13 2019-08-13 Audio and video synchronization method, audio and video synchronization system, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110602542A CN110602542A (en) 2019-12-20
CN110602542B true CN110602542B (en) 2022-02-08

Family

ID=68854234

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910745460.XA Active CN110602542B (en) 2019-08-13 2019-08-13 Audio and video synchronization method, audio and video synchronization system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110602542B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112165452A (en) * 2020-09-01 2021-01-01 广东九联科技股份有限公司 Control management system based on advertisement machine multi-screen display playing and equipment thereof
CN113365046B (en) * 2021-04-30 2023-08-01 厦门立林科技有限公司 High-performance audio and video data test transmitting method, application and storage medium thereof
CN113596550B (en) * 2021-08-31 2024-05-24 小帧科技(深圳)有限公司 Audio and video synchronous control method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6779041B1 (en) * 1999-05-21 2004-08-17 Thin Multimedia, Inc. Apparatus and method for streaming MPEG-1 data
CN105791939A (en) * 2016-03-14 2016-07-20 北京捷思锐科技股份有限公司 Audio and video synchronization method and apparatus
CN106550282A (en) * 2015-09-17 2017-03-29 北京视联动力国际信息技术有限公司 A kind of player method and system of video data
CN107509100A (en) * 2017-09-15 2017-12-22 深圳国微技术有限公司 Audio and video synchronization method, system, computer installation and computer-readable recording medium
CN108574816A (en) * 2017-09-06 2018-09-25 北京视联动力国际信息技术有限公司 It is a kind of to regard networked terminals and based on communication means, the device regarding networked terminals

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6779041B1 (en) * 1999-05-21 2004-08-17 Thin Multimedia, Inc. Apparatus and method for streaming MPEG-1 data
CN106550282A (en) * 2015-09-17 2017-03-29 北京视联动力国际信息技术有限公司 A kind of player method and system of video data
CN105791939A (en) * 2016-03-14 2016-07-20 北京捷思锐科技股份有限公司 Audio and video synchronization method and apparatus
CN108574816A (en) * 2017-09-06 2018-09-25 北京视联动力国际信息技术有限公司 It is a kind of to regard networked terminals and based on communication means, the device regarding networked terminals
CN107509100A (en) * 2017-09-15 2017-12-22 深圳国微技术有限公司 Audio and video synchronization method, system, computer installation and computer-readable recording medium

Also Published As

Publication number Publication date
CN110602542A (en) 2019-12-20

Similar Documents

Publication Publication Date Title
CN109120946B (en) Method and device for watching live broadcast
CN108737768B (en) Monitoring method and monitoring device based on monitoring system
CN110149262B (en) Method and device for processing signaling message and storage medium
CN109194982B (en) Method and device for transmitting large file stream
CN109167960B (en) Method and system for processing video stream data
CN110022295B (en) Data transmission method and video networking system
CN109246486B (en) Method and device for framing
CN109547163B (en) Method and device for controlling data transmission rate
CN110035005B (en) Data processing method and device
CN110602542B (en) Audio and video synchronization method, audio and video synchronization system, equipment and storage medium
CN110049341B (en) Video processing method and device
CN110049273B (en) Video networking-based conference recording method and transfer server
CN109040656B (en) Video conference processing method and system
CN108574816B (en) Video networking terminal and communication method and device based on video networking terminal
CN108881958B (en) Multimedia data stream packaging method and device
CN110149305B (en) Video network-based multi-party audio and video playing method and transfer server
CN110113564B (en) Data acquisition method and video networking system
CN110138513B (en) Data transmission method and video networking system
CN109714568B (en) Video monitoring data synchronization method and device
CN109743284B (en) Video processing method and system based on video network
CN109547727B (en) Data caching method and device
CN110581846A (en) Monitoring video processing and system
CN110769297A (en) Audio and video data processing method and system
CN110086773B (en) Audio and video data processing method and system
CN109842630B (en) Video processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant