CN112995708A

CN112995708A - Multi-video synchronization method and device

Info

Publication number: CN112995708A
Application number: CN202110431407.XA
Authority: CN
Inventors: 莫阳
Original assignee: Hunan Happly Sunshine Interactive Entertainment Media Co Ltd
Current assignee: Hunan Happly Sunshine Interactive Entertainment Media Co Ltd
Priority date: 2021-04-21
Filing date: 2021-04-21
Publication date: 2021-06-18

Abstract

The application provides a multi-video synchronization method and a device, firstly, a main audio and video code stream and an auxiliary audio and video code stream are determined; the method includes the steps that audio and video code streams uploaded by a plurality of terminal devices are received, audio streams with preset lengths are intercepted from the audio and video code streams, and audio fingerprints are extracted according to preset time slices, so that corresponding main audio fingerprints and auxiliary audio fingerprints are obtained; comparing each secondary audio fingerprint with the primary audio fingerprint; if the comparison is successful, calculating and recording the time offset of the corresponding terminal equipment and the standard machine position; and according to the time offset corresponding to each terminal device, the video stream corresponding to the terminal device is played in an offset manner, so that the video stream corresponding to each terminal device is played synchronously. According to the method and the device, the problem of multi-machine-position video synchronization under the cloud broadcasting guide condition can be simplified through the audio fingerprint identification technology, meanwhile, the playing quality of the cloud broadcasting guide technology is improved, and the complexity and the cost of time synchronization under the cloud environment are reduced.

Description

Multi-video synchronization method and device

Technical Field

The present application relates to the field of video live broadcast synchronization technologies, and in particular, to a multi-video synchronization method and apparatus.

Background

With the development of the cloud technology, under the condition of cloud broadcasting guidance, a uniform client needs to be applied, network time calibration is carried out before shooting, then a timestamp is input into each video frame when each terminal shoots a code, and then video streams of the shooting terminals are directly transmitted to the cloud through a public network for synthesis and switching.

However, since time calibration and data transmission are performed on a public network with relatively low reliability, each video stream has different delay, and the deviation generally varies from tens of milliseconds to hundreds of milliseconds, which requires a certain distance for ± 15ms of audio and video, and the time cannot be strictly synchronized; in addition, because time calibration is required, only a uniform protocol can be used between the shooting terminals, that is, the recording programs of the clients need to be kept consistent, so that the threshold of the shooting terminals is further improved, and the shooting complexity is increased.

Therefore, how to implement and simplify multi-machine video synchronization under the cloud director condition is a problem to be urgently solved by technical personnel in the field.

Disclosure of Invention

The application provides a multi-video synchronization method and device, which are used for realizing and simplifying multi-machine video synchronization under the cloud director condition.

In order to achieve the above object, the present application provides the following technical solutions:

a multi-video synchronization method selects an audio and video code stream with delay time within a preset range as a main audio and video code stream, and determines that corresponding terminal equipment is a standard machine position, and the rest audio and video code streams are auxiliary audio and video code streams; the method comprises the following steps:

receiving audio and video code streams uploaded by a plurality of terminal devices, wherein the audio and video code streams are converted into video streams shot by the terminal devices through an encoder;

intercepting audio streams with preset lengths from the audio and video code streams, and extracting audio fingerprints according to preset time slices to obtain corresponding main audio fingerprints and each auxiliary audio fingerprint;

comparing each secondary audio fingerprint with the primary audio fingerprint;

if the comparison is successful, calculating and recording the time offset of the corresponding terminal equipment and the standard machine position;

and according to the time offset corresponding to each terminal device, the video stream corresponding to the terminal device is played in an offset manner, so that the video stream corresponding to each terminal device is played synchronously.

Further, intercepting an audio stream with a preset length from the audio/video code stream includes:

separating audio streams from the audio and video code streams according to a preset mode;

and intercepting an audio stream with a preset length from the audio stream.

Further, the separating the audio stream from each of the audio/video code streams according to a preset manner includes:

and separating audio streams from the audio and video code streams through a demultiplexer.

Further, the method also comprises the following steps:

if the comparison fails, the comparison fails to be output, and the corresponding terminal equipment is marked to have unavailable risks.

A multi-video synchronization device selects an audio and video code stream with delay time within a preset range as a main audio and video code stream, determines a corresponding terminal device as a standard machine position, and determines the rest audio and video code streams as auxiliary audio and video code streams; the device includes:

the first processing unit is used for receiving audio and video code streams uploaded by a plurality of terminal devices, and the audio and video code streams are converted into video streams shot by the terminal devices through an encoder;

the second processing unit is used for intercepting audio streams with preset lengths from the audio and video code streams and extracting audio fingerprints according to preset time slices to obtain corresponding main audio fingerprints and each auxiliary audio fingerprint;

a third processing unit, configured to compare each of the secondary audio fingerprints with the primary audio fingerprint;

and the fourth processing unit is configured to perform offset playing on the video stream of the corresponding terminal device according to the time offset corresponding to each terminal device, so as to implement synchronous playing of the video stream corresponding to each terminal device.

Further, the second processing unit is configured to:

and intercepting an audio stream with a preset length from the audio stream.

Further, the second processing unit is configured to:

Further, the third processing unit is further configured to:

A storage medium comprising a stored program, wherein a device on which the storage medium is located is controlled to perform a multi-video synchronization method as described above when the program is run.

An electronic device comprising at least one processor, and at least one memory, bus connected with the processor; the processor and the memory complete mutual communication through the bus; the processor is configured to invoke program instructions in the memory to perform the multi-video synchronization method as described above.

The multi-video synchronization method and the multi-video synchronization device are characterized in that firstly, an audio and video code stream with delay time within a preset range is selected as a main audio and video code stream, a corresponding terminal device is determined as a standard machine position, and the rest audio and video code streams are auxiliary audio and video code streams; then, receiving audio and video code streams uploaded by a plurality of terminal devices, wherein the audio and video code streams are converted into video streams shot by the terminal devices through an encoder; intercepting audio streams with preset lengths from the audio and video code streams, and extracting audio fingerprints according to preset time slices to obtain corresponding main audio fingerprints and each auxiliary audio fingerprint; comparing each secondary audio fingerprint with the primary audio fingerprint; if the comparison is successful, calculating and recording the time offset of the corresponding terminal equipment and the standard machine position; and finally, according to the time offset corresponding to each terminal device, the video stream corresponding to the terminal device is played in a shifting manner, so that the video stream corresponding to each terminal device is played synchronously. According to the embodiment of the application, the problem of multi-machine-position video synchronization under the cloud directing condition can be simplified through the audio fingerprint identification technology, meanwhile, each shooting terminal device does not need to be subjected to time calibration in advance and can be added at any time, so that the cloud directing platform can synchronize audios and videos transmitted by different networks in real time, the playing quality of the cloud directing technology is improved, and the complexity and the cost of time synchronization under the cloud environment are reduced.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a multi-video synchronization system according to an embodiment of the present application;

fig. 2 is a schematic flowchart illustrating a multi-video synchronization method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an implementation manner of a multi-video synchronization system disclosed in the embodiment of the present application;

FIG. 4 is a schematic diagram of a multi-video synchronization method disclosed in an embodiment of the present application;

fig. 5 is a schematic diagram illustrating a multi-video synchronous playing principle disclosed in an embodiment of the present application;

fig. 6 is a schematic structural diagram of a multi-video synchronization apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device disclosed in an embodiment of the present application.

Detailed Description

The present application provides a multi-video synchronization method and apparatus, which are applied to a multi-video synchronization system shown in fig. 1, where the multi-video synchronization system includes: the cloud terminal 10 and the at least two terminal devices 20 are shown in fig. 1, in the application, three terminal devices 20 are used for shooting a scene at different positions and angles at the same time, the terminal devices 20 can be terminal cameras or mobile phones, the terminal devices 20 are used for recording videos and converting the videos into audio and video code streams through encoders, the audio and video code streams are pushed to the cloud terminal 10 through a wireless or wired mode and a special network or internet link, and the cloud terminal 10 realizes multi-position video synchronization under the cloud guide condition through a multi-video synchronization method.

The application provides a multi-video synchronization method and a multi-video synchronization device, and the invention aims to: how to realize and simplify multi-machine video synchronization under the cloud director condition.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 2, a flow chart of a multi-video synchronization method according to an embodiment of the present application is shown. As shown in fig. 2, an embodiment of the present application provides a multi-video synchronization method, where an audio/video code stream with a delay time within a preset range is selected as a main audio/video code stream, and a corresponding terminal device is determined as a standard machine position, and the remaining audio/video code streams are secondary audio/video code streams; the method comprises the following steps:

s201: and receiving audio and video code streams uploaded by a plurality of terminal devices, wherein the audio and video code streams are converted into video streams shot by the terminal devices by an encoder.

Each terminal camera or mobile phone records videos and converts the videos into audio and video code streams through the encoder, and the audio and video code streams are linked through a special network or the internet in a wireless or wired mode and are pushed to the cloud server 10.

It should be noted that, the cloud server 10 designates, according to a preset algorithm, a code of a terminal camera or a mobile phone with a relatively small delay as a main audio/video code stream, and codes of the other terminals as auxiliary audio/video code streams.

S202: and intercepting audio streams with preset lengths from the audio and video code streams, and extracting audio fingerprints according to preset time slices to obtain corresponding main audio fingerprints and each auxiliary audio fingerprint.

In this embodiment of the application, the above intercepting an audio stream with a preset length from the audio/video code stream includes: separating audio streams from the audio and video code streams according to a preset mode; and intercepting an audio stream with a preset length from the audio stream.

It should be noted that, the separating the audio stream from each of the audio/video code streams according to the preset mode includes: and separating audio streams from the audio and video code streams through a demultiplexer.

In this step, the preset length and the preset time slice may be freely set according to the requirement, and are not specifically limited herein.

S203: comparing each secondary audio fingerprint with the primary audio fingerprint, if the comparison is successful, executing step S204, and if the comparison is unsuccessful, executing step S206.

S204: and calculating and recording the time offset of the corresponding terminal equipment and the standard machine position.

S205: and according to the time offset corresponding to each terminal device, the video stream corresponding to the terminal device is played in an offset manner, so that the video stream corresponding to each terminal device is played synchronously.

S205: and outputting comparison failure, and marking the corresponding terminal equipment to have unavailable risk.

It should be noted that, in the embodiment of the present application, since the delay time of each secondary terminal device may change due to network jitter, and each secondary terminal device may go online again after going offline, therefore, it is necessary to calibrate once more after a period of time, that is, multiple video synchronizations will be repeatedly executed according to a certain time interval,

for convenience of understanding, the embodiment of the present application assumes that three terminals shoot a scene at different positions and angles at the same time:

as shown in fig. 3, one of the terminals is used as a main shooting terminal device, the other two terminals are respectively used as an auxiliary shooting terminal device 1 and an auxiliary shooting terminal device 2, the three shooting terminal devices shoot a scene, a shot audio and video are converted into an audio and video code stream through an encoder in real time, and the audio and video code stream is linked through a private network or the internet in a wireless or wired manner and pushed to a specified cloud server 4.

As shown in fig. 4, the cloud server 4 separates an audio stream from an audio/video code stream sent by the main shooting terminal device through a demultiplexer (demux), intercepts a part of the audio stream in real time, and performs fingerprint extraction on the audio stream according to a certain time slice to serve as a reference audio fingerprint; adopting the same method, extracting the audio fingerprints from the auxiliary audio streams, comparing the audio fingerprints with the reference audio fingerprints of the main audio stream, and if the comparison is successful, calculating and recording the time offset of the terminal equipment and the standard terminal equipment; and if the comparison fails, displaying that the comparison fails, and then marking that the terminal equipment has unavailable risk.

As shown in fig. 5, the cloud server 4 performs offset playing on the video stream of the terminal equipment location according to the offset of each secondary terminal equipment location, so as to complete the synchronization operation of each video stream. Specifically, it can be understood that the maximum delay amount of each secondary terminal time is taken as the backward offset of the primary terminal device; each secondary terminal device then time-aligns with the primary terminal. For example, if the sub shooting terminal 1 is delayed for 2 seconds with respect to the main shooting terminal and the sub shooting terminal 2 is delayed for 1 second with respect to the main shooting terminal, the time of the main shooting terminal is delayed for 2 seconds, and the time of the sub shooting terminal 2 is delayed for 1 second, so that the synchronization of the main and sub audio is realized.

The embodiment of the application provides a multi-video synchronization method, firstly, selecting an audio and video code stream with delay time within a preset range as a main audio and video code stream, determining a corresponding terminal device as a standard machine position, and determining the rest audio and video code streams as auxiliary audio and video code streams; then, receiving audio and video code streams uploaded by a plurality of terminal devices, wherein the audio and video code streams are converted into video streams shot by the terminal devices through an encoder; intercepting audio streams with preset lengths from the audio and video code streams, and extracting audio fingerprints according to preset time slices to obtain corresponding main audio fingerprints and each auxiliary audio fingerprint; comparing each secondary audio fingerprint with the primary audio fingerprint; if the comparison is successful, calculating and recording the time offset of the corresponding terminal equipment and the standard machine position; and finally, according to the time offset corresponding to each terminal device, the video stream corresponding to the terminal device is played in a shifting manner, so that the video stream corresponding to each terminal device is played synchronously. According to the embodiment of the application, the problem of multi-machine-position video synchronization under the cloud directing condition can be simplified through the audio fingerprint identification technology, meanwhile, each shooting terminal device does not need to be subjected to time calibration in advance and can be added at any time, so that the cloud directing platform can synchronize audios and videos transmitted by different networks in real time, the playing quality of the cloud directing technology is improved, and the complexity and the cost of time synchronization under the cloud environment are reduced.

Referring to fig. 6, based on the multi-video synchronization method disclosed in the foregoing embodiment, the embodiment correspondingly discloses a multi-video synchronization apparatus, where an audio/video code stream with a delay time within a preset range is selected as a main audio/video code stream, and a corresponding terminal device is determined as a standard machine position, and the remaining audio/video code streams are sub-audio/video code streams; the device includes:

the first processing unit 601 is configured to receive an audio/video code stream uploaded by a plurality of terminal devices, where the audio/video code stream is converted from a video stream shot by each terminal device by an encoder;

the second processing unit 602 is configured to intercept an audio stream with a preset length from the audio/video code stream, and perform audio fingerprint extraction according to a preset time slice to obtain corresponding main audio fingerprints and each auxiliary audio fingerprint;

a third processing unit 603, configured to compare each of the secondary audio fingerprints with the primary audio fingerprint;

the fourth processing unit 604 is configured to perform offset playing on the video stream of the corresponding terminal device according to the time offset corresponding to each terminal device, so as to implement synchronous playing of the video stream corresponding to each terminal device.

Specifically, the second processing unit 602 is configured to:

and intercepting an audio stream with a preset length from the audio stream.

Specifically, the second processing unit 602 is configured to:

Specifically, the third processing unit 603 is further configured to:

The multi-video synchronization device comprises a processor and a memory, wherein the first processing unit, the second processing unit, the third processing unit, the fourth processing unit, the fifth processing unit and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.

The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more than one, and multi-machine video synchronization under the cloud directing condition is achieved and simplified by adjusting kernel parameters.

An embodiment of the present application provides a storage medium having a program stored thereon, which when executed by a processor implements the multi-video synchronization method.

The embodiment of the application provides a processor, wherein the processor is used for running a program, and the multi-video synchronization method is executed when the program runs.

An embodiment of the present application provides an electronic device, as shown in fig. 7, the electronic device 70 includes at least one processor 701, and at least one memory 702 and a bus 703, which are connected to the processor; the processor 701 and the memory 702 complete communication with each other through the bus 703; the processor 701 is configured to call program instructions in the memory 702 to perform the multi-video synchronization method described above.

The electronic device herein may be a server, a PC, a PAD, a mobile phone, etc.

The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device:

comparing each secondary audio fingerprint with the primary audio fingerprint;

and intercepting an audio stream with a preset length from the audio stream.

Further, the method also comprises the following steps:

The present application is described in terms of flowcharts and/or block diagrams of methods, apparatus (systems), computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a device includes one or more processors (CPUs), memory, and a bus. The device may also include input/output interfaces, network interfaces, and the like.

The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip. The memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A multi-video synchronization method is characterized in that an audio and video code stream with delay time within a preset range is selected as a main audio and video code stream, a corresponding terminal device is determined as a standard machine position, and the rest audio and video code streams are auxiliary audio and video code streams; the method comprises the following steps:

comparing each secondary audio fingerprint with the primary audio fingerprint;

2. The method of claim 1, wherein the intercepting an audio stream of a preset length from the audio/video code stream comprises:

and intercepting an audio stream with a preset length from the audio stream.

3. The method of claim 2, wherein separating audio streams from the respective audio/video streams according to a preset manner comprises:

4. The method of claim 1, further comprising:

5. A multi-video synchronization device is characterized in that an audio and video code stream with delay time within a preset range is selected as a main audio and video code stream, a corresponding terminal device is determined as a standard machine position, and the rest audio and video code streams are auxiliary audio and video code streams; the device includes:

6. The apparatus of claim 5, wherein the second processing unit is configured to:

and intercepting an audio stream with a preset length from the audio stream.

7. The apparatus of claim 6, wherein the second processing unit is configured to:

8. The apparatus of claim 5, wherein the third processing unit is further configured to:

9. A storage medium comprising a stored program, wherein the program, when executed, controls an apparatus on which the storage medium resides to perform the multi-video synchronization method of any one of claims 1 to 4.

10. An electronic device comprising at least one processor, and at least one memory, bus connected to the processor; the processor and the memory complete mutual communication through the bus; the processor is configured to invoke program instructions in the memory to perform the multi-video synchronization method of any of claims 1 to 4.