CN103648011A

CN103648011A - Audio and video synchronization device and method based on HLS protocol

Info

Publication number: CN103648011A
Application number: CN201310637560.3A
Authority: CN
Inventors: 苍鹏; 李强
Original assignee: Leshi Zhixin Electronic Technology Tianjin Co Ltd
Current assignee: Xinle Visual Intelligent Electronic Technology Tianjin Co ltd; Leshi Zhixin Electronic Technology Tianjin Co Ltd
Priority date: 2013-11-29
Filing date: 2013-11-29
Publication date: 2014-03-19
Anticipated expiration: 2033-11-29
Also published as: CN103648011B

Abstract

The invention provides an audio and video synchronization device and method based on an HLS protocol. The method comprises: after receiving a synchronization request, adding a synchronization mark to audio and video data obtained first; demultiplexing the audio and video data into audio data and video data, and updating timestamps of the audio data and / or video data based on the synchronization mark; decoding the audio data and video data and outputting an audio signal to be played and a video signal to be played; detecting whether the timestamps carried in the audio signal to be played and the video signal to be played are consistent, and sending the synchronization request when the timestamps carried in the audio signal to be played and the video signal to be played are not consistent; and controlling the audio signal to be played and the video signal to be played to be output synchronously based on the synchronization mark. Through the technical scheme in the invention, the problem of picture or sound retention due to continuous nonsynchronization of the audio and video is solved.

Description

A kind of audio-visual synchronization apparatus and method based on HLS agreement

Technical field

The present invention relates to Technology of Multimedia Stream Playing field, relate in particular to a kind of audio-visual synchronization apparatus and method based on HLS agreement.

Background technology

HLS(Http live streaming) agreement is the streaming media agreement based on HTML (Hypertext Markup Language) Http that Apple realizes, HLS agreement is carried out segmentation by the huge continuous media data of capacity, the small documents that is divided into One's name is legion transmits, catered to the file transfer of web page server, adopt the lightweight index file of a continuous renewal to control download and the broadcasting of cutting apart rear little media file, can realize the live and program request of Streaming Media.HLS agreement is being paid the utmost attention to the automatic switchover of supporting code check under the prerequisite of fluency, is widely used at present by each large video website.

By HLS agreement, multi-medium data is divided into audio, video data ts file and index m3u8 file after treatment, and is present on Cloud Server.In fact m3u8 file is a kind of organizational form of file group, and the form by audio, video data ts file with playlist organizes together for player downloads and broadcasting.In m3u8 file, generally include a plurality of uniform resource position mark URL of audio, video data ts file.Player obtains after m3u8 file, and then downloads corresponding audio, video data according to URL wherein.For the audio, video data of having downloaded, please refer to Fig. 1, by this audio, video data demultiplexing, be first voice data and video data, then decode this voice data and video data, export audio signal to be played and vision signal to be played, finally export audio signal to be played and vision signal to be played, to realize the broadcasting of audio, video data.

Due to the complicated factor of Internet Transmission, in the process of playing at above-mentioned audio, video data, Voice & Video synchronization accuracy is uncertain, affected by abnormal conditions larger.Therefore, can produce due to audio signal to be played and the lasting picture that can not synchronously bring of vision signal or sound delay to be played.

To sum up, a kind of audio and video synchronization method based on HLS agreement and device urgently provide.

Summary of the invention

In view of this, the invention provides a kind of audio-visual synchronization apparatus and method based on HLS agreement, solve that audio signal to be played and vision signal to be played continue picture asynchronous and that bring or sound is detained.

Particularly, described device comprises mark module, demultiplexing module, decoder module, detection module and the playing module arranging in turn, wherein:

Described mark module, for after receiving synchronization request, for the first audio, video data getting adds sync mark, and notifies playing module;

Described demultiplexing module, for being voice data and video data by described audio, video data demultiplexing, and according to described sync mark, upgrades the timestamp of voice data and/or video data;

Described decoder module, for described voice data and the video data of decoding, exports audio signal to be played and vision signal to be played;

Described detection module, whether consistent with the timestamp carrying in vision signal to be played for detection of described audio signal to be played, when the timestamp that carries is inconsistent, send synchronization request to mark module in audio signal to be played and vision signal to be played;

Described playing module, for synchronously exporting according to the described sync mark described audio signal to be played of control and vision signal to be played.

Further, described detection module, if when the timestamp carrying for described audio signal to be played and vision signal to be played is all inconsistent within the predetermined time, send synchronization request to mark module.

Further, the process of the timestamp of described demultiplexing module renewal voice data and/or video data comprises: described demultiplexing module is according to the timestamp of video data described in the timestamp correction of described voice data.

Further, the process of the timestamp of described demultiplexing module renewal voice data and/or video data comprises: described demultiplexing module is according to the timestamp of voice data described in the timestamp correction of described audio, video data and video data.

Further, described playing module controls described audio signal to be played according to described sync mark and the synchronous process of exporting of vision signal to be played comprises: audio signal to be played or the vision signal to be played of carrying sync mark are stored in to output buffer area, wait receives after the vision signal to be played or audio signal to be played of carrying sync mark, controls to carry the audio signal to be played of sync mark and vision signal to be played is synchronously exported.

Described method comprises:

After receiving synchronization request, for the first audio, video data getting adds sync mark;

By described audio, video data demultiplexing, be voice data and video data, and according to described sync mark, upgrade the timestamp of voice data and/or video data;

Decode described voice data and video data, export audio signal to be played and vision signal to be played;

Detect described audio signal to be played whether consistent with the timestamp carrying in vision signal to be played, when the timestamp that carries is inconsistent, send synchronization request in audio signal to be played and vision signal to be played;

According to the described sync mark described audio signal to be played of control and vision signal to be played, synchronously export.

Further, described method also comprises:

If when the timestamp carrying in described audio signal to be played and vision signal to be played is all inconsistent within the predetermined time, send synchronization request.

Further, the process of the timestamp of described renewal voice data and/or video data comprises: according to the timestamp of video data described in the timestamp correction of described voice data.

Further, the process of the timestamp of described renewal voice data and/or video data comprises: according to the timestamp of voice data described in the timestamp correction of described audio, video data and video data.

Further, according to the described sync mark described audio signal to be played of control and the synchronous process of exporting of vision signal to be played, comprise: audio signal to be played or the vision signal to be played of carrying sync mark are stored in to output buffer area, wait receives after the vision signal to be played or audio signal to be played of carrying sync mark, controls to carry the audio signal to be played of sync mark and vision signal to be played is synchronously exported.

By above description, can be found out, the present invention is by detecting the timestamp of audio signal to be played and vision signal to be played, and then detect audio signal to be played and vision signal to be played whether output is synchronous, in audio signal to be played and vision signal to be played, export when asynchronous, for treating that the audio, video data of demultiplexing adds sync mark.According to this sync mark, control audio signal to be played and vision signal to be played is synchronously exported, when demultiplexing, upgrade the timestamp of voice data and video data simultaneously, synchronous to reach follow-up audio signal to be played and vision signal timestamp to be played, and then export synchronous object.Therefore, solved the problem that audio signal to be played and vision signal to be played continue asynchronous brought picture or sound delay.

Accompanying drawing explanation

Fig. 1 is audio, video data playing flow figure in prior art of the present invention;

Fig. 2 is the structural representation of an embodiment of the present invention middle pitch video synchronization device;

Fig. 3 is the structural representation of TS bag in an embodiment of the present invention;

Fig. 4 is the flow chart of audio and video synchronization method in an embodiment of the present invention.

Embodiment

For making object, technical scheme and the advantage of the embodiment of the present invention clearer, below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly described, obviously, described embodiment is the present invention's part embodiment, rather than whole embodiment.Embodiment based in the present invention, those of ordinary skills, not making the every other embodiment obtaining under creative work prerequisite, belong to the scope of protection of the invention.

For the problems referred to above of the prior art, the invention provides a kind of audio-visual synchronization device based on HLS agreement.For further illustrating the present invention, provide the following example:

Embodiment mono-

Please refer to Fig. 2, the audio-visual synchronization device based on HLS agreement in the present embodiment is provided with in turn: mark module 110, demultiplexing module 120, decoder module 130, detection module 140 and playing module 150.

Mark module 110, for after receiving synchronization request, for the first audio, video data getting adds sync mark, and notifies playing module 150.

Demultiplexing module 120, for being voice data and video data by audio, video data demultiplexing, and according to above-mentioned sync mark, upgrades the timestamp of voice data and/or video data.

Decoder module 130, for above-mentioned voice data and the video data of decoding, exports audio signal to be played and vision signal to be played.

Detection module 140, whether consistent with the timestamp carrying in vision signal to be played for detection of above-mentioned audio signal to be played, when the timestamp that carries is inconsistent, send synchronization request to mark module 110 in audio signal to be played and vision signal to be played.

Playing module 150, for controlling audio signal to be played according to above-mentioned sync mark and vision signal to be played is synchronously exported.

The present invention arranges detection module 140, for the timestamp carrying according to audio signal to be played and vision signal to be played judge audio signal to be played and vision signal to be played whether synchronous.Nonsynchronous time, send synchronization request to mark module 110, mark module 110 is after receiving this synchronization request, for the first audio, video data next getting adds sync mark.This sync mark be used for into 150 promptings of demultiplexing module 120 and playing module synchronous.

In a kind of concrete realization of the present embodiment, demultiplexing module 120, according to sync mark, is upgraded the timestamp of voice data and/or video data, in order to revise voice data and the nonsynchronous problem of video data timestamp.Playing module 150 controls audio signal to be played according to above-mentioned sync mark and vision signal to be played is exported synchronous.

Embodiment bis-

The present embodiment provides a kind of more comprehensively audio-visual synchronization device based on HLS agreement, and this device is disposed with mark module, demultiplexing module, decoder module, detection module and playing module.Wherein:

Mark module, for after receiving synchronization request, for the first audio, video data getting adds sync mark, and notifies playing module.

Demultiplexing module, for being voice data and video data by audio, video data demultiplexing, and according to above-mentioned sync mark, upgrades the timestamp of voice data and/or video data.

Decoder module, for above-mentioned voice data and the video data of decoding, exports audio signal to be played and vision signal to be played.

Detection module, whether consistent with the timestamp carrying in vision signal to be played for detection of above-mentioned audio signal to be played, when the timestamp that carries is inconsistent, send synchronization request to mark module in above-mentioned audio signal to be played and vision signal to be played.

Playing module, for controlling audio signal to be played according to above-mentioned sync mark and vision signal to be played is synchronously exported.

Compared to prior art, the present invention arranges detection module between decoder module and playing module, whether consistent with the timestamp carrying in vision signal to be played for detection of audio signal to be played, and then whether the output that detects audio signal to be played and vision signal to be played is synchronous, and inconsistent time, by detection module, send synchronization request to mark module at audio signal to be played and vision signal timestamp to be played.In order to coordinate detection module, the present invention also arranged mark module before demultiplexing module, the synchronization request sending for receiving detection module, and after receiving this synchronization request, for the first audio, video data getting thereafter adds sync mark.Further, playing module controls decoded audio signal to be played according to this sync mark and vision signal to be played is synchronously exported, synchronous to realize the output of Voice & Video.Demultiplexing module is upgraded the timestamp of voice data and/or video data according to this sync mark, synchronous to realize the timestamp of voice data and video data, further guarantees that follow-up output is synchronous.

According to HLS agreement, for a complete multi-medium data, according to the URL in m3u8 file, download the audio, video data of a plurality of TS forms of this multi-medium data.TS is a kind of encapsulation format, and voice data and video data and supplementary all can be encapsulated into the TS bag of 188 bytes.Please refer to the TS pack arrangement schematic diagram shown in Fig. 3, a TS bag is to be formed by TS head and the packing of PES stream.Wherein PES stream is to be formed by PES head and the packing of ES stream, and ES stream comprises voice data and video data.

In a kind of concrete realization of the present embodiment, mark module is for obtaining the audio, video data of having downloaded, when not receiving synchronization request, the output that audio signal at present to be played and vision signal to be played are described is synchronous, mark module does not need to do special processing, and audio, video data is sent to demultiplexing module.Mark module is after receiving synchronization request, illustrate now, audio signal to be played and vision signal to be played are asynchronous, in order to adjust as early as possible, mark module is after receiving synchronization request, and for the first audio, video data next getting adds sync mark, this sync mark can be arranged on TS head, such as, in certain field of TS head, increase corresponding mark.

It is voice data and video data by audio, video data demultiplexing that demultiplexing module is used for.Demultiplexing module is resolved audio, video data TS bag, therefrom parses voice data and video data.In the process of resolving, if find above-mentioned sync mark, upgrade the timestamp of voice data and/or video data.Particularly, include a timestamp in each voice data and video data, this timestamp is exactly displaying time, such as, the time schedule bar of below while watching film.Demultiplexing module, when finding sync mark, synchronously be adjusted the timestamp of voice data and video data.

Particularly, in a kind of exemplary execution mode of the present invention, demultiplexing module has two kinds of synchronous modes of adjusting the timestamp of voice data and/or video data.Be according to a timestamp for the timestamp correction video data of voice data, the timestamp of voice data of namely take is benchmark, revises the timestamp of video data, makes it consistent with the timestamp of voice data.A kind of is according to the timestamp of the timestamp correction video data of audio, video data.According to HLS agreement, each audio, video data has corresponding duration, according to duration corresponding to this audio, video data in m3u8 file, can calculate the time of this audio, video data for whole multi-medium data, namely the timestamp of this audio, video data.In the timestamp of voice data and the timestamp difference of above-mentioned audio, video data when larger, according to the timestamp correction voice data of above-mentioned audio, video data and the timestamp of video data,, the timestamp of voice data and video data is all modified to the timestamp of above-mentioned audio, video data, relatively accurate to guarantee the timestamp of voice data and video data.

Decoder module decode above-mentioned voice data and video data, become the audio signal to be played that can play for output and vision signal to be played by number reduction.

Detection module, for before audio signal to be played and vision signal to be played output, detects the timestamp that audio signal to be played and vision signal to be played are carried.This timestamp is the concrete time of audio, video data, such as 01:20:38, represents the 1st hour 20 minutes 38 seconds.Generally, the time in the playing progress bar of audio, video data just comes from this timestamp.When the timestamp that carries in audio signal to be played and vision signal to be played is inconsistent, illustrate that audio signal to be played and vision signal output to be played are asynchronous, send synchronization request to mark module.Further, when the timestamp that detection module only carries in audio signal to be played and vision signal to be played is all inconsistent within the predetermined time, just send synchronization request, excessive to avoid frequently sending caused mark module pressure.Preferably, this predetermined time can be selected 3 seconds.

The object that playing module is play with realization for exporting decoded audio signal to be played and vision signal to be played.When audio signal to be played and vision signal to be played are exported when asynchronous, playing module carries out the synchronous output of audio signal to be played and vision signal to be played according to sync id.Particularly, playing module is stored in output buffer area by audio signal to be played or the vision signal to be played of carrying sync mark, wait receives after the vision signal to be played or audio signal to be played of carrying sync mark, controls to carry the audio signal to be played of sync mark and vision signal to be played is synchronously exported.Because sync mark is stamped for audio, video data after receiving synchronization request by mark module, carry the audio signal to be played of sync mark and vision signal to be played from same audio, video data, need to synchronously export.If playing module first receives the audio signal to be played of carrying sync mark, illustrate that the output of audio signal to be played is faster than the output of vision signal to be played, therefore first audio signal to be played is stored in to output buffer area, when wait receives the vision signal to be played of carrying sync mark, control and carry the audio signal to be played of sync mark and vision signal to be played is synchronously exported.If playing module first receives the vision signal to be played of carrying sync mark, illustrate that the output of vision signal to be played is faster than the output of audio signal to be played, therefore first vision signal to be played is stored in to output buffer area, when wait receives the audio signal to be played of carrying sync mark, control and carry the audio signal to be played of sync mark and vision signal to be played is synchronously exported.With this, reach the object of isochronous audio and video.

Embodiment tri-

Corresponding said apparatus, the invention provides a kind of audio and video synchronization method based on HLS agreement.Please refer to Fig. 4, the method comprises:

By audio, video data demultiplexing, be voice data and video data, and according to above-mentioned sync mark, upgrade the timestamp of voice data and/or video data;

Decoding audio data and video data, export audio signal to be played and vision signal to be played;

Detect audio signal to be played whether consistent with the timestamp carrying in vision signal to be played, when the timestamp that carries is inconsistent, send synchronization request in audio signal to be played and vision signal to be played;

According to above-mentioned sync mark, control audio signal to be played and vision signal to be played is synchronously exported.

Further, said method also comprises:

If when the timestamp carrying in audio signal to be played and vision signal to be played is all inconsistent within the predetermined time, send synchronization request.Excessive to avoid frequently sending caused mark module pressure.Preferably, this predetermined time can be selected 3 seconds

Further, the process of the timestamp of renewal voice data and/or video data comprises: according to the timestamp of the timestamp correction video data of voice data.

Further, the process of the timestamp of renewal voice data and/or video data comprises: according to the timestamp correction voice data of audio, video data and the timestamp of video data.

Particularly, in a kind of exemplary execution mode of the present invention, there are two kinds of synchronous modes of adjusting the timestamp of voice data and video data.Be according to a timestamp for the timestamp correction video data of voice data, the timestamp of voice data of namely take is benchmark, revises the timestamp of video data, makes it consistent with the timestamp of voice data.In the timestamp of voice data and the timestamp difference of audio, video data a when larger, according to the timestamp correction voice data of audio, video data and the timestamp of video data.The timestamp of voice data and video data is all modified to the timestamp of above-mentioned audio, video data, relatively accurate to guarantee the timestamp of voice data and video data.

Further, the process of controlling audio signal to be played and the synchronous output of vision signal to be played according to sync mark comprises: audio signal to be played or the vision signal to be played of carrying sync mark are stored in to output buffer area, wait receives after the vision signal to be played or audio signal to be played of carrying sync mark, controls to carry the audio signal to be played of sync mark and vision signal to be played is synchronously exported.

Compared to prior art, before the present invention plays after decoding, detect audio signal to be played whether consistent with the timestamp carrying in vision signal to be played, and then whether the output that detects audio signal to be played and vision signal to be played is synchronous, and inconsistent time, send synchronization request at audio signal to be played and vision signal timestamp to be played.On the one hand, the present invention is before demultiplexing, after receiving synchronization request, for the first audio, video data getting thereafter adds sync mark.Further, according to this sync mark, control decoded audio signal to be played and vision signal to be played is synchronously exported, synchronous to realize the output of Voice & Video.On the other hand, according to this sync mark, upgrade the timestamp of voice data and/or video data, synchronous to realize the timestamp of voice data and video data, further guarantee that follow-up output is synchronous.

The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of making, be equal to replacement, improvement etc., within all should being included in the scope of protection of the invention.

Claims

1. the audio-visual synchronization device based on HLS agreement, is characterized in that, comprises the mark module, demultiplexing module, decoder module, detection module and the playing module that arrange in turn, wherein:

2. device according to claim 1, is characterized in that,

Described detection module, if be further used for the timestamp that carries in described audio signal to be played and vision signal to be played when all inconsistent within the predetermined time, sends synchronization request to mark module.

3. device according to claim 1, is characterized in that,

The process that described demultiplexing module is upgraded the timestamp of voice data and/or video data comprises: described demultiplexing module is according to the timestamp of video data described in the timestamp correction of described voice data.

4. device according to claim 1, is characterized in that,

The process that described demultiplexing module is upgraded the timestamp of voice data and/or video data comprises: described demultiplexing module is according to the timestamp of voice data described in the timestamp correction of described audio, video data and video data.

5. device according to claim 1, is characterized in that,

Described playing module controls described audio signal to be played according to described sync mark and the synchronous process of exporting of vision signal to be played comprises: audio signal to be played or the vision signal to be played of carrying sync mark are stored in to output buffer area, wait receives after the vision signal to be played or audio signal to be played of carrying sync mark, controls to carry the audio signal to be played of sync mark and vision signal to be played is synchronously exported.

6. the audio and video synchronization method based on HLS agreement, is characterized in that, method comprises:

7. method according to claim 6, is characterized in that, described method also comprises:

8. method according to claim 6, is characterized in that,

The process of the timestamp of described renewal voice data and/or video data comprises: according to the timestamp of video data described in the timestamp correction of described voice data.

9. method according to claim 6, is characterized in that,

The process of the timestamp of described renewal voice data and/or video data comprises: according to the timestamp of voice data described in the timestamp correction of described audio, video data and video data.

10. method according to claim 6, is characterized in that,

According to the described sync mark described audio signal to be played of control and the synchronous process of exporting of vision signal to be played, comprise: audio signal to be played or the vision signal to be played of carrying sync mark are stored in to output buffer area, wait receives after the vision signal to be played or audio signal to be played of carrying sync mark, controls to carry the audio signal to be played of sync mark and vision signal to be played is synchronously exported.