CN103888815A

CN103888815A - Method and system for real-time separation treatment and synchronization of audio and video streams

Info

Publication number: CN103888815A
Application number: CN201410093765.4A
Authority: CN
Inventors: 徐永键; 徐广健; 林澍霖; 阮俊杰; 谭洪舟
Original assignee: SYSUNG ELECTRONICS AND TELECOMM RESEARCH INSTITUTE
Current assignee: SYSU HUADU INDUSTRIAL SCIENCE AND TECHNOLOGY INSTITUTE; Sun Yat Sen University
Priority date: 2014-03-13
Filing date: 2014-03-13
Publication date: 2014-06-25
Anticipated expiration: 2034-03-13
Also published as: CN103888815B

Abstract

The invention discloses a method for real-time separation treatment and synchronization of audio and video streams. The method comprises the steps that an upper computer obtains and demultiplexes a medium source file so as to obtain the video streams and the audio streams; the upper computer plays a video; format conversion and encoding compression are carried out on the audio streams, and then the audio streams are transmitted to a lower computer; the lower computer conducts decoding and filtering and plays an audio; initial synchronous treatment is carried out at the beginning of playing of the video and the audio, and synchronous correction treatment is carried out in the playing process. Compared with the prior art, separation treatment is carried out on the audio streams and the video streams, the occupied network bandwidths are reduced when the audio streams are transmitted to the lower computer, and in addition, it is ensured that the upper computer and the lower computer accurately and synchronously play the video and the audio respectively. The invention further discloses a system for real-time separation treatment and synchronization of the audio and video streams.

Description

The real-time separating treatment of a kind of audio/video flow and synchronous method and system thereof

Technical field

The present invention relates to wireless audio and video processing technology field, relate more specifically to a kind of central air-conditioner control method and control system thereof of analyzing based on video demographics.

Background technology

Along with the develop rapidly of radio network technique and the quick upgrading of mobile radio terminal hardware, a lot of new technology, new is applied to for possibility.The expansion of network technology and the performance boost of hardware device allow people have higher requirement to multimedia audio-visual experience.

The type of existing most of audio frequency and video audio-video system mainly contains: (1) smart machine terminal is responsible for audio/video decoding and broadcasting; (2) intelligent terminal is combined with high tone quality stereo set, provides audio frequency and video to play.

Audio-video system decoding and the broadcasting of the first type are all to complete at intelligent terminal, can guarantee to play preferably real-time.Such system is generally arranged on smart mobile phone or dull and stereotyped upper, and part also can provide the OK a karaoke club ok characteristic of user recording, can also regulate in real time sound effect parameters.But general intelligent terminal does not all carry the outer equipment of putting that high tone quality can be provided, and integrated acoustics is difficult to meet the requirement of part Fancier.So such audio-video system has very large application limitation.

The audio-video system of the second type and the first have very large difference.The broadcasting of such system audio frequency and video separates, and the video broadcasting of decoding in intelligent terminal sends audio stream data after voice data decoding and plays back in the stereo set of high tone quality.Same, this type systematic part also can possess the characteristic that OK a karaoke club ok sings.This has met the demand that high tone quality is enjoyed, but this system is generally to be sent in stereo set and play in real time by extract real-time voice data, is difficult to ensure the audio frequency and video accurate synchronization of host computer and slave computer.

Summary of the invention

Technical problem to be solved by this invention is: the real-time separating treatment of a kind of audio/video flow and synchronous method and system are provided, while ensureing host computer and slave computer transmission data, take the less network bandwidth, and can guarantee the accurate synchronization that host computer and slave computer audio/video flow are play.

For solving the problems of the technologies described above, the technical solution used in the present invention is: the real-time separating treatment of a kind of audio frequency and video and synchronous method are provided, comprise:

S1: host computer obtains and demultiplexing source of media file, to obtain video flowing and audio stream;

S2: described host computer carries out video decode and video filtering to described video flowing, to obtain video playback data, and plays described video playback data;

S3: described host computer carries out transferring to slave computer after format conversion and compression coding to described audio stream;

S4: described slave computer calls decoder described audio stream is decoded and filtering, to obtain audio frequency played data, and plays described audio frequency played data;

S5: when being started, described video playback data and audio frequency played data carry out initial synchronization, and to carrying out synchronous correction processing in described video playback data and audio frequency played data playing process.

Compared with prior art, in method of the present invention, host computer carries out demultiplexing to source of media file and obtains, after video flowing and audio stream, flowing corresponding data by host computer displaying video, and audio stream is transferred to slave computer play, thereby realize the separating treatment of audio/video flow; And before audio stream is transferred to slave computer, audio stream is carried out to format conversion and compression coding, thereby greatly reduce the network bandwidth shared while transmitting data; In addition, when starting video playback data and audio frequency played data, the method carries out initial synchronization, and to carrying out synchronous correction processing in video playback data and audio frequency played data playing process, thereby guarantee host computer and slave computer accurate synchronization when displaying video, audio frequency respectively.

Correspondingly, the present invention also provides the real-time separating treatment of a kind of audio/video flow and synchronous system, comprises host computer and slave computer, and described host computer comprises:

Demultiplexing module, for obtaining and demultiplexing source of media file, to obtain video flowing and audio stream;

Video decode and filtration module, for described video flowing is carried out to video decode and video filtering, to obtain video playback data, and play described video playback data;

The first format converting module, for carrying out format conversion and compression coding to described audio stream;

The first transport module, for transferring to slave computer by the described audio stream after format conversion and compression coding;

Described slave computer comprises:

Audio decoder and filtration module, decode and filtering to described audio stream for calling decoder, to obtain audio frequency played data, and plays described audio frequency played data;

It is characterized in that, described host computer and slave computer include:

Synchronization module, carries out initial synchronization when described video playback data and audio frequency played data are started, and to carrying out synchronous correction processing in described video playback data and audio frequency played data playing process.

By following description also by reference to the accompanying drawings, it is more clear that the present invention will become, and these accompanying drawings are used for explaining embodiments of the invention.

Brief description of the drawings

Fig. 1 is the flow chart of the real-time separating treatment of audio/video flow of the present invention and synchronous method the first embodiment.

Fig. 2 is the flow chart of the real-time separating treatment of audio/video flow of the present invention and synchronous method the second embodiment.

Fig. 3 is the schematic diagram of Fig. 2.

Fig. 4 is the schematic diagram of audio frequency and video initial synchronization.

Fig. 5 is the schematic diagram that in playing process, audio-visual synchronization is proofreaied and correct.

Fig. 6 is the structured flowchart of the real-time separating treatment of audio/video flow of the present invention and synchronous system the first embodiment.

Fig. 7 is the structured flowchart of the real-time separating treatment of audio/video flow of the present invention and synchronous system the second embodiment.

Embodiment

With reference now to accompanying drawing, describe embodiments of the invention, in accompanying drawing, similarly element numbers represents similar element.

Please refer to Fig. 1, the invention provides the real-time separating treatment of a kind of audio frequency and video and synchronous method, comprising:

S2: host computer carries out video decode and video filtering to video flowing, to obtain video playback data, and displaying video played data;

S3: host computer carries out transferring to slave computer after format conversion and compression coding to audio stream;

S4: slave computer calls decoder audio stream is decoded and filtering, to obtain audio frequency played data, and plays described audio frequency played data;

S5: when being started, video playback data and audio frequency played data carry out initial synchronization, and to carrying out synchronous correction processing in video playback data and audio frequency played data playing process.

Please refer to again Fig. 2 and Fig. 3, in another embodiment of the present invention, can realize the real-time separating treatment to audio/video flow and synchronous in wireless image-sound system.It should be noted that, in the present embodiment, hardware device used mainly comprises: host computer and slave computer, connect by wireless Wi-Fi network between the two.Wherein, host computer is mobile intelligent terminal equipment, can be mobile phone or flat-panel devices based on Android, IOS or other system (not limiting to cell phone system).Slave computer is the Specialty Hi-Fi equipment that has carried professional audio processor.The decoding of audio frequency and video is to separate with slave computer at host computer respectively to realize, and asynchronous execution.Host computer is responsible for video pictures and is play, audio stream is sent to slave computer by network transmission module and plays (utilizing synchronization module to ensure the accurate synchronization of audio frequency and video), while slave computer is recorded user voice and is passed back to host computer, finally carries out audio frequency and video merging at host computer.

Particularly, the present embodiment comprises the steps:

S201: host computer obtains and demultiplexing source of media file, to obtain video flowing and audio stream.

S202: host computer carries out video decode and video filtering to video flowing, to obtain video playback data, and displaying video played data; Be specially, host computer calls corresponding Video Decoder according to the form of video flowing, and the video flowing that demultiplexing is obtained is decoded, and carries out the output of video pictures play at mobile terminal.

S203: host computer carries out transferring to slave computer after format conversion and compression coding to audio stream; Be specially, the audio stream obtaining is carried out to format conversion, adjust and become the form that is applicable to carrying out flow transmission, then arrive slave computer by Wi-Fi wireless network transmissions.Now the data of transmission are not original audio frequency PCM data, but the audio stream data of compressing that is encoded has reduced the bandwidth occupancy of Wi-Fi network.If the just voice data in current broadcasting moment of real-time Transmission, in the time that network congestion is unsettled, voice data cannot arrive slave computer in real time, so the mode that the present invention adopts is: the process of audio stream transmission is the service doing one's best, and the slave computer that is transferred in advance as much as possible audio stream data is preserved into file.

S204: slave computer calls decoder audio stream is decoded and filtering, to obtain audio frequency played data; Be specially, slave computer receives the audio stream data of host computer transmission, preserves into file, and the decoder that meanwhile audio stream is called to specific format carries out audio decoder and filtering.

S205: gather voice data by audio input device, the audio frequency played data and the voice data that obtain after to decoding filtering by audio mixing and audio adjustment module are synthesized processing, to obtain forming new audio frequency played data, and play this new audio frequency played data by sound equipment.

S206, carries out initial synchronization when video playback data and new audio frequency played data are started, and to carrying out synchronous correction processing in video playback data and new audio frequency played data playing process.

S207, slave computer carries out transferring to host computer after audio coding and format conversion to this new audio frequency played data; Be specially, these new audio frequency played data are carried out audio coding simultaneously, convert through format converting module the audio stream that is applicable to flow transmission to again, pass through in real time Wi-Fi Internet Transmission to host computer, now the data of transmission are through the voice data of coding instead of original PCM data, have reduced greatly taking of bandwidth.

S208, host computer receives this new audio frequency played data, and after video playback data and this new audio frequency played data are play and finished, and both is carried out to audio frequency and video synthetic, to generate new media file; Be specially, host computer receives the audio stream data that comprises voice of slave computer passback, preserves and becomes file.When media play finishes, carry out the synthetic of audio frequency and video on backstage, finally obtain a homemade MV file.

Particularly, please refer to Fig. 4, S206 specifically comprises:

In the time that video starts to play, video the first frame sends a frame data bag to slave computer when broadcasting, and the form of this packet is as follows:

Frame number m

Frame correspondence system time t1(ms)

In the time that this frame data bag arrives slave computer, the slave computer audio stream plays of decoding at once postbacks a frame data bag to host computer simultaneously, and form is as shown above, and the frame number that frame number sends over host computer is identical.In the time that host computer receives the frame data bag that slave computer postbacks, obtain the frame correspondence system time t2 in this packet, poor with the t2 of current system time is Δ t=t2-t1, in order more accurately reasonably to add up this time difference, the interference of exclusive segment contingent situation, use the method for sample mean, this process is carried out to repetitive operation.Front 10 frames to video carry out this repetitive operation, 10 groups that obtain are carried out assembly average, obtain the mean value Δ t ' of time difference, for example, if when Δ t ' has exceeded certain predetermined threshold value (50ms), slow down the processing of the frame rate of presentation of video frames, if current frame of video is p1, within the time of Δ t '/2, maintain this frame, reach the initial synchronous object of audio frequency and video.Frame of video is slowed down the processing of frame rate instead of processed at audio frequency end is to experience in order to obtain good user, in the moment of carrying out initial synchronous correction, picture to a certain degree delay user's impact very tricklely, in the time that Δ t ' is enough little, even can not discover to some extent.

After having ensured that audio frequency and video have started synchronously, in decoding playing process, audio frequency and video are carried out to synchronous correction.

Please refer to Fig. 5, in the process of playing at audio/video decoding, need on each time break node, do a synchronous correction, the present invention is in order to obtain good user's experience, get this time interval and be 10s(in playing process every 10s of mistake carry out a synchronous correction).

The detailed process of synchronous correction is as follows:

When the present frame of video is play, host computer sends a synchronous correction packet by Wi-Fi network to slave computer, and the current reproduction time tvideo(that this packet comprised current frame number, calculate according to present frame and frame per second is taking ms as unit).The form of packet is as follows:

Frame of video sequence number n

Video present frame time tvideo(ms)

After slave computer receives packet, go out according to calculation of parameter such as the sample rates of the audio stream of decoding the reproduction time taudio that current audio frame is corresponding, reformulate new packet, send to host computer.The information that this packet comprises is: frame of video sequence number, video frame time tvideo, audio frame time taudio.The form of packet is as follows:

Frame of video sequence number n

Video frame time tvideo(ms)

Audio frame time taudio(ms)

Host computer receives this packet, resolves the relevant information that obtains, and calculates time tvideo ' corresponding to current video frame by the frame per second of video, and total information is as follows:

1. frame of video sequence number n

2. video frame time tvideo

3. audio frame time taudio

4. current video frame time tvideo '

According to above information, the time Δ tv_ahead_a of the leading audio frame of frame of video can estimate according to following formula:

{Δt}_{v_ahead_a} \approx t_{video} + \frac{t_{video}^{'} - t_{video}}{2} t_{audio}

Same, 10 continuous frame of video are carried out to above repetitive operation, it is carried out to sample mean, obtain the Δ t ' average time of a leading audio frame of frame of video _{v_ahead_a}.As Δ t ' _{v_ahead_a}absolute value when exceeding certain default threshold value (such as 200ms), it is carried out to audio-visual synchronization correction.

Particularly, as Δ t ' _{v_ahead_a}when >0, video is ahead of audio frequency, slows down the frame rate of video playback at host computer, can take method below: at Δ t _{v_ahead_a}time in maintain present frame, then continue as usual to play;

As Δ t ' _{v_ahead_a < 0}time, video lags behind audio frequency, carries out the processing of frame-skipping at host computer, according to Δ t ' _{v_ahead_a}calculate the frame number k that should skip with current frame per second, then skip this k frame.

Please refer to Fig. 6, the present invention provides the real-time separating treatment of a kind of audio/video flow and synchronous system simultaneously again, comprises host computer 100 and slave computer 200.

Particularly, host computer 100 comprises:

Demultiplexing module 101, for obtaining and demultiplexing source of media file, to obtain video flowing and audio stream;

Video decode and filtration module 102, for video flowing is carried out to video decode and video filtering, to obtain video playback data, and displaying video played data;

The first format converting module 103, for carrying out format conversion and compression coding to audio stream;

The first transport module 104, for transferring to slave computer by the described audio stream after format conversion and compression coding.

Particularly, slave computer 200 comprises:

Audio decoder and filtration module 201, decode and filtering to audio stream for calling decoder, to obtain audio frequency played data, and audio plays played data.

It should be noted that host computer 100 and slave computer 200 include:

Synchronization module 30, carries out initial synchronization when video playback data and audio frequency played data are started, and to carrying out synchronous correction processing in video playback data and audio frequency played data playing process.

Compared with prior art, system of the present invention comprises host computer 100 and slave computer 200, host computer 100 carries out demultiplexing to source of media file and obtains after video flowing and audio stream, flow corresponding data by host computer 100 displaying videos, and audio stream is transferred to slave computer 200 play, thereby realize the separating treatment of audio/video flow; And before audio stream is transferred to slave computer 100, by the first format converting module 103, audio stream is carried out to format conversion and compression coding, thereby greatly reduce the network bandwidth shared while transmitting data; In addition, in this system, host computer 100 and slave computer 200 include synchronization module 30, can start video playback data and audio frequency played data time, carry out initial synchronization, and to carrying out synchronous correction processing in video playback data and audio frequency played data playing process, thereby guarantee host computer and slave computer accurate synchronization when displaying video, audio frequency respectively.

Particularly, synchronization module 30 comprises:

Initial synchronization unit 301, for according to the frame number m of video lead frame packet and the corresponding system time t of described video lead frame ₁, frame number and the corresponding system time t of this audio frequency start frame in audio frequency start frame packet ₂, adopt the poor Δ t=t of method of average timing statistics ₂-t ₁mean value Δ t ', and according to mean value Δ t ', described video lead frame is slowed down to frame rate processing; And

Synchronous correction unit 302, for estimating the time of the leading described current audio frame of described current video frame according to the reproduction time taudio of the current reproduction time tvideo of current video frame, the corresponding current video frame of current audio frame, and adopt the method for average to obtain the Δ t ' average time of the leading described current audio frame of described current video frame _{v_ahead_a}, and according to Δ t ' average time _{v_ahead_a}slow down the frame rate of video playback or carry out frame-skipping processing.

It should be noted that because initial synchronization and synchronous correction part are front being described in detail, therefore do not repeat them here.

Please refer to Fig. 7, in another embodiment of system of the present invention, this system can realize the functions such as recording, possesses the function of wireless Karaoke again, is a wireless image-sound system.

Particularly, the system in the present embodiment comprises host computer 100 ', slave computer 200 ', sound collection equipment 400 and sound equipment 500.Wherein, sound collection equipment 400 is for gathering voice data, and sound equipment 500 is for playing new audio frequency played data.

Particularly, slave computer 200 ', except the audio decoder and filtration module 201 ' that comprise in the first embodiment, also comprises:

Audio mixing and audio adjustment module 202, for audio frequency played data and voice data are synthesized to processing, to obtain forming new audio frequency played data;

The second format converting module 203, for carrying out audio coding and format conversion to this new audio frequency played data;

The second transport module 204, for transferring to described host computer by the new audio frequency played data after audio coding and format conversion.

Particularly, host computer 100 ', except demultiplexing module 101 ', video decode and the filtration module 102 ', the first format converting module 103 ', the first transport module 104 ' and the synchronization module 30 ' part that comprise in the first embodiment, also comprises:

Audio/video flow merges module 105, for receiving this new audio frequency played data, and after video playback data and this new audio frequency played data are play and finished, both is carried out to audio frequency and video synthetic, to generate new media file;

User interactive module 106, for realizing the mutual of described system and user.

It should be noted that the first transport module 104 ' and the second transport module 204 are WIFI module, host computer 100 is intelligent mobile terminal equipment.

As can be seen from the above description, the real-time separating treatment of audio/video flow of the present invention and synchronous method and system, realize the separating treatment of audio/video flow, the shared network bandwidth while greatly having reduced transmission data, and accurate synchronization can guarantee that host computer is distinguished displaying video, audio frequency with slave computer time.

In conjunction with most preferred embodiment, invention has been described above, but the present invention is not limited to the embodiment of above announcement, and should contain the various amendments of carrying out according to essence of the present invention, equivalent combinations.

Claims

1. the real-time separating treatment of audio/video flow and a synchronous method, is characterized in that, comprising:

2. the method for claim 1, is characterized in that, carries out initial synchronization and comprise particularly when described video playback data and audio frequency played data are started:

(21) when playing the first frame of described video playback data, described host computer sends video lead frame packet to described slave computer, and described video lead frame packet comprises frame number m and the corresponding system time t of described video lead frame ₁;

(22) described slave computer receives after described video initial data bag, postbacks audio frequency start frame packet to described host computer, and described audio frequency start frame packet comprises frame number and the corresponding system time t of this audio frequency start frame ₂, and this frame number is identical with the frame number of described video lead frame;

(23) adopt the poor Δ t=t of method of average timing statistics ₂-t ₁mean value Δ t ';

(24) judge whether mean value Δ t ' exceedes predetermined threshold value, and according to judged result, described video lead frame is slowed down to frame rate processing.

3. method as claimed in claim 1 or 2, is characterized in that, specifically comprises carrying out synchronous correction processing in described video playback data and audio frequency played data playing process:

(31) while playing current video frame, described host computer sends synchronous correction packet to described slave computer, the current reproduction time tvideo that described synchronous correction packet comprises the frame number of described current video frame, calculates according to described current video frame and frame per second;

(32) described slave computer receives described synchronous correction packet, calculates according to the sample rate of the audio stream of decoding the reproduction time t that current audio frame is corresponding _audio, reformulate new Packet Generation to described host computer, new packet comprises frame of video sequence number, the current reproduction time tvideo of frame of video and the reproduction time taudio of current audio frame of described current video frame;

(33) described host computer receives new packet, calculates time tvideo ' corresponding to current video frame by the frame per second of video;

(34) estimate the described current video frame time of described current audio frame in advance according to formula (1)

(35) adopt the method for average to obtain the Δ t ' average time of the leading described current audio frame of described current video frame _{v_ahead_a};

(36) judge Δ t ' _{v_ahead_a}absolute value whether exceed predetermined threshold value;

(37) as Δ t ' _{v_ahead_a}absolute value exceed predetermined threshold value and as Δ t ' _{v_ahead_a}when >0, described host computer slows down the frame rate of video playback;

(38) as Δ t ' _{v_ahead_a}absolute value exceed predetermined threshold value and as Δ t ' _{v_ahead_a}when <0, described host computer carries out frame-skipping processing.

4. method as claimed in claim 3, is characterized in that, after S4, also comprises:

S6: gather voice data by audio input device, by audio mixing and audio adjustment module, described audio frequency played data and voice data are synthesized to processing, to obtain forming new audio frequency played data, and play this new audio frequency played data by sound equipment;

S7: described slave computer carries out transferring to described host computer after audio coding and format conversion to this new audio frequency played data;

S8: described host computer receives this new audio frequency played data, and after described video playback data and this new audio frequency played data are play and finished, carries out audio frequency and video by both synthetic, to generate new media file.

5. the real-time separating treatment of audio/video flow and a synchronous system, comprise host computer and slave computer, and described host computer comprises:

Described slave computer comprises:

6. system as claimed in claim 5, is characterized in that, described synchronization module comprises:

Initial synchronization unit, for according to the frame number m of video lead frame packet and the corresponding system time t of described video lead frame ₁, frame number and the corresponding system time t of this audio frequency start frame in audio frequency start frame packet ₂, adopt the poor Δ t=t of method of average timing statistics ₂-t ₁mean value Δ t ', and according to mean value Δ t ', described video lead frame is slowed down to frame rate processing; And

Synchronous correction unit, for according to the current reproduction time t of current video frame _video, the corresponding current video frame of current audio frame reproduction time t _audioestimate the time of the leading described current audio frame of described current video frame, and adopt the method for average to obtain the average time of the leading described current audio frame of described current video frame

and according to average time

slow down the frame rate of video playback or carry out frame-skipping processing.

7. the system as described in claim 5 or 6, is characterized in that, this system also comprises:

Sound collection equipment, for gathering voice data;

Sound equipment, for playing new audio frequency played data;

Described slave computer also comprises:

Audio mixing and audio adjustment module, for described audio frequency played data and voice data are synthesized to processing, to obtain forming new audio frequency played data;

The second format converting module, for carrying out audio coding and format conversion to this new audio frequency played data;

The second transport module, for transferring to described host computer by the new audio frequency played data after audio coding and format conversion;

Described host computer also comprises:

Audio/video flow merges module, for receiving this new audio frequency played data, and after described video playback data and this new audio frequency played data are play and finished, both is carried out to audio frequency and video synthetic, to generate new media file.

8. system as claimed in claim 7, is characterized in that, described the first transport module and the second transport module are WIFI module.

9. system as claimed in claim 7, is characterized in that, described host computer also comprises:

User interactive module, for realizing the mutual of described system and user.

10. system as claimed in claim 5, is characterized in that, described host computer is intelligent mobile terminal equipment.