CN109194974A

CN109194974A - Media low latency communication means and system for internet video live broadcasting

Info

Publication number: CN109194974A
Application number: CN201811136034.8A
Authority: CN
Inventors: 杨罡
Original assignee: Beijing Beidou Fangyuan Electronic Technology Co Ltd
Current assignee: Beijing net Hi Tech Co., Ltd
Priority date: 2018-09-28
Filing date: 2018-09-28
Publication date: 2019-01-11
Anticipated expiration: 2038-09-28
Also published as: CN109194974B

Abstract

The present invention provides the media low latency communication means and system for internet video live broadcasting, this method comprises: whether the scene changes degree between current video frame image and previous video frame images that judgement is got is more than preset scene change threshold, data subpackage is carried out respectively to current video frame and the respective audio frame got, and according to scene changes degree whether be more than preset scene change threshold judging result, corresponding scene markers are added in each video data packet and packets of audio data, buffers video data packet and packets of audio data, transmission priority according to scene markers adjustment video data packet and packets of audio data, and video data packet and packets of audio data are issued according to priority is sent, recipient buffers the video data packet and packets of audio data received according to scene markers, to realize that audio, video data plays.This method can adjust the priority of sound, transmission of video, with network communication have delay or it is unstable when preferentially send significant data.

Description

Media low latency communication means and system for internet video live broadcasting

Technical field

The present invention relates to network communication technology fields, in particular to are used for the media low latency communication party of internet video live broadcasting Method and system.

Background technique

Global Internet has been expanded rapidly since last century the nineties, have become the world today push economic development and The important information infrastructure of social progress.Since internet rises, has a variety of different direct or indirect utilize and interconnected The application of net is developed, so that daily life is more convenient, and uses a network for video conference, network direct broadcasting Etc. being the communication mode that developed based on internet, can make to carry out video/audio friendship apart from farther away multi-party participant Stream, the sound of the image and sending that enable each side participant that other each side participants to be watched to show, greatly facilitates ginseng With the process of person's information interchange.

For example, live streaming side can send out the audio-video comprising broadcasting content when carrying out internet video live broadcasting using internet It send to viewing side, viewing side is not necessarily to locality viewing performance or match such as performance scene, in-situ match, but can be with Audio/video information is sent/received in the position that can arbitrarily connect network, realizes internet video live broadcasting.It (is broadcast live in sender Before audio-video just) is sent to recipient (i.e. viewing side), audio-video is first obtained into video frame one by one by coding And audio frame is decoded to obtain audio-video document and is played out then sequentially by network transmission to recipient by recipient.

During stating internet video live broadcasting on the implementation, the communication between live streaming side and viewing side may be due to network Transmission speed is relatively slow or the reasons such as unstable networks and is interfered, and leads to that signal delay occurs during live streaming, Image and sound are discontinuous or asynchronous so that viewing side signal be delayed during can not normal viewing live content, to view Frequency live streaming process generates biggish negative effect.

Summary of the invention

(1) goal of the invention

To overcome above-mentioned at least one defect of the existing technology, make up leads to viewing side because network state is bad as far as possible Can not audiovisual internet video live broadcasting content, the present invention provide following technical scheme.

(2) technical solution

As the first aspect of the present invention, the invention discloses a kind of media low latency communications for internet video live broadcasting Method, comprising:

Judge the scene changes degree between the current video frame image and previous video frame images that get whether be more than Preset scene change threshold；

Data subpackage is carried out to the current video frame and the respective audio frame got respectively, and is become according to the scene Change degree whether be more than preset scene change threshold judging result, added in each video data packet and packets of audio data Corresponding scene markers；

The video data packet and the packets of audio data are buffered, adjusts the video data packet according to the scene markers With the transmission priority of the packets of audio data, and the video data packet and the audio are issued according to the transmission priority Data packet；

Recipient buffers the video data packet and the packets of audio data received according to the scene markers, with reality Existing audio, video data plays.

In a kind of possible embodiment, the video frame got and the audio frame got by recording in real time The audio-video of system is encoded to obtain.

In a kind of possible embodiment, the scene changes journey is judged by frame differential method and/or background subtraction Whether degree is more than preset scene change threshold.

In a kind of possible embodiment, the buffer size of recipient is variable, is adjusted according to the scene markers.

In a kind of possible embodiment, which is characterized in that the current video frame and got corresponding described Audio frame is carried out respectively before data subpackage, further includes: adds corresponding time identifier in each frame video frame and audio frame； Also,

The video data packet received and the audio data are buffered according to the scene markers in the recipient After packet, further includes: it is excellent to limit the transmission for the time identifier carried according to each data packet and preset duration threshold value The high data of first grade are ahead of the duration of the low data playback of the transmission priority.

In a kind of possible embodiment, the duration threshold value variable is adjusted according to the scene markers.

As a second aspect of the invention, the invention discloses a kind of media low latency communications for internet video live broadcasting System, comprising:

Scene judgment module, the scene between current video frame image and previous video frame images for judging to get Whether variation degree is more than preset scene change threshold；

Data subpackage module, for carrying out data point respectively to the current video frame and the respective audio frame got Packet；

Scene identity module, for whether being more than default according to the scene changes degree after the data subpackage Scene change threshold judging result, corresponding scene markers are added in each video data packet and packets of audio data；

Buffer module is sent, for buffering the video data packet and the sound after the corresponding scene markers of the addition Frequency data packet, adjusts the transmission priority of the video data packet and the packets of audio data according to the scene markers, and according to The video data packet and the packets of audio data are issued according to the transmission priority；

Buffer module is received, for buffering the video data packet and the audio that receive according to the scene markers Data packet, to realize that audio, video data plays.

In a kind of possible embodiment, the system further include:

Coding records module, and the audio-video and being encoded to it for real-time recording obtains the video frame got With the audio frame got.

In a kind of possible embodiment, the scene judgment module includes: that the first judging unit and/or the second judgement are single Member, first judging unit are used to judge by frame differential method whether the scene changes degree to be more than that preset scene becomes Change threshold value, the second judgment unit is used to judge whether the scene changes degree is more than preset field by background subtraction Scape change threshold.

In a kind of possible embodiment, the transmission buffer module includes: the first adjusting unit, for according to the field Scape label adjusts the buffer size for sending buffer module.

In a kind of possible embodiment, the system further include:

Time identifier module, for being added in each frame video frame and audio frame corresponding before the data subpackage Time identifier；

Time difference adjustment module, for being carried according to each data packet after the reception buffer module buffered data packet The time identifier and preset duration threshold value limit the high data of priority that send and are ahead of the transmission priority The duration of low data playback.

In a kind of possible embodiment, the duration threshold value variable, the time difference adjustment module is according to the scene mark The duration threshold value is adjusted in note.

(3) beneficial effect

Provided by the present invention for the media low latency communication means and system of internet video live broadcasting, live streaming side can be sent out The content of video is judged in the audio-video comprising live content out, to adjust the preferential of transmission of video and audio transmission Grade, with internet video live broadcasting have delay or it is unstable when preferentially send significant data, reduce the interference that network delay generates, protect Demonstrate,prove the key content that recipient can not interrupt to the full extent, not miss live streaming；When simultaneously by being added in audio, video data Between mark limit the duration postponed of data packet for postponing sending, the data packet for preventing priority low falls behind too many.

Detailed description of the invention

It is exemplary below with reference to the embodiment of attached drawing description, it is intended to for the explanation and illustration present invention, and cannot manage Solution is the limitation to protection scope of the present invention.

Fig. 1 is the process provided by the present invention for the media low delay communication means first embodiment of internet video live broadcasting Schematic diagram.

Fig. 2 is the process provided by the present invention for the media low delay communication means second embodiment of internet video live broadcasting Schematic diagram.

Fig. 3 is the structure provided by the present invention for the media low delay communication system first embodiment of internet video live broadcasting Block diagram.

Fig. 4 is the structure provided by the present invention for the media low delay communication system second embodiment of internet video live broadcasting Block diagram.

Specific embodiment

To keep the purposes, technical schemes and advantages of the invention implemented clearer, below in conjunction in the embodiment of the present invention Attached drawing, technical solution in the embodiment of the present invention is further described in more detail.

It should be understood that in the accompanying drawings, from beginning to end same or similar label indicate same or similar element or Element with the same or similar functions.Described embodiments are some of the embodiments of the present invention, rather than whole implementation Example, in the absence of conflict, the features in the embodiments and the embodiments of the present application can be combined with each other.Based in the present invention Embodiment, every other embodiment obtained by those of ordinary skill in the art without making creative efforts, It shall fall within the protection scope of the present invention.

Herein, " first ", " second " etc. are only used for mutual differentiation, rather than indicate their significance level and sequence Deng.

The division of module, unit or assembly herein is only a kind of division of logic function, in actual implementation may be used To there is other division modes, such as multiple modules and/or unit can be combined or are integrated in another system.As separation The module of part description, unit, component are also possible to indiscrete may be physically separated.It is shown as a unit Component can be physical unit, may not be physical unit, it can is located at a specific place, may be distributed over grid In unit.Therefore some or all of units can be selected to realize the scheme of embodiment according to actual needs.

The media low delay communication means for internet video live broadcasting provided below with reference to Fig. 1 the present invention is described in detail First embodiment.The present embodiment is mainly used in internet video live broadcasting, and what can be issued to sender includes network direct broadcasting content Audio-video in the content of video judged, to adjust the priority of transmission of video and audio transmission, to have in network communication It is delayed or preferentially sends significant data when unstable, reduce the interference that network delay generates, guarantee that recipient can be utmostly On do not interrupt, do not miss the key content of network direct broadcasting.

As shown in Figure 1, media low delay communication means provided in this embodiment includes the following steps:

Step 100, judge the scene changes degree between the current video frame image got and previous video frame images It whether is more than preset scene change threshold.

One video frame is a width still image, and multiple video frames combine as one section of video, and audio frame is similarly. During obtaining video frame and audio frame to audio/video coding, video frame can be implemented with scene judgment mechanism, specific side Formula is to identify to the video frame Vn and the image of previous video frame Vn-1 that currently need to be sent to recipient, is obtained adjacent The scene changes degree of two field pictures, and judge scene changes degree Sn whether there is or not beyond preset scene change threshold St.

If scene changes degree Sn exceeds St, then it represents that video frame Vn and video frame Vn-1 have significant change in scene. Such as the picture of video frame Vn is the scene of spectators under the platform of concert scene, the picture of video frame Vn-1 is that singer sings before the lights The scene of song, the scene changes degree of two frame pictures can exceed scene change threshold St in such cases, can be judged as aobvious Write variation.

If Sn is without departing from St for scene changes degree, then it represents that video frame Vn and video frame Vn-1 do not have significant change in scene Change.Such as the picture of the picture and video frame Vn-1 of video frame Vn is the scene that singer sings before the lights, in such cases The scene changes degree of two frame pictures can be judged as not having significant change without departing from scene change threshold St.

It should be noted that can include many video frame and audio frame in one section of audio-video, in addition to first frame video frame Except audio frame, subsequent all video frames are required compared with former frame carries out above-mentioned scene changes degree.

It should also be noted that, with the progress scene changes degree comparison of current video frame image in addition to previous video frame figure As outside, it can also be preceding N video frame images, N > 1 can also be and preceding second frame image comparison, preceding third frame image pair Than etc., frame-skipping compares, and the specific value of N can be arranged according to the actual situation, but should not be arranged excessive, otherwise may omit The case where long interior scene changes become again again when one section shorter.

Step 200, data subpackage is carried out to current video frame and the respective audio frame got respectively, and is become according to scene Change degree whether be more than preset scene change threshold judging result, added in each video data packet and packets of audio data Corresponding scene markers.

After whether the picture for the picture and video frame Vn-1 for judging video frame Vn has significant change, by video frame Vn And corresponding audio frame An carries out data subpackage respectively.It, can be by data point when being carried out data transmission using internet It sends at multiple data packets and in batches, recipient, which receives, will do it integration after data packet, obtain capable of completely playing Data.

After video frame Vn and audio frame An is carried out data subpackage, multiple video data packets and multiple audio datas are obtained Packet, adds scene markers in each video data packet and each packets of audio data.The scene markers of addition are depended on not Whether scene changes degree is more than preset scene change threshold before subpackage, if the scene of video frame Vn is relative to video frame Vn-1 has significant change, then the scene markers that add in the data packet of video frame Vn and adds in the data packet of video frame Vn-1 The scene markers that add are different, using the picture of above-mentioned video frame Vn as concert scene platform under spectators scene and video frame Vn-1 For picture is the scene sung before the lights of singer, the scene mark that is added in the data packet of video frame Vn-1 and audio frame An-1 It is denoted as Sm, then the scene markers added in the data packet of video frame Vn and video frame Vn-1 are Sm+1.And if video frame Vn The picture of picture and video frame Vn-1 is without significant changes, such as is the scene that singer sings before the lights, then video frame Vn and The scene markers added in the data packet of audio frame An are identical as video frame Vn-1 and audio frame An-1, are Sm.

It is understood that data packet can use UDP message packet or TCP data packet.UDP(User Datagram Protocol, User Datagram Protocol) it is a kind of connectionless agreement, it is passed for providing the simple unreliable information towards affairs Business is taken, can support the network application for needing to transmit data between the computers, UDP does not carry out reliability to transmission data packet Guarantee, is suitable for once transmitting low volume data, and transmission speed is very fast.TCP (Transmission Control Protocol, Transmission control protocol) it is a kind of Connection-oriented Protocol, it comprises special transmitting pledge systems, and can ensure that the hair of data It send and reception sequence, but needs occupying system resources, transmission speed is under equal conditions slower than UDP.

Step 300, buffers video data packet and packets of audio data, according to scene markers adjustment video data packet and audio number Video data packet and packets of audio data are issued according to the transmission priority of packet, and according to priority is sent.

Sender is by sending buffer pool come buffered data.Video frame Vn and audio frame An are being subjected to data subpackage respectively Later, video data packet is sent in the video transmission buffer area for sending buffer pool, it is slow that packets of audio data is sent to transmission The audio for rushing pond sends buffer area.Buffer pool is used for buffered data, for retransmiting away data accumulation afterwards to a certain extent.

After video data packet and packets of audio data are sent to and send buffer pool, the data packet to current video frame is needed Scene markers and the scene markers of the data packet of previous video frame detected, to judge it is same whether adjacent video frames belong to Scene.Scene markers are identical, indicate that current video frame and previous video frame are same scene, scene markers difference is then different fields Scape, and then separate same scene and lower to haircut and the mode of priority and different scenes downward is given to haircut and give the mode of priority.

It is understood that since packets of audio data is also added to scene markers, it can also be current by judging Whether the scene markers of the data packet of the scene markers of the data packet of audio frame and previous audio frame are identical to know whether same field Scape.

In the case where current video frame and previous video frame are with scene, show the live streaming picture of current video frame with before The live streaming picture of one video frame is essentially identical, such as the picture of two frame is scene that singer sings before the lights, then at this time The main contents of live streaming are the song (audio) that singer sings rather than the background (video) of the looks of singer and stage, therefore sound The live content that conversation structure includes is more, and the transmission priority of packets of audio data is adjusted to the hair higher than video data packet at this time Send priority, that is, when the picture of the picture and former frame of judging present frame is with scene, send the video that buffer pool is sent The ratio of data packet and packets of audio data is tilted to packets of audio data.

For example, initial time, sends every millisecond of transmission m data packet of buffer pool, middle pitch, video data packet ratio are 1: 9, after the data packet of first frame video frame and the data packet of first frame audio frame issue, the data packet of the second frame video frame and the The data packet of two frame audio frames, which enters, to be sent in buffer pool, and is detected the second frame video frame and the same field of first frame video frame Scape, then at this time send buffer pool in sound, video data packet ratio be adjusted to 2:8 so that recipient receive it is same amount of When data packet, packets of audio data accounting is some more.

If, even if the transmission accounting of packets of audio data improves, recipient also can in the preferable situation of network state It is synchronously received video data packet and packets of audio data.If network state is deteriorated suddenly, such as network speed is relatively slow, network is unstable Fixed etc., then since the transmission accounting of packets of audio data improves, the packets of audio data accounting that recipient (viewing side) receives also can It improves, audio data can buffer in advance than video data and finish and play back, then in recipient, network direct broadcasting concert The picture of window might have Caton, but the song of singer can keep continuously, guaranteeing that recipient can be to the full extent as far as possible It does not interrupt, miss the emphasis of concert.

In the case where current video frame and previous video frame are different scenes, show the live streaming picture of current video frame with The live streaming picture difference of previous video frame is larger, for example, from the scene transitions of spectators under the concert scene platform of former frame be current The scene that the singer of frame sings before the lights, then singer might have excellent performance while singing at this time, therefore Image information may or picture more important than acoustic information and acoustic information it is of equal importance.At this time by the transmission of video data packet Priority is adjusted to the transmission priority greater than or equal to packets of audio data, that is, in the picture and former frame for judging present frame Picture when being different scenes, send video data packet that buffer pool is sent and the ratio of packets of audio data inclined to video data packet Tiltedly or ratio is restored to initial proportion.

For example, after the data packet of n-th frame video frame and the data packet of n-th frame audio frame issue, the (n+1)th frame video frame Data packet and the data packet of the (n+1)th frame audio frame, which enter, to be sent in buffer pool, and is detected the (n+1)th frame video frame and n-th The scene of frame video frame is different, then sends the sound in buffer pool at this time, video data packet ratio is adjusted to from the 2:8 under same scene 0.5:9.5, when so that recipient receiving same amount of data packet, video data packet accounting is some more；Or sound, video Data packet ratio is restored to initial proportion 1:9, makes the two is balanced to issue.

If in the preferable situation of network state, even if the transmission accounting of video data packet is mentioned relative to initial proportion Height, recipient can also be synchronously received video data packet and packets of audio data.If network state is poor, then due to video The transmission accounting of data packet improves, and the video data packet accounting that recipient (viewing side) receives can also improve, video data meeting It is buffered in advance than audio data and finishes and play back, then in recipient, the sound of network direct broadcasting window might have card , but the content of the performance of singer at the scene can keep as far as possible continuously, guarantee recipient can to the full extent not in The key content of live action performance that is disconnected, not missing singer.It is restored in sending buffer pool in sound, video data packet initial Normal rates when, sound that recipient (viewing side) receives, video data packet also can be more balanced, and picture and sound all may Caton is had, but relative to having for the case where priority, sound and picture are more taken into account.

As shown in the above, the bandwidth allocation of the practical as data transmission of priority is sent.

Whole network live streaming during, no matter the quality of network environment, send buffer pool in data packet always exist Judge same scene or different scenes according to scene markers and reconcile again to issue data packet according to priority after priority.

Step 400, recipient buffers the video data packet and packets of audio data received according to scene markers, to realize sound Video data plays.

Recipient is by receiving buffer pool come buffered data.Sender (live streaming side) is according to sending priority for sound, video After data packet issues, recipient (viewing side) receives sound, video data packet, and is put into and receives in buffer pool, wherein audio number Audio is put into according to packet and receives buffer area, and video data packet is put into video reception buffer area.In sound, video buffer to certain data volume Afterwards, the scene markers according to the carrying of each data packet and some other feature, obtain video frame and audio frame for decoded packet data, And play out frame by frame so that recipient can audiovisual to concert, realize Internet net cast.

In the good situation of network state, the sound of sender's sending, video data packet can be received quickly, therefore nothing By data packet transmission priority how, recipient can buffer in time completes sound, video and plays back, so that recipient Broadcasting pictures and sound are continuous.

In the case where network state is bad, sender issues sound, video data packet according to priority is sent, and recipient is same When buffer sound, video data packet, under same scene, the transmission priority of packets of audio data is high, and preferential send includes significant data Packets of audio data, recipient can first buffer and finish audio data, therefore audio data can first be played and come out, and reduce network and prolong When the interference that generates.Under different scenes, the transmission priority of video data packet is high, preferential to send the video counts comprising significant data According to packet, recipient can first buffer and finish video data, therefore video data can first be played and come out, and the same network delay that reduces produces Raw interference.When network recovery is to kilter, for sender, sender remain according to the identification of scene come by It is approved for distribution that priority is sent to issue data packet, for recipient, since the good therefore buffer speed of network environment becomes faster, recipient It is asynchronous due to generating sound picture when network state before is bad, it can cut fall behind due to completion in audio-video because not buffering at this time The part of broadcasting, directly from the advanced broadcast point of another party be played simultaneously after audio-video frequency content.

In one embodiment, the video frame got in step 100 and the audio frame got pass through real-time recording Audio-video encoded to obtain.

It is usually to be encoded by audio-video document as each video frame and audio frame of initial data.Sound view Frequency file is that scene passes through the recording arrangements real-time recordings such as camera.Recording side starts live broadcast when concert starts and drills Picture and sound that can be live is sung, the file of real-time recording is immediately encoded to obtain video frame and audio frame, and in cataloged procedure The middle judgement for carrying out subsequent scene changes degree.

When judgement two field pictures scene changes in step 100, the scene changes of current frame image and previous frame image Judgement is realized in an encoding process.

In one embodiment, the coding mode of audio-video is determined according to network state.

Coding mode includes the parameters such as video code rate, video resolution, video frame rate, audio code rate, audio sample rate.

The data bits of unit time transmission, i.e. sampling rate when video code rate is exactly data transmission, unit are kbps (kilobit It is per second).Sampling rate is bigger in unit time, and precision is higher, and data volume is also bigger, and the file dealt is just closer to original File.Audio code rate is identical as video code rate principle, and audio code rate is the sampling rate of audio.

Resolution ratio is to be often expressed as ppi (Pixel per for measuring the parameter that data volume is how many in image Inch, per inch pixel), resolution ratio is higher, and video is more clear, and data volume is also bigger.

Frame per second (Frame rate) be for measure display frame number measurement, unit be FPS (Frames per Second, Display frame number per second).Frame per second is higher, and the more smooth picture the more true to nature, and frame per second is too low, and picture has Caton.

Sample rate refers to that each second acquires how many a sample of signal, and sample frequency is higher, then obtains within the unit time Sample data is more, also more accurate to the expression of signal waveform, and the sound quality of audio, tone more restore, and data volume is also bigger.

It is understood that when network state is bad, code rate that when coding uses, resolution ratio, frame per second, sample rate It is relatively lower etc. what can be arranged, to compress, reduce data volume, so that recipient can accurately receive the same of live content When reduce requirement to network speed, to guarantee the normal transmission of data.When network state is good, code that when coding uses Rate, resolution ratio, frame per second, sample rate etc. can be set normal or higher, to improve the viewing experience of viewing side.

In one embodiment, judge whether scene changes degree surpasses by frame differential method and/or background subtraction Cross preset scene change threshold.

When judging whether there is scene changes in step 100, two frame video frame figures can be judged by frame differential method It, can also be simultaneously as that can also judge that whether there is or not scene changes for two frame video frame images by background subtraction whether there is or not scene changes The double-deck judgement is carried out using frame differential method and background subtraction, further increases the accuracy of judging result.

Frame differential method is by two frame adjacent in video flowing or to be separated by the two images pixel values of a few frame images and subtract each other, and is obtained Two field pictures brightness absolute value of the difference, by judge the absolute value whether be greater than preset threshold value determine two frames scene whether Have significant change.

Background subtraction be using in image sequence present frame and reference background model relatively detect moving object A kind of method obtains the grayscale image of target moving region by the way that the picture frame currently obtained and background image are done calculus of differences, Grayscale image thresholding and take absolute value, determines the field of two frames by judging whether the absolute value is greater than preset threshold value Whether scape has significant change.Wherein, background image is updated according to the current picture frame that obtains.

In one embodiment, the buffer size of recipient is variable, is adjusted according to scene markers.

Recipient includes receiving buffer pool, and the video reception buffer area and audio that receive buffer pool receive the size of buffer area It can be set to thick-and-thin, may be set to be variable.If video reception buffer area and audio receive setting buffers Variable for size, then the foundation changed is to make an addition to the scene markers in data packet in step 200.

During being buffered to the sound that receives, video data packet, is decoded, sound, video data packet can be carried Scene markers detected.If the scene markers for detecting that data packet carries are identical, then it represents that the picture field of two frame video frames Scape does not have significant changes, can suitably reduce the buffer capacity for receiving buffer pool at this time, is equivalent to and reduces the slow of audio, video data It rushes the time, and then plays out the audio, video data having received as early as possible.In the case where Network status is bad, due to audio data The occupied space of packet is relatively small, and can suitably allow video pictures second-rate under scene, therefore appropriate reduce connects The buffer capacity of buffer pool is received, to play out audio content as early as possible, reduces the Caton phenomenon during live streaming.

If detecting, the scene markers that data packet carries are different, then it represents that the pictured scene of two frame video frames has significant become Change, can suitably increase the buffer capacity for receiving buffer pool at this time, is equivalent to the buffer time for increasing audio, video data, and then make The audio, video data having received plays again after having buffered one section.In the case where Network status is bad, due to video data packet Occupied space it is relatively large, it is therefore desirable to buffering is completed to broadcast again after the video of one section of duration, therefore appropriate increase receives The buffer capacity of buffer pool prevents to have buffered enough videos played for example less than one second and just plays immediately, straight to reduce Caton sense during broadcasting.

In one embodiment, after sender issues data packet, the confirmation signal of recipient's feedback is waited, is being received Corresponding data packet in buffer area is deleted after the confirmation signal, sender is being more than that setting time does not receive confirmation signal When, data packet is re-emitted automatically.

The transmission buffer pool of sender is anti-to sender if recipient receives data packet after issuing data packet Confirmation signal is presented, shows oneself to have received data packet, sender both can delete the data packet in sending buffer pool, if but Packet loss phenomenon occurs for data packet, then sender determines the data packet issued when being more than that setting time does not receive confirmation signal It loses, and re-emits data packet automatically, to again attempt to make recipient's received data packet.

The media low delay communication means for internet video live broadcasting provided below with reference to Fig. 2 the present invention is described in detail Second embodiment.The content of video carries out in the audio-video comprising network direct broadcasting content that the present embodiment can issue sender Judgement, to adjust the priority of transmission of video and audio transmission, with network communication have delay or it is unstable when preferentially send weight Data are wanted, the interference that network delay generates is reduced, guarantee that recipient can not interrupt to the full extent, not miss network direct broadcasting Key content；The duration that the data packet for postponing sending is postponed is limited by adding time identifier in audio, video data simultaneously, The data packet for preventing priority low falls behind too many.

As shown in Fig. 2, media low delay communication means provided in this embodiment includes the following steps:

Step 100, judge the scene changes degree between the current video frame image got and previous video frame images Whether it is more than preset scene change threshold, and adds corresponding time identifier in each frame video frame and audio frame.

During obtaining video frame and audio frame to audio/video coding, video frame can be implemented and judge machine with scene System, concrete mode are to know to the video frame Vn and the image of previous video frame Vn-1 that currently need to be sent to recipient Not, the scene changes degree of adjacent two field pictures is obtained, and judges scene changes degree Sn whether there is or not beyond preset scene changes Threshold value St.If scene changes degree Sn exceeds St, then it represents that video frame Vn and video frame Vn-1 have significant change in scene.If Sn is without departing from St for scene changes degree, then it represents that video frame Vn and video frame Vn-1 do not have significant change in scene.

It should be noted that with the progress scene changes degree comparison of current video frame image in addition to previous video frame images Outside, preceding N video frame images be can also be, N > 1 can also be and preceding second frame image comparison, preceding third frame image comparison Deng frame-skipping compares, and the specific value of N can be arranged according to the actual situation, but should not be arranged excessive, otherwise may omit The case where long interior scene changes become again again when one section shorter.

It is view according to acquisition time or clock signal etc. since the time for getting each frame video frame is different Frequency frame adds time identifier, and the time identifier of each video frame is different at the time of included.Audio frame is then and synchronous view Frequency frame have same time identifier, therefore each audio frame time identifier it is included at the time of it is also different.Specifically, when Between mark can be timestamp.Timestamp (timestamp) is to indicate that a data have been deposited before some specific time , the complete, data that can verify that, a usually character string uniquely identifies the time at certain a moment.

It is understood that being also possible to first add time identifier in video frame and audio frame, two frames are then judged again Whether the scene changes degree of video frame images exceeds scene change threshold, and the execution sequence of the two has no Compulsory Feature.

After whether the picture for the picture and video frame Vn-1 for judging video frame Vn has significant change, by video frame Vn And corresponding audio frame An carries out data subpackage respectively.

After video frame Vn and audio frame An is carried out data subpackage, multiple video data packets and multiple audio datas are obtained Packet, adds scene markers in each video data packet and each packets of audio data.If the scene of video frame Vn is relative to view Frequency frame Vn-1 has significant change, then the scene markers added in the data packet of video frame Vn and the data packet in video frame Vn-1 The scene markers of middle addition are different, if the scene of video frame Vn relative to video frame Vn-1 without significant changes, in video frame Vn Data packet and video frame Vn-1 data packet in add identical scene markers.The scene markers of audio frame according to video frame and Come, the video frame of synchronization and the scene markers of audio frame are identical.

It should be noted that the video frame and audio frame that carry time identifier be after data subpackage, each data packet according to It is old to carry time identifier.

After video frame Vn and audio frame An are carried out data subpackage respectively, video data packet is sent to transmission buffering The video in pond is sent in buffer area, and packets of audio data is sent to the audio transmission buffer area for sending buffer pool.

After video data packet and packets of audio data are sent to and send buffer pool, the data packet to current video frame is needed Scene markers and the scene markers of the data packet of previous video frame detected, scene markers are identical, indicate current video number It is same scene according to video frame belonging to video frame and previous video data packet belonging to packet, scene markers difference is then different fields Scape.

In the case where current video frame and previous video frame are with scene, the transmission priority of packets of audio data is adjusted For the transmission priority higher than video data packet, so that in the bandwidth allocation for sending data, bandwidth that packets of audio data occupies Suitably increase.Since the transmission accounting of packets of audio data improves, the packets of audio data accounting that recipient (viewing side) receives It can improve, audio data can buffer in advance than video data and finish and play back, so that when network direct broadcasting, such as concert is existing When the live streaming of field, the song of singer can keep continuously, guaranteeing that recipient can not interrupt to the full extent, not miss performance as far as possible Key content.

In the case where current video frame and previous video frame are different scenes, by the transmission priority tune of video data packet The whole transmission priority for greater than or equal to packets of audio data, so that video data packet accounts in the bandwidth allocation for sending data Bandwidth suitably increases.Since the transmission accounting of video data packet improves, the video data that recipient (viewing side) receives Packet accounting can also improve, and video data can buffer in advance than audio data and finish and play back, and enable the picture of concert It keeps continuous as far as possible, guarantees that recipient can not interrupt to the full extent, not miss the key content of performance.

Step 400, recipient buffers the video data packet and packets of audio data received according to scene markers, to realize sound Video data plays, and time identifier and preset duration threshold value according to the carrying of each data packet, limitation send priority High data are ahead of the duration for sending the low data playback of priority.

Sender will send the data sending for buffering and finishing in buffer pool, and recipient receives the video of sender's sending Data packet and packets of audio data, and the scene markers that carry according to data packet and the data such as packet header know data packet Not, it buffers, while receiving buffer pool after reception of the data packet, also to the scene markers of the data packet of current video frame with before The scene markers of the data packet of one video frame are detected, to judge whether adjacent video frames belong to Same Scene.Scene markers It is identical, indicate that current video frame and previous video frame are same scene, scene markers difference is then different scenes.In data buffering It after completion, needs to judge the time difference of audio and video playing, prevents the advanced another party of audio or video excessive.In data Decoding and playing audio-video content, realize the purpose of internet video live broadcasting after buffering.

When sender preferentially sends the high data packet of priority, for example, sender continues preferentially to send packets of audio data When, if Network status is bad, the audio that recipient plays can be ahead of video, and if Network status is bad for a long time State before playing back the audio, also needs the time to packets of audio data then in order to avoid to be ahead of video excessive for the audio of broadcasting Mark is detected, if receiving the time identifier of packets of audio data that will be played in buffer pool is Tn, video data packet when Between identify also as Tn, then show that sound, video data packet are played simultaneously, the time difference be not present between sound, video.

If the time identifier for receiving the packets of audio data that will be played in buffer pool is Tn+10, the time of video data packet Mark is also Tn, then audio advanced time difference of 10 units of video, has been equivalent in advance 10 frames.It at this time will be advanced with regard to needs Frame number and preset duration threshold value be compared, if duration threshold value be 20, advanced frame number is simultaneously less than duration threshold value, because This allows to continue to play audio in advance, and video can only then be broadcast since data packet is transmitted unsmooth in a network with Caton state It puts.If duration threshold value is 10 frames, advanced frame number has reached the limitation of duration threshold value, cannot constantly broadcast in advance again at this time Playback frequency cannot draw nonsynchronous time difference expanding sound, but need to abandon video and lag behind audio without playing video Partial content, when video data is buffered to Tn+10, sound, video are directly broadcast since time identifier is the position of Tn+10 It puts.

If recipient detects the scene of the sound, video data packet that receive during above-mentioned audio plays in advance Label changes, then it represents that the pictured scene of the frame is different from former frame, and video playing may catch up with a part by sound at this time Frequently advanced part, scene followed by no longer change again, then audio may will continue to increase the length for being ahead of video.

If network state is by badly improving, sound, video data when audio leads video plays and is less than duration threshold value Bao Junneng is normally received and smooth buffering, then can abandon video and lag behind the content of audio-frequency unit, and time mark is played simultaneously Know identical sound, video data, the time difference between sound, video is zero at this time.

In one embodiment, duration threshold value variable is adjusted according to scene markers.

For judge the duration threshold value of the advanced argument of the data packet played in advance beyond setting value be it is adjustable, specifically according to It is adjusted according to the scene markers of the data packet received.

If receiving the scene markers for the current video frame data packet that buffer pool receives and the field of previous video requency frame data packet Scape label is identical, i.e. non-occurrence scene variation, at this time sender can it is multiple go out audio data, then duration threshold value can suitably become Greatly, so as to receive buffer pool when not receiving complete video frame slowly, audio is allowed mostly to play in advance.

If receiving the scene markers for the current video frame data packet that buffer pool receives and the field of previous video requency frame data packet Scape label is different, that is, scene changes have occurred, at this time sender can it is multiple go out video data, then duration threshold value can suitably become Small, so that not get higher than video excessive for audio, sound, video can synchronize broadcast as far as possible.

It is usually to be encoded by audio-video document as each video frame and audio frame of initial data.Sound view Frequency file is that scene passes through the recording arrangements real-time recordings such as camera.

Coding mode includes the parameters such as video code rate, video resolution, video frame rate, audio code rate, audio sample rate.It can With understanding, when network state is bad, code rate, resolution ratio, frame per second, the sample rate etc. that when coding uses can be arranged It is relatively lower, to compress, reduce data volume, so that recipient's reduction while can accurately receive network direct broadcasting content Requirement to network speed, to guarantee the normal transmission of data.When network state is good, code rate that when coding uses is differentiated Rate, frame per second, sample rate etc. can be set normal or higher, to improve the viewing experience of viewing side.

During being buffered to the sound that receives, video data packet, is decoded, sound, video data packet can be carried Scene markers detected.If the scene markers for detecting that data packet carries are identical, then it represents that the picture field of two frame video frames Scape does not have significant changes, can suitably reduce the buffer capacity for receiving buffer pool at this time, in the case where Network status is bad, fits Audio content can be played out as early as possible when reducing the buffer capacity for receiving buffer pool, reduce the Caton phenomenon during live streaming.

If detecting, the scene markers that data packet carries are different, then it represents that the pictured scene of two frame video frames has significant become Change, can suitably increase the buffer capacity for receiving buffer pool at this time, in the case where Network status is bad, appropriate increase receives slow The buffer capacity in pond is rushed, can prevent to have buffered enough videos played for example less than one second and just play immediately, to reduce Caton sense during live streaming.

The media low delay communication system for internet video live broadcasting provided below with reference to Fig. 3 the present invention is described in detail First embodiment.The present embodiment is the system for implementing preceding method first embodiment, and it is straight which is mainly used in network video It broadcasts, the content of video judges in the audio-video comprising network direct broadcasting content that can be issued to sender, to adjust video The priority of transmission and audio transmission, with network communication have delay or it is unstable when preferentially send significant data, reduce network Be delayed the interference generated, guarantees that recipient can not interrupt to the full extent, not miss the key content of network direct broadcasting.

As shown in figure 3, media low delay communication system provided in this embodiment includes:

Scene judgment module, the scene between current video frame image and previous video frame images for judging to get Whether variation degree is more than preset scene change threshold.It is determined as that two frame scenes are different if being more than scene change threshold, not Then it is determined as that two frame scenes are identical more than scene change threshold.

Data subpackage module, for the current video frame by the judgement of scene judgment module and the respective audio got Frame carries out data subpackage respectively.

Scene identity module is connect with scene judgment module, is used for after data subpackage, according to scene changes degree Whether be more than preset scene change threshold judging result, added in each video data packet and packets of audio data corresponding Scene markers.If scene judgment module determines that the scene of two frames is identical, the scene markers added in data packet are identical, if scene Judgment module determines that the scene of two frames is different, then the scene markers added in data packet are different.

Buffer module is sent, is used for after adding corresponding scene markers, buffers video data packet and packets of audio data, foundation Scene markers adjust the transmission priority of video data packet and packets of audio data, and issue video data packet according to priority is sent And packets of audio data.Sending buffer module includes the transmission buffer pool for buffers video data packet and packets of audio data.If two The scene markers of frame are identical, then determine that the priority of audio data is high, send the transmission bandwidth middle pitch frequency of buffer module at this time According to accounting improve, when sending data packet, the traffic volume of packets of audio data is with respect to increasing.If the scene markers of two frames are different, Then determine that the priority of video data is high, the accounting for sending video data in the transmission bandwidth of buffer module at this time improves, and is sending out When sending data packet, the traffic volume of packets of audio data is opposite to be increased.

Buffer module is received, is connect with the broadcasting end of recipient, for receiving the video counts for sending buffer module and issuing According to packet and packets of audio data, and the video data packet and packets of audio data received is buffered according to scene markers, to realize that sound regards Frequency data playback.Receiving buffer module includes the reception buffer pool for buffers video data packet and packets of audio data.If audio The priority of data packet is high, then receives buffer module and can buffer in advance and complete and play in advance audio so that Network status not When well and with scene, audio is preferentially played.If the priority of video data packet is high, receiving buffer module can buffer in advance At and play video in advance so that preferentially playing video when and different scenes bad in Network status.

It is sending side above dotted line in Fig. 3, is receiving side below dotted line.

In one embodiment, system further include: coding records module, connect with video recording equipment, for real-time The audio-video of recording is simultaneously encoded the video frame got and the audio frame got to it.

In one embodiment, scene judgment module includes: the first judging unit and/or second judgment unit, and first Judging unit is used to judge by frame differential method whether scene changes degree to be more than preset scene change threshold, the second judgement Unit is used to judge whether scene changes degree is more than preset scene change threshold by background subtraction.

In one embodiment, sending buffer module includes: the first adjusting unit, for adjusting hair according to scene markers Send the buffer size of buffer module.First adjusting unit is connect with buffer pool is sent.When two frame video frames are with scene, first is adjusted Section unit can suitably turn the buffer size for sending buffer module down, and in two frame video frame different scenes, first adjusts list Member can suitably tune up the buffer size for sending buffer module.

The media low delay communication system for internet video live broadcasting provided below with reference to Fig. 4 the present invention is described in detail Second embodiment.The present embodiment is the system for implementing preceding method second embodiment, the packet which can issue sender The content of video is judged in the audio-video of the content containing network direct broadcasting, to adjust the priority of transmission of video and audio transmission, With network communication have delay or it is unstable when preferentially send significant data, reduce the interference that network delay generates, guarantee to receive Side can not interrupt to the full extent, not miss the key content of network direct broadcasting.Simultaneously by adding the time in audio, video data It identifies to limit the duration that the data packet for postponing sending is postponed, the data packet for preventing priority low falls behind too many.

As shown in figure 4, media low delay communication system provided in this embodiment includes:

Time identifier module is used for before data subpackage, when adding corresponding in each frame video frame and audio frame Between identify, time identifier can use timestamp.

Buffer module is received, is connect with the broadcasting end of recipient, for receiving the video counts for sending buffer module and issuing According to packet and packets of audio data, and the video data packet and packets of audio data received is buffered according to scene markers, to realize that sound regards Frequency data playback.Receiving buffer module includes the reception buffer pool for buffers video data packet and packets of audio data.

Time difference adjustment module connect with buffer module is received, is used for after receiving buffer module buffered data packet, according to The time identifier carried according to each data packet and preset duration threshold value, it is excellent that the high data of limitation transmission priority are ahead of transmission The duration of the low data playback of first grade, the data packet for preventing priority low fall behind too many.

It is sending side above dotted line in Fig. 4, is receiving side below dotted line.

In one embodiment, duration threshold value variable, time difference adjustment module carry out duration threshold value according to scene markers It adjusts.When receiving two frame video frames in buffer module is with scene, time difference adjustment module can suitably be adjusted duration threshold value Greatly, when receiving two frame video frames in buffer module is different scenes, time difference adjustment module can suitably be adjusted duration threshold value It is small.

The scene judgment module of the present embodiment, scene identity module, sends buffer module, receives and delay data subpackage module The specific setting that die block, coding record the components such as module can refer to structure described in aforementioned system first embodiment and set It sets, no longer repeats one by one.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by those familiar with the art, all answers It is included within the scope of the present invention.Therefore, protection scope of the present invention should be with the scope of protection of the claims It is quasi-.

Claims

1. a kind of media low latency communication means for internet video live broadcasting characterized by comprising

Judge whether the scene changes degree between the current video frame image and previous video frame images that get is more than default Scene change threshold；

Data subpackage is carried out respectively to the current video frame and the respective audio frame got, and according to the scene changes journey Degree whether be more than preset scene change threshold judging result, added in each video data packet and packets of audio data corresponding Scene markers；

The video data packet and the packets of audio data are buffered, adjusts the video data packet and institute according to the scene markers The transmission priority of packets of audio data is stated, and issues the video data packet and the audio data according to the transmission priority Packet；

Recipient buffers the video data packet and the packets of audio data received according to the scene markers, to realize sound Video data plays.

2. low delay communication means as described in claim 1, which is characterized in that the video frame got and the acquisition To audio frame encoded to obtain by the audio-video of real-time recording.

3. low delay communication means as described in claim 1, which is characterized in that pass through frame differential method and/or background difference Method judges whether the scene changes degree is more than the preset scene change threshold.

4. low delay communication means as described in claim 1, which is characterized in that the buffer size of recipient is variable, foundation The scene markers are adjusted.

5. low delay communication means according to any one of claims 1 to 4, which is characterized in that described to described current Video frame and the respective audio frame got are carried out respectively before data subpackage, further includes: in each frame video frame and audio frame The middle corresponding time identifier of addition；Also,

The recipient according to the scene markers video data packet that receives of buffering and the packets of audio data it Afterwards, further includes: the time identifier carried according to each data packet and preset duration threshold value limit the transmission priority High data are ahead of the duration of the low data playback of the transmission priority.

6. low delay communication means as claimed in claim 5, which is characterized in that the duration threshold value variable, according to described in Scene markers are adjusted.

7. a kind of media low latency communication system for internet video live broadcasting characterized by comprising

Scene judgment module, the scene changes between current video frame image and previous video frame images for judging to get Whether degree is more than preset scene change threshold；

Data subpackage module, for carrying out data subpackage respectively to the current video frame and the respective audio frame got；

Scene identity module, for whether being more than preset field according to the scene changes degree after the data subpackage The judging result of scape change threshold adds corresponding scene markers in each video data packet and packets of audio data；

Buffer module is sent, for buffering the video data packet and the audio number after the corresponding scene markers of the addition According to packet, the transmission priority of the video data packet and the packets of audio data is adjusted according to the scene markers, and according to institute It states and sends the priority sending video data packet and the packets of audio data；

Buffer module is received, the video data packet and the audio data for receiving according to scene markers buffering Packet, to realize that audio, video data plays.

8. low delay communication system as described in claim 1, which is characterized in that the system further include:

Coding records module, and the audio-video and being encoded to it for real-time recording obtains the video frame got and institute State the audio frame got.

9. low delay communication system as described in claim 1, which is characterized in that the scene judgment module includes: first to sentence Disconnected unit and/or second judgment unit, first judging unit are used to judge the scene changes journey by frame differential method Whether degree is more than preset scene change threshold, and the second judgment unit is used to judge that the scene becomes by background subtraction Whether change degree is more than preset scene change threshold.

10. low delay communication system as described in claim 1, which is characterized in that the transmission buffer module includes: the first tune Unit is saved, for adjusting the buffer size for sending buffer module according to the scene markers.