CN102364952B - Method for processing audio and video synchronization in simultaneous playing of plurality of paths of audio and video - Google Patents

Method for processing audio and video synchronization in simultaneous playing of plurality of paths of audio and video Download PDF

Info

Publication number
CN102364952B
CN102364952B CN 201110327166 CN201110327166A CN102364952B CN 102364952 B CN102364952 B CN 102364952B CN 201110327166 CN201110327166 CN 201110327166 CN 201110327166 A CN201110327166 A CN 201110327166A CN 102364952 B CN102364952 B CN 102364952B
Authority
CN
China
Prior art keywords
audio
video
compression bag
road
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 201110327166
Other languages
Chinese (zh)
Other versions
CN102364952A (en
Inventor
胡开荆
李群巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Wanpeng Digital Intelligence Technology Co ltd
Original Assignee
ZHEJIANG WANPENG NETWORK TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZHEJIANG WANPENG NETWORK TECHNOLOGY Co Ltd filed Critical ZHEJIANG WANPENG NETWORK TECHNOLOGY Co Ltd
Priority to CN 201110327166 priority Critical patent/CN102364952B/en
Publication of CN102364952A publication Critical patent/CN102364952A/en
Application granted granted Critical
Publication of CN102364952B publication Critical patent/CN102364952B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention relates to a method for processing audio and video synchronization in the simultaneous playing of a plurality of paths of audio and video. The requirements of multi-user communication application on the simultaneous synchronization of the plurality of paths of audio and video cannot be met by a conventional audio and video synchronization technology. The method provided by the invention comprises the following steps of: each user acquires own audio and video data, compresses the acquired audio and video data into audio and video compressed packets, marks timestamps on the audio and video compressed packets and transmits the audio and video compressed packets to a server; the server decompresses and mixes the audio and video compressed packets received from each user, records the timestamps corresponding to all the audio compressed packets participating in the mixing in mixing results, compresses the audio compressed packets into mixed compressed packets, transmits the mixed compressed packets to clients and directly transmits the video compressed packets to the clients; and after receiving the mixed compressed packets and the video compressed packets, each client decompresses the mixed compressed packets, sequentially plays the decompressed audio data, and displays video frames in corresponding video compressed packets according to the principle of driving video by audio. By the method, synchronization relationships between all the audio and the video can be integrally stored.

Description

When playing simultaneously, processes a kind of multichannel audio-video frequency the method for audio-visual synchronization
Technical field
The invention belongs to technical field of computer multimedia, relate to the method for after Internet Transmission, multichannel audio-video frequency being processed, process the method for audio-visual synchronization simultaneously when specifically a kind of multichannel audio-video frequency is play.
Background technology
Along with the develop rapidly of current the Internet broadband technology and multimedia information technology, network-multimedia application has become the important content of internet application.Particularly, in the network teleconference, due to the interbehavior related between many people, need to be play multichannel audio-video frequency simultaneously.Now each road audio frequency and video all needs synchronously, otherwise the effect that can't accomplish " labial synchronization ", the fluency that impact is linked up.Traditional audio-visual synchronization technology is by timestamp of each mark of audio frequency and video bag, when playing, according to this timestamp, carries out synchronously.This mode can only work in the situation of a road audio frequency and a road video, in the situation that multichannel voice frequency and multi-channel video can't work, can not meet the many people of this class of video conference and link up application multichannel audio-video frequency is carried out to synchronous requirement simultaneously.
Summary of the invention
The objective of the invention is for the deficiencies in the prior art, a kind of multi-channel video synchronous method driven of playing based on audio frequency is provided.
The concrete steps of the inventive method are:
Step (1). each user obtains respectively audio, video data separately and Voice & Video is compressed separately; By the voice data of collection take 10~120 milliseconds become audio data unit as unit, each audio data unit is compressed into to the audio compression bag, each audio compression bag mark collection client machine timestamp constantly; Each frame in video data is compressed into to the video compression bag, each video compression bag mark collection client machine timestamp constantly; Each audio compression bag and each video compression bag are sent to server;
The method that each user obtains respectively audio, video data separately comprises by the equipment collection and obtains from media file; As by the equipment collection, the moment of described timestamp for gathering; As obtained from media file, playback of media files or decompress(ion) assembly can be data setup times stamp, and this timestamp is the moment that relative media file starts, and are converted to that to take current computer be the timestamp of standard constantly.
Step (2). audio mixing after each user's that server will receive audio compression bag decompress(ion), then in the audio mixing result, record the timestamp corresponding to audio compression bag of all participation audio mixings, be compressed into the audio mixing compressed package, send to client; The video compression bag directly sends to client.
N user U 1~U n, each user has a road audio frequency, total N road audio A 1~A n; Server needs audio mixing to go out N+1 road audio frequency, respectively:
The 0th tunnel. comprise all audio frequency, be designated as M 0,
The 1st tunnel. except A 1outer other all audio frequency, be designated as M 1,
The 2nd tunnel. except A 2outer other all audio frequency, be designated as M 2,
By that analogy,
The N road. except A nouter other all audio frequency, be designated as MN.
Generating Mei road audio frequency all needs the timestamp of the N that it is corresponding or source, N-1 road audio frequency to write in this road audio frequency, will have N or N-1 timestamp in this audio frequency, and the corresponding source of these timestamps audio frequency.
After generating this N+1 road audio frequency, by M 0send to all users that do not send audio frequency, M 1send to U 1, M 2send to U 2, by that analogy, send to each user's audio content not comprise this user's audio frequency.
Step (3). each client, after audio mixing compressed package and video compression bag, by played in order after audio mixing compressed package decompress(ion), then according to the principle of audio driven video, shows the frame of video in corresponding video compression bag.
Each client to content be the N road video compression bag that a road audio mixing compressed package and server forward; During broadcasting, by the audio driven video, undertaken, audio compression bag of every broadcasting, record all timestamps (U, A) that comprise in this audio compression bag While playing X user's video, take out video time stamp (U corresponding to this road video frame to be played x, V x), take out the same user's of the audio frame of playing recently timestamp (U simultaneously x, A x), to V xand A xcompare, if V xbe more than or equal to A x, mean that video content after audio content, can play, and if V xbe less than A x, according to audio driven video principle, mean that this frame of video does not also arrive the moment of playing, therefore need to wait for that broadcasting judgement next time determines whether playing.
The inventive method be take audio time stamp as tie, by multi-channel video and audio sync, reach all videos all can with the effect of audio frequency " labial synchronization ".The inventive method sound intermediate frequency is when the server audio mixing, do not use single timestamp to carry out audio mixing compressed package of mark, but the timestamp that will participate in the multichannel voice frequency of this audio mixing compressed package is all preserved, as the timestamp of audio mixing compressed package, so just intactly preserved the synchronized relation between all Voice & Videos.
Embodiment
Process the method for audio-visual synchronization when a kind of multichannel audio-video frequency is play simultaneously, concrete steps are:
Step (1). each user obtains respectively audio, video data separately and Voice & Video is compressed separately; By the voice data of collection take 10~120 milliseconds become audio data unit as unit, each audio data unit is compressed into to the audio compression bag, each audio compression bag mark collection client machine timestamp constantly; Each frame in video data is compressed into to the video compression bag, each video compression bag mark collection client machine timestamp constantly; Each audio compression bag and each video compression bag are sent to server.
The method that each user obtains respectively audio, video data separately comprises by the equipment collection and obtains from media file; As by the equipment collection, the moment of described timestamp for gathering; As obtained from media file, playback of media files or decompress(ion) assembly can be data setup times stamp, and this timestamp is the moment that relative media file starts, and are converted to that to take current computer be the timestamp of standard constantly.
Video processing is that the video of input be take to frame as unit, after using the video encoder compression, according to network condition, cuts into the size (being generally 400~1400 bytes) of suitable transmission, with together with the timestamp of this frame of video, sends to server.Sort and judge whether that the packet loss phenomenon is arranged in transmitting procedure for convenience of receiving terminal, audio frequency and video Bao Jun is with sequence number.Sequence number is that 2 bytes increase progressively, and over after maximum, from 0, restarts.For improving the user of bandwidth when poor, experience, audio, video data sends with different connections, and when bandwidth is inadequate, the audio frequency connection, because the relative video connection of data is fewer, easily is protected like this.And our mutual Main Means is by audio frequency, in general video is supplementary means, does like this and can allow audio frequency more smooth, reduces the impact on the user.
Step (2). audio mixing after each user's that server will receive audio compression bag decompress(ion), then in the audio mixing result, record the timestamp corresponding to audio compression bag of all participation audio mixings, be compressed into the audio mixing compressed package, send to client; The video compression bag directly sends to client.
N user U 1, U 2..., U n, each user has a road audio frequency, and total N road audio frequency, be respectively A 1, A 2..., A n; Server needs audio mixing to go out N+1 road audio frequency, respectively:
The 0th tunnel. comprise all audio frequency, be designated as M 0,
The 1st tunnel. except A 1outer other all audio frequency, be designated as M 1,
The 2nd tunnel. except A 2outer other all audio frequency, be designated as M 2,
...、
The N road. except A nouter other all audio frequency, be designated as M n.
Generating Mei road audio frequency all needs the timestamp of the N that it is corresponding or source, N-1 road audio frequency to write in this road audio frequency, will have N or N-1 timestamp in this audio frequency, and the corresponding source of these timestamps audio frequency.M for example 0will comprise (U 1, A 1) (U 2, A 2) ... (U n, A n), M 1will comprise (U 2, A 2) (U 3, A 3) ... (U n, A n).
After generating this N+1 road audio frequency, by M 0send to all users that do not send audio frequency, M 1send to U 1, M 2send to U 2, by that analogy, send to each user's audio content not comprise this user's audio frequency, avoid echogenicity in these user's loudspeaker.
Step (3). each client, after audio mixing compressed package and video compression bag, by played in order after audio mixing compressed package decompress(ion), then according to the principle of audio driven video, shows the frame of video in corresponding video compression bag.
Each client to content be the N road video compression bag that a road audio mixing compressed package and server forward; During broadcasting, by the audio driven video, undertaken, audio compression bag of every broadcasting, record all timestamps (U, A) that comprise in this audio compression bag While playing X user's video, take out video time stamp (U corresponding to this road video frame to be played x, V x), take out the same user's of the audio frame of playing recently timestamp (U simultaneously x, A x), to V xand A xcompare, if V xbe more than or equal to A x, mean that video content after audio content, can play, and if V xbe less than A x, according to audio driven video principle, mean that this frame of video does not also arrive the moment of playing, therefore need to wait for that broadcasting judgement next time determines whether playing.
The uncertainty of Internet Transmission is more intense, main manifestations have following some: the uncertainty of data packet disorder and reception delay.While by TCP, sending data, the data that different connections are sent may be different from the order sent when receiving, and while by UDP, sending data, the order that different packets arrives is also unwarrantable, and this is the out of order characteristic of packet.No matter use TCP or UDP, the time that the packet arrival the other side computer sent consumes is all uncertain, can change along with the Internet Transmission quality condition, generally may fluctuate in 1 millisecond to 500 milliseconds, even likely reach the several seconds when network is poor.Due to above two characteristics, need to the audio, video data received be sorted respectively and buffered.The foundation of sequence is the sequence number in packet, and the time of buffering will determine according to network delay.Network delay is less, means that network condition is better, can suitably reduce so the voice data of buffering, obtains better real-time.Network delay is larger, mean that network condition is poorer, we will suspend broadcasting so, until the voice data duration of buffering equals the duration of network delay, although sacrificed like this real-time, but improved the fluency of playing, reduced when playing because buffering is too short, after data have been played, do not had the data can be by the phenomenon of the card caused.

Claims (1)

1. process the method for audio-visual synchronization simultaneously when a multichannel audio-video frequency is play, it is characterized in that the concrete steps of the method are:
Step (1). each user obtains respectively audio, video data separately and Voice & Video is compressed separately; By the voice data of collection take 10~120 milliseconds become audio data unit as unit, each audio data unit is compressed into to the audio compression bag, each audio compression bag mark collection client machine timestamp constantly; Each frame in video data is compressed into to the video compression bag, each video compression bag mark collection client machine timestamp constantly; Each audio compression bag and each video compression bag are sent to server;
The method that each user obtains respectively audio, video data separately comprises by the equipment collection and obtains from media file; As by the equipment collection, the moment of described timestamp for gathering; As obtained from media file, playback of media files or decompress(ion) assembly can be data setup times stamp, and this timestamp is the moment that relative media file starts, and are converted to that to take current computer be the timestamp of standard constantly;
Step (2). audio mixing after each user's that server will receive audio compression bag decompress(ion), then in the audio mixing result, record the timestamp corresponding to audio compression bag of all participation audio mixings, be compressed into the audio mixing compressed package, send to client; The video compression bag directly sends to client;
N user U 1~U n, each user has a road audio frequency, total N road audio A 1~A n; Server needs audio mixing to go out N+1 road audio frequency, respectively:
The 0th tunnel. comprised all audio frequency, be designated as M0,
The 1st tunnel. except A 1outer other all audio frequency, be designated as M 1,
The 2nd tunnel. except A 2outer other all audio frequency, be designated as M 2,
By that analogy,
The N road. except A nouter other all audio frequency, be designated as M n;
Generating Mei road audio frequency all needs the timestamp of the N that it is corresponding or source, N-1 road audio frequency to write in this road audio frequency, will have N or N-1 timestamp in this audio frequency, and the corresponding source of these timestamps audio frequency;
After generating this N+1 road audio frequency, by M 0send to all users that do not send audio frequency, M 1send to U 1, M 2send to U 2, by that analogy, send to each user's audio content not comprise this user's audio frequency;
Step (3). each client, after audio mixing compressed package and video compression bag, by played in order after audio mixing compressed package decompress(ion), then according to the principle of audio driven video, shows the frame of video in corresponding video compression bag;
Each client to content be the N road video compression bag that a road audio mixing compressed package and server forward; During broadcasting, by the audio driven video, undertaken, audio compression bag of every broadcasting, record all timestamps (U, A) that comprise in this audio compression bag; While playing X user's video, take out video time stamp (U corresponding to this road video frame to be played x, V x), take out the same user's of the audio frame of playing recently timestamp (U simultaneously x, A x), to V xand A xcompare, if V xbe more than or equal to A x, mean that video content after audio content, can play, and if V xbe less than A x, according to audio driven video principle, mean that this frame of video does not also arrive the moment of playing, wait for that broadcasting judgement next time determines whether playing.
CN 201110327166 2011-10-25 2011-10-25 Method for processing audio and video synchronization in simultaneous playing of plurality of paths of audio and video Active CN102364952B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110327166 CN102364952B (en) 2011-10-25 2011-10-25 Method for processing audio and video synchronization in simultaneous playing of plurality of paths of audio and video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110327166 CN102364952B (en) 2011-10-25 2011-10-25 Method for processing audio and video synchronization in simultaneous playing of plurality of paths of audio and video

Publications (2)

Publication Number Publication Date
CN102364952A CN102364952A (en) 2012-02-29
CN102364952B true CN102364952B (en) 2013-12-25

Family

ID=45691502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110327166 Active CN102364952B (en) 2011-10-25 2011-10-25 Method for processing audio and video synchronization in simultaneous playing of plurality of paths of audio and video

Country Status (1)

Country Link
CN (1) CN102364952B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103702013B (en) * 2013-11-28 2017-02-01 北京航空航天大学 Frame synchronization method for multiple channels of real-time videos
CN105187883B (en) * 2015-09-11 2018-05-29 广东威创视讯科技股份有限公司 A kind of data processing method and client device
US9979997B2 (en) 2015-10-14 2018-05-22 International Business Machines Corporation Synchronization of live audio and video data streams
CN105516090B (en) * 2015-11-27 2019-01-22 刘军 Media playing method, equipment and music lesson system
CN106658030B (en) * 2016-12-30 2019-07-30 上海寰视网络科技有限公司 A kind of playback method and equipment of the composite video comprising SCVF single channel voice frequency multi-channel video
CN107195308B (en) * 2017-04-14 2021-03-16 苏州科达科技股份有限公司 Audio mixing method, device and system of audio and video conference system
CN106941613A (en) * 2017-04-14 2017-07-11 武汉鲨鱼网络直播技术有限公司 A kind of compacting of audio frequency and video interflow and supplying system and method
CN108021675B (en) * 2017-12-07 2021-11-09 北京慧听科技有限公司 Automatic segmentation and alignment method for multi-equipment recording
CN109120974A (en) * 2018-07-25 2019-01-01 深圳市异度信息产业有限公司 A kind of method and device that audio-visual synchronization plays
CN109600649A (en) * 2018-08-01 2019-04-09 北京微播视界科技有限公司 Method and apparatus for handling data
CN109361886A (en) * 2018-10-24 2019-02-19 杭州叙简科技股份有限公司 A kind of conference video recording labeling system based on sound detection
CN111277885B (en) * 2020-03-09 2023-01-10 北京世纪好未来教育科技有限公司 Audio and video synchronization method and device, server and computer readable storage medium
CN113259762B (en) * 2021-04-07 2022-10-04 广州虎牙科技有限公司 Audio processing method and device, electronic equipment and computer readable storage medium
CN114760274B (en) * 2022-06-14 2022-09-02 北京新唐思创教育科技有限公司 Voice interaction method, device, equipment and storage medium for online classroom

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
PL183167B1 (en) * 1995-12-07 2002-05-31 Koninkl Philips Electronics Nv Method of receiving encoded non-pcm phonic bit streams and multiplechannel reproducing equipment incorporating an apparatus receiving encoded non-pcm phonic bit streams
CN100438634C (en) * 2006-07-14 2008-11-26 杭州国芯科技有限公司 Video-audio synchronization method
CN101232623A (en) * 2007-01-22 2008-07-30 李会根 System and method for transmitting stereo audio and video numerical coding based on transmission stream

Also Published As

Publication number Publication date
CN102364952A (en) 2012-02-29

Similar Documents

Publication Publication Date Title
CN102364952B (en) Method for processing audio and video synchronization in simultaneous playing of plurality of paths of audio and video
TW589892B (en) Instant video conferencing method, system and storage medium implemented in web game using A/V synchronization technology
CN102893542B (en) Method and apparatus for synchronizing data in a vehicle
CN102655584B (en) The method and system that media data sends and played in a kind of Telepresence
CN100579238C (en) Synchronous playing method for audio and video buffer
US20050062843A1 (en) Client-side audio mixing for conferencing
CN101271720A (en) Synchronization process for mobile phone stream media audio and video
CN104426832A (en) Multi-terminal multichannel independent playing method and device
EP1976290A1 (en) Apparatus, network device and method for transmitting video-audio signal
EP1675399A3 (en) Fast channel switching for digital TV
CN105491393A (en) Method for implementing multi-user live video business
JP2004525545A (en) Webcast method and system for synchronizing multiple independent media streams in time
CN109361945A (en) The meeting audiovisual system and its control method of a kind of quick transmission and synchronization
CN105992040A (en) Multichannel audio data transmitting method, audio data synchronization playing method and devices
CN101998174B (en) Quick access method, server, client and system of multicast RTP (real time protocol) session
US8385234B2 (en) Media stream setup in a group communication system
JP2011509543A5 (en)
WO2008028367A1 (en) A method for realizing multi-audio tracks for mobile mutilmedia broadcasting system
CN105791939A (en) Audio and video synchronization method and apparatus
JP2009284282A (en) Content server, information processing apparatus, network device, content distribution method, information processing method, and content distribution system
CN101516057B (en) Method for realizing streaming media through mobile terminal
CN110267064A (en) Audio broadcast state processing method, device, equipment and storage medium
CN1972407A (en) A video and audio synchronization playing method for mobile multimedia broadcasting
WO2017071670A1 (en) Audio and video synchronization method, device and system
JP2015070460A (en) System and method for video and voice distribution, and program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee

Owner name: ZHEJIANG WINUPON EDUCATIONAL TECHNOLOGY CO., LTD.

Free format text: FORMER NAME: ZHEJIANG WINUPON NETWORK TECHNOLOGY CO., LTD.

CP03 Change of name, title or address

Address after: Xihu District Hangzhou City, Zhejiang Province, 310013 West Road Wensan No. 118 Hangzhou electronic commerce building room 1406

Patentee after: ZHEJIANG WANPENG EDUCATION SCIENCE AND TECHNOLOGY STOCK CO.,LTD.

Address before: The electronic commerce building, No. 118 Hangzhou West Road, Zhejiang province 310013 city 15 Floor

Patentee before: ZHEJIANG WANPENG NETWORK TECHNOLOGY Co.,Ltd.

CP02 Change in the address of a patent holder

Address after: 310051 12 / F, building 8, No. 19, Jugong Road, Xixing street, Binjiang District, Hangzhou City, Zhejiang Province

Patentee after: ZHEJIANG WANPENG EDUCATION SCIENCE AND TECHNOLOGY STOCK Co.,Ltd.

Address before: Xihu District Hangzhou City, Zhejiang Province, 310013 West Road Wensan No. 118 Hangzhou electronic commerce building room 1406

Patentee before: ZHEJIANG WANPENG EDUCATION SCIENCE AND TECHNOLOGY STOCK Co.,Ltd.

CP02 Change in the address of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 12 / F, building 8, No. 19, Jugong Road, Xixing street, Binjiang District, Hangzhou City, Zhejiang Province, 310051

Patentee after: Zhejiang Wanpeng Digital Intelligence Technology Co.,Ltd.

Address before: 12 / F, building 8, No. 19, Jugong Road, Xixing street, Binjiang District, Hangzhou City, Zhejiang Province, 310051

Patentee before: ZHEJIANG WANPENG EDUCATION SCIENCE AND TECHNOLOGY STOCK CO.,LTD.

CP01 Change in the name or title of a patent holder