CN105430537B

CN105430537B - Synthetic method, server and music lesson system are carried out to multichannel data

Info

Publication number: CN105430537B
Application number: CN201510851568.9A
Authority: CN
Inventors: 刘军
Original assignee: Individual
Current assignee: Individual
Priority date: 2015-11-27
Filing date: 2015-11-27
Publication date: 2018-04-17
Anticipated expiration: 2035-11-27
Also published as: CN105430537A

Abstract

The invention discloses carry out synthetic method, server and music lesson system to multichannel data.Synthetic method is wherein carried out to multichannel data to be suitable for performing in the server.This method comprises the following steps.Receive the video data transmitted by multi-path media terminal.Include one or more video frame per road video data.Each video frame includes the timestamp of its corresponding acquisition time.According to the timestamp of every road video frame, the reference time point for aliging to video frame is selected.According to the frame per second of every road video data, selection is wherein all the way as synthesis reference data.Since selected reference time point, a video frame in synthesis reference data is selected successively in chronological order, and query time stabs the frame earlier than and closest to selected video frame from remaining every road video data.Synthetic operation is performed to selected video frame and the video frame inquired, to obtain the synthetic video frame of one or more code streams.

Description

Synthetic method, server and music lesson system are carried out to multichannel data

Technical field

Synthetic method, server and music teaching are carried out the present invention relates to the communications field, more particularly to multichannel data System.

Background technology

At present, in the real-time Communication for Power scheme such as video conference or network direct broadcasting, the terminal for gathering media data can be with The media data such as video frame and audio frame is gathered, and is transferred to server.Server, can be to matchmaker after media data is received Body plays end transmission media data.In addition, media server is to before the transmission media data of media play end, can also be to matchmaker Volume data carries out data processing.For example, the video frame from multi pass acquisition terminal can be synthesized picture-in-picture by server.

For example, the patent of Application No. CN200810131309.9 discloses a kind of conference system, including acquisition terminal, clothes Business device and image display device.Acquired image can be sent in server by acquisition terminal.Server will can be connect The view data of receipts is synthesized, and then transmits composograph to image display device.

However, in existing Data Synthesis scheme, usually using picture all the way as key frame, and synthesize multiple in picture Temporal associativity between image is very low.

The content of the invention

For this reason, the present invention provides a kind of new scheme synthesized to multichannel data, effectively solve above at least One problem.

According to an aspect of the present invention, there is provided a kind of that synthetic method is carried out to multichannel data, this method is suitable for taking Performed in business device.This method comprises the following steps.Receive the video data transmitted by multi-path media terminal.The every road received regards Frequency is according to including one or more video frame.Wherein each video frame includes the timestamp of the corresponding video frame acquisition time.Root According to the timestamp of video frame in every road video data, when selecting the benchmark for aliging to the multi-path video data received Between point.According to the frame per second of every road video data, select in received multi-path video data all the way as synthesis reference data.From Selected reference time point starts, and selects a video frame in the synthesis reference data successively in chronological order, and from institute Reception it is non-synthetic reference data, per road video data in query time stamp earlier than and closest to selected video frame one Frame.Synthetic operation is performed to selected video frame and the video frame inquired, to obtain the synthesis of one or more code streams Video frame.

According to a further aspect of the invention, there is provided a kind of server synthesized to multichannel data, including receiver, Selection of reference frame device, frame per second selector and Compositing Engine.Receiver is suitable for receiving the video data transmitted by multi-path media terminal.Institute The every road video data received includes one or more video frame.Wherein each video frame includes the corresponding video frame acquisition time Timestamp.Selection of reference frame device is suitable for the timestamp according to video frame in the video data of every road, selectes for more to what is received The reference time point that road video data aligns.Frame per second selector is suitable for being connect according to the frame per second per road video data, selection All the way as synthesis reference data in the multi-path video data of receipts.Compositing Engine is suitable for since selected reference time point, Select a video frame in the synthesis reference data successively in chronological order, and from received it is non-synthetic reference data, Query time stabs the frame earlier than and closest to selected video frame in per road video data.Then, Compositing Engine is to selected The video frame selected and the video frame inquired perform synthetic operation, to obtain the synthetic video frame of one or more code streams.

According to a further aspect of the invention, there is provided a kind of music lesson system, including media termination according to the present invention, Server and media play end media termination, suitable for collection video data and voice data.Server is suitable for from from multichannel The media data of media termination is synthesized.Apparatus for media playing is suitable for obtaining synthetic video frame and/or synthesized voice from server Frequency frame.

The scheme synthesized according to the present invention to multichannel data, is receiving the video frame from multi-path media terminal When, alignment operation can perform video frame successively, and then the video frame by alignment is closed according to the acquisition time of video frame As video frame all the way.Particularly, synthetic schemes according to the present invention make it that each several part picture has higher in synthetic video frame Synchronousness.In this way, apparatus for media playing can be realized to more by receiving synthetic video frame all the way according to the present invention Road media termination gathers the live broadcasting of image.It should be noted that synthetic schemes is for Online Music according to the present invention For the flow medium live systems such as teaching, system performance can be greatly improved.In addition, synthetic schemes can be with according to the present invention According to desired code stream, the synthetic video frame of a variety of code streams is generated.In this way, media server can be to media according to the present invention Playback equipment transmits and the matched video frame code stream of current network speed, so as to ensure the high real-time of data transfer, further to carry The performance of high live broadcast system.

Brief description of the drawings

In order to realize above-mentioned and related purpose, some illustrative sides are described herein in conjunction with following description and attached drawing Face, these aspects indicate the various modes that can put into practice principles disclosed herein, and all aspects and its equivalent aspect It is intended to fall under in the range of theme claimed.Read following detailed description in conjunction with the accompanying drawings, the disclosure it is above-mentioned And other purposes, feature and advantage will be apparent.Throughout the disclosure, identical reference numeral generally refers to identical Component or element.

Fig. 1 shows the block diagram of an exemplary music tutoring system 100 according to the present invention；

Fig. 2 shows the block diagram of the server 200 synthesized to multichannel data according to some embodiments of the invention；

Fig. 3 shows the flow chart that synthetic method 300 is carried out to multichannel data according to some embodiments of the invention；With And

Fig. 4 shows the flow chart that synthetic method 400 is carried out to multichannel data according to some embodiments of the invention.

Embodiment

The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here Limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure Completely it is communicated to those skilled in the art.

Fig. 1 shows the block diagram of an exemplary music tutoring system 100 according to the present invention.As shown in Figure 1, music teaching System 100 can include multiple student clients 110, server 120 and teacher's client 130.In music lesson system 100 In, student client 110 and teacher's client 130 carry out real-time Communication for Power by server 120, to carry out Online Music religion Learn.For example, when student is played, student client 110 may be implemented as media termination, and collection is related to student's performance Video and the media data such as audio, and these media datas are transmitted to teacher's client 130 by server 120.Always Teacher's client 130 may be implemented as apparatus for media playing, receives and plays media data, so that teacher understands student's in real time Performance situation.Meanwhile teacher's client 130 can also be implemented as media termination, the feedback that collection teacher plays student refers to Lead or the media data of the content such as teaching demonstration, and transmitted by server 120 to student client.Student client 110 Apparatus for media playing is may be implemented as, receives and plays the media data from teacher's client 130, so that teacher is to student Play and carry out Real-time Feedback, or teaching demonstration is carried out to student in real time.In short, student client 110 and teacher's client 130 may be implemented as media termination or apparatus for media playing.In order to simplify description, do not repartition hereinafter media termination and The particular type of apparatus for media playing.Here, media data for example including play an instrument fingering, breath, musical instrument sound and refer to The contents of courses such as official documents and correspondence are led, but not limited to this.

In general, music lesson system has real-time and synchronism etc. higher requirement.The present invention is taught for music A kind of server link of system, it is proposed that new Data Synthesis scheme.With reference to Fig. 2 to the clothes in music lesson system Business device carries out further exemplary illustration.It should be noted that server can be used in music religion according to the present invention In system, but it is not limited to this.For example, server can also be applied in such as video conference, match directly according to the present invention Broadcast etc. in real time flow medium scheme.

Fig. 2 shows the block diagram of the server 200 synthesized to multichannel data according to some embodiments of the invention.Clothes Device 200 of being engaged in can carry out data processing to the media data from one or more media terminations, and by treated number According to being transferred to one or more apparatus for media playing.Although server 200 is depicted as single entity, server 200 Function can be dispersed in multiple computing devices, computing cluster or data center, and the component of server 200 may reside within In multiple geographical locations.

Server 200 includes receiver 210, selection of reference frame device 220, frame per second selector 230, Compositing Engine 240 and sends Device 250.

Receiver 210 is suitable for receiving voice data and video data from multi-path media terminal.Each media termination leads to Often by voice data and video data transmission to apparatus for media playing in a manner of network packet.Wherein, video data refers to Multiple video data packets that receiver 210 receives successively.Each video data packet carries out net for media termination to an audio frame Network transmission it is packaged into network packet.In an embodiment in accordance with the invention, a video data packet message format shows Example be:

TCP_info+AV_Info+VideoData

Wherein, TCP_info is TCP transmission protocol header.

AV_Info includes video frame control parameter：

DWORD c_type Control Coolings；

_ _ int64 stamp timestamps；

DWORD c_value control numerical value.

VideoData is the corresponding video compression data of a video frame, its compressed format is for example, H.264, but unlimited In this.Included timestamp is the acquisition time of video frame in AV_Info.In other words, which gathers for media termination The capture time of original image.

When receiving a video data packet, receiver 210 can be from wherein extracting video frame (AV_Info+ VideoData).In addition, receiver 210 can be configured as including meshwork buffering area 211, and the video that will can be extracted Frame is stored in meshwork buffering area 211.In an embodiment in accordance with the invention, video frame is stored in network by receiver 210 The example code of buffer area 211 is as follows：

m_VideoRecRing->Set((char)pBuf,nLen,(char*)&vsinfo,sizeof(AV_Info))

Wherein pBuf is the interim buffering of video frame, and nLen is the length of video frame, and vsinfo is video frame control parameter.

In addition, in an embodiment in accordance with the invention, meshwork buffering area 211 is configured as including multiple distribution areas.This Sample, a distribution area can be corresponded to per road media termination.Receiver 210 can deposit the video frame from same media termination It is put into same distribution area.

Selection of reference frame device 220 is suitable for the timestamp according to the video frame per road media termination, selectes and is regarded for multichannel of aliging The reference time point of frequency frame.In general, it is according to acquisition time sequential delivery video frame per road media termination.Selection of reference frame device 220 can inquire about the timestamp for the video frame that the road is received at first from every road video frame.Selection of reference frame device 220 can be to looking into The timestamp ask is compared, and using the timestamp of time value the latest as reference time point.Selection of reference frame device 220 can be deleted Except in every road video data timestamp earlier than the video frame of reference time point.

Frame per second selector 230 is suitable for selecting a circuit-switched data from the video frame of multi-path media terminal as synthesis base value According to.In one embodiment, frame per second selector 230 selects frame per second is highest to be used as synthesis reference data all the way, but not limited to this.

Compositing Engine 240 can be since reference time point, in chronological order successively one in selection synthesis reference data Video frame, and query time is stabbed earlier than and closest to selected in every road video data outside synthesis reference data The video frame of the timestamp of video frame.In other words, Compositing Engine 240 can perform multiple alignment operation.In each alignment operation In, Compositing Engine 240 is using selected and video frame that is being inquired about as one group of matched video frame of timestamp, although this group of video There may be the small time difference between the timestamp of frame.Each alignment will be grasped according to one embodiment of present invention below Make illustrative.For example, share A and B two-way with the media termination that server 200 is connected.

The acquisition frame rate of video frame from A roads media termination is 25 frames/second, its timestamp since reference time point It is followed successively by：40ms 80ms 120ms 160ms…

The acquisition frame rate of video frame from B roads media termination is 10 frames/second, its timestamp since reference time point It is followed successively by：10ms 110ms 210ms 310ms…

Described to simplify, the gap between timestamp herein shown eliminates each timestamp in 1 second duration Hour, minute and second unit level numerical value and illustrate only a millisecond concrete numerical value for unit level.Wherein, A roads video frame conduct Synthesize reference data.When performing an alignment operation, Compositing Engine 240 have selected the video frame that timestamp is 40ms.In addition, The query time from the B roads of non-synthetic reference data of Compositing Engine 240 stabs the video frame earlier than and closest to 40ms.Inquired about The video frame arrived is the corresponding video frame of 10ms.In this way, Compositing Engine 240 is right by 40ms and 10ms institutes in this alignment operation The video frame answered is as one group of matched video frame of timestamp.When performing another alignment operation, Compositing Engine 240 is from B roads Query time stabs the video frame earlier than and closest to 80ms in video frame.The video frame inquired is the corresponding videos of 10ms Frame.In this way, Compositing Engine is using the corresponding video frame of 80ms and 10ms as one group of matched video frame of timestamp.And so on, 120ms and 110ms is one group of matched video frame of timestamp, and which is not described herein again.

It should be noted that realize time synchronization between the multi-path media terminal to communicate with server 200.Change speech It, the timestamp per road video frame has identical time reference.Alignment operation is performed according to timestamp for Compositing Engine 240 And for every group of selected matched video frame of timestamp, wherein the difference very little of the acquisition time of each video frame, therefore have There is higher synchronousness.On this basis, when Compositing Engine 240 continues to carry out synthetic operation to every group of video frame so that Each several part picture has higher synchronousness in synthesized video frame.To sum up, server 200 according to the present invention can incite somebody to action Video frame from multiple media terminations synthesizes video frame all the way, and in each synthetic video frame each several part picture have compared with High synchronism.

In addition, every group of matched video frame of timestamp can also synthesized the mistake of a video frame by Compositing Engine 240 The code stream of this video frame is adjusted in journey.Specifically, Compositing Engine 240 first operates every group of video frame perform decoding.Example Such as, one group of video frame includes 4 video frame (i.e. media termination is 4 tunnels), and Compositing Engine 240 is obtained by perform decoding operation The image of 4 640*480.Depending on desired synthesis code stream, Compositing Engine 240 can select to carry out size tune to 4 images It is whole.For example, every Image Adjusting is 320*240 by trimming operation by Compositing Engine 240, but not limited to this.In an implementation In example, the example code that Compositing Engine 240 adjusts image size is：

void CDsCaptureDemoDlg::YUVToYUV(BYTE*pDesStr,int DesWidth,int DesHeight,BYTE*pSourceStr,int SourceWidth,int SourceHeight)

Before adjustment, the yuv images of SourceWidth=640, SourceHeight=480, that is, 640*480.

After being adjusted conversion, become DesWidth=320, the yuv images of DesHeight=240, that is, 320*240.

Then, Compositing Engine 240 carries out image synthesis to the image by cutting, and is encoded to a synthetic video frame. In this way, Compositing Engine 240 can generate the synthetic video frame of a variety of code streams by the adjustment to picture size according to the present invention.

Transmitter 250 can transmit generated synthetic video frame to apparatus for media playing.Specifically, hair of the invention Device 250 is sent to be transmitted and the matched synthetic video frame code stream of current network speed to apparatus for media playing.In this way, server 200 exists When transmitting the video data from multiple media terminations to apparatus for media playing, there is higher transmission real-time.

In addition, the multi-path audio-frequency data received can also be synthesized audio all the way by server 200 according to the present invention Data.Specifically, receiver 210, which is received, includes one or more audio frames per road voice data.Each audio frame includes The timestamp of multiple audio sampling points and the audio frame.When the timestamp is, for example, the collection of the first sampling point in multiple audio sampling points Between.Compositing Engine 240 can perform MCVF multichannel voice frequency frame time unifying operation according to the timestamp, then will pass through alignment operation Audio frame synthesize Composite tone frame all the way.

Fig. 3 shows the flow chart that synthetic method 300 is carried out to multichannel data according to some embodiments of the invention.Side Method 300 is suitable for performing in media server according to the present invention.

As shown in figure 3, method 300 starts from step S310.In step S310, regarding transmitted by multi-path media terminal is received Frequency evidence.The every road video data received includes one or more video frame.Wherein each video frame includes the corresponding video The timestamp of frame acquisition time.Then, method enters step S320, according to the timestamp of video frame in every road video data, choosing The fixed reference time point for being used to align to the multi-path video data received.One embodiment according to the present invention, in step In S320, the timestamp of the video frame every road video data, being received at first to being received is compared, and selects the time The timestamp of value the latest is as reference time point.In step s 320, can also in every road video data for being received, the time The video frame stabbed earlier than the reference time point performs delete operation.

In addition, method 300 further includes step S330.In step S330, according to the frame per second of every road video data, institute is selected All the way as synthesis reference data in the multi-path video data of reception.One embodiment according to the present invention, in step S330, choosing It is highest all the way as synthesis reference data to select frame per second in received multi-path video data.But not limited to this, depending on it is expected The frame per second of synthetic video frame, can also select the video data all the way of other frame per second as synthesis base value in step S330 According to.

Then, method 300 performs step S340.In step S340, since selected reference time point, temporally Order selects a video frame in the synthesis reference data successively, and from received it is non-synthetic reference data, per road Query time stabs the video frame of the timestamp earlier than and closest to selected video frame in video data.As described above, in step Timestamp based on video frame in rapid S340, have selected a video frame from every road video data, and by selected video frame As one group of matched video frame of timestamp.Then, method 300 can perform step S350.In step S350, to selected Video frame and the video frame that is inquired perform synthetic operation, to obtain the synthetic video frame of one or more code streams.According to One embodiment of the invention, first grasps selected video frame and the video frame perform decoding inquired in step S350 Make.Then, synthetic operation is performed to the video frame Jing Guo decoding operate in step S350 to obtain one or more codes The synthetic video frame of stream.Wherein, can be before synthetic video frame to the video Jing Guo decoding operate in order to synthesize more middle code streams Frame (i.e. a two field picture), carries out trimming operation to adjust picture size.Here, the embodiment of method 300 is to Fig. 2 Description disclosed in, which is not described herein again.

Fig. 4 shows the flow chart that synthetic method 400 is carried out to multichannel data according to some embodiments of the invention.Side Method 400 is suitable for performing in media server according to the present invention.

As shown in figure 4, method 400 starts from step S410.In step S410, regarding transmitted by multi-path media terminal is received Frequency evidence.The every road video data received includes one or more video frame.Wherein each video frame includes the corresponding video The timestamp of frame acquisition time.Then, method enters step S420, according to the timestamp of video frame in every road video data, choosing The fixed reference time point for being used to align to the multi-path video data received.One embodiment according to the present invention, in step In S420, the timestamp of the video frame every road video data, being received at first to being received is compared, and selects the time The timestamp of value the latest is as reference time point.In the step s 420, can also in every road video data for being received, the time The video frame stabbed earlier than the reference time point performs delete operation.

In addition, method 400 further includes step S430.In step S430, according to the frame per second of every road video data, institute is selected All the way as synthesis reference data in the multi-path video data of reception.One embodiment according to the present invention, in step S430, choosing It is highest all the way as synthesis reference data to select frame per second in received multi-path video data.But not limited to this, depending on it is expected The frame per second of synthetic video frame, can also select the video data all the way of other frame per second as synthesis base value in step S430 According to.

Then, method 400 performs step S440.In step S440, since selected reference time point, temporally Order selects a video frame in the synthesis reference data successively, and from received it is non-synthetic reference data, per road Query time stabs the video frame of the timestamp earlier than and closest to selected video frame in video data.As described above, in step Timestamp based on video frame in rapid S440, have selected a video frame from every road video data, and by selected video frame As one group of matched video frame of timestamp.Then, method 400 can perform step S450.In step S450, to selected Video frame and the video frame that is inquired perform synthetic operation, to obtain the synthetic video frame of one or more code streams.According to One embodiment of the invention, first grasps selected video frame and the video frame perform decoding inquired in step S450 Make.Then, synthetic operation is performed to the video frame Jing Guo decoding operate in step S450 to obtain one or more codes The synthetic video frame of stream.Wherein, can be before synthetic video frame to the video Jing Guo decoding operate in order to synthesize more middle code streams Frame (i.e. a two field picture), carries out trimming operation to adjust picture size.

In addition, one embodiment, method 400 further include step S460 according to the present invention.In step S460, multichannel is received Voice data transmitted by collection terminal.Wherein, one or more audio frames are included per road voice data.Each audio frame includes pair Answer the timestamp of its acquisition time.Then, method 400 performs step S470.In step S470, according in every road voice data The timestamp of audio frame, time unifying operation is performed by the multi-path audio-frequency data sound intermediate frequency frame received.Then, method 400 into Enter step S480, the audio frame Jing Guo alignment operation is synthesized into Composite tone frame all the way.In addition, method 400 further includes step S490, synthetic video frame and/or Composite tone frame are transmitted to apparatus for media playing.Here, the embodiment of method 400 has been Disclosed in the explanation of Fig. 2, which is not described herein again.

In the specification that this place provides, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the present invention Example can be put into practice in the case of these no details.In some instances, known method, knot is not been shown in detail Structure and technology, so as not to obscure the understanding of this description.

Similarly, it will be appreciated that in order to simplify the disclosure and help to understand one or more of each inventive aspect, Above in the description to the exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention：I.e. required guarantor The application claims of shield are than the feature more features that is expressly recited in each claim.More precisely, as following As claims reflect, inventive aspect is all features less than single embodiment disclosed above.Therefore, abide by Thus the claims for following embodiment are expressly incorporated in the embodiment, wherein each claim is in itself Separate embodiments as the present invention.

Those skilled in the art should understand that the module or unit or group of the equipment in example disclosed herein Part can be arranged in equipment as depicted in this embodiment, or alternatively can be positioned at and the equipment in the example In different one or more equipment.Module in aforementioned exemplary can be combined as a module or be segmented into addition multiple Submodule.

Those skilled in the art, which are appreciated that, to carry out adaptively the module in the equipment in embodiment Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment Member or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or Sub-component.In addition at least some in such feature and/or process or unit exclude each other, it can use any Combination is disclosed to all features disclosed in this specification (including adjoint claim, summary and attached drawing) and so to appoint Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power Profit requires, summary and attached drawing) disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generation Replace.

In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed One of meaning mode can use in any combination.

In addition, be described as herein can be by the processor of computer system or by performing for some in the embodiment The method or the combination of method element that other devices of the function are implemented.Therefore, have and be used to implement the method or method The processor of the necessary instruction of element forms the device for being used for implementing this method or method element.In addition, device embodiment Element described in this is the example of following device：The device is used to implement as in order to performed by implementing the element of the purpose of the invention Function.

As used in this, unless specifically stated, come using ordinal number " first ", " second ", " the 3rd " etc. Description plain objects are merely representative of the different instances for being related to similar object, and are not intended to imply that the object being so described must Must have the time it is upper, spatially, in terms of sequence or given order in any other manner.

Although according to the embodiment of limited quantity, the invention has been described, benefits from above description, the art It is interior it is clear for the skilled person that in the scope of the present invention thus described, it can be envisaged that other embodiments.Additionally, it should be noted that The language that is used in this specification primarily to readable and teaching purpose and select, rather than in order to explain or limit Determine subject of the present invention and select.Therefore, in the case of without departing from the scope and spirit of the appended claims, for this Many modifications and changes will be apparent from for the those of ordinary skill of technical field.For the scope of the present invention, to this The done disclosure of invention is illustrative and not restrictive, and it is intended that the scope of the present invention be defined by the claims appended hereto.

Claims

1. a kind of carry out multichannel data synthetic method, this method is suitable for performing in the server, and this method includes：

The video data transmitted by multi-path media terminal is received, the every road video data received includes one or more videos Frame, wherein each video frame includes the timestamp of the corresponding video frame acquisition time；

According to the timestamp of video frame in every road video data, select for aliging to the multi-path video data received Reference time point；

According to the frame per second of every road video data, select in received multi-path video data all the way as synthesis reference data；

Since selected reference time point, a video frame in the synthesis reference data is selected successively in chronological order, And query time is stabbed earlier than and closest to selected video from non-synthetic reference data, the every road video data received One frame of frame；And

Synthetic operation is performed to selected video frame and the video frame inquired, to obtain the synthesis of one or more code streams Video frame；

Wherein, it is described select for aliging to the multi-path video data received reference time point the step of include：

The timestamp of video frame every road video data, being received at first to being received is compared, and selects time value most The timestamp in evening is as reference time point.

2. the method as described in claim 1, further includes：

To in every road video data for being received, timestamp perform delete operation earlier than the video frame of the reference time point.

3. the method for claim 1, wherein all the way as synthesis base in the received multi-path video data of the selection The step of quasi- data, includes：

Frame per second is highest all the way as synthesis reference data in the received multi-path video data of selection.

4. such as the method any one of claim 1-3, wherein, it is described to selected video frame and what is inquired regard Frequency frame performs synthetic operation, is included with obtaining the step of the synthetic video frame of one or more code streams：

Selected video frame and the video frame perform decoding inquired are operated；And

Synthetic operation is performed to the video frame Jing Guo decoding operate to obtain the synthetic video frame of one or more code streams.

5. method as claimed in claim 4, wherein, the described pair of video frame Jing Guo decoding operate performs synthetic operation to obtain The step of synthetic video frame, includes：

Trimming operation is carried out to one or more in the video frame by decoding operate to adjust picture size.

6. the method as described in claim 1, further includes：

The voice data transmitted by multi-path media terminal is received, wherein including one or more audio frames per road voice data, often A audio frame includes the timestamp of its corresponding acquisition time；

According to the timestamp of every road voice data sound intermediate frequency frame, the multi-path audio-frequency data sound intermediate frequency frame received is performed into the time pair Neat operation；And

Audio frame Jing Guo alignment operation is synthesized into Composite tone frame all the way.

7. method as claimed in claim 6, wherein,

The synthetic video frame and/or the Composite tone frame are transmitted to apparatus for media playing.

8. a kind of server synthesized to multichannel data, including：

Receiver, suitable for receiving the video data transmitted by multi-path media terminal, the every road video data received includes one Or multiple video frame, wherein each video frame includes the timestamp of the corresponding video frame acquisition time；

Selection of reference frame device, suitable for the timestamp according to video frame in the video data of every road, is selected for being regarded to the multichannel received Frequency is according to the reference time point alignd；

Frame per second selector, suitable for according to the frame per second per road video data, selecting conduct all the way in received multi-path video data Synthesize reference data；And

Compositing Engine, suitable for since selected reference time point, selecting successively in chronological order in the synthesis reference data A video frame, and from received it is non-synthetic reference data, per road video data in query time stamp earlier than and most connect A frame of selected video frame is bordering on,

Wherein, what the multi-path video data that the selection of reference frame device is suitable for being selected for being received according to following manner alignd Reference time point：

9. server as claimed in claim 8, wherein the selection of reference frame device is further adapted for：

10. server as claimed in claim 8, wherein, the frame per second selector is suitable for being received according to following manner selection Multi-path video data in all the way as synthesis reference data：

11. such as the server any one of claim 8-10, wherein, the Compositing Engine is suitable for according to following manner pair Selected video frame and the video frame inquired perform synthetic operation, to obtain the synthetic video of one or more code streams Frame：

12. server as claimed in claim 11, wherein, the Compositing Engine is held to the video frame Jing Guo decoding operate Before row synthetic operation, it is further adapted for：

13. server as claimed in claim 8, wherein,

The receiver is further adapted for：The voice data transmitted by multi-path media terminal is received, wherein including one per road voice data A or multiple audio frames, each audio frame include the timestamp of its corresponding acquisition time；And

The Compositing Engine is further adapted for：According to the timestamp of every road voice data sound intermediate frequency frame, the MCVF multichannel voice frequency number that will be received Time unifying operation is performed according to sound intermediate frequency frame, and

14. server as claimed in claim 13, further includes transmitter, suitable for being regarded to the apparatus for media playing transmission synthesis Frequency frame and/or the Composite tone frame.

15. a kind of music lesson system, including：

Media termination, suitable for collection video data and voice data；

The server synthesized to multichannel data as any one of claim 8-14；And

Apparatus for media playing, suitable for obtaining synthetic video frame and/or Composite tone frame from the server.