CN103024339B - A kind of method and apparatus realizing audio mixing based on video source - Google Patents

A kind of method and apparatus realizing audio mixing based on video source Download PDF

Info

Publication number
CN103024339B
CN103024339B CN201210384236.0A CN201210384236A CN103024339B CN 103024339 B CN103024339 B CN 103024339B CN 201210384236 A CN201210384236 A CN 201210384236A CN 103024339 B CN103024339 B CN 103024339B
Authority
CN
China
Prior art keywords
video source
meeting
place
source information
maximum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210384236.0A
Other languages
Chinese (zh)
Other versions
CN103024339A (en
Inventor
王东琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201210384236.0A priority Critical patent/CN103024339B/en
Publication of CN103024339A publication Critical patent/CN103024339A/en
Application granted granted Critical
Publication of CN103024339B publication Critical patent/CN103024339B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

A kind of method based on video source information realization audio mixing.The method comprises: the video source information in other meeting-place in all meeting-place of receiver, video meeting except meeting-place, maximum N side; According to video source information, generating video source information identifies; According to video source message identification, the meeting-place with same video source information mark is divided into same group.Meanwhile, a kind of mixer adapter and MCU device is provided.Maximum N side, other meeting-place of same class audio signal, by classifying to other meeting-place according to video source message identification, is carried out audio mixing by same mixer, the internal system bandwidth reducing system operations resource He take by the embodiment of the present invention.

Description

A kind of method and apparatus realizing audio mixing based on video source
Technical field
The present invention relates to Audio Signal Processing field, particularly relate to a kind of method and apparatus realizing audio mixing based on video source.
Background technology
For reducing entreprise cost and increasing the object linking up efficiency, increasing enterprise selects telepresence system to set up video conferencing system.So-called telepresence system, usually there is true man's size, eye to features such as eye, image mosaic and acoustic image coordinatioies, by providing the video of man-size image, high definition and relief audio frequency and special design environment, reach the effect linked up face-to-face.Net very can not only realize telecommunication, and can reproduce true, by network provide on the spot in person as the meeting of face-to-face communication experience, picture is among same room with them to make them feel.It not only comprises the network equipment, terminal equipment, even comprises tables and chairs etc., to build the true meeting room of hauling on the whole.
Fig. 1 (wherein, in Fig. 1, the structure of 10b and 10c is identical with 10a) gives the video conference schematic diagram that typically has the participant of multiple meeting-place.Different meeting-place 10, three positions is had, the loud speaker 103 comprising the display device 102 for showing the image in remote site in each meeting-place 10, gathering the picture pick-up device 105 of this meeting-place image, gathering the microphone apparatus 104 of the sound in this meeting-place and present in remote site in the true system of shown net 100 in FIG.Treatment facility 106 is responsible for the vision signal of picture pick-up device 105 seizure in process local terminal meeting-place 10 and the audio signal of microphone apparatus 104 seizure, and to the video/audio signal collected after treatment, such as, after carrying out the process such as speech enhan-cement, image enhaucament and video encoding, MCU (Multi Control Unit, multipoint control unit) 11 is sent to by network; MCU also receives the audio frequency and video and data-signal that other meeting-place 10b and 10c send over simultaneously.
MCU, according to the control of user, completes audio frequency, video, the mixing of data-signal or hand-off process, then is sending the data after process to treatment facility 106 in each meeting-place 10; Treatment facility 106 receive MCU 11 send video and audio and data-signal after, process to the received signal, such as, after audio-video signal being decoded, audio signal by loud speaker 103, vision signal by display device 102 in the participant dedicated in this meeting-place.In the diagram, each display device 102 is a corresponding loud speaker 103 only, the corresponding loudspeaker apparatus 103a of such as display device 102a.But actual setting to adopt multiple loud speaker to reappear the voice signal in remote site.
In telepresence system, in order to reach true man's effect, realize good communication effectiveness, display device screen has larger size usually, such as, adopt the display of 72 inches.A kind of desirable communication effectiveness expects that the image position that the sound that local terminal meeting-place participant perceives the participant of remote site can present in local terminal meeting-place with the participant of remote site is mated mutually, when such as, participant 101b in remote site talks, desired audio can send from the right positions of display device 102a; If there is the image of position of appearing in local terminal meeting-place of the sound of remote site participant and remote site participant in the unmatched situation of the position of appearing in local terminal meeting-place, the life that such as, participant 101b in remote site is corresponding when talking is current in the position at 101e place, local terminal meeting-place, will link up to user and bring obstacle, therefore MCU device 11 must carry out the process of acoustic image coordination (sound and position of image coupling) when carrying out audio mixing.
Prior art adopts following scheme when processing the problems referred to above:
In video conferencing system, each terminal equipment can send to the audio signal of the acquisition in this meeting-place the MCU device be attached thereto by network.MCU device then sends to terminal equipment the audio signal in other meeting-place received.And in fact, for a terminal equipment, if MCU sends over the audio signal in other all meeting-place, based on the consideration reducing equipment cost, the computing capability of terminal equipment and MCU device is limited, and terminal equipment can not process the audio signal that all MCU send over simultaneously; And based on reducing the consideration of use cost, also the audio signal in other meeting-place is not all sent to enough bandwidth of terminal equipment.The consideration of comprehensive above two aspects, MCU can't send A meeting-place to the audio signal in all non-A meeting-place, but sends A meeting-place again to after carrying out audio mixing according to the audio signal that certain strategy chooses limited meeting-place from non-A meeting-place.
The audio mixing strategy of MCU in prior art is described below in conjunction with Fig. 2.For the audio code stream that the meeting-place be connected with MCU is sended over by network with the meeting-place be attached thereto, be defined as follows:
for i-th the meeting-place Ti be connected with MCU gives the code stream of MCU, as in Fig. 2 be the code stream that meeting-place T1 gives MCU;
for MCU sends to the code stream in i-th meeting-place be connected with this MCU, as in Fig. 2 be the code stream that meeting-place T1 gives MCU;
The common implementation method of audio mixing is as follows:
The first step, finds the maximum N road of envelope (or energy) (being 4 tunnels in corresponding diagram) field signal (to be T1 by order from big to small in corresponding diagram from the code stream the meeting-place of mixer input t2's t3's with T4's );
Second step, according to audio mixing strategy, to different meeting-place, chooses different meeting-place to carry out audio mixing.
Common way, if certain meeting-place Ti is one in meeting-place, maximum N road, then chooses other maximum meeting-place, (N-1) road and carries out audio mixing, and send to this meeting-place Ti.As the T1 in figure, it belongs to one in maximum four meeting-place, tunnel, and therefore MCU will send in the code stream of T1 by T2 t3's with T4's the code stream combined.
And for non-maximum four meeting-place, tunnel, such as meeting-place T5, then to all maximum four meeting-place, tunnel (T1's t2's t3's with T4's ) all carry out audio mixing.
Due to the restriction of local terminal meeting-place display screen number, even if the vision signal of remote site does not show in local terminal meeting-place, as long as the sound of remote site is inside maximum N side, audio signal corresponding for this remote site is then still needed to participate in audio mixing, this just needs to carry out the process of acoustic image coordination, ensures the consistent of meeting-place pictures and sounds image.
When remote site is single screen meeting-place, it sends to the visual Jin You mono-tunnel vision signal in local terminal meeting-place, and when local terminal meeting-place is three screen meeting-place, a unique road vision signal corresponding for remote site only can be presented at of a local terminal meeting-place display device.Be that the i screen display of shielding on meeting-place at local terminal three when this remote site vision signal is shown at a kind of common audio mixing strategy in such cases, then the loudspeaker apparatus that the sound of this remote site just shields the i screen in meeting-place corresponding at local terminal three presents, shown in figure as left in Fig. 3.If the vision signal in this single screen meeting-place does not shield the display device display in meeting-place at local terminal three, then the center speakers corresponding to the intermediate screen sound of its correspondence being shielded meeting-place at local terminal three presents, shown in figure as right in Fig. 3.
In the Multi-Party Conference being greater than two sides, because display screen number is limited, a road of remote site, two-way or three tunnel vision signals therefore only can be shown in local terminal meeting-place.When the number of the vision signal that remote site is shown in local terminal meeting-place is less than the number of the vision signal of the actual transmission of this remote site, just need to carry out the process of acoustic image coordination.
A kind of processing method, when on a display device i in local terminal meeting-place of three tunnel vision signal Jin You mono-tunnels of remote site in current, then the loudspeaker apparatus of all sound of this remote site all corresponding to the display device i in local terminal meeting-place presents.As shown in Fig. 4 (3), three tunnel vision signals of remote site only have vision signal corresponding to R road to show in the display device C screen display in local terminal meeting-place, then the loudspeaker apparatus that the sound of this remote site all shields correspondence at the display device C in local terminal meeting-place presents;
When remote site has two-path video signal in local terminal meeting-place in current, then need to adopt and draw close principle nearby to present sound.As in Fig. 4 (1), the vision signal that the C screen of remote site is corresponding does not present on the display device in local terminal meeting-place, now needs loudspeaker apparatus remote site C being shielded the display device L screen of corresponding audio signal in local terminal meeting-place corresponding to present; As in Fig. 4 (2), the vision signal that the L screen of remote site is corresponding does not present on the display device in local terminal meeting-place, now needs loudspeaker apparatus remote site L being shielded the display device L screen of corresponding audio frequency quotation marks in local terminal meeting-place corresponding to present;
When three tunnel vision signals of remote site do not show in local terminal meeting-place, then local terminal meeting-place is when presenting the sound of remote site, present with the reference azimuth of sound, as shown in Fig. 4 (4), the audio signal that L, C and R video code flow of remote site is corresponding presents at the loudspeaker apparatus that L, C and R display device in local terminal meeting-place is corresponding respectively.
Due in video conference process, often give user and freely select the demand seeing remote site video image, because each meeting-place selects the remote site seen different, therefore the audio mixing strategy of MCU when audio mixing is also not quite similar, common way carries out separately acoustic image coordination stereo process to each meeting-place, road, as shown in Figure 5.
Summary of the invention
Inventor, in the process of invention, finds that prior art exists following problem:
Need the mixer to the signal arrangement in each meeting-place is independent to process, need to take separately system operations resource; Need after audio mixing to send the data after audio mixing to the encoder that each meeting-place is corresponding, take internal system bandwidth resources.
Given this, be necessary to provide a kind of method and apparatus realizing audio mixing based on video source to solve the problems referred to above.
On the one hand, embodiments provide a kind of method based on video source information realization audio mixing, it is characterized in that, the described method realizing audio mixing, comprising:
The video source information in other meeting-place in all meeting-place of receiver, video meeting except meeting-place, maximum N side;
According to described video source information, generating video source information identifies;
According to described video source message identification, the meeting-place with same video source information mark is divided into same group;
Receive maximum N side audio signal;
The described video source information in meeting-place and maximum N side audio signal that belong to same group are sent to same mixer and carry out acoustic image coordination stereo process.
On the other hand, the embodiment of the present invention additionally provides a kind of mixer distributor, it is characterized in that, specifically comprises:
Receiver module, the video source information in other meeting-place in all meeting-place of receiver, video meeting except meeting-place, maximum N side and maximum N side audio signal;
Video source message identification generation module, is connected with described receiver module, and according to described video source information, generating video source information identifies;
Divide module, be connected with described video source message identification generation module, according to described video source message identification, the meeting-place with same video source information mark is divided into same group;
Sending module, is sent to same mixer by the described video source information in meeting-place and maximum N side audio signal that belong to same group and carries out acoustic image coordination stereo process.
Again on the one hand, the embodiment of the present invention additionally provides a kind of multipoint control unit, it is characterized in that, comprising:
Decoder, for the decoding data sent meeting-place end;
Mixer distributor, specifically comprises: receiver module, the video source information in other meeting-place in all meeting-place of receiver, video meeting except meeting-place, maximum N side and maximum N side audio signal; Video source message identification generation module, is connected with described receiver module, and according to described video source information, generating video source information identifies; Divide module, be connected with described video source message identification generation module, according to described video source message identification, the meeting-place with same video source information mark is divided into same group; Sending module, is sent to same mixer by the described video source information in meeting-place and maximum N side audio signal that belong to same group and carries out acoustic image coordination stereo process;
Mixer, is connected with mixer distributor, applies same mixer carry out acoustic image coordination stereo process to described video source information and maximum N side audio signal to the meeting-place belonging to same group;
Encoder, is connected with mixer, for the described video source information after audio mixing and maximum N side coding audio signal.
The embodiment of the present invention is by classifying to other meeting-place outside meeting-place, maximum N side according to video source message identification, the video source information and maximum N side audio signal with other meeting-place of same class of same video source information mark are carried out acoustic image coordination stereo process by same mixer, saves system operations resource and reduce the internal system bandwidth taken.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the networking diagram of background technology of the present invention;
Fig. 2 is the audio mixing policy map of background technology of the present invention;
Fig. 3 is the acoustic image coordination policy map of background technology of the present invention;
Fig. 4 is the acoustic image coordination policy map of background technology of the present invention;
Fig. 5 is the acoustic image coordination process figure of background technology of the present invention;
Fig. 6 is the method flow diagram of the embodiment of the present invention 1;
Fig. 7 is the structure chart of the embodiment of the present invention 2;
Fig. 8 is the video source message identification pie graph of the embodiment of the present invention 2;
Fig. 9 is the mixer dispensing arrangement figure of the embodiment of the present invention 3;
Figure 10 is the video source message identification generation module structure chart of the embodiment of the present invention 3;
Figure 11 is the MCU device structure chart of the embodiment of the present invention 4.
Embodiment
For making the object of the embodiment of the present invention, technical scheme and advantage clearly, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
The embodiment of the present invention provides a kind of method realizing audio mixing based on video source, please refer to figure and comprises:
S101: the video source information in other meeting-place in all meeting-place of receiver, video meeting except meeting-place, maximum N side;
Wherein, meeting-place, maximum N side refers in all video conference places N number of meeting-place with envelope or the maximum audio signal of energy.
S103: according to above-mentioned video source information, generating video source information identifies;
Analyze the information category comprised in video source information; With the bit number in data cell and positional representation information category; With the specifying information in the different numeric representation information categories of bit; Generate the video source information single screen mark represented by data cell; According to video source information single screen mark, generating video source information identifies.
Video source information single screen mark is made up of N number of data cell; The video source information of corresponding one party in the corresponding maximum N side of each data cell; When a certain video source in maximum N side is not by selection, its data unit value is set to zero.
S105: according to video source message identification, is divided into same group by the meeting-place with same video source information mark.
The embodiment of the present invention is by classifying to other meeting-place outside meeting-place, maximum N side according to video source message identification, the video source information and maximum N side audio signal with other meeting-place of same class of same video source information mark are carried out acoustic image coordination stereo process by same mixer, saves system operations resource and reduce the internal system bandwidth taken.
On the other hand, the embodiment of the present invention also provides a kind of method realizing audio mixing based on video source, and the implementation procedure of the method is described for eight meeting-place video conferences.
Wherein setting maximum meeting-place one, this eight meeting-place sound intermediate frequency signal envelope (or energy) to meeting-place four is maximum cubic meeting-place.The meeting-place five then participated in a conference is other meeting-place to meeting-place eight.
In video conference communication process, although allow other meeting-place user freely to select the vision signal seeing remote site in conference process, in actual use, there will be with same position screen viewing same frame, the situation that namely acoustic image coordination is identical.Based on such fact, just can adopt identical acoustic image coordination strategy, share same mixer.Concrete steps are as follows:
S201: the video source information in other meeting-place in all meeting-place of receiver, video meeting except maximum cubic audio signal meeting-place, please refer to Fig. 6.
In eight all meeting-place, maximum cubic meeting-place still adopts independent mixer.And can be divided into groups in other meeting-place outside the venue in maximum four directions, the meeting-place being classified as same group adopts same mixer.
S203: according to described video source information, generating video source information identifies;
First selected by each screen the video source seen to convert video source information single screen mark to, video source information single screen mark is used to refer to the video source information of maximum 4 sides that this screen presents.
(1) if when video source is seen in the choosing of local terminal meeting-place choosing sees is a single video source, when using single screen to see single picture, can represent that the screen in a meeting-place selects the video information seen by following data cell:
As shown in Figure 7, represent the type in this screen video source with the bit of the 0th ~ 1 two position, such as, when numerical value is 0, represent that meeting-place (i.e. selected meeting-place) corresponding to this video source is meeting-place, a single video source; When numerical value is 1, represent that meeting-place corresponding to this video source is three video source meeting-place, tunnel; Then represent it is the combination of picture more than when numerical value is 2.
This sequence of meeting-place, video source place in current maximum four directions is represented with the bit of the 2nd ~ 3 two two positions.When numerical value is 0, be expressed as the most generous meeting-place; When numerical value is 1, be expressed as the second generous meeting-place; When numerical value is 2, be expressed as the third-largest side meeting-place; When numerical value is 3, be expressed as the fourth-largest side meeting-place.
The bit of 4th ~ 7 three positions represents and selects the video source seen in the position in this meeting-place, video source place, is namely left video source, middle video source or right video source.When for left video source, the numerical value of corresponding 4th position is 1; When for middle video source, the numerical value of corresponding 5th position is 1; When being right video source, the numerical value of corresponding 6th position is 1; Last position retains need not.
The value of 8 integers that such as this data cell is corresponding is
01 10 001 0
Represent that in maximum four directions, the third-largest side meeting-place is the right video source (001, corresponding metric 1) that the third-largest side (binary one 0, the corresponding decimal system 2) has been seen in one three screen meeting-place (Binary Zero 1 corresponding metric 1) choosing.
00 00 010 0
Representing in maximum four directions the most generous is that the most generous meeting-place (Binary Zero 0 has been seen in a single screen meeting-place (Binary Zero 0 corresponding metric 0) choosing, the corresponding decimal system 0) unique main video source (010, corresponding metric 1).
And if the meeting-place seen selected by this meeting-place is not in maximum 4 sides, then all bit numerical value of this data cell are all set to 0.
(2) when certain screen present be many pictures time, then need the above-mentioned data cell of many groups to identify video source information, in this example, get maximum 4 square signals, then need to represent with the integer of a 8*4=32 position.Wherein first group of 8 bit represents the most generous video source information at many pictures, second group of 8 bit represents the second generous video source information at many pictures, 3rd group of 8 bits represent the video information of the third-largest side at many pictures, 4th group of 8 bits represent the video information of the fourth-largest side at many pictures, still please refer to Fig. 7.
F=10000100 10011100 00000000 10110010
This many picture of above-mentioned expression contains the middle video source in the most generous meeting-place, the second generous left video source and middle video source, and the right video source of the fourth-largest side, but does not comprise the video source of the third-largest side.
As can be seen from the expression of many pictures, this example is maximum cubic audio mixing, needs the integer of 32 to represent; And if only unique video source is seen in choosing, then only need the video source information that the integer of 8 can represent corresponding.Therefore, conveniently compare, need the integer the integer of 8 is extended to 32 to represent.Concrete, if what select is the most generous, then represent with the 1st group of 8 bits the video source information that choosing is seen, other three groups amount to 24 bits and are all set to 0.Be exemplified below, 01 10 001 0 are converted to 32 to represent and are exactly
F=00000000 00000000 01100010 00000000
In sum, in this example, 4 group of 8 bit data unit is adopted to form video source information single screen mark.
(3) when a local terminal meeting-place is three screen meeting-place, the video source information selected by it should use the video source message identification of three above-mentioned bits of 32 to represent, such as F l, F cand F r, represent left screen respectively, the video source information single screen mark of middle screen and right screen correspondence, then the video source message identification in meeting-place 1 is identified by three video source information single screens and forms:
F 1 = { F 1 L , F 1 C , F 1 R } ,
Then, the video source message identification of mixer distributor to each meeting-place compares, if value all correspondent equals of three of video source message identification video source information single screen marks, then represent that the video source message identification in two meeting-place is identical, same group can be classified as, same mixer can be shared.
The video source indicating label in the meeting-place 2 of such as same meeting is:
F 2 = { F 2 L , F 2 L , F 2 R }
If satisfied condition:
( F 1 L = F 2 L ) & & ( F 1 C = F 2 C ) & & ( F 1 R = F 2 R )
Then represent that the maximum cubic video source message identification seen selected by meeting-place 1 and meeting-place 2 is the same, they can share a mixer and carry out audio mixing.
In sum, when meeting-place only uses single screen to watch the vision signal in selected meeting-place, video source information single screen mark is video source message identification; When meeting-place uses double screen to watch the vision signal in selected meeting-place, identified by two video source single screens and form video source message identification; When meeting-place uses three screens to watch the vision signal in selected meeting-place, then identified by three video source single screens and form video source message identification;
Except above-mentioned information source identification generation method, in order to reach the object of saving system operations resource, when certain other meeting-place several all uses the single-screen receiver, video source information of same position, the video source message identification being considered as these other meeting-place is identical.Such as, the first local terminal meeting-place in other meeting-place, second local terminal meeting-place and the 3rd local terminal meeting-place, wherein, the left video source of the first remote site has been seen in the intermediate screen choosing in the first local terminal meeting-place, the middle video source of the second remote site has been seen in the intermediate screen choosing in the second local terminal meeting-place, the right video source of the 3rd remote site has been seen in the intermediate screen choosing in the 3rd local terminal meeting-place, although they select the video source seen inconsistent, but the voice signal that final output exports at the loud speaker of intermediate screen is the same, the acoustic image coordination of the intermediate screen therefore in these three kinds of situations can be thought to be equal to, such setting can reduce the number of mixer further.
S205: according to video source message identification, is divided into same group by the meeting-place with same video source information mark.
According to the setting of this example, such as, the meeting-place 5 in other meeting-place identifies identical with the video information in meeting-place 7, and meeting-place 6 identifies identical with the video information in meeting-place 8, just meeting-place 5 and meeting-place 7 can be divided into group 1, meeting-place 6 and meeting-place 8 are divided into group 2.
S207: same mixer is applied to other meeting-place belonging to same group according to audio mixing strategy, acoustic image coordination stereo process is carried out to maximum N square signal.
Connect above-mentioned, then the meeting-place 5 organized in 1 can share same mixer with meeting-place 7 and carry out acoustic image coordination stereo process according to existing acoustic image coordination strategy; Meeting-place 6 in group 2 can share same mixer with meeting-place 8 and carry out acoustic image coordination stereo process according to existing acoustic image coordination strategy.
Existing acoustic image coordination strategy is introduced in the introduction, does not repeat at this.
The embodiment of the present invention is by classifying to other meeting-place outside meeting-place, maximum N side according to video source message identification, the video source information and maximum N side audio signal with other meeting-place of same class of same video source information mark are carried out acoustic image coordination stereo process by same mixer, saves system operations resource and reduce the internal system bandwidth taken.
The embodiment of the present invention also provides a kind of mixer distributor 200, please refer to Fig. 8, specifically comprises:
Receiver module 201, the video source information in other meeting-place in all meeting-place of receiver, video meeting except meeting-place, maximum N side;
Video source message identification generation module 203, is connected with receiver module 101, and according to described video source information, generating video source information identifies;
Divide module 205, be connected 203 with video source message identification generation module, according to described video source message identification, the meeting-place with same video source information mark is divided into same group.
Preferably, receiver module 201, also receives maximum N side audio signal.
Preferably, mixer distributor also comprises:
Sending module 207, is sent to same mixer by the described video source information in meeting-place and maximum N side audio signal that belong to same group and carries out acoustic image coordination stereo process.
Wherein, video source message identification generation module 203, please refer to Fig. 9, specifically comprises:
Information category analysis module 2031, analyzes the information category comprised in described video source information;
Video source information category representation module 2032, with information category described in the bit number in data cell and positional representation;
Specifying information representation module 2033, is connected with video source information category representation module 2032, with the specifying information in information category described in the different numeric representations of described bit;
Video source information single screen identifier generation module 2034, is connected with specifying information representation module 2033, generates the video source information single screen mark having described data cell to represent;
Video source message identification generation module 2035, is connected with video source information single screen identifier generation module 2034, according to described video source information single screen mark, generates described video source message identification.
The embodiment of the present invention is by classifying to other meeting-place outside meeting-place, maximum N side according to video source message identification, the video source information and maximum N side audio signal with other meeting-place of same class of same video source information mark are carried out acoustic image coordination stereo process by same mixer, saves system operations resource and reduce the internal system bandwidth taken.
The embodiment of the present invention also provides a kind of MCU device 10 simultaneously, please refer to Figure 10, specifically comprises:
Decoder 100, decodes for the audio frequency and video sent meeting-place end;
Mixer distributor 200, is connected with decoder 100, as above described in example, is not repeated herein.
Mixer 300, is connected with mixer distributor 200, applies same mixer carry out acoustic image coordination stereo process to described video source information and maximum N side audio signal to the meeting-place belonging to same group.
Encoder 400, is connected with mixer 300, for the described video source information after audio mixing and maximum N side coding audio signal.
The embodiment of the present invention is by classifying to other meeting-place outside meeting-place, maximum N side according to video source message identification, the video source information and maximum N side audio signal with other meeting-place of same class of same video source information mark are carried out acoustic image coordination stereo process by same mixer, saves system operations resource and reduce the internal system bandwidth taken.
Last it is noted that above embodiment is only in order to illustrate technical scheme of the present invention, be not intended to limit; Although with reference to previous embodiment to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein portion of techniques feature; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (13)

1. based on a method for video source information realization audio mixing, it is characterized in that, the described method realizing audio mixing, comprising:
The video source information in other meeting-place in all meeting-place of receiver, video meeting except meeting-place, maximum N side;
According to described video source information, generating video source information identifies;
According to described video source message identification, the meeting-place with same video source information mark is divided into same group;
Receive maximum N side audio signal;
The described video source information in meeting-place and maximum N side audio signal that belong to same group are sent to same mixer and carry out acoustic image coordination stereo process.
2. method according to claim 1, is characterized in that, described according to described video source information, and generating video source information identifies, and specifically comprises:
Analyze the information category comprised in described video source information;
With information category described in the bit number in data cell and positional representation;
With the specifying information in information category described in the different numeric representations of described bit;
Generate the video source information single screen mark represented by described data cell;
According to described video source information single screen mark, generate described video source message identification.
3. method according to claim 2, is characterized in that, described video source information single screen mark is made up of N number of described data cell; The video source information of corresponding one party in the corresponding described maximum N side of each described data cell; When a certain video source in described maximum N side is not by selection, described in it, data unit value is set to zero.
4. method according to claim 1, is characterized in that, when the video source information that other meeting-place only use the single-screen of same position to receive in maximum N side, is considered as other meeting-place and has identical video source message identification.
5. according to the method in claim 2 or 3, it is characterized in that, described information category comprises: the screen display location in the type of video source, the sequence of meeting-place, video source place in maximum N side, meeting-place, video source place.
6. method according to claim 1, is characterized in that, meeting-place, described maximum N side refers in all video conference places N number of meeting-place with envelope or the maximum successively audio signal of energy.
7. a mixer distributor, is characterized in that, specifically comprises:
Receiver module, the video source information in other meeting-place in all meeting-place of receiver, video meeting except meeting-place, maximum N side and maximum N side audio signal;
Video source message identification generation module, is connected with described receiver module, and according to described video source information, generating video source information identifies;
Divide module, be connected with described video source message identification generation module, according to described video source message identification, the meeting-place with same video source information mark is divided into same group;
Sending module, is sent to same mixer by the described video source information in meeting-place and maximum N side audio signal that belong to same group and carries out acoustic image coordination stereo process.
8. mixer distributor according to claim 7, is characterized in that, described video source message identification generation module, specifically comprises:
Information category analysis module, analyzes the information category comprised in described video source information;
Video source information category representation module, with information category described in the bit number in data cell and positional representation;
Specifying information representation module, with the specifying information in information category described in the different numeric representations of described bit;
Video source information single screen identifier generation module, generates the video source information single screen mark having described data cell to represent;
Video source message identification generation module, according to described video source information single screen mark, generates described video source message identification.
9. mixer distributor according to claim 8, is characterized in that, described video source information single screen mark is made up of N number of described data cell; The video source information of corresponding one party in the corresponding described maximum N side of each described data cell; When a certain video source in described maximum N side is not by selection, described in it, data unit value is set to zero.
10. mixer distributor according to claim 7, is characterized in that, when the video source information that other meeting-place only use the single-screen of same position to receive in maximum N side, is considered as other meeting-place and has identical video source message identification.
11. mixer distributors according to claim 8, it is characterized in that, described information category comprises: the screen display location in the type of video source, the sequence of meeting-place, video source place in maximum N side, meeting-place, video source place.
12. mixer distributors according to claim 7, is characterized in that, described maximum N side audio signal refers to envelope in all video conference places or the maximum N road audio signal of energy.
13. 1 kinds of multipoint control units, is characterized in that, comprising:
Decoder, for the decoding data sent meeting-place end;
Mixer distributor, the mixer distributor as described in claim 7-9 any one;
Mixer, is connected with mixer distributor, applies same mixer carry out acoustic image coordination stereo process to described video source information and maximum N side audio signal to the meeting-place belonging to same group;
Encoder, is connected with mixer, for the described video source information after audio mixing and maximum N side coding audio signal.
CN201210384236.0A 2012-10-11 2012-10-11 A kind of method and apparatus realizing audio mixing based on video source Active CN103024339B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210384236.0A CN103024339B (en) 2012-10-11 2012-10-11 A kind of method and apparatus realizing audio mixing based on video source

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210384236.0A CN103024339B (en) 2012-10-11 2012-10-11 A kind of method and apparatus realizing audio mixing based on video source

Publications (2)

Publication Number Publication Date
CN103024339A CN103024339A (en) 2013-04-03
CN103024339B true CN103024339B (en) 2015-09-30

Family

ID=47972419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210384236.0A Active CN103024339B (en) 2012-10-11 2012-10-11 A kind of method and apparatus realizing audio mixing based on video source

Country Status (1)

Country Link
CN (1) CN103024339B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1655609A (en) * 2004-02-13 2005-08-17 精工爱普生株式会社 Method and system for recording videoconference data
CN102065265A (en) * 2009-11-13 2011-05-18 华为终端有限公司 Method, device and system for realizing sound mixing
CN102222503A (en) * 2010-04-14 2011-10-19 华为终端有限公司 Mixed sound processing method, device and system of audio signal
CN102270456A (en) * 2010-06-07 2011-12-07 华为终端有限公司 Method and device for audio signal mixing processing

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102655584B (en) * 2011-03-04 2017-11-24 中兴通讯股份有限公司 The method and system that media data sends and played in a kind of Telepresence

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1655609A (en) * 2004-02-13 2005-08-17 精工爱普生株式会社 Method and system for recording videoconference data
CN102065265A (en) * 2009-11-13 2011-05-18 华为终端有限公司 Method, device and system for realizing sound mixing
CN102222503A (en) * 2010-04-14 2011-10-19 华为终端有限公司 Mixed sound processing method, device and system of audio signal
CN102270456A (en) * 2010-06-07 2011-12-07 华为终端有限公司 Method and device for audio signal mixing processing

Also Published As

Publication number Publication date
CN103024339A (en) 2013-04-03

Similar Documents

Publication Publication Date Title
CN102291562B (en) Conference terminal, conference server, conference system and data processing method
CN104521180B (en) Conference call method, apparatus and system based on Unified Communication
CN101132516B (en) Method, system for video communication and device used for the same
CN101877643B (en) Multipoint sound-mixing distant view presenting method, device and system
CN101895718B (en) Video conference system multi-image broadcast method, and device and system thereof
CN105578199A (en) Virtual reality panorama multimedia processing system and method and client device
CN104038722A (en) Content interaction method and content interaction system for video conference
CN103338348A (en) Implementation method, system and server for audio-video conference over internet
CN103581609A (en) Video processing method, device and system
CN102984496A (en) Processing method, device and system of video and audio information in video conference
CN103051864A (en) Mobile video conference method and system thereof
CN104113721A (en) Method and device for displaying conference materials in video conference
CN110166729A (en) Cloud video-meeting method, device, system, medium and calculating equipment
CN105554430A (en) Video call method, system and device
CN104135484B (en) A kind of embedded system of integrated interactive whiteboard and video conference
CN105357208A (en) Multi-party network audio session method and system
CN110798644A (en) Wireless screen-casting conference system based on HDMI signal conversion
CN104283857A (en) Method, device and system for creating multimedia conference
CN104427295A (en) Method for processing video in video conference and terminal
CN102348097A (en) Session method and multi-point control unit for video conference
CN101083752A (en) Method for displaying meeting place name of video conference and video terminal
CN103024339B (en) A kind of method and apparatus realizing audio mixing based on video source
CN106559640A (en) Two-stage MCU strange land is in real time mutually for device
US8976225B2 (en) Method, computer program and device for managing media stream in video conferencing
CN107465889A (en) A kind of soft or hard video conference interconnect device of electric power and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant