Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, rather than whole embodiment.Embodiment based in the present invention, those of ordinary skills, not making the every other embodiment obtaining under creative work prerequisite, belong to the scope of protection of the invention.
Embodiment of the method
Embodiment of the method 1
Embodiment of the method 1 of the present invention can describe by accompanying drawing 3
301, for a terminal, obtain the audio stream with respect to described terminal;
In implementing 301, the described audio stream obtaining with respect to described terminal for a terminal is specially: obtain the energy with respect to the reproduction of multiple audio streams of described terminal; According to the energy of the described reproduction of multiple audio streams obtaining, select at least 1 road audio stream of energy maximum; To the described audio stream allocation identification of selecting.
Be appreciated that for a terminal, obtaining the maximum Ji of energy road audio stream is a kind of mode realizing, can obtain all audio streams yet, its implementation does not need energy to calculate, and directly obtains relevant audio stream.
302, the audio stream allocation identification with respect to described terminal getting described in giving;
In implementing 302, described in distribute to the sign with respect to the audio stream of described terminal, specifically can adopt meeting-place number as the sign of described reproduction of multiple audio streams, certainly, also can be by the artificial distribution of conference administrator, or by the real-time distribution of meeting management system.
Be appreciated that, a kind of code name that the sign of audio stream described in embodiments of the invention is just distributed described audio stream, object is in order to distinguish described audio stream, therefore, according to embodiments of the invention, can access other sign acquiring method, to this, embodiments of the invention do not limit.
303, the described audio stream with respect to described terminal getting and described sign corresponding to described audio stream are combined and send to described terminal.
In implementing 303, for the mode that the described audio stream with respect to described terminal getting and described sign corresponding to described audio stream are combined, can be following mode:
Adopt the mode of loose combination, the described audio code stream obtaining is not done to any change, when every frame voice data is carried out to protocol encapsulation, in the packet header of agreement, add the source sign of audio stream;
And/or
The mode that employing closely combines, the monaural audio code stream obtaining described in being about to carries out encoding and decoding, monaural audio code stream after described encoding and decoding is integrated into the code stream of a multichannel, in the frame head of described multichannel code stream, increases audio stream source sign corresponding to a plurality of sound channels.
It should be noted that, for the mode that the described audio stream with respect to the described terminal sign corresponding with described audio stream combined, can adopt is all the mode of loose combination, also can be to be all the mode closely combining, can also be loose combination and closely combine the mode combining.
The agreement packet header that the sign of audio stream can be wrapped at IP, also can be at the frame head of audio frame.
Adopt the technical scheme of the embodiment of the present invention, the sign that terminal can be distributed according to the audio stream of the other-end receiving and audio stream, locates freely to the acoustic image positions of other-end.
Embodiment of the method 2
Embodiment of the method 2 of the present invention describes mainly for the way of example of the situation of individual server, and its processing procedure can describe by the drawn flow chart of accompanying drawing 4
401, server obtains described audio stream corresponding to each terminal;
In implementing 401, each terminal is generally corresponding to each meeting-place, and corresponding terminal is obtained the audio stream in corresponding meeting-place, corresponding to the server of each terminal, gets described audio stream corresponding to each terminal.
402, the described audio stream getting is calculated to its energy, and select at least 1 road audio stream of energy maximum;
Implementing 402 o'clock, server carries out respectively the calculating of energy by the audio stream corresponding to each terminal getting in 401, according to the result of energy calculating, selects at least 1 road audio stream of energy maximum, as final selecteed audio stream;
Wherein, in carrying out the computational process of audio stream energy, can adopt following method:
(1) calculate decoded audio stream audio power in a frame time in time domain, after calculating multiframe audio signal, be averaged; Or
(2) calculate decoded audio stream audio power within the scope of corresponding frequency domain in frequency domain, after calculating multiframe audio signal, be averaged; Or
(3) quantizing factor of audio stream is decoded, thereby estimate the energy of described audio stream.
Above-mentionedly to the calculating of audio stream energy, can be summarized as 2 classes, one class is the computational methods based on decoding process, mainly corresponding to (1) (2) two kinds of modes, an other class is the mode of estimating based on non-decoding, mainly corresponding to (3) this mode, why adopt this two classes mode to carry out the estimation of audio stream energy, mainly due to the agreement for different, for some audio protocols (for example: agreement G.723.1, G.729 agreement), only have by calculating the energy of described audio stream to the mode of audio stream complete decoding, for other audio protocols (for example: agreement G.722.1, AAC LD agreement), only need to decode to some parameter of audio stream, just can estimate the energy of audio stream.
After estimating the energy of audio stream, according to the strategy of audio conferencing, can select at least 1 road audio stream of energy maximum wherein as selecteed audio stream.
Be appreciated that, the energy of Dui Ge road audio stream calculates above, thereby select at least 1 road audio stream of energy maximum, just select a kind of implementation of audio stream, also can not calculate the energy of each road audio stream, and using the audio stream in all participant meeting-place all as selecteed audio stream.
403, obtain identification information corresponding to selecteed at least 1 road audio stream;
In implementing 403, for above-mentioned selecteed at least 1 road audio stream, obtain its corresponding identification information.
Wherein, the identification information of described selected audio stream specifically can adopt meeting-place number as the sign of described reproduction of multiple audio streams, certainly, and also can be by the artificial distribution of conference administrator, or by the real-time distribution of meeting management system.
Be appreciated that, a kind of code name that the sign of audio stream described in embodiments of the invention is just distributed described audio stream, object is in order to distinguish described audio stream, therefore, according to embodiments of the invention, can access other sign acquiring method, to this, embodiments of the invention do not limit.
404, by described selecteed audio stream with described in the identification information that gets combined;
In implementing 404, by described selecteed at least 1 road audio stream, combine with the identification information of the described selecteed audio stream getting.
Wherein, the mode combining comprises:
Adopt the mode of loose combination, the described audio code stream obtaining is not done to any change, when every frame voice data is carried out to protocol encapsulation, in the packet header of agreement, add the source sign of audio stream;
And/or
The mode that employing closely combines, the monaural audio code stream obtaining described in being about to carries out encoding and decoding, monaural audio code stream after described encoding and decoding is integrated into the code stream of a multichannel, in the frame head of described multichannel code stream, increases audio stream source sign corresponding to a plurality of sound channels.
It should be noted that, for the mode that the described audio stream with respect to the described terminal sign corresponding with described audio stream combined, can adopt is all the mode of loose combination, also can be to be all the mode closely combining, can also be loose combination and closely combine the mode combining.
405, by described, according to corresponding sending strategy, send to each corresponding terminal with the audio stream after identification information combination.
In implementing 405, by described, send to each corresponding terminal with the audio stream after identification information combination, specifically can adopt following strategy:
If the audio stream that is: being selected out comprises the audio stream that some terminals are obtained, the audio stream that sends to described terminal is to remove other audio streams that are selected out afterwards that described terminal is obtained audio stream; If when selecteed audio stream does not comprise the audio stream that some terminals obtain, what send to described terminal is all selecteed audio streams.
For the sending strategy of this audio stream is described more clearly, with reference to figure 5, above-mentioned strategy is described, in Fig. 5, comprise altogether 4 terminals and a server, wherein, each terminal to the dotted line implication of server is: the audio stream that each terminal collects self is uploaded to server, and server to the solid line implication of each terminal is: server is handed down to each terminal by the audio stream of selecting.Suppose the calculating through server, terminal the 2, the 3rd, energy maximal audio flows corresponding terminal, and therefore, server is just handed down to respectively terminal 1 and terminal 4 by audio stream 2,3, and server is handed down to terminal 2 by audio stream 3, and audio stream 2 is handed down to terminal 3.
Adopt the technical scheme of the embodiment of the present invention, the sign that terminal can be distributed according to the audio stream of the other-end receiving and audio stream, locates freely to the acoustic image positions of other-end.
Embodiment of the method 3
Embodiment of the method 3 of the present invention describes mainly for the way of example of the situation of a plurality of server phase cascade, and its structure can illustrate by Fig. 6
In Fig. 6, we can find out and have three servers, and four terminals, wherein terminal 1 belongs to server 2 with terminal 2, terminal 3 and terminal 4 belong to server 3, and server 2 is linked togather by 1 grade of server with server 3, wherein, can regard server 1 as master server, and server 2 and server 3 regard as server 1 from server.
For the situation of multiserver phase cascade, its processing procedure is, with reference to the flow chart of figure 7:
701, master server obtains the audio stream of uploading from server;
702, described master server is to resolving into reproduction of multiple audio streams from the described audio stream getting from server, and the way of the audio stream decompositing is the number of described terminal from server;
In implementing 702, because the described audio stream getting from server is that described each terminal from server is uploaded, therefore, describedly from server, can decomposite according to concrete terminal different audio streams.
703, described master server calculates its energy to the described audio stream decompositing, and selects at least 1 road audio stream of energy maximum;
In implementing 703, the described audio stream decompositing is calculated to energy, and the implementation procedure of selecting at least 1 road audio stream of energy maximum is similar to 402 in embodiment of the method 2 of the present invention, do not repeat them here.
704, obtain identification information corresponding to selecteed at least 1 road audio stream;
In implementing 704, master server is by obtaining identification information corresponding to selecteed at least 1 road audio stream from server.Its obtain manner is similar to 403 in embodiment of the method 2 of the present invention, does not repeat them here.
705, by described selecteed audio stream with described in the identification information that gets combined;
In implementing 705, because the implementation procedure of this step is similar to 404 in embodiment of the method 2 of the present invention, do not repeat them here.
706, by described, according to corresponding sending strategy, send to each corresponding terminal with the audio stream after identification information combination.
Because the implementation procedure of this step is similar to 405 in embodiment of the method 2 of the present invention, do not repeat them here.
Be appreciated that 3 of embodiments of the method for the present invention provide the form of the server cascade of three servers formations, for the more cascade of multiserver, the mode of its realization can complete according to the process of the present embodiment equally.
Adopt the technical scheme of the embodiment of the present invention, the sign that terminal can be distributed according to the audio stream of the other-end receiving and audio stream, locates freely to the acoustic image positions of other-end.
Embodiment of the method 4
The way of example of the situation that embodiment of the method 4 of the present invention combines with a plurality of server cascades mainly at least 1 terminal describes, and its structure can illustrate by Fig. 8
As seen from Figure 8, comprise three servers, wherein, server 1 is master server, and server 2 is from server with server 3, these three servers form the form of server cascade, in addition, Fig. 8 comprises 6 terminals altogether, wherein, terminal 1,2,3,4 is respectively under the administration of server 2,3, and terminal 5,6 is the terminal being directly connected with master server 1.
Its implementation procedure is, with reference to figure 9:
901, master server obtains the audio stream of the terminal that the audio stream uploaded from server and described master server directly administer;
902, described master server is to resolving into reproduction of multiple audio streams from the described audio stream getting from server, and the way of the audio stream decompositing is not more than the number of described terminal from server;
In implementing 902, because the described audio stream getting from server is that described each terminal from server is uploaded, therefore, describedly from server, can decomposite according to concrete terminal different audio streams.Wherein, the way of the audio stream decompositing can be less than the number of described terminal from server, according to different terminals, whether sound to determine the way of decomposited audio stream, when some terminals are during without meeting-place sound, the way of the audio stream that decomposited is less than the number of described terminal from server.
903, described master server is to the audio stream decompositing from the described audio stream getting from server and the audio stream that obtains from direct administered terminal calculating energy respectively, and selects at least 1 road audio stream of energy maximum;
In implementing 903, described master server is to the audio stream decompositing from the described audio stream getting from server and the audio stream that obtains from direct administered terminal calculating energy respectively, and the implementation procedure of selecting at least 1 road audio stream of energy maximum is similar to 402 in embodiment of the method 2 of the present invention, do not repeat them here.
904, obtain identification information corresponding to selecteed at least 1 road audio stream;
In implementing 904, because the implementation procedure of this step is similar to 403 in embodiment of the method 2 of the present invention, do not repeat them here.
905, by described selecteed audio stream with described in the identification information that gets combined;
In implementing 905, because the implementation procedure of this step is similar to 404 in embodiment of the method 2 of the present invention, do not repeat them here.
906, the audio stream after the combination of described and identification information is sent to each corresponding terminal according to corresponding sending strategy or from server.
Because the implementation procedure of this step is similar to 405 in embodiment of the method 2 of the present invention, do not repeat them here.
Be appreciated that, 4 of embodiments of the method for the present invention provide the form of the server cascade of three servers formations and two terminals of master server administration, for the more cascade of multiserver, and master server administers the mode of the realization of more terminal, can complete according to the process of the present embodiment equally.
Adopt the technical scheme of the embodiment of the present invention, the sign that terminal can be distributed according to the audio stream of the other-end receiving and audio stream, locates freely to the acoustic image positions of other-end.
Embodiment of the method 5
The processing that the audio stream that this method embodiment receives for end-on carries out, referring to Figure 10, its implementation procedure is specially:
1001, the reproduction of multiple audio streams of the allocation identification getting is extracted to identification information;
In realizing 1001, the agreement packet header that can wrap from the IP of audio stream, or obtain described identification information from the frame head of audio frame.
1002, according to the identification information of described extraction, to thering is the audio stream of same sign, shunt;
In realizing 1002, due to different audio streams, its identification information is not identical, for the audio stream of same sign, shunts, and the audio stream of like-identified is distributed to same decoder module.
1003, according to the audio stream after the identification information Wei Ge road shunting of described extraction, distribute acoustic image positions;
In realizing 1003, utilize the identification information of the audio stream that step 1001 extracts can carry out the distribution of acoustic image positions.
The distribution of acoustic image positions can be specified in advance by user, and certain acoustic image positions fixed allocation is given some terminals, also can automatically distribute, and distribution can be carried out according to following principle automatically:
(1) if sign corresponding to audio stream is consistent with the terminal of watching, the acoustic image positions in the middle of distributing, in Fig. 9, this position is the virtual sound image position before television set.Adopting the benefit of distributing is in this way that acoustic image positions and the image of watching match.
(2) if the audio signal energies of certain terminal is larger, distribute acoustic image positions above, can guarantee that like this far-end speaker's sound is from above.
(3) if the audio signal energies of certain terminal is less, distribute the acoustic image positions of both sides, such terminal may be noise, is distributed in both sides and can allows noise and far-end speaker's separated the opening of sound, thereby guarantee the definition of speaker's sound.
1004, the audio stream after described shunting is decoded, and according to the acoustic image positions information of described audio stream, described decoded audio stream is carried out to 3D sonication.
In realizing 1004, for the audio stream that is distributed in same audio stream according to identical identification information in step 1002, decode, utilize the acoustic image positions information of 1003 distribution, described decoded audio stream is carried out to 3D sonication.
Embodiment of the method for the present invention has all been used 3D sonication, and other places repeat no more.The object of 3D sonication is that the detailed process of 3D sonication can describe by following example by utilizing two audio amplifiers in left and right to build a stereophonic field, referring to Figure 11:
In Figure 11, the distance between loud speaker p1, p2 is d, and virtual sound image v1 is w apart from the distance between loud speaker p1, suppose that the acoustic image positions that certain audio stream s1 distributes is v1, s1 can be multiplied by gain g1 and be transported to p1, s1 is multiplied by gain g2 and is transported to p2, and g1, g2 can be calculated as follows:
w/d=(g1-g2)/(g1+g2) (1)
c=g1×g1+g2×g2 (2)
In formula (1), (2), g1 is L channel amplitude gain, and g2 is R channel amplitude gain, and c is a fixed value, for example, can equal 1.
When calculating the gain information of left and right acoustic channels, just can simulate three-dimensional sound field.
Adopt the technical scheme of the embodiment of the present invention, the sign that terminal can be distributed according to the audio stream of the other-end receiving and audio stream, locates freely to the acoustic image positions of other-end.
System embodiment
System embodiment 1
System embodiment of the present invention can be passed through accompanying drawing
Server 1200, for obtaining the audio stream with respect to described terminal for a terminal; The audio stream allocation identification with respect to described terminal getting described in giving; The described audio stream with respect to described terminal getting and described sign corresponding to described audio stream are combined and send to described terminal;
At least one terminal 1300, for obtaining the described audio stream with sign, extract the sign of described audio stream, and according to described sign, the described audio stream with same sign is shunted, according to the audio stream after the identification information Wei Ge road shunting of described extraction, distribute acoustic image positions; Audio stream after described shunting is decoded, and according to the acoustic image positions information of described audio stream, the audio stream after described shunting is carried out to 3D sonication.
Adopt the technical scheme of the embodiment of the present invention, the sign that terminal can be distributed according to the audio stream of the other-end receiving and audio stream, locates freely to the acoustic image positions of other-end.
System embodiment 2
With reference to figure 6 structure charts, on the basis of system embodiment 1, native system embodiment comprises a master server, i.e. server in Fig. 61, for obtaining the audio stream with respect to described terminal for a terminal; The audio stream allocation identification with respect to described terminal getting described in giving; The described audio stream with respect to described terminal getting and described sign corresponding to described audio stream are combined and send to described terminal, also for by described at least one from the described audio stream of tape identification after combination of server, be decomposed into reproduction of multiple audio streams; At least one is from server, i.e. server in Fig. 62 and server 3 for obtaining the terminal of himself administration or the audio stream of other servers, and combine the sign of the described audio stream getting and described audio stream.
Adopt the technical scheme of the embodiment of the present invention, the sign that terminal can be distributed according to the audio stream of the other-end receiving and audio stream, locates freely to the acoustic image positions of other-end.
Device embodiment
Server example
The server that the present embodiment mainly provides a kind of signal of the 3D of realization audio conferencing to process, subordinate server comprises, with reference to Figure 13:
Audio stream acquiring unit 1210, for obtaining the audio stream with respect to described terminal for a terminal; Sign allocation units 1220, for the audio stream allocation identification with respect to described terminal getting described in giving; Combination transmitting element 1230, for combining and send to described terminal by the described audio stream with respect to described terminal getting and described sign corresponding to described audio stream.
Wherein, sign allocation units 1220 specifically comprise, with reference to Figure 14: and audio stream energy harvesting module 1221, for obtaining the energy with respect to the reproduction of multiple audio streams of described terminal; Audio stream is selected module 1222, for the energy of the reproduction of multiple audio streams that obtains described in basis, selects at least 1 road audio stream of energy maximum; Sign distribution module 1223, for to the described at least 1 road audio stream allocation identification of selecting.
Sign distribution module 1223 specifically comprises, with reference to Figure 15: meeting-place number obtains submodule 12231, at least 1 road audio stream that the obtains described energy maximum meeting-place number in meeting-place, place separately; Meeting-place distribution sub module 12232, distributes to described audio stream in the meeting-place number in meeting-place, place separately for described meeting-place number being obtained at least 1 road audio stream of the described energy maximum that submodule obtains.
Described combination transmitting element 1230 specifically comprises with lower module, with reference to Figure 16: the first composite module 1231, for described selected audio code stream is not done to any change, when every frame voice data is carried out to protocol encapsulation, in the packet header of agreement, add the source sign of audio stream; And/or second composite module 1232, for described selected monaural audio code stream is carried out to encoding and decoding, monaural audio code stream after described encoding and decoding is integrated into the code stream of a multichannel, in the frame head of described multichannel code stream, increases audio stream source sign corresponding to a plurality of sound channels.
Adopt the technical scheme of the embodiment of the present invention, the sign that terminal can be distributed according to the audio stream of the other-end receiving and audio stream, locates freely to the acoustic image positions of other-end.
Apparatus embodiments
Apparatus embodiments 1
The terminal that the embodiment of the present invention also provides a kind of signal of the 3D of realization audio conferencing to process, with reference to Figure 17, comprising:
Audio treatment unit 1310, for the reproduction of multiple audio streams extraction identification information of the allocation identification from getting, and shunts audio stream according to described identification information, and described reproduction of multiple audio streams is decoded respectively;
Acoustic image positions allocation units 1320, distribute acoustic image positions for the identification information extracting according to described audio treatment unit to decoded described reproduction of multiple audio streams;
3D sonication unit 1330, for carrying out 3D sonication according to the acoustic image positions of described distribution to described decoded reproduction of multiple audio streams.
In realizing the process of the embodiment of the present invention, described audio treatment unit 1310 specifically comprises with reference to Figure 18: marker extraction module 1311, for the reproduction of multiple audio streams extraction identification information of the allocation identification from getting; Distribution module 1312, for distributing audio stream according to the described identification information extracting; Decoder module 1313, for decoding described reproduction of multiple audio streams respectively.
Adopt the technical scheme of the embodiment of the present invention, the sign that terminal can be distributed according to the audio stream of the other-end receiving and audio stream, locates freely to the acoustic image positions of other-end.
Apparatus embodiments 2
On the basis of the said equipment embodiment 1, described terminal can also comprise, with reference to Figure 19: and audio signal acquiring unit 1340, for obtaining the audio signal in described meeting-place; Audio coding unit 1350, for to the coding audio signal getting.
Adopt the technical scheme of the embodiment of the present invention, the sign that terminal can be distributed according to the audio stream of the other-end receiving and audio stream, locates freely to the acoustic image positions of other-end.
Through the above description of the embodiments, those skilled in the art can be well understood to the mode that the present invention can add essential general hardware platform by software and realize, and can certainly pass through hardware, but in a lot of situation, the former is better execution mode.Understanding based on such, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product is stored in a storage medium, comprise that some instructions are with so that a computer equipment (can be personal computer, server, or the network equipment etc.) carry out the method described in each embodiment of the present invention.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of making, be equal to replacement, improvement etc., within all should being included in the scope of protection of the invention.