CN101547265B

CN101547265B - Method, equipment and system for processing signal of 3D audio conference

Info

Publication number: CN101547265B
Application number: CN200810217091.9A
Authority: CN
Inventors: 詹五洲; 王东琦
Original assignee: Huawei Device Co Ltd
Current assignee: Huawei Device Co Ltd; Huawei Device Shenzhen Co Ltd
Priority date: 2008-10-20
Filing date: 2008-10-20
Publication date: 2014-07-30
Anticipated expiration: 2028-10-20
Also published as: CN101547265A

Abstract

The embodiment of the invention provides a method, a system and equipment for processing a signal of a 3D audio conference. The method comprises the following steps: aiming at a terminal to obtain audio streams relative to the terminal; matching marks for the obtained audio streams relative to the terminal; and combining the obtained audio streams relative to the terminal and the marks corresponding to the obtained audio streams and transmitting the obtained audio streams and the marks to the terminal. The invention can solve the problems that demands of transmission passages are excessive, and the terminal can not freely position acoustic image positions of other terminals in the prior art.

Description

A kind of signal processing method of 3D audio conferencing, equipment and system

Technical field

The present invention relates to field of audio processing, relate in particular to a kind of signal processing method, equipment and system of 3D audio conferencing.

Background technology

Audio conference system, more and more extensive in the application of meeting, current audio conference system is monophony or dual track normally, lack the telepresenc in space, and when multipoint conference, Jiang Ge road sound is aliasing in together, cause the definition of sound to decline.

Available technology adopting 3D sonication is processed the audio stream in audio conferencing, by the acoustic image positions of distributing for each audio stream, and according to the position relationship of the audio stream of each acoustic image positions, regulate described audio stream in the size of the gain of left and right acoustic channels, and then build a kind of effects,sound of solid.

How, carry out in the networking of 3D audio conferencing, a solution of the prior art is to adopt distributed networking structure, each terminal needs to receive the conferencing data of other-end, then these voice datas being carried out to 3D localization process, is from different orientation to make the different audio stream of user awareness.Referring to Fig. 1, in Fig. 1, terminal 2 receives the conferencing data of terminal 1 and terminal 3, and terminal 2 is carried out 3D localization process to these voice datas, determines the orientation of terminal 1 and terminal 3.In prior art, another kind of solution is to adopt centralized networking structure, referring to Fig. 2, in conference system in Fig. 2, there are a server and a plurality of terminal, all terminals all send to server by the voice data of self, by server, according to the situation of each participant terminal, to sending to the audio stream of this participant terminal, carry out 3D localization process, and the audio stream after processing is sent to corresponding participant terminal.

In completing process of the present invention, inventor finds that prior art at least exists following problem: distributed 3D audio conferencing in prior art, due to by distribution process in each terminal, need many transmission channels, therefore can only be applicable to have the mini-session in several meeting-place; Centralized 3D audio conferencing in prior art, because all processing are all carried out on server, need to know in advance like this configuration of each terminal plays equipment, and terminal can not be carried out free location to the acoustic image positions of other-end.

Summary of the invention

The invention provides a kind of signal processing method, server, terminal and system of 3D audio conferencing, too much to solve the transmission channel demand existing in prior art, and terminal cannot be carried out the problem of freely locating to the acoustic image positions of other-end.

The embodiment of the present invention provides a kind of signal processing method of 3D audio conferencing, is applied to audio conference system, and described audio conference system comprises at least 2 terminals and 1 server, and the method comprises:

Audio stream corresponding to each terminal at least 2 terminals described in described server obtains; Wherein, audio stream corresponding to described each terminal is the audio stream in each meeting-place, terminal place of obtaining of described each terminal;

The audio stream energy getting described in calculating, and select at least 1 road audio stream of energy maximum;

Obtain described identification information corresponding at least 1 road audio stream being selected out, described identification information is that described audio stream distributes acoustic image positions for receiving terminal;

By described selecteed at least 1 road audio stream with described in the identification information that gets combine respectively;

At least 2 terminals described at least 1 road audio stream after described and identification information combination is sent to according to corresponding sending strategy;

Wherein, described sending strategy is:

If the audio stream being selected out comprises the audio stream that some terminals are corresponding, the audio stream that sends to described terminal is to remove audio stream corresponding to described terminal audio stream afterwards; If do not comprise the audio stream that some terminals are obtained in selecteed audio stream, all selecteed audio streams are sent to described terminal.

The server that the embodiment of the present invention also provides a kind of signal of 3D audio conferencing to process, is applied to audio conference system, comprising:

Audio stream acquiring unit, at least 2 audio streams corresponding to each terminal of terminal described in obtaining; Wherein, audio stream corresponding to described each terminal is the audio stream in each meeting-place, terminal place of obtaining of described each terminal;

Sign allocation units, for the reproduction of multiple audio streams allocation identification getting described in giving;

Combination transmitting element, for combining and send to described terminal by the described reproduction of multiple audio streams getting and described sign corresponding to described reproduction of multiple audio streams;

Wherein, described sign allocation units comprise: audio stream energy harvesting module, for obtaining the energy of reproduction of multiple audio streams; Audio stream is selected module, for the energy of the reproduction of multiple audio streams that obtains described in basis, selects at least 1 road audio stream of energy maximum; Sign distribution module, for to the described at least 1 road audio stream allocation identification of selecting;

Wherein, described sending strategy is:

If the audio stream being selected out comprises the audio stream that some terminals are corresponding, the audio stream that sends to described terminal is to remove audio stream corresponding to described terminal audio stream afterwards; If do not comprise the audio stream that some terminals are obtained in selecteed audio stream, all selecteed audio streams are sent to described terminal.。

The terminal that the embodiment of the present invention also provides a kind of signal of the 3D of realization audio conferencing to process, comprising:

Audio treatment unit, for the reproduction of multiple audio streams extraction identification information of the allocation identification from getting, and shunts audio stream according to described identification information, and described reproduction of multiple audio streams is decoded respectively;

Acoustic image positions allocation units, distribute acoustic image positions for the identification information extracting according to described audio treatment unit to decoded described reproduction of multiple audio streams;

3D sonication unit, for carrying out 3D sonication according to the acoustic image positions of described distribution to described decoded reproduction of multiple audio streams.

The embodiment of the present invention also provides a kind of signal processing method of 3D audio conferencing, and described method comprises:

The reproduction of multiple audio streams of the allocation identification getting is extracted to identification information;

According to the identification information of described extraction, to thering is the audio stream of same sign, shunt;

According to the audio stream after the identification information Wei Ge road shunting of described extraction, distribute acoustic image positions;

Audio stream after described shunting is decoded, and according to the acoustic image positions information of described audio stream, described decoded audio stream is carried out to 3D sonication.

Adopt the technical scheme of the embodiment of the present invention, the sign that terminal can be distributed according to the audio stream of the other-end receiving and audio stream, locates freely to the acoustic image positions of other-end.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, to the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skills, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.

Fig. 1 is the network diagram of the distributed 3D audio conference system of prior art employing;

Fig. 2 is the network diagram of the centralized 3D audio conference system of prior art employing;

Fig. 3 is the schematic flow sheet of embodiment of the method 1 of the present invention;

Fig. 4 is the schematic flow sheet of embodiment of the method 2 of the present invention;

Fig. 5 is the system group network structural representation of embodiment of the method 2 correspondences of the present invention;

Fig. 6 is the system group network structural representation of embodiment of the method 3 correspondences of the present invention;

Fig. 7 is the schematic flow sheet of embodiment of the method 3 of the present invention;

Fig. 8 is the system group network structural representation of embodiment of the method 4 correspondences of the present invention;

Fig. 9 is the schematic flow sheet of embodiment of the method 4 of the present invention;

Figure 10 is the schematic flow sheet of embodiment of the method 5 of the present invention;

Figure 11 is the structural representation of 3D sonication in embodiment of the method for the present invention;

Figure 12 is the structural representation of system embodiment 1 of the present invention;

Figure 13 is the structural representation of server example 1 of the present invention;

Figure 14 identifies the structural representation of allocation units in the server example 1 shown in Figure 13;

Figure 15 is the structural representation of the sign distribution module in the sign allocation units shown in Figure 14;

Figure 16 combines the structural representation of transmitting element in the server example 1 shown in Figure 13;

Figure 17 is the structural representation of terminal embodiment 1 of the present invention;

Figure 18 is the structural representation of the audio treatment unit in the terminal embodiment 1 shown in Figure 17;

Figure 19 is the structural representation of terminal embodiment 2 of the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, rather than whole embodiment.Embodiment based in the present invention, those of ordinary skills, not making the every other embodiment obtaining under creative work prerequisite, belong to the scope of protection of the invention.

Embodiment of the method

Embodiment of the method 1

Embodiment of the method 1 of the present invention can describe by accompanying drawing 3

301, for a terminal, obtain the audio stream with respect to described terminal;

In implementing 301, the described audio stream obtaining with respect to described terminal for a terminal is specially: obtain the energy with respect to the reproduction of multiple audio streams of described terminal; According to the energy of the described reproduction of multiple audio streams obtaining, select at least 1 road audio stream of energy maximum; To the described audio stream allocation identification of selecting.

Be appreciated that for a terminal, obtaining the maximum Ji of energy road audio stream is a kind of mode realizing, can obtain all audio streams yet, its implementation does not need energy to calculate, and directly obtains relevant audio stream.

302, the audio stream allocation identification with respect to described terminal getting described in giving;

In implementing 302, described in distribute to the sign with respect to the audio stream of described terminal, specifically can adopt meeting-place number as the sign of described reproduction of multiple audio streams, certainly, also can be by the artificial distribution of conference administrator, or by the real-time distribution of meeting management system.

Be appreciated that, a kind of code name that the sign of audio stream described in embodiments of the invention is just distributed described audio stream, object is in order to distinguish described audio stream, therefore, according to embodiments of the invention, can access other sign acquiring method, to this, embodiments of the invention do not limit.

303, the described audio stream with respect to described terminal getting and described sign corresponding to described audio stream are combined and send to described terminal.

In implementing 303, for the mode that the described audio stream with respect to described terminal getting and described sign corresponding to described audio stream are combined, can be following mode:

Adopt the mode of loose combination, the described audio code stream obtaining is not done to any change, when every frame voice data is carried out to protocol encapsulation, in the packet header of agreement, add the source sign of audio stream;

And/or

The mode that employing closely combines, the monaural audio code stream obtaining described in being about to carries out encoding and decoding, monaural audio code stream after described encoding and decoding is integrated into the code stream of a multichannel, in the frame head of described multichannel code stream, increases audio stream source sign corresponding to a plurality of sound channels.

It should be noted that, for the mode that the described audio stream with respect to the described terminal sign corresponding with described audio stream combined, can adopt is all the mode of loose combination, also can be to be all the mode closely combining, can also be loose combination and closely combine the mode combining.

The agreement packet header that the sign of audio stream can be wrapped at IP, also can be at the frame head of audio frame.

Embodiment of the method 2

Embodiment of the method 2 of the present invention describes mainly for the way of example of the situation of individual server, and its processing procedure can describe by the drawn flow chart of accompanying drawing 4

401, server obtains described audio stream corresponding to each terminal;

In implementing 401, each terminal is generally corresponding to each meeting-place, and corresponding terminal is obtained the audio stream in corresponding meeting-place, corresponding to the server of each terminal, gets described audio stream corresponding to each terminal.

402, the described audio stream getting is calculated to its energy, and select at least 1 road audio stream of energy maximum;

Implementing 402 o'clock, server carries out respectively the calculating of energy by the audio stream corresponding to each terminal getting in 401, according to the result of energy calculating, selects at least 1 road audio stream of energy maximum, as final selecteed audio stream;

Wherein, in carrying out the computational process of audio stream energy, can adopt following method:

(1) calculate decoded audio stream audio power in a frame time in time domain, after calculating multiframe audio signal, be averaged; Or

(2) calculate decoded audio stream audio power within the scope of corresponding frequency domain in frequency domain, after calculating multiframe audio signal, be averaged; Or

(3) quantizing factor of audio stream is decoded, thereby estimate the energy of described audio stream.

Above-mentionedly to the calculating of audio stream energy, can be summarized as 2 classes, one class is the computational methods based on decoding process, mainly corresponding to (1) (2) two kinds of modes, an other class is the mode of estimating based on non-decoding, mainly corresponding to (3) this mode, why adopt this two classes mode to carry out the estimation of audio stream energy, mainly due to the agreement for different, for some audio protocols (for example: agreement G.723.1, G.729 agreement), only have by calculating the energy of described audio stream to the mode of audio stream complete decoding, for other audio protocols (for example: agreement G.722.1, AAC LD agreement), only need to decode to some parameter of audio stream, just can estimate the energy of audio stream.

After estimating the energy of audio stream, according to the strategy of audio conferencing, can select at least 1 road audio stream of energy maximum wherein as selecteed audio stream.

Be appreciated that, the energy of Dui Ge road audio stream calculates above, thereby select at least 1 road audio stream of energy maximum, just select a kind of implementation of audio stream, also can not calculate the energy of each road audio stream, and using the audio stream in all participant meeting-place all as selecteed audio stream.

403, obtain identification information corresponding to selecteed at least 1 road audio stream;

In implementing 403, for above-mentioned selecteed at least 1 road audio stream, obtain its corresponding identification information.

Wherein, the identification information of described selected audio stream specifically can adopt meeting-place number as the sign of described reproduction of multiple audio streams, certainly, and also can be by the artificial distribution of conference administrator, or by the real-time distribution of meeting management system.

404, by described selecteed audio stream with described in the identification information that gets combined;

In implementing 404, by described selecteed at least 1 road audio stream, combine with the identification information of the described selecteed audio stream getting.

Wherein, the mode combining comprises:

And/or

405, by described, according to corresponding sending strategy, send to each corresponding terminal with the audio stream after identification information combination.

In implementing 405, by described, send to each corresponding terminal with the audio stream after identification information combination, specifically can adopt following strategy:

If the audio stream that is: being selected out comprises the audio stream that some terminals are obtained, the audio stream that sends to described terminal is to remove other audio streams that are selected out afterwards that described terminal is obtained audio stream; If when selecteed audio stream does not comprise the audio stream that some terminals obtain, what send to described terminal is all selecteed audio streams.

For the sending strategy of this audio stream is described more clearly, with reference to figure 5, above-mentioned strategy is described, in Fig. 5, comprise altogether 4 terminals and a server, wherein, each terminal to the dotted line implication of server is: the audio stream that each terminal collects self is uploaded to server, and server to the solid line implication of each terminal is: server is handed down to each terminal by the audio stream of selecting.Suppose the calculating through server, terminal the 2, the 3rd, energy maximal audio flows corresponding terminal, and therefore, server is just handed down to respectively terminal 1 and terminal 4 by audio stream 2,3, and server is handed down to terminal 2 by audio stream 3, and audio stream 2 is handed down to terminal 3.

Embodiment of the method 3

Embodiment of the method 3 of the present invention describes mainly for the way of example of the situation of a plurality of server phase cascade, and its structure can illustrate by Fig. 6

In Fig. 6, we can find out and have three servers, and four terminals, wherein terminal 1 belongs to server 2 with terminal 2, terminal 3 and terminal 4 belong to server 3, and server 2 is linked togather by 1 grade of server with server 3, wherein, can regard server 1 as master server, and server 2 and server 3 regard as server 1 from server.

For the situation of multiserver phase cascade, its processing procedure is, with reference to the flow chart of figure 7:

701, master server obtains the audio stream of uploading from server;

702, described master server is to resolving into reproduction of multiple audio streams from the described audio stream getting from server, and the way of the audio stream decompositing is the number of described terminal from server;

In implementing 702, because the described audio stream getting from server is that described each terminal from server is uploaded, therefore, describedly from server, can decomposite according to concrete terminal different audio streams.

703, described master server calculates its energy to the described audio stream decompositing, and selects at least 1 road audio stream of energy maximum;

In implementing 703, the described audio stream decompositing is calculated to energy, and the implementation procedure of selecting at least 1 road audio stream of energy maximum is similar to 402 in embodiment of the method 2 of the present invention, do not repeat them here.

704, obtain identification information corresponding to selecteed at least 1 road audio stream;

In implementing 704, master server is by obtaining identification information corresponding to selecteed at least 1 road audio stream from server.Its obtain manner is similar to 403 in embodiment of the method 2 of the present invention, does not repeat them here.

705, by described selecteed audio stream with described in the identification information that gets combined;

In implementing 705, because the implementation procedure of this step is similar to 404 in embodiment of the method 2 of the present invention, do not repeat them here.

706, by described, according to corresponding sending strategy, send to each corresponding terminal with the audio stream after identification information combination.

Because the implementation procedure of this step is similar to 405 in embodiment of the method 2 of the present invention, do not repeat them here.

Be appreciated that 3 of embodiments of the method for the present invention provide the form of the server cascade of three servers formations, for the more cascade of multiserver, the mode of its realization can complete according to the process of the present embodiment equally.

Embodiment of the method 4

The way of example of the situation that embodiment of the method 4 of the present invention combines with a plurality of server cascades mainly at least 1 terminal describes, and its structure can illustrate by Fig. 8

As seen from Figure 8, comprise three servers, wherein, server 1 is master server, and server 2 is from server with server 3, these three servers form the form of server cascade, in addition, Fig. 8 comprises 6 terminals altogether, wherein, terminal 1,2,3,4 is respectively under the administration of server 2,3, and terminal 5,6 is the terminal being directly connected with master server 1.

Its implementation procedure is, with reference to figure 9:

901, master server obtains the audio stream of the terminal that the audio stream uploaded from server and described master server directly administer;

902, described master server is to resolving into reproduction of multiple audio streams from the described audio stream getting from server, and the way of the audio stream decompositing is not more than the number of described terminal from server;

In implementing 902, because the described audio stream getting from server is that described each terminal from server is uploaded, therefore, describedly from server, can decomposite according to concrete terminal different audio streams.Wherein, the way of the audio stream decompositing can be less than the number of described terminal from server, according to different terminals, whether sound to determine the way of decomposited audio stream, when some terminals are during without meeting-place sound, the way of the audio stream that decomposited is less than the number of described terminal from server.

903, described master server is to the audio stream decompositing from the described audio stream getting from server and the audio stream that obtains from direct administered terminal calculating energy respectively, and selects at least 1 road audio stream of energy maximum;

In implementing 903, described master server is to the audio stream decompositing from the described audio stream getting from server and the audio stream that obtains from direct administered terminal calculating energy respectively, and the implementation procedure of selecting at least 1 road audio stream of energy maximum is similar to 402 in embodiment of the method 2 of the present invention, do not repeat them here.

904, obtain identification information corresponding to selecteed at least 1 road audio stream;

In implementing 904, because the implementation procedure of this step is similar to 403 in embodiment of the method 2 of the present invention, do not repeat them here.

905, by described selecteed audio stream with described in the identification information that gets combined;

In implementing 905, because the implementation procedure of this step is similar to 404 in embodiment of the method 2 of the present invention, do not repeat them here.

906, the audio stream after the combination of described and identification information is sent to each corresponding terminal according to corresponding sending strategy or from server.

Be appreciated that, 4 of embodiments of the method for the present invention provide the form of the server cascade of three servers formations and two terminals of master server administration, for the more cascade of multiserver, and master server administers the mode of the realization of more terminal, can complete according to the process of the present embodiment equally.

Embodiment of the method 5

The processing that the audio stream that this method embodiment receives for end-on carries out, referring to Figure 10, its implementation procedure is specially:

1001, the reproduction of multiple audio streams of the allocation identification getting is extracted to identification information;

In realizing 1001, the agreement packet header that can wrap from the IP of audio stream, or obtain described identification information from the frame head of audio frame.

1002, according to the identification information of described extraction, to thering is the audio stream of same sign, shunt;

In realizing 1002, due to different audio streams, its identification information is not identical, for the audio stream of same sign, shunts, and the audio stream of like-identified is distributed to same decoder module.

1003, according to the audio stream after the identification information Wei Ge road shunting of described extraction, distribute acoustic image positions;

In realizing 1003, utilize the identification information of the audio stream that step 1001 extracts can carry out the distribution of acoustic image positions.

The distribution of acoustic image positions can be specified in advance by user, and certain acoustic image positions fixed allocation is given some terminals, also can automatically distribute, and distribution can be carried out according to following principle automatically:

(1) if sign corresponding to audio stream is consistent with the terminal of watching, the acoustic image positions in the middle of distributing, in Fig. 9, this position is the virtual sound image position before television set.Adopting the benefit of distributing is in this way that acoustic image positions and the image of watching match.

(2) if the audio signal energies of certain terminal is larger, distribute acoustic image positions above, can guarantee that like this far-end speaker's sound is from above.

(3) if the audio signal energies of certain terminal is less, distribute the acoustic image positions of both sides, such terminal may be noise, is distributed in both sides and can allows noise and far-end speaker's separated the opening of sound, thereby guarantee the definition of speaker's sound.

1004, the audio stream after described shunting is decoded, and according to the acoustic image positions information of described audio stream, described decoded audio stream is carried out to 3D sonication.

In realizing 1004, for the audio stream that is distributed in same audio stream according to identical identification information in step 1002, decode, utilize the acoustic image positions information of 1003 distribution, described decoded audio stream is carried out to 3D sonication.

Embodiment of the method for the present invention has all been used 3D sonication, and other places repeat no more.The object of 3D sonication is that the detailed process of 3D sonication can describe by following example by utilizing two audio amplifiers in left and right to build a stereophonic field, referring to Figure 11:

In Figure 11, the distance between loud speaker p1, p2 is d, and virtual sound image v1 is w apart from the distance between loud speaker p1, suppose that the acoustic image positions that certain audio stream s1 distributes is v1, s1 can be multiplied by gain g1 and be transported to p1, s1 is multiplied by gain g2 and is transported to p2, and g1, g2 can be calculated as follows:

w/d＝(g1-g2)/(g1+g2) (1)

c＝g1×g1+g2×g2 (2)

In formula (1), (2), g1 is L channel amplitude gain, and g2 is R channel amplitude gain, and c is a fixed value, for example, can equal 1.

When calculating the gain information of left and right acoustic channels, just can simulate three-dimensional sound field.

System embodiment

System embodiment 1

System embodiment of the present invention can be passed through accompanying drawing

Server 1200, for obtaining the audio stream with respect to described terminal for a terminal; The audio stream allocation identification with respect to described terminal getting described in giving; The described audio stream with respect to described terminal getting and described sign corresponding to described audio stream are combined and send to described terminal;

At least one terminal 1300, for obtaining the described audio stream with sign, extract the sign of described audio stream, and according to described sign, the described audio stream with same sign is shunted, according to the audio stream after the identification information Wei Ge road shunting of described extraction, distribute acoustic image positions; Audio stream after described shunting is decoded, and according to the acoustic image positions information of described audio stream, the audio stream after described shunting is carried out to 3D sonication.

System embodiment 2

With reference to figure 6 structure charts, on the basis of system embodiment 1, native system embodiment comprises a master server, i.e. server in Fig. 61, for obtaining the audio stream with respect to described terminal for a terminal; The audio stream allocation identification with respect to described terminal getting described in giving; The described audio stream with respect to described terminal getting and described sign corresponding to described audio stream are combined and send to described terminal, also for by described at least one from the described audio stream of tape identification after combination of server, be decomposed into reproduction of multiple audio streams; At least one is from server, i.e. server in Fig. 62 and server 3 for obtaining the terminal of himself administration or the audio stream of other servers, and combine the sign of the described audio stream getting and described audio stream.

Device embodiment

Server example

The server that the present embodiment mainly provides a kind of signal of the 3D of realization audio conferencing to process, subordinate server comprises, with reference to Figure 13:

Audio stream acquiring unit 1210, for obtaining the audio stream with respect to described terminal for a terminal; Sign allocation units 1220, for the audio stream allocation identification with respect to described terminal getting described in giving; Combination transmitting element 1230, for combining and send to described terminal by the described audio stream with respect to described terminal getting and described sign corresponding to described audio stream.

Wherein, sign allocation units 1220 specifically comprise, with reference to Figure 14: and audio stream energy harvesting module 1221, for obtaining the energy with respect to the reproduction of multiple audio streams of described terminal; Audio stream is selected module 1222, for the energy of the reproduction of multiple audio streams that obtains described in basis, selects at least 1 road audio stream of energy maximum; Sign distribution module 1223, for to the described at least 1 road audio stream allocation identification of selecting.

Sign distribution module 1223 specifically comprises, with reference to Figure 15: meeting-place number obtains submodule 12231, at least 1 road audio stream that the obtains described energy maximum meeting-place number in meeting-place, place separately; Meeting-place distribution sub module 12232, distributes to described audio stream in the meeting-place number in meeting-place, place separately for described meeting-place number being obtained at least 1 road audio stream of the described energy maximum that submodule obtains.

Described combination transmitting element 1230 specifically comprises with lower module, with reference to Figure 16: the first composite module 1231, for described selected audio code stream is not done to any change, when every frame voice data is carried out to protocol encapsulation, in the packet header of agreement, add the source sign of audio stream; And/or second composite module 1232, for described selected monaural audio code stream is carried out to encoding and decoding, monaural audio code stream after described encoding and decoding is integrated into the code stream of a multichannel, in the frame head of described multichannel code stream, increases audio stream source sign corresponding to a plurality of sound channels.

Apparatus embodiments

Apparatus embodiments 1

The terminal that the embodiment of the present invention also provides a kind of signal of the 3D of realization audio conferencing to process, with reference to Figure 17, comprising:

Audio treatment unit 1310, for the reproduction of multiple audio streams extraction identification information of the allocation identification from getting, and shunts audio stream according to described identification information, and described reproduction of multiple audio streams is decoded respectively;

Acoustic image positions allocation units 1320, distribute acoustic image positions for the identification information extracting according to described audio treatment unit to decoded described reproduction of multiple audio streams;

3D sonication unit 1330, for carrying out 3D sonication according to the acoustic image positions of described distribution to described decoded reproduction of multiple audio streams.

In realizing the process of the embodiment of the present invention, described audio treatment unit 1310 specifically comprises with reference to Figure 18: marker extraction module 1311, for the reproduction of multiple audio streams extraction identification information of the allocation identification from getting; Distribution module 1312, for distributing audio stream according to the described identification information extracting; Decoder module 1313, for decoding described reproduction of multiple audio streams respectively.

Apparatus embodiments 2

On the basis of the said equipment embodiment 1, described terminal can also comprise, with reference to Figure 19: and audio signal acquiring unit 1340, for obtaining the audio signal in described meeting-place; Audio coding unit 1350, for to the coding audio signal getting.

Through the above description of the embodiments, those skilled in the art can be well understood to the mode that the present invention can add essential general hardware platform by software and realize, and can certainly pass through hardware, but in a lot of situation, the former is better execution mode.Understanding based on such, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product is stored in a storage medium, comprise that some instructions are with so that a computer equipment (can be personal computer, server, or the network equipment etc.) carry out the method described in each embodiment of the present invention.

The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of making, be equal to replacement, improvement etc., within all should being included in the scope of protection of the invention.

Claims

1. a signal processing method for 3D audio conferencing, is applied to audio conference system, it is characterized in that, described audio conference system comprises a plurality of terminals and 1 server, and described method comprises:

Described server obtains audio stream corresponding to each terminal in described a plurality of terminal; Wherein, audio stream corresponding to described each terminal is the audio stream in each meeting-place, terminal place of obtaining of described each terminal;

At least 1 road audio stream after described and identification information combination is sent to described a plurality of terminal according to corresponding sending strategy;

Wherein, described sending strategy is:

2. signal processing method according to claim 1, is characterized in that, described identification information, is specially: adopt meeting-place number as the allocation identification of reproduction of multiple audio streams.

3. signal processing method according to claim 1, is characterized in that, described by described selecteed at least 1 road audio stream with described in the identification information that gets combine respectively and specifically comprise following mode:

Adopt the mode of loose combination, selecteed audio code stream is not done to any change, when every frame voice data is carried out to protocol encapsulation, in the packet header of agreement, add the source sign of audio stream;

And/or

The mode that employing closely combines, is about to selecteed audio code stream and carries out encoding and decoding, the audio code stream after encoding and decoding is integrated into the code stream of a multichannel, increases audio stream source sign corresponding to a plurality of sound channels in the frame head of described multichannel code stream.

4. signal processing method according to claim 1, is characterized in that, the energy of the audio stream getting described in described calculating can adopt a kind of of following methods:

Calculate decoded audio stream audio power in a frame time in time domain, after calculating multiframe audio signal, be averaged; Or

Calculate decoded audio stream audio power within the scope of corresponding frequency domain in frequency domain, after calculating multiframe audio signal, be averaged; Or

Quantizing factor to audio stream is decoded, thereby estimates the energy of described audio stream.

5. a server of realizing the signal processing of 3D audio conferencing, is applied to audio conference system, it is characterized in that, described server comprises:

Audio stream acquiring unit, for obtaining audio stream corresponding to each terminal of a plurality of terminals; Wherein, audio stream corresponding to described each terminal is the audio stream in each meeting-place, terminal place of obtaining of described each terminal;

Wherein, sending strategy is:

6. server according to claim 5, is characterized in that, described combination transmitting element specifically comprises with lower module:

The first composite module for described selected audio code stream is not done to any change, when every frame voice data is carried out to protocol encapsulation, adds the source sign of audio stream in the packet header of agreement;

And/or

The second composite module, for selected audio code stream is carried out to encoding and decoding, is integrated into the audio code stream after encoding and decoding the code stream of a multichannel, increases audio stream source sign corresponding to a plurality of sound channels in the frame head of described multichannel code stream.