CN1859511A

CN1859511A - Telephone conference voice mixing method

Info

Publication number: CN1859511A
Application number: CN 200510034524
Authority: CN
Inventors: 朱祥文; 吴宗武
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2005-04-30
Filing date: 2005-04-30
Publication date: 2006-11-08

Abstract

A teleconference sound mixing method includes A, configuring one fixed time section, calculating each attend a meeting side time domain energy in fixed time section, judging maximum sound side and sub-maximum side; B, in current time section previous time section maximum sound side and sub-maximum side to make sound mixing, to obtain first sound mixing data, and according to current time section maximum sound side sub-maximum side to make sound mixing, to obtain second sound mixing data; C, dividing fixed time section into sound mixing zone smooth zone; in current time section sound mixing zone, sound mixing result being first sound mixing data or second sound mixing data; but in current time section smooth zone, first sound mixing data descent along with time, second sound mixing data increasing along with time, sound mixing result formed by first and second sound mixing data superposition. Said method makes adjacent time section different sound mixing data capable of smooth transition, improving sound mixing effect.

Description

A kind of telephone conference voice mixing method

Technical field

The present invention relates to communication technical field, relate in particular to a kind of sound mixing method that improves voice quality in the videoconference.

Background technology

Often relate to MPTY in the middle of the conference telephone, we are modal to be two side's audio mixings.Participate in a conference telephone simultaneously such as A, B, C, D, if A, B talk simultaneously, C, D will hear the sound of A, B simultaneously, and its networking diagram as shown in Figure 1.Our equipment is when handling conference telephone, judge at first in the middle of this meeting that whose sound maximum, whose sound are time big, then maximum acoustic and time big sound are superimposed, giving other does not have to speak or speak the little participant of sound, has just realized conference telephone capabilities substantially; And be exactly audio mixing with the operation that sound is superimposed, sound mixing method has directly determined the effect of meeting-place audio mixing.Existing method is to calculate the time domain energy of each participant side in the fixed length time period (such as 20ms) mostly, determine maximum acoustic passage and time loud noise passage, in the next time period, the sound of largest passages and the sound of time major path are superposeed in proportion, give participant side.For example shown in Figure 2, suppose to have tripartite A/B/C to participate in same meeting, at first calculate the time domain energy of each voice channel in the current slot, draw the most generous and inferior generous of this time period meeting-place, as A and B, make audio mixing according to the most generous and inferior generous channel number A/B then in the next time period, the rest may be inferred in ensuing processing.Adopt this sound mixing method as shown in Figure 3, because the most generous and inferior generous channel number of volume constantly changes, the long audio mixing result of window the last period, with the long audio mixing result of back one section window, exist than big-difference in the junction, the audio mixing data are directly switched, and will cause sound variation stiff, can recognize significantly " noise ", the audio mixing quality is not good enough.

Summary of the invention

Technical problem to be solved by this invention is: when the most generous and inferior generous channel number of adjacent time period volume changes, cause sound obvious noise to occur because the audio mixing data are directly switched, so that the problem that voice quality descends.

The present invention solves the problems of the technologies described above the technical scheme that is adopted to be:

A kind of telephone conference voice mixing method may further comprise the steps:

A, set time section of setting are calculated the time domain energy of each participant side in described set time section, judge to draw sound largest passages and sound time major path in each time period;

B, in current slot, described sound largest passages and sound time major path according to a last time period carry out audio mixing, obtain the first audio mixing data, and carry out audio mixing, obtain the second audio mixing data according to the described sound largest passages and the sound time major path of current slot;

C, described set time section is divided into two intervals, audio mixing district peace skating area; In the audio mixing district of current slot, the audio mixing result is the described first audio mixing data or the second audio mixing data; And in the level and smooth district of current slot, the described first audio mixing data are dull in time to descend, and the described second audio mixing data are dull in time to be increased, and the audio mixing result is formed by stacking by the described first audio mixing data and the second audio mixing data.

Described method, wherein: in described level and smooth district, the described first audio mixing data are linear in time to descend, and the described second audio mixing data are linear in time to be increased.

Described method, wherein: the audio mixing data in described audio mixing district satisfy following formula:

MixOut (n) = \frac{(M - ramp) {\cdot X}_{1} (n) + ramp \cdot X_{2} (n)}{M}

Wherein: X ₁(n) be the described first audio mixing data;

X ₂(n) be the described second audio mixing data;

M is the digital quantity of the level and smooth district of expression time span, for greater than 0 positive integer;

Ramp is variable transit time, linear change in time, and its excursion is 0～M.

Described method, wherein: the time span that described level and smooth district is set is less than or equal to 1/2 described set time segment length, and when described set time section was 20ms, the time span in described level and smooth district was less than or equal to 10ms, and corresponding described M is less than or equal to 80.

Described method, wherein: corresponding length is the set time section of 20ms, and the best value of described M is 80.

Described method, wherein: when described level and smooth district was arranged on the rear portion of described set time section, the audio mixing result in described audio mixing district was the described first audio mixing data.

Described method, wherein: when described level and smooth district was arranged on described set time section anterior, the audio mixing result in described audio mixing district was the described second audio mixing data.

Beneficial effect of the present invention is: owing to adopted method of the present invention, when the most generous and inferior generous channel number of adjacent time period volume changes, the linking of adjacent time period different blended sound data is level and smooth, there is not saltus step, thereby greatly improved the audio mixing effect, make sound totally not have impurity, improved voice quality.

Description of drawings

Fig. 1 is a conference telephone audio mixing networking schematic diagram;

Fig. 2 is the existing audio mixing algorithm schematic diagram of conference telephone;

Fig. 3 causes the noise schematic diagram for existing audio mixing algorithm audio mixing switches;

Fig. 4 a, Fig. 4 b are respectively the audio mixing algorithm schematic diagram that the level and smooth district of the present invention is arranged on rear portion/front portion.

Embodiment

With embodiment the present invention is described in further detail with reference to the accompanying drawings below:

Referring to Fig. 3, adopt original sound mixing method, the audio mixing data that obtain in very first time section are: MixOut (n)=A (n)+B (n); The audio mixing data that obtain in second time period are: MixOut (n)=B (n)+C (n); Since from the time period 1 when time periods 2 transition, the audio mixing data are directly switched, and are easy to produce noise; The present invention attempts when the audio mixing passage changes, and by the data of level and smooth two adjacent time periods, the audio mixing effect is improved.Sound mixing method of the present invention is set a set time section equally referring to Fig. 4 a, Fig. 4 b, calculates the time domain energy of each participant side in the set time section earlier, judges to draw volume largest passages and time major path in each time period; Afterwards, in current slot, will judge that the sound of the largest passages that draws and time major path superposes in proportion according to last time period and promptly carry out audio mixing, obtain the first audio mixing data, and largest passages and time major path to current slot carry out audio mixing, obtain the second audio mixing data.Different is with prior art, sound mixing method of the present invention with each set time section to schedule length be divided into two intervals, audio mixing district peace skating area, and level and smooth district can be arranged on the front portion shown in Fig. 4 b, also can be arranged on the rear portion shown in Fig. 4 a.In the audio mixing district of current slot, the position that the audio mixing result looks level and smooth district setting is not all the first audio mixing data or the second audio mixing data; When level and smooth district was arranged on the rear portion of set time section, shown in Fig. 4 a, the audio mixing result in audio mixing district was the first audio mixing data; When level and smooth district was arranged on set time section anterior, shown in Fig. 4 b, the audio mixing result in audio mixing district was the second audio mixing data.And in the level and smooth district of current slot, the audio mixing result is formed by stacking by the first audio mixing data and the second audio mixing data, and the first audio mixing data are dull in time to descend, be equivalent to the first audio mixing data and take advantage of the coefficient of a monotone decreasing, gradually go out from this level and smooth district, increase and the second audio mixing data are dull in time, be equivalent to the coefficient that the second audio mixing data are taken advantage of a monotone increasing, be fade-in this level and smooth district.So-called monotone variation is meant and only does the variation of rising or descending in time, in change procedure without any fluctuation, i.e. always positive number or negative always of the time dependent slope of function, the simplest as linear change function in time.Because the effect in level and smooth district makes the linking of adjacent time period different blended sound data level and smooth, does not have saltus step, thereby has greatly improved the audio mixing effect.Below we to be arranged on set time section rear portion with level and smooth district be example, the inventive method is made detail analysis sets forth:

Referring to Fig. 4 a, judge the volume of this time period the most generous and inferior generous be A and B passage in the time period 0, afterwards, A and B passage are carried out audio mixing in the time period 1, obtain the first audio mixing data X ₁(n)=A (n)+B (n); In like manner, judge the volume of this time period 1 the most generous and inferior generous be B and C-channel, and B and C-channel are carried out audio mixing obtain the second audio mixing data X in the time period 1 ₂(n)=B (n)+C (n).Audio mixing is output as X in the audio mixing district of time period 1 ₁(n), and in the level and smooth district that the rear portion of time period 1 linked to each other with the time period 2, the first audio mixing data X ₁(n) linear in time decline, during to level and smooth end of extent, X ₁(n) equal 0, just gradually go out level and smooth district, form a triangle that descends in level and smooth district; And the second audio mixing data X ₂(n) linear in time rising the in level and smooth district during to level and smooth end of extent, equals X ₂(n), form a triangle that rises in level and smooth district; The audio mixing output MixOut (n) in so level and smooth district is two leg-of-mutton stacks.This shows that because the transient process of a gradual change is arranged from A, B audio mixing to B, C audio mixing, the transition nature that will become is level and smooth, therefore can obtain audio mixing effect preferably.In like manner can release, when the volume of next time period (time period 2) the most generous and volume time generous with the time period 1 not simultaneously, for example the volume of time period 2 the most generous and inferior generous be D/E, the time periods 2 the audio mixing district the audio mixing data X of audio mixing result for obtaining according to the most generous/time generous audio mixing of time period 1 ₂(n), and smoothly distinguish still at the rear portion of set time section, and in level and smooth district, audio mixing data X ₂(n) linear in time decline, linear in time rising of audio mixing data according to the most generous D of volume and time generous C of time period 2 obtains seamlessly transits.The audio mixing in level and smooth district MixOut (n) as a result can be used following formulae express:

MixOut (n) = \frac{(M - ramp) {\cdot X}_{1} (n) + ramp {\cdot X}_{2} (n)}{M}

In the formula: M is the digital quantity of the level and smooth district of expression time span, and M is the positive integer greater than 0; Ramp is variable transit time, and linear change, and its excursion in time is 0～M; By formula as seen, when ramp=0, the audio mixing in level and smooth district is MixOut (n)=X as a result ₁(n); When ramp=M, MixOut (n)=X ₂(n).By Fig. 4 and formula as seen, the level and smooth degree that adjacent time period audio mixing data are switched is determined jointly by ramp and M, the time span in the level and smooth district of operated by rotary motion is less than or equal to 1/2 set time segment length, and the set time segment length generally is set at 20ms, therefore the time span in level and smooth district is less than or equal to 10ms, and corresponding digital quantity M is 80; In actual applications, M just gets 80.Be also shown in by formula, when the volume of adjacent time period the most generous and time generous identical, i.e. X ₁(n) equal X ₂(n) time, the audio mixing result in level and smooth district just equals the audio mixing result in this audio mixing district time period, shown in the time period among Fig. 42.

Be understandable that, for those of ordinary skills, can be equal to replacement or change according to technical scheme of the present invention and inventive concept thereof, and all these changes or replacement all should belong to the protection range of the appended claim of the present invention.

Claims

1, a kind of telephone conference voice mixing method may further comprise the steps:

2, method according to claim 1 is characterized in that: in described level and smooth district, the described first audio mixing data are linear in time to descend, and the described second audio mixing data are linear in time to be increased.

3, method according to claim 2 is characterized in that: the audio mixing data in described audio mixing district satisfy following formula:

MixOut (n) = \frac{(M - ramp) {\cdot X}_{1} (n) + ramp {\cdot X}_{2} (n)}{M}

Wherein: X ₁(n) be the described first audio mixing data;

X ₂(n) be the described second audio mixing data;

4, method according to claim 3, it is characterized in that: the time span that described level and smooth district is set is less than or equal to 1/2 described set time segment length, when described set time section is 20ms, the time span in described level and smooth district is less than or equal to 10ms, and corresponding described M is less than or equal to 80.

5, method according to claim 4 is characterized in that: corresponding length is the set time section of 20ms, and the best value of described M is 80.

6, method according to claim 5 is characterized in that: when described level and smooth district was arranged on the rear portion of described set time section, the audio mixing result in described audio mixing district was the described first audio mixing data.

7, method according to claim 5 is characterized in that: when described level and smooth district was arranged on described set time section anterior, the audio mixing result in described audio mixing district was the described second audio mixing data.