Summary of the invention
The object of the present invention is to provide a kind of audio-frequency processing method that strengthens 3-D audio, can eliminate any ears cross-talk information, make the listener, produce sensation on the spot in person by earphone or two enhancing sound fields that loudspeaker are experienced to have more feeling of immersion and spatial impression.
Above-mentioned purpose can realize by following technical measures: a kind of audio-frequency processing method that strengthens 3-D audio, by alliteration being imported in the advanced line frequency time range, carry out again in the spatial dimension based on the synthetic enhancement process of alliteration based on the enhancement process of principal component.
Divides two-stage based on the enhancement process of principal component in the described frequency time range, wherein the detailed process of the first order is: about two sound channel signals average, the signal S after average obtains the signal that once filters through filter F1; Then, after handling through signed magnitude arithmetic(al), the signal after this filters, obtains the signal of secondary filter again by another filter F 2; Once the signal times of Guo Lving be weighted with former L channel input signal after with gain coefficient and, the signal times of secondary filter be weighted once more with above-mentioned L channel weighted sum gained signal after with gain coefficient and, obtain the L channel output signal L1 of output at the corresponding levels; Once the signal times of Guo Lving be weighted with former R channel input signal after with gain coefficient and, the signal times of secondary filter be weighted once more with above-mentioned R channel weighted sum gained signal after with gain coefficient and, obtain the R channel output signal R1 of output at the corresponding levels;
Wherein partial detailed process is: the signal after two sound channels are imported on average about former filters through filter F3, trap signal multiply by behind the gain coefficient with the first order in the L channel output signal exported be weighted and, obtain the L channel output signal L2 of second level output; The R channel output signal of exporting in this signal and the first order be weighted and, obtain the R channel output signal R2 of second level output.
At least need one-level based on the synthetic enhancement process of alliteration in the described spatial dimension, progression can be determined according to the sound source image number that setting is distributed in different orientation, different orientation sound source image number of many settings, just need increase the synthetic enhancement process of one-level alliteration, the processing procedure of each grade is identical; Wherein the synthetic enhancement process detailed process of first order alliteration is:
The above-mentioned L channel output signal L2 signal that acquisition is filtered through filter F4, through the signal that acquisition is filtered through filter F5 again after the delay process, two trap signals of gained are weighted and picked up signal T4 above-mentioned R channel output signal R2 earlier; This signal T4 multiply by behind the gain coefficient be weighted with L channel output signal L2 and, obtain the L channel output signal L3 of the synthetic enhancement process output of this grade alliteration;
The simultaneously above-mentioned R channel output signal R2 signal that acquisition is filtered through filter F4, through the signal that acquisition is filtered through filter F5 again after the delay process, two trap signals of gained are weighted and picked up signal T5 above-mentioned L channel output signal L2 earlier; This signal T5 multiply by behind the gain coefficient be weighted with R channel output signal R2 and, obtain the R channel output signal R3 of the synthetic enhancement process output of this grade alliteration.
Between the enhancement process that described alliteration is synthetic at different levels continuous tupe or parallel tupe or the tupe of continuous parallel mixing.
The process of the enhancement process that described alliteration is synthetic is a two-stage, and the first order and the second level are two continuous tupes, and the process of the tupe that it is continuous is the output of the synthetic input of the filtering ears of next stage from upper level.
The process of the enhancement process that described alliteration is synthetic is a two-stage, and the first order and the second level are two parallel tupes, and the synthetic input of the filtering ears of next stage and upper level is all from the output of enhancement process in the frequency time range of the second level.
Described in the frequency time range in the enhanced processes frequency response of three filters according to the characteristic of frequency spectrum and the three rank director datas that produced owing to nonlinear operation by average signal.
The frequency response of two filters is according to the both sides level disparity (ILD) and the both sides sound time lag difference (ITD) of determining to place orientation, acoustic image place in advance in the process of the synthetic enhancement process of described alliteration.
Unit delay processing time value is by in the orientation, place of determining to place acoustic image in advance in the process of the synthetic enhancement process of described alliteration, and sound passes to the another ear and time difference of producing and determining from an ear.
Describedly after the synthetic enhancement process of alliteration, increase the one-level volume and regulate control and treatment, with control audio signal output reposefully and since reinforced effects produce the human-made noise of sense of hearing aspect be reduced to minimum.
In first and second grade of the present invention frequency time range based in the enhancing process of principal component analysis and third and fourth grade spatial dimension based on after the synthetic enhancing process of alliteration, there have a plurality of virtual sound sources to be positioned to be three-dimensional different local, make the output of dual track or monophony will produce the audio frequency effect of expection, the sound field of enhancing has more feeling of immersion and spatial impression.
Embodiment
As shown in Figure 1, present embodiment divides the level Four enhancement process, its first and second grade is the interior enhancing process based on principal component analysis of frequency time range, third and fourth grade is based on the synthetic enhancing process of alliteration in the spatial dimension, following mask body is discussed enhanced processes at different levels, for simplicity, the signal of setting input is the stereo input of left and right acoustic channels and Play System adopts earphone.
As shown in Figure 2, wherein the enhancement process detailed process in the first order frequency time range is: about two sound channel signal L0, R0 average, the signal S after average obtains the signal that once filters through filter F1; Then, after handling through signed magnitude arithmetic(al), the signal after this filters, obtains the signal of secondary filter again by another filter F 2; Once the signal times of Guo Lving with gain coefficient G1 after with former L channel input signal L0 be weighted and, the signal times of secondary filter be weighted once more with above-mentioned L channel weighted sum gained signal T1 after with gain coefficient G2 and, obtain the L channel output signal L1 of output at the corresponding levels; Once the signal times of Guo Lving with gain coefficient G1 after with former R channel input signal R0 be weighted and, the signal times of secondary filter be weighted once more with above-mentioned R channel weighted sum gained signal T2 after with gain coefficient G2 and, obtain the R channel output signal R1 of output at the corresponding levels;
As shown in Figure 3, enhancement process detailed process in the frequency time range of the second level is: the signal S after two sound channel signal L0, R0 import on average about former filters through filter F3, trap signal multiply by behind the gain coefficient G3 with the first order in the L channel output signal L1 that exports be weighted and, obtain the L channel output signal L2 of second level output; The R channel output signal R1 that exports in this signal and the first order be weighted and, obtain the R channel output signal R2 of second level output.
The value of above-mentioned gain coefficient G1, G2 and G3 all should be for just, and less than 1.
The order of above-mentioned three filter F 1, F2 and F3 is in order to strengthen the principal component in the input audio frequency, the characteristic of the frequency response frequency spectrum of these three filters and the three rank director datas that produced owing to nonlinear operation (as the signed magnitude arithmetic(al) of phase I) by average signal S, so, the composition of different frequency is handled by difference filtration frequency response.The enhancing in these two stages mainly occurs in the temporal frequency scope.
As shown in Figure 4, then in third level spatial dimension based on the synthetic enhancement process detailed process of alliteration be: above-mentioned L channel output signal L2 obtains the signal of filtration through filter F4, through obtain the signal of filtration behind the delay process D1 again through filter F5, two trap signals of gained are weighted and picked up signal T4 above-mentioned R channel output signal R2 earlier; This signal T4 multiply by behind the gain coefficient G4 be weighted with L channel output signal L2 and, obtain the L channel output signal L3 of the synthetic enhancement process output of this grade alliteration; Above-mentioned R channel output signal R2 of the while signal that acquisition is filtered through filter F4, again through the signal of filter F5 acquisition filtration, two trap signals of gained are weighted and picked up signal T5 after the above-mentioned L channel output signal L2 process time-delay D1 of the elder generation processing; This signal T5 multiply by behind the gain coefficient G4 be weighted with R channel output signal R2 and, obtain the R channel output signal R3 of the synthetic enhancement process output of this grade alliteration.
Utilize filter F 4 and F 5 to carry out ears in this one-level and synthesize, the frequency response of these two filters is according to the both sides level disparity (ILD) and the both sides sound time lag difference (ITD) of determining to place acoustic image place orientation S1 in advance.Suppose the listener directly over be 0 degree, the value in its orientation, left side is for negative, angle is-180 ° to 0 °; The value in the orientation on its right is for just, and angle is 0 ° to 180 °.Simultaneously, unit delay processing time value D1 is being by determining to place orientation, the place S1 of acoustic image in advance, and sound passes to the another ear and time difference of producing and determining from an ear.In this one-level, after the left and right sound track signals L2 of input, R2 handled, left and right sides ear produced corresponding alliteration output L3 and R3 respectively.In other words, after handling like this, the listener can feel sound from S1 ,-place beyond S1 or the head.And this can be controlled by the frequency response of regulating filter F 4 and F5.And after ears composite signal T4 and T5 are revised by gain coefficient G4 (gain coefficient G4 for just and less than 1), be increased to respectively again that respective input signals L2 and R2 go up and when obtaining corresponding alliteration output L3 and R3, be only the effect that has realized that fully 3D strengthens.Such combination makes the listener experience the 3D sound field of enhancing, has more feeling of immersion and spatial impression to a certain extent.
The enhancement process of synthesizing based on alliteration in the above-mentioned spatial dimension can be multistage, progression can be determined according to the sound source image number that setting is distributed in different orientation, different orientation sound source image number of many settings just need increase the synthetic enhancement process of one-level alliteration, and the processing procedure of each grade is identical.
The process of the enhancement process that the present embodiment alliteration is synthetic is a two-stage, and this secondary is two continuous tupes, and the synthetic input of the filtering ears of next stage is from the output of upper level.As shown in Figure 5, identical with the third level shown in Fig. 4 based on the synthetic enhancement process detailed process of alliteration in the fourth stage spatial dimension, this level can further strengthen the 3D audio frequency effect.The same with the third level shown in Fig. 4, this one-level also has two different filter F 6 and F7 to be respectively applied for generation new dual track output T6 and T7.The method of the frequency response of decision F6 and F7 is the same with F5 with decision filter F 4.The difference at these the two poles of the earth only be back one-level sound source image will be distributed in different orientation S_2 and-S_2.Equally, simultaneously, unit delay processing time value D2 is being by determining to place orientation, the place S2 of acoustic image in advance, and sound passes to the another ear and time difference of producing and determining from an ear.In this one-level, the input signal that two dual tracks are handled is respectively L3 and R3, and two the input signal L2 and the R2 of the first order shown in this and Fig. 4 are different.In other words, by new synthetic ears composite signal T6 and T7, the hearer not only can feel sound be positioned at S2 and-S2, and sound field expanded to respectively S1 and-position of S1.After ears composite signal T6 and T7 are revised by gain coefficient G5 (gain coefficient G5 for just and less than 1), be increased to respectively again that respective input signals L3 and R3 go up and when obtaining corresponding alliteration output L4 and R4, such one has six sound sources is positioned at three-dimensional different place, to form the output of final complete system.Have four to be virtual in these six sound sources, lay respectively at S_1 ,-S_1, S_2 and-S_2, two other sound source is a frequency time range enhanced stereo sound signal.
As shown in Figure 6, also can be two parallel tupes in addition between the third level and the fourth stage, concrete processing procedure is with shown in Figure 5 the same, and difference is that the filtering ears of next stage and upper level synthesize the output of importing all from enhancement process in the frequency time range of the second level.During parallel tupe, in this one-level, the input signal that two dual tracks are handled is the same with two input signals of the third level shown in Fig. 4 all to be the output L2 and the R2 of the interior enhancement process of second level frequency time range.After ears composite signal T6 and T7 are revised by gain coefficient G5 (gain coefficient G5 for just and less than 1), be increased to respectively again that respective input signals L3 and R3 go up and system's output of obtaining effect same.
After the synthetic enhancement process of above-mentioned secondary alliteration, also can increase more booster stage, the progression that increases is with the same with the principle of handling output in the above-mentioned level, increase at different levels between can be continuous tupe or the parallel tupe or the tupe of continuous parallel mixing.Under the extra situation that increases level, different location, as S_3, S_4 is added in final system's output, sound field is further extended, and the cost that sound field further expands to be exactly method become more complicated.
Also can increase the one-level volume in the booster stage output in the end and regulate control and treatment, with the output of control audio signal reposefully and since reinforced effects produce the human-made noise of sense of hearing aspect be reduced to minimum, when especially the inventive method is used in digital field.
When monophonic signal is imported, the average signal S of first and second in the foregoing description grade becomes monophony input, i.e. average signal S=L channel input signal L0=R channel input signal R0.Following equation appears in this case: first order L channel output signal L1=first order R channel output signal R1, the L channel output signal L2=second level, second level R channel output signal R2, furtherly, the third level produces stereo output by following computing:
L3=L2+G4*F4(L2)
R3=R2+G4*F5(D1(R2))
The filter F 4 here,, F5 and D1 determine according to concrete location S1.The same fourth stage also produces stereo output:
L4=L3+G5*F6(L3)
R4=R3+G5*F7(D2(R3))
Here filter F 6, F7 and D2 determines according to concrete location S2, sound field further expanded to the S2 position.Under monaural situation, main because L2=R2, so not for-S1 and-handle the location of S2.Another method of handling the monophony input is at first to adopt I-Q right angle method to produce the new stereo output of left and right acoustic channels, handles (shown in Fig. 1-5) according to the identical method of the foregoing description neutral body vocal input situation then.
When Play System is two loudspeaker, it is very necessary that third and fourth level is eliminated cross-talk, and this two-stage is eliminated the frequency response of the filter of cross-talk and can be determined in advance according to filter F 4, F5, F6, F7, D1 and D2.Listen because cross-talk can be destroyed alliteration choosing, and the cancellation of cross-talk all losses can not avoid the alliteration choosing to listen the time, therefore, the effect when using 3D reinforced effects under the loudspeaker situation not use earphone is remarkable.And, should the keep at a distance distance of 1 meter on two loudspeaker of hearer.
The inventive method can be supported any sample rate, comprises 96kHz, 48kHz, 44.1KHz and 32kHz, 16kHz and 8kHz.The sample rate difference, the frequency response of the filter that all relate to is with different.But, under the relatively low situation of sample rate, because reducing of space and frequency resolution weakened the reinforced effects of 3D.
Related filter can be that the IIR filter also can be the FIR filter among the present invention.The order of the switch of IIR filter and FIR filter is in poised state (performance, speed and moderate complexity).For simplifying the enforcement of the inventive method, many IIR filter of secondary instruction can be together in series to substitute long switch FIR filter or high order IIR filter
The inventive method can be supported any existing monophony and stereo audio signal, as MP3, WMA, MIDI, digital TV, digital broadcasting and network audio etc.This method is applicable to any software and hardware, also can be built in the relevant audio player.