CN102572676A

CN102572676A - Real-time rendering method for virtual auditory environment

Info

Publication number: CN102572676A
Application number: CN201210014504XA
Authority: CN
Inventors: 张承云; 谢菠荪
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2012-01-16
Filing date: 2012-01-16
Publication date: 2012-07-11
Anticipated expiration: 2032-01-16
Also published as: CN102572676B

Abstract

The invention discloses a real-time rendering method for a virtual auditory environment. According to the method, the initial information of the virtual auditory environment can be set, and a head trace tracker is used for detecting the dynamic spatial positions of six degrees of freedom of motion of the head of a listener in real time, and dynamically simulating sound sources, sound transmission, environment reflection, the radiation and binaural sound signal conversion of a receiver, and the like in real time according to the data. In the simulation of the binaural sound signal conversion, a method for realizing joint processing on a plurality of virtual sound sources in different directions and at difference distances by utilizing a shared filter is adopted, so that signal processing efficiency is improved. A binaural sound signal is subjected to earphone-ear canal transmission characteristic balancing, and then is fed to an earphone for replay, so that a realistic spatial auditory event or perception can be generated.

Description

A kind of virtual acoustic environments real-time drawing method

Technical field

Patent of the present invention relates to the electroacoustic techniques field, is specifically related to a kind of virtual acoustic environments real-time drawing method.

Background technology

Virtual acoustic environments is to be produced or the control acoustic enviroment by manual work, and the sense of hearing perception that makes attentive listener produce just as place oneself in the midst of the nature acoustic enviroment has important use at aspects such as scientific research, virtual reality, acoustics Aided Design, communications.

In the acoustic enviroment of nature, the sound wave of sound source radiation forms the space sound field through the approach transmission of direct sound wave and various environment edge reflection, diffuse sound, and it includes the time and the space acoustic information of sound source and environment.After attentive listener got into sound field, ears are received was the acoustic pressure that comprises himself physiological structure reflection and scattering.The reflection of physiological structure and scattering convert the acoustic information of sound source and environment to binaural sound (pressure) signal at eardrum place.The binaural sound signal forms corresponding spatial hearing incident after auditory system (comprising high-rise nervous system) is handled, like the sense of hearing perception of auditory localization, acoustic enviroment etc.

When the attentive listener head was fixed, the transmission of sound wave from the sound source to ears can be regarded the LTI process as, and the frequency domain acoustic transmission function under the free field situation from the sound source to ears can use a related transfer function (HRTF) to describe.If the locus spherical coordinates of the relative attentive listener head center of sound source S is (r; θ; φ), wherein r is the sound source distance :-90 °≤φ≤90 ° and 0 °≤θ＜360 ° represent respectively the elevation angle and azimuth: φ=0 ° with+90 ° represent respectively horizontal plane with directly over: horizontal plane θ=0 ° with 90 ° respectively before the expression just and just right-hand.Then a related transfer function (HRTF) is:

H_{L} (r, θ, φ, f) = \frac{P_{L} (r, θ, φ, f)}{P_{0} (f)}

H_{R} (r, θ, φ, f) = \frac{P_{R} (r, θ, φ, f)}{P_{0} (f)} - - - (1)

Wherein, P _LAnd P _RBe respectively that the position is (r, θ, the frequency domain acoustic pressure that sound source φ) produces at ears: P ₀It is the acoustic pressure that head is removed the back center position.Generally speaking, HRTF is relevant with sound source position and frequency f, and with individual relevant.To the far field of sound source apart from r >=1.0m, the approximate and range-independence of HRTF.But for the near field of r＜1.0m, HRTF and distance dependent, thereby comprise the factor that distance is located.The time-domain representation of HRTF is a coherent pulse response (HRIR), and they are interrelated by Fourier transform.

In traditional signal processing, for the blended space position is that (virtual free field point sound source φ) can be with the time-domain signal e of single channel for r, θ ₀(t) do after suitable delay process and the amplitude scale and a pair of corresponding HRIR convolution (or HRTF filtering of equal value), obtain the binaural sound signal:

e_{L} (t) = \frac{1}{r} h_{L} (r, θ, φ, t) * e_{0} (t - T)

e_{R} (t) = \frac{1}{r} h_{R} (r, θ, φ, t) * e_{0} (t - T) - - - (2)

H wherein _LWith h _RBe respectively the HRIR that sound source arrives left and right ear, t is the time.T=r/c is the transmission delay that sound source arrives attentive listener, c is the velocity of sound: amplitude scale 1/r has simulated the amplitude of spherical sound wave in the free field with range attenuation.

To the situation of synthetic many virtual sound sources, suppose to have M virtual sound source, its input signal is e _{0, i}And lay respectively at locus (r (t), _i, θ _i, φ _i), i=1,2...M.According to the linear superposition theorem of sound wave, the binaural sound signal of (2) formula can be extended to the stack of M virtual sound source contribution so:

e_{L} (t) = Σ_{i = 1}^{M} \frac{1}{r_{i}} h_{L} (r_{i}, θ_{i}, φ_{i}, t) * e_{0, i} (t - T_{i})

e_{R} (t) = Σ_{i = 1}^{M} \frac{1}{r_{i}} h_{R} (r_{i}, θ_{i}, φ_{i}, t) * e_{0, i} (t - T_{i}) - - - (3)

T wherein _i=r _i/ c is the transmission delay of i virtual sound source to attentive listener.

With a pair of Headphone reproducing of binaural sound signal mixing that (3) formula obtains, then the acoustic pressure at attentive listener ears place will be proportional to locus (r _i, θ _i, φ _i) the ears acoustic pressure that each point sound source produced with, thereby in the sense of hearing, fictionalize the spatial sound source of relevant position.

Because the binaural sound signal has comprised the main information of sound, virtual acoustic environments can realize through the simulated dual otoacoustic signal and with the method that earphone (or loud speaker) is reset.Just given physics and geometrical condition; Through to the physical characteristic of sound source, sound transmission characteristics (comprise through with Ambient sound), recipient simulation to anti-/ scattering three parts of sound wave; Thereby simulate the physical process that sound wave transmits from the sound source to ears, obtain the two-part information of time (frequency) and space of sound.Wherein spatial information comprises auditory localization and Ambient acoustic intelligence (imaginary source of reflected sound available equivalents replaces); Sound source or attentive listener motion all can causing spatial information change; These multidate informations are vital to the subjective authenticity of virtual acoustic environments: attentive listener motion simultaneously also can bring important auditory localization factor, and this is important to the sound source of distinguishing the before and after image direction.Thereby signal processing should be synthesized these information and dynamic change thereof.So the realization of virtual acoustic environments should be real-time, mutual, dynamic processing procedure, is called virtual acoustic environments real-time rendering.

Existing in the world several seminars have realized virtual acoustic environments real-time rendering system on digital signal processing chip (DSP) or computer platform, comprise first generation SCATIS, the second generation IKA-SIM system of the Ruhr-Universitat Bochum development of Germany, the SLAB system of U.S. NASA (Aero-Space center), the DIVA system of Finland Helsinki University Technology, German RWTH Aachen University based on the virtual acoustic environments system of speaker playback etc.At present, virtual acoustic environments system runs into because software and hardware resources restriction and can only reduce the problem of systematic function.Because obtain real virtual acoustic environments effect; Need dynamically, in real time sound source, the sound transmission, recipient's scattering and binaural sound conversion of signals etc. are simulated; Therefore the amount of calculation of respective algorithms is very big; In the time of particularly need simulating a plurality of sound sources (imaginary source that comprises through sound source and reflected sound) simultaneously, it is more outstanding that this problem seems.The imaginary source of through in addition sound source and reflected sound is all simulated through a related transfer function (HRTF) filtering; And data processing for ease; HRTF obtains through experiment measuring earlier; In order to satisfy the sampling thheorem in frequency domain and direction in space territory simultaneously, it is very big to measure the HRTF data volume of coming out, especially the near-field HRTF data.Therefore under the certain condition of software and hardware resources, can only reduce the levels of precision of system usually, make signal processing to realize.Existing virtual acoustic environments real-time drawing method generally adopts following measure: the number that sound source is simulated in restriction simultaneously: adopt a spot of imaginary source simulated environment reflected sound (single order; Maximum second orders) spatial information and omit the higher order reflection acoustic intelligence: adopt the unusual filter of low order (HRTF) simulation recipient to the scattering of sound wave: only consider the far field sound source and ignore that a near field related transfer function brings apart from locating information: only consider the freedom of motion of attentive listener head horizontal plane and omit other degree of freedom: reduce the dynamic property of system etc., this can influence the performance of virtual acoustic environments inevitably.The development of software and hardware technology also can only be alleviated above-mentioned contradiction to a certain extent, and can not tackle the problem at its root.

Summary of the invention

The objective of the invention is to overcome the deficiency of prior art, a kind of method of virtual acoustic environments real-time rendering is provided.This method can reduce operand and required memory space greatly, and the performance of system under the certain condition of software and hardware resources is improved significantly.

The method of a kind of virtual acoustic environments real-time rendering of the present invention, its technical scheme may further comprise the steps:

The first step adopts a trace tracker and signal processing platform to detect the translation of attentive listener head in real time and rotates the dynamic spatial location of six degrees of freedom of motion altogether, comprises the direction in space of head center position and head;

The initial information of the virtual acoustic environments of second step input comprises sound-source signal, sound source characteristic, environmental characteristics and attentive listener characteristic; Sound source characteristic comprises the locus and the directional property of sound source; Environmental characteristics comprises shape and size, the acoustic absorptivity at interface, the air acoustic absorptivity in room; The attentive listener characteristic comprises the locus of attentive listener and the coherent pulse response (HRIR) that signal processing is used.

The attentive listener head dynamic spatial location data that the 3rd step was detected according to a trace tracker, dynamic direction and the distance of through sound source of calculating and the relative attentive listener of Ambient imaginary source:

The 4th step is according to distance calculation its time-delay, attenuation coefficient to attentive listener of each sound source;

The information that the 5th step was obtained according to top first to the 4th step; The sound-source signal of input is carried out many virtual sound sources shared filter Combined Treatment of different directions and distance; It mainly is method through principal component analysis; The weight that the near field of different sound source distances and direction and a far field coherent pulse response is decomposed into limited public time basic function with, add one average time function.Shared filter be according to public time basic function and average time basic function design one group of finite length impulse response (FIR) filter.Weight is decomposed in a coherent pulse response according to principal component analysis obtains; Each input signal of dynamic adjustments is to the weight gain of each filter; And dynamically adjust the time-delay and the decay of input signal according to the distance of the relative attentive listener of sound source; Fictionalize through sound source and reflected sound imaginary source, the dynamic binaural sound signal that obtains synthesizing.

The 6th step, the binaural sound signal that the 5th step is synthetic was given Headphone reproducing through earphone-duct transmission characteristic equilibrium treatment rear feed.

Principle of the present invention is: the near-field HRTF or the HRIR data of different spatial (comprising direction and distance) have correlation.After these correlations of method elimination with principal component analysis (PCA); Public time basis function weights that the HRIR of different spatial can be expressed as limited (Q) with add one average time function; And weight coefficient is only relevant with the position of sound source and the time irrelevant.Thereby can come virtual a plurality of sound sources through public time basic function and shared filter that average time, basic function designed one group (Q+1) individual finite length impulse response (FIR); And through regulating each input signal to locus (comprising distance and direction) that a plurality of virtual sound sources are controlled in the weight gain and the time-delay of each filter; To realize the simplification of signal processing, obviously reduce the operand of virtual a plurality of sound sources simultaneously.Simultaneously, adopt a trace tracker to detect the translation of attentive listener head and rotate the motion of six-freedom degree altogether, and regulate each input signal according to the position dynamic of head and realize the Dynamic Signal processing to the gain and the time-delay of each filter.

The present invention compared with prior art has following advantage and beneficial effect:

1. the present invention can virtual different directions and the sound source of distance;

2. the present invention can realize the dynamic process of the motion of six-freedom degree according to translation of attentive listener head and rotation;

3. compare with many sound sources processing method of corresponding HRIR convolution respectively with traditional each sound source, the present invention adopts shared filter how far to realize and near field virtual sound source combination treatment method, can reduce operand and data volume, raising signal processing efficient.In signal processing software and hardware ability one regularly, can handle more sound source (or more the reflection imaginary source of high-order) simultaneously;

4. the present invention only needs carry out space interpolation to the gain coefficient of Q filter when the far field of optional position, Virtual Space and near field sound source, and more traditional is directly much simple to HRIR interpolation (hrtf filter coefficient interpolation);

5. as long as the present invention continuously changes Q weight gain coefficient and can produce the motion sound source with time-delay, signal processing is fairly simple, the perceptible noise of having avoided conventional method constantly to switch HRIR simultaneously being brought;

6. as long as the present invention continuously changes weight gain coefficient, scale factor and time-delay and can realize the multidate information processing, signal processing is fairly simple, has avoided the noise of discernable switching again;

6. the present invention can pass through algorithmic language (like VC++) programming realization on multimedia computer, also can adopt the general dsp hardware circuit to realize;

7. the present invention can be used for the sound reproduction of human binaural perception studies, also can be used for the sound reproduction of virtual training equipment, virtual reality product, acoustics Aided Design, multimedia and communication equipment.

Description of drawings

Fig. 1 is a system block diagram of the present invention.

Fig. 2 is that the system hardware that adopts personal computer to do signal processing is formed sketch map.

Fig. 3 is that system software module is formed sketch map.

Fig. 4 is many virtual sound sources of the present invention Combined Treatment block diagram (left ear portions).

Fig. 5 to 7 is psychologic acoustics experimental results.

Embodiment

Below in conjunction with accompanying drawing and embodiment the present invention is done further detailed description, but the present invention requires protection range to be not limited to the scope that embodiment representes.

System block diagram of the present invention is as shown in Figure 1, and it can define or import sound-source signal, sound source characteristic (locus, directional property), environmental characteristics (like shape and size, the acoustic absorptivity at interface, the air acoustic absorptivity in room), attentive listener characteristic information such as (HRTF data sets).Trace tracker detects the locus and the direction of attentive listener head in real time, and these dynamic datas are passed to signal processing in real time.Signal processing is dynamically carried out the simulation of sound source, the sound transmission, attentive listener three parts according to the information that defines or import, the position and the directional information of attentive listener head, accomplishes the real-time rendering of virtual acoustic environments, obtains the binaural sound signal.Through after earphone-external ear characteristic equilibrium, adopt earphone to reset binaural signal.

Fig. 2 is that the system hardware that adopts personal computer to do signal processing is formed sketch map, comprises sound card, earphone, a trace tracker of personal computer, support ASIO function.Computer is the information processing centre of whole system, and computer passed the initial work that parameter is accomplished a trace tracker and sound card down through USB interface earlier before system started working, and through software interface sound source, environment, attentive listener information was set.System's back trace tracker of starting working is passed to computer with the position and the directional information of the attentive listener head of dynamic change through USB interface in real time, and signal processing software information calculations in view of the above goes out the binaural sound signal, sends Headphone reproducing to through sound card then.

Fig. 3 is that system software module is formed sketch map; Software section is realized functions such as man-machine interaction, a trace tracking data reception, audio data input and output, signal processing parameter calculating, signal processing; Whole system comprises three threads, realizes five functional modules.

(1) human-machine interface module:

Realize the setting and the demonstration of relevant information, such as the geometric data of the position that sound-source signal, attentive listener and sound source are set, environment, boundary material sound absorption characteristics, HRTF etc., dynamic displaying virtual Sounnd source direction and distance etc.

(2) a trace trackdown device interface module:

The realization computer is connected with a trace tracker, can carry out the parameter setting by correct trace tracker, receives the attentive listener head position information (comprise translation and rotate six-freedom degree) that tracker sends in real time.

(3) sound card interface module:

The realization computer is connected with sound card, and sound card is carried out the parameter setting, and the audio data of handling well is sent to sound card.This module the most important thing is the use that ASIO drives, and Steinberg company provides free api function.In order to reduce the lag time of system, the length N of data buffer zone is short more good more, but its value receives the sound card performance limitations again, and for the purpose of stable, the big or small value of ASIO data buffer zone will be higher than the lower limit of sound card ability operate as normal.

(4) signal processing parameter computing module:

Suppose certain static sound source under initial condition relatively the spherical coordinates of attentive listener for (φ), behind the attentive listener head movement, it is represented with spherical coordinates (r ', θ ', φ ') with respect to the reposition of attentive listener head for r, θ.Because in three dimensions, head has the degree of freedom of six motions, comprises three translation freedoms and three rotational freedoms, correspondingly need six coordinate parameters to represent altogether.Suppose that when beginning head center is positioned at the origin of coordinates, x, y, z axle point to respectively the positive right side, just before and directly over.After spatial translation was done at the head center, the head center position can be represented with component Δ x, Δ y and the Δ z of translation vector on three coordinates.3 parameters of rotational freedom are with (α, beta, gamma) expression, the corresponding respectively rotation around the z axle, around the rotation of x axle, around the rotation of y axle.According to attentive listener head movement data, (4) formula of utilization can calculate direction and the distance of (real or empty) the relative attentive listener of sound source:

[\begin{matrix} r^{'} \cos φ^{'} \sin θ^{'} \\ r^{'} \cos φ^{'} \cos θ^{'} \\ r^{'} \sin φ^{'} \end{matrix}] = [\begin{matrix} \cos α \cos γ + \sin α \sin β \sin γ & - \sin α \cos γ + \cos α \sin β \sin γ & \cos β \sin γ \\ \sin α \cos β & \cos α \cos β & - \sin β \\ - \cos α \sin γ + \sin α \sin β \cos γ & \sin α \sin γ + \cos α \sin β \cos γ & \cos β \cos γ \end{matrix}] [\begin{matrix} r \cos φ \sin θ - Δx \\ r \cos φ \cos θ - Δy \\ r \sin φ - Δz \end{matrix}] - - - (4)

Then according to sound source distance calculation time-delay T=r '/c, gain coefficient 1/r '.

(5) signal processing module:

Input signal is done processing such as sound source directive property, decay, time-delay; With imaginary source method simulation reflection; And can be with the sound absorption at the lower order filter simulation interface relevant: with the synthetic corresponding sound source of HRTF Filtering Processing (comprise go directly and reflect sound source) with frequency; With reverberation algorithm scattered reflection post-synthesis phase sound, at last that each sound-source signal is corresponding binaural signal mixes and earphone-duct transmission characteristic equilibrium treatment.

Because the operand of signal processing module mainly is consumed in the HRTF Filtering Processing; Except adopting traditional direct HRIR convolution to realize the HRTF filtering; In order to reduce the operand of signal processing, the method that the present invention adopts the HRIR basic function to decompose and shared filter is synthesized many virtual sound sources of different directions and distance simultaneously.Specify in the face of this down.

At first the left and right ear HRIR with different sound source positions is approximately its minimum phase function h _MinPure time-delay:

h _L(r，θ，φ，t)＝h _min，L(r，θ，φ，t-τ _L) h _R(r，θ，φ，t)＝h _min，R(r，θ，φ，t-τ _R) (5)

τ wherein _L=τ _L(r, θ, φ) and τ _R=τ _R(r, θ, φ) be a pair of relevant with sound source position but with the pure time-delay of frequency-independent.

In the conventional process that (3) formula provides in front, to each virtual sound source input signal e _{0, i}(t) use 1/r _iThe scale and the T that delays time _iAfter (i virtual sound source the transmission delay at center) to the end respectively with corresponding h _L(r _i, θ _i, φ _i), h _R(r _i, θ _i, φ _i) convolution (or frequency domain filtering), with the signal stack of M direction, obtain binaural signal again.To HRIR carry out the minimum phase of (5) formula approximate after, conventional process becomes, with each virtual sound source input signal e _{0, i}(t) T that delays time respectively _{L, i}=T _i-τ _L(r _i, θ _i, φ _i), T _R, _i=T _i-τ _R(r _i, θ _i, φ _i), then respectively with the minimum phase HRIR convolution of respective direction after again with the signal stack of M direction, obtain binaural signal:

e_{L} (t) = Σ_{i = 1}^{M} \frac{1}{r_{i}} h_{\min, L} (r_{i}, θ_{i}, φ_{i}, t) * e_{0, i} (t - T_{L, i})

e_{R} (t) = Σ_{i = 1}^{M} \frac{1}{r_{i}} h_{\min, R} (r_{i}, θ_{i}, φ_{i}, t) * e_{0, i} (t - T_{R, i}) - - - (6)

In traditional processing, synthetic each virtual sound source all will carry out process of convolution with a pair of HRIR, thereby operand is very big during synthetic many sound sources.And need directly HRIR to be refreshed and interpolation when dynamically synthetic binaural signal or motion virtual sound source, this is easy to the perceptible noise that brings.

If adopt many virtual sound sources of synthetic simultaneously different directions of decomposition of HRIR basic function and shared filter and distance, then at first the minimum phase HRIR of different Sounnd source directions and distance is decomposed into the weight combination of Q time basic function:

h_{\min, L} (r, θ, φ, t) = Σ_{q = 1}^{Q} w_{q, L} (r, θ, φ) g_{q} (t) + h_{\min, av} (t)

(7)

h_{\min, R} (r, θ, φ, t) = Σ_{q = 1}^{Q} w_{q, R} (r, θ, φ) g_{q} (t) + h_{\min, av} (t)

H wherein _{Min, av}(t) be function average time, it is the function of time, and is irrelevant with sound source position and left and right ear: g _q(t) be a series of time basic function, it is the function of time equally just, and is irrelevant with sound source position and left and right ear: and w _{Q, L}(r, θ, φ) and w _{Q, R}(r, θ φ) are the corresponding weights coefficient, and it is only relevant with sound source position and left and right ear, and be irrelevant with the time.As long as change the HRIR that weight can obtain different sound source positions.

Top function average time, time basic function and weight coefficient can be tried to achieve through known HRIR data are carried out principal component analysis (PCA).But existing principal component analytical method mainly be to sound source distance greater than the far field HRTF of 1.0m or HRIR, do not comprise the change information of sound source distance.For the processing of the sound source of simplifying virtual different distance, the present invention is generalized to the near field HRIR (comprising sound source apart from locating information) that the sound source distance is less than 1.0m with the method for principal component analysis.The KEMAR dummy head near field HRIR data that adopt experiment measuring to obtain.The ears acoustic pressure measurement point of these data is the ends at the duct simulator, and centre distance is from 0.2m to 1.0m to the end to comprise sound source, and interval 0.1m is totally 9 distances, the HRIR of 493 Sounnd source directions on each distance.To measure HRIR by (2) formula do minimum phase approximate after, length 128 points (44.1kHz sample frequency) of each minimum phase HRIR.Use minimum phase HRIR is carried out the PCA decomposition; The result shows; The minimum phase HRIR of different sound source distances and direction can be similar to Q=15 time basic function and add function representation average time, and this has represented more than 97.4% of HRIR energy variation, thereby can obtain enough accuracys.Simultaneously also can obtain the corresponding weights coefficient.

PCA exploded representation substitution (6) formula the minimum phase near field HRIR of (7) formula can obtain many virtual sound sources combination treatment method of simplifying.With left ear signal processing is example,

e_{L} (t) = Σ_{i = 1}^{M} \frac{1}{r_{i}} [Σ_{q = 1}^{Q} w_{q, L} (r_{i}, θ_{i}, φ_{i}) g_{q} (t) + h_{\min, av} (t)] * e_{0, i} (t - T_{L, i})

= Σ_{q = 1}^{Q} g_{q} (t) * [Σ_{i = 1}^{M} w_{q, L} (r_{i}, θ_{i}, φ_{i}) \frac{1}{r_{i}} e_{0, i} (t - T_{L, i})] + h_{\min, av} (t) * [Σ_{i = 1}^{M} \frac{1}{r_{i}} e_{0, i} (t - T_{L, i})] - - - (8)

Second equal sign of following formula comprises two.First expression is with each input signal e _{0, i}Time-delay T _{L, i}And use 1/r _iAfter the scale, use w again _{Q, L}(r _i, θ _i, φ _i) weighted gains: the signal with a corresponding M sound source superposes afterwards and g then _q(t) convolution, the convolution output with Q channel at last mixes.Second expression is with each input signal e _{0, i}Time-delay T _{L, i}1/r is also used in the back _iSuperpose after the scale, again and h _{Min, av}(t) convolution.Fig. 4 is the calcspar of the signal processing of drawing according to (8) formula.According to the analysis of front, each g among the figure _q(t) and h _{Min, av}(t) impulse response length is 128 points, and convolution can use corresponding 128 FIR filters to realize, this is of equal value fully on signal processing.

Can find out all M virtual sound source shared (Q+1) individual convolution function or filter g by Fig. 4 _q(t) (q=1,2...Q) and h _{Min, av}(t), and the number of convolution algorithm or filter is [in the present embodiment, (Q+1)=16] fixed, and the increase with virtual sound source number M does not increase, and this is the key that the present invention realizes efficient virtual many near fields virtual sound source Combined Treatment.And for the sound source of optional position, Virtual Space, only need carry out space interpolation to Q weight coefficient, than directly much simple in the conventional method to the HRIR interpolation.In order to produce the motion sound source, as long as according to w _{Q, L}(r _i, θ _i, φ _i) continuously change the weight gain of input signal and continuously change time-delay T _{L, i}, not only simplified signal processing, avoided the continuous noise of discovering that HRIR brought that switches simultaneously.And through continuously changing weight gain w _{Q, L}(r _i, θ _i, φ _i), scale factor 1/r _iWith time-delay T _{L, i}Realize the multidate information processing, make that also signal processing is fairly simple, and avoided the noise of discernable switching.

Below the needed HRIR data volume of signal processing is analyzed.If original HRIR amplitude comprises M sound source position, each position N discrete time point, total M * N real data.The method that adopts the present invention to propose, carry out the PCA decomposition with Q spectral shape basic function after, have Q * N+Q * M+N real data.As long as Q satisfies the condition of (9) formula, total data volume just can obtain compression:

Q < \frac{(M - 1) N}{M + N} - - - (9)

For example, M=9 (distance) * 493 (direction) in this enforcement row, N=128, Q=15.Thereby to decompose the back data volume be original 12.1%, and data are effectively compressed.

Below the needed operand of signal processing is analyzed.If the length of HRIR is N sampled point, a while virtual M sound source.When adopting traditional signal processing; Can know by (3) formula; Need be with M HRIR being carried out convolution algorithm after the time-delay of M input signal and the scale and with the stack of gained signal, the binaural sound signal that therefore under the situation of not considering scale, calculates a sample point needs the computing of 2MN multiplication and 2 (MN-1) sub-addition.And the method that adopts the present invention to propose; Carry out after PCA decomposes with Q spectral shape basic function; Can know by (8) formula; Calculate the monaural acoustical signal and comprise (Q+1) individual time domain convolution algorithm altogether, add the gain weighted of all directions, the binaural signal that under the situation of not considering scale, calculates a sample point needs the inferior multiplication of 2MQ+2N (Q+1) and (2N+2M-4) (Q+1) sub-addition computing.Because operand depends primarily on multiplying, therefore as long as sound source quantity M satisfies the condition of (10) formula, total operand just can be reduced:

M > \frac{N (Q + 1)}{N - Q} - - - (10)

In fact, make M virtual sound source shared one group of parallel convolution (or filter) to handle because PCA decomposes, the number that increases virtual sound source need not increase process of convolution, only needs to increase the input signal scale is handled.Therefore the increase of the number of virtual sound source only causes amount of calculation increase a little, thereby has broken through in the traditional treatment method amount of calculation to the sound source limited in number.For example: K=100, N=128, Q=15, the operand of signal processing is 28.3% of a conventional method, thereby operand is obviously descended.

After building virtual acoustic environments real-time rendering system according to said method, system's various performance parameters be can measure, system lag time, system refresh rate comprised, system can simultaneously treated sound source (real and empty) number.And the actual perceived performance of passing through psychologic acoustics experimental verification system.

Case effect and checking.

The system hardware of selecting is configured to: personal computer (Intel Q9550 CPU2.83GHz, 4G internal memory), sound card (ESIUGM96), earphone (Sennheiser HD250), a trace tracker (Polhemus FASTRAK).Software section adopts Microsoft Visual Studio.NET 2003 platforms to realize with C Plus Plus.128 points are got in the ASIO data buffer zone of sound card.

Table 1 has been listed the performance parameter of the system example that measures.Wherein system refers to from attentive listener and move to certain position and system exports the time difference between the corresponding response signal lag time.The system refresh rate refers to signal processing (scene) refreshing frequency in the unit interval.

The tabulation of table 1 system performance parameter

The purpose of psychologic acoustics experiment is the synthetic of authentication near field virtual sound source on the one hand, and the simultaneous verification shared filter realizes the actual effect of many virtual sound sources combination treatment method.

Adopt full audio-band white noise as the single channel primary signal, white noise signal is in computer, to produce (sample rate 44.1kHz, quantified precision 16bit) through software.Be employed in the near-field HRTF data that measure on the KEMAR dummy head, under traditional tupe and shared filter tupe, synthesize dynamic free field virtual sound source respectively.Get 5 kinds of different virtual sound sources apart from r=0.2,0.4,0.6,0.8,1.0m: get 18 directions on each distance; Be distributed in φ=-30 °, 0 °, 30 °, 4 latitude faces of 60 °; Wherein get 3 azimuth angle theta=0 ° during φ=60 °, 90 °, 180 °, each latitude face is got 5 azimuth angle theta=0 ° in its excess-three latitude face; 45 °, 90 °, 135 °, 180 °.Thereby always have 720 virtual source positions.

Experiment is that 0.15s, background noise are not more than in the listening room of 30dBA and carry out a reverberation time, and the experimenter is sitting on the chair of listening room center.Have 10 experimenters (men and women each five) and participate in the experiment, they are respectively the undergraduates of teacher, postgraduate and the Speciality of Physics of acoustics specialty.To every kind of tupe and each virtual source position, every experimenter does 4 times respectively and judges that therefore every kind of situation has 40 judgements.

Table 2 provided the different virtual sound source apart from the time perceived direction average front and back and chaotic up and down rate.The data that direction of mirror image is chaotic have been carried out space inversion and have been added up; Data discrete property when the result shows except r=0.2m is bigger; The result of all the other several distance correspondences is similar, and the result under traditional tupe and the shared filter pattern is basic identical.Virtual sound source directional statistics figure when as an example, Fig. 5 has provided r=0.8m.The virtual sound source of observing all directions for ease distributes, from the dead ahead, just right-hand and three viewpoints in dead astern show experimental result.The desired value of "+" expression direction, the perceived direction that the representative of the pore of elliptical center is average, and the confidential interval (this paper get α=0.05) of ellipse when to be significance level be α.If the experimental data symmetry is good, then ellipse dots, otherwise uses solid line.Wherein Fig. 5 (a) is the result of conventional process pattern, and Fig. 5 (b) is the result under the shared filter tupe.Further the variance analysis method with mathematical statistics can prove; In significance level α=0.05; To each virtual source position; The average front and back of traditional tupe and shared filter tupe, chaotic rate up and down, average perceived directional distortion (being defined as the absolute angle difference of expectation and actual perceived virtual sound source direction) difference that there are no significant.

Table 2 adopts the chaotic rate statistical form of near-field HRTF

To the virtual sound source distance; Experimental result shows; In the distance that departs near middle vertical plane direction (particularly the horizontal plane side direction) and r≤0.6m; Adopt the near-field HRTF processing of different distance can obtain different perception virtual sound source distances, but then effect is relatively poor near middle vertical plane, is hard to tell the distance of virtual sound source basically.This be because the head shadow effect make side direction HRTF with sound source apart from significant change, thereby bring apart from positioning factor.But near the HRTF the middle vertical plane is not obvious with the sound source variable in distance, thus bring apart from positioning factor a little less than.In the reality, the distance location is a result of various factor comprehensive action, and is not only the variable in distance of HRTF.For example Ambient sound also is a factor of distance location, but situation will be more complicated after introducing Ambient sound.

Fig. 6 is apart from example of positioning experiment result, has provided under two kinds of tupes horizontal plane φ=0 °, the average distance location of θ=90 ° (10 experimenters, every experimenter 4 times) and standard deviation.Variable in distance trend in the time of can finding out r≤0.6m among the figure.But the discreteness of actual perceived distance is bigger, and mean value and expectation (target) distance is also inconsistent.This is the general result of sound source (comprising virtual and true) distance location, is consistent with existing research.In addition, can prove, not have the conspicuousness difference apart from the positioning experiment result under each virtual source position, two kinds of tupes in significance level α=0.05 with the variance analysis method of mathematical statistics.

Result of upper experiment shows that traditional tupe and shared filter tupe can obtain virtual sound source positioning result of equal value.And, adopt near-field HRTF can partly produce different earshot perception in certain condition (side direction).For whether further check shared filter tupe can influence other sense of hearing perception factor (comprise non-space perception factor, as tone color etc.), carried out subjective contrast and selected experiment.

Adopt three intervals, two to force to select the method for (3I-2AFC) subjective contrast and selection experiment.The virtual sound source direction of HRIR selection, Dynamic Signal processing method, setting, the distance of setting are narrated identical with the front.The signal that traditional tupe obtains is signal A as a reference, and the signal that obtains with the shared filter mode treatment is as echo signal B.Each signal of resetting comprises three sections, and first section is respectively reference and comparison signal for reference signal, second and the 3rd section, arranges by order at random, and two kinds of order of AAB and ABA are promptly arranged.Long 10 seconds of every segment signal, loop play.In the experiment, the experimenter judge second with the 3rd segment signal in which section different in sense of hearing perception with first section (comprising direction, distance, tone color), force selection as if judging then with mode at random.18 directions, 5 kinds of set of range have 90 groups of signals altogether, and every multicast is put 6 times, each order 3 times.All signals are play by random sequence.Totally 10 of experimenters who participates in the experiment (4 of 6 woman of man), 9 of the postgraduates of acoustics specialty wherein, 1 of information engineering professional undergraduate.

To each virtual sound source distance and direction, 10 experimenters judge 60 times altogether.The result's who judges accuracy is an obedience (0-1) distributed random variable.When sample is enough big,, the experimenter selects answer at random if can't judging, and then accuracy levels off to p ₀=0.5.For the accuracy of 3I-2AFC experiment, it is generally acknowledged and work as p ₀>=0.75 o'clock can the discrimination objective signal and the difference of reference signal.Experimental data is pressed p ₀=0.5 and p ₀>=0.75 does bilateral and monolateral hypothesis testing respectively, can obtain corresponding statistics.Wherein p=0.373 is the failure of an experiment critical value, and p=0.627 can not distinguish higher limit, and p=0.658 can distinguish lower limit.The data statistics result shows that all the other are and can not distinguish except can not coming to a conclusion in (θ=135 °, φ=-30 °, r=0.2m), (θ=135 °, φ=30 °, r=0.4m), (θ=0 °, φ=60 °, r=0.8m).The shared filter method that this result further illustrates the present invention's proposition can not influence virtual auditory effect really.As an example, Fig. 7 has provided the experiment statistics result of various different distance on the horizontal plane.

Above-mentioned psychologic acoustics experimental result shows that many sound sources Combined Treatment of shared filter can obtain the subjective perception effect of equal value with conventional process.

Research of the present invention obtains the subsidy of " state natural sciences fund, numbering: 10774049 " and the autonomous research topic of subtropical zone architecture science National Key Laboratory.”

Claims

1. virtual acoustic environments real-time drawing method is characterized in that comprising the steps and treatment conditions:

1) detects attentive listener head dynamic spatial location in real time;

2) initial information of the virtual acoustic environments of input comprises sound-source signal, sound source characteristic, environmental characteristics and attentive listener characteristic;

3) calculate dynamic direction and the distance of through sound source and the relative attentive listener of Ambient imaginary source according to attentive listener head dynamic spatial location;

4) according to the distance calculation of each sound source its time-delay, attenuation coefficient to attentive listener;

5) sound-source signal to input carries out many virtual sound sources shared filter Combined Treatment of different directions and distance; Each input signal of dynamic adjustments is to the weight gain of each filter; And dynamically adjust the time-delay and the decay of input signal according to the distance of the relative attentive listener of sound source; Fictionalize through sound source and reflected sound imaginary source, the dynamic binaural sound signal that obtains synthesizing;

6) the 5th step is synthetic binaural sound signal is given Headphone reproducing through earphone-duct transmission characteristic equilibrium treatment rear feed.

2. a kind of virtual acoustic environments real-time drawing method according to claim 1 is characterized in that said dynamic spatial location comprises translation and rotates six degrees of freedom of motion altogether.

3. a kind of virtual acoustic environments real-time drawing method according to claim 2 is characterized in that said locus also comprises the direction in space of head center position and head.

4. a kind of virtual acoustic environments real-time drawing method according to claim 1 is characterized in that said step 2) sound source characteristic comprise the locus and the directional property of sound source; Environmental characteristics comprises shape and size, the acoustic absorptivity at interface, the air acoustic absorptivity in room; The attentive listener characteristic comprises the locus of attentive listener and the coherent pulse response (HRTF) that signal processing is used.

5. a kind of virtual acoustic environments real-time drawing method according to claim 1; The shared filter Combined Treatment that it is characterized in that said step 5) is the method through principal component analysis; The weight that the near field of different sound source distances and direction and a far field coherent pulse response is decomposed into limited public time basic function with, add one average time function.

6. a kind of virtual acoustic environments real-time drawing method according to claim 5, it is characterized in that said shared filter be according to public time basic function and average time basic function design one group of finite length impulse response (FIR) filter.