CN117202001A - Sound image virtual externalization method based on bone conduction equipment - Google Patents

Sound image virtual externalization method based on bone conduction equipment Download PDF

Info

Publication number
CN117202001A
CN117202001A CN202311329162.5A CN202311329162A CN117202001A CN 117202001 A CN117202001 A CN 117202001A CN 202311329162 A CN202311329162 A CN 202311329162A CN 117202001 A CN117202001 A CN 117202001A
Authority
CN
China
Prior art keywords
sound image
brir
externalization
signal
bone conduction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311329162.5A
Other languages
Chinese (zh)
Inventor
王杰
郑焕勇
桑晋秋
郑成诗
李晓东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou University
Original Assignee
Guangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou University filed Critical Guangzhou University
Priority to CN202311329162.5A priority Critical patent/CN117202001A/en
Publication of CN117202001A publication Critical patent/CN117202001A/en
Pending legal-status Critical Current

Links

Abstract

The invention provides a sound image virtual externalization method based on bone conduction equipment, which comprises the following steps: constructing virtual spatial audio by using a BRIR database; adjusting bone conduction equipment, and collecting the acoustic image externalization score of a subject; performing a subject subjective listening sound image externalization experiment of smoothing a frequency spectrum of a direct part of the BRIR signal by using bone conduction equipment in combination with BRIR signals with different lengths and a gamma filter bank; obtaining the length of the minimum BRIR signal and the azimuth angle of the BRIR signal which can not be used for externalizing the sound image of the subject through the result of the sound image externalization experiment, wherein if the minimum length of the BRIR signal which can not be used for externalizing the sound image is full length or the sound image with certain azimuth angle can not be externalized; the data prolongation is carried out on the early reflection of the BRIR signal, so that the data of the reverberation part of the BRIR signal is prolonged, and the sound image externalization effect of the BRIR signal is changed; the IC value of the BRIR signal that renders the sound image unvariable is changed to externalize the sound image of the subject. The invention can use BRIR signals with smaller data volume to combine with the change of IC value to make the calculation amount of convolution smaller.

Description

Sound image virtual externalization method based on bone conduction equipment
Technical Field
The invention belongs to the technical field of bone conduction ear phones, and particularly relates to a sound image virtual externalization method based on bone conduction equipment.
Background
In general, a person perceives that sound is generally caused by two paths, namely, an air guide and a bone guide, and as the name implies, the air guide is that sound waves are generated through air vibration in the air, the sound waves finally reach the auditory nervous system through the outer ear, the middle ear and the inner ear, and the bone guide is that the sound waves directly bypass the outer ear and the middle ear through the vibration of the skull, and finally reach the auditory nervous system through the inner ear. This feature has led to an increasing use of bone conduction techniques in hearing protection when bone conduction devices are worn, due to the open ear canal. However, since the transcranial attenuation of bone conduction devices is much smaller than that of air conduction ear phones, the spatial sound reproduction effect of bone conduction devices is not comparable to that of air conduction devices.
As the market of modern technologies continues to expand, such as virtual reality or augmented reality, 3D audio reproduction by headphones has become indispensable. However, the sound image reproduced by headphones is often perceived in the head, i.e. localized in the head, as opposed to the location of the sound source in the real world, which is often perceived as externalized, i.e. outside the head. Externalization plays an important role in space perception and rationality of acoustic scenes, and is a key feature for establishing an immersive acoustic environment. Many studies have been made over the years with the aim of finding basic cues for perceived sound image externalization and improving methods for simulating virtual auditory environment sound image externalization by headphone virtual sound reproduction.
One way to simulate virtual 3D audio presented through headphones is to convolve the input audio signal with a Head Related Transfer Function (HRTF), which refers to the transfer function from the sound source to the eardrum of the listener and is acoustic information without room. In this way, the simulated virtual sound source is considered to be associated with the azimuth of the HRTF, and the resulting audio signal after convolution is typically not aliased. In order to make the convolved audio signal contain the acoustic information of the room and to make the audio signal externalized, we can use Binaural Room Impulse Response (BRIR). At present, experiments on sound image externalization almost all use air conduction devices, and little is known about sound image externalization research of bone conduction devices.
At present, most of sound-image externalization experiments, especially the measurement of sound-image externalization experiments of gas guide equipment, are almost performed in anechoic rooms using personalized or non-personalized BRIR databases. However, the acoustic image externalization is greatly related to the acoustic information of the room, and the feelings of different subjects on the acoustic image externalization are also irregular, so that listeners with poor acoustic image externalization capability need the room to contain more abundant acoustic information to obtain the feeling of the acoustic image externalization. The listener's choice of unified hearing aid may not match his own sound image externalization capabilities, resulting in a listener's ability to enhance sound image externalization even after wearing the hearing aid not being significant.
Disclosure of Invention
The aim of the invention is to propose a bone conduction device-based sound image virtualization method, which uses bone conduction virtual sound playback to verify to which condition a BRIR signal is modified that cannot be externalized, in which case the sound image of a subject is externalized by changing the value of the binaural coherence (IC) of the BRIR signal that does not make the sound image externalizable.
In order to achieve the above object, the present invention provides a method for virtual externalization of sound image based on bone conduction device, the method comprising:
s1, constructing virtual space audio by using a BRIR database;
s2, adjusting the constant loudness level of the stimulus presented by the bone conduction device in a frequency range;
s3, collecting the acoustic image externalization score of the subject;
s4, performing a subject subjective listening sound image externalization experiment of smoothing a frequency spectrum of a direct part of the BRIR signal by using bone conduction equipment in combination with BRIR signals with different lengths and a gamma filter bank;
s5, obtaining the length of the minimum BRIR signal which can not be used for externalizing the sound image of the subject and the azimuth angle of the BRIR signal smoothed by the gamma filter bank according to the result of the sound image externalization experiment, if the minimum length of the BRIR signal which can not be used for externalizing the sound image is full length or the sound image with certain azimuth angle can not be used for externalizing, continuing to execute the step S6, otherwise, executing the step S7;
s6: the data prolongation is carried out on the early reflection of the BRIR signal, so that the data of the reverberation part of the BRIR signal is prolonged, and the sound image externalization effect of the BRIR signal is changed;
s7: the IC value of the BRIR signal that renders the sound image unvariable is changed to externalize the sound image of the subject.
Further, the BRIR database is a BRIR database of the university of sari, which contains BRIR databases of four rooms.
Further, the step S2 adopts an equal loudness matching method, and the stimulus presented by the air conduction earphone and the bone conduction device has the same loudness level in the measurement frequency range, which comprises the following specific steps:
s2.1, using an air conduction earphone to play stimulation, and controlling the stimulation sound pressure level at the moment to be 65dB SPL;
s2.2, alternately playing noise stimulation through bone conduction equipment and an air conduction earphone;
s2.3, the listener adjusts the amplification factor of the bone conduction device until the perceived loudness is matched with the loudness presented by the air conduction earphone.
Further, the sound image externalization experiment specifically includes: and analyzing the difference of the sound image externalization performance of the air conduction earphone and the bone conduction device and the multi-factor variance analysis, and predicting the performance problem of the bone conduction device in the aspect of sound image externalization by utilizing the analysis result.
Further, the sound image externalization score is evaluated according to the linear scale of perceived sound image externalization, and the evaluated class is classified into class 0, class 1, class 2 and class 3; wherein the level 1 indicates that the sound image is externalized and is located at the sound source; the level 2 indicates that the sound image is externalized, but not as far as the sound source; the sound image is externalized but very close to the bone conduction device; the level 0 indicates that the sound image is internalized.
The bone conduction device-based sound image virtualization method of claim 1, wherein the BRIR signal comprises 2.5, 5, 10, 20, 40, 80, 120, and 200 milliseconds;
the direct portion of the BRIR signal is the first 2.5 milliseconds;
the BRIR signal is early reflected in the interval of 2.5 milliseconds to 80 milliseconds, followed by late reverberation 80 milliseconds.
Further, the IC value is characterized by a binaural cross-correlation representing a correlation between the left and right ear signals, the binaural cross-correlation being calculated by a normalized cross-correlation function, specifically expressed as follows:
where ρ (τ) is the normalized cross-correlation function, t 1 And t 2 Is the time scale of BRIR signal, x l (t) and x r (t) is at duration t 2 -t 1 I represents the left ear, r represents the right ear, τ represents the time delay between the left and right ear signals, the maximum peak of ρ (τ) is found together with the time delay τ, in order to ensure that the binaural time difference is in a reasonable range, the delay time is limited to between-1 ms and 1 ms;
the binaural signal is calculated as the maximum value of ρ (τ) as follows:
IC=max{ρ(τ)} (2)
the low-correlation binaural signal is obtained corresponding to a high externalization score, which is the distance.
Further, the gamatine filter bank smoothes the spectrum of the direct part of the BRIR signal, by using gamatine filter banks with different bandwidth coefficients B, different degrees of smoothing are achieved, ranging from 0.316 to 63.1, and then each center frequency f is calculated c Smooth spectral amplitude of the direct part of (2)The formula is as follows:
wherein the method comprises the steps ofExpressed as the spectral size of the original direct part in BRIR, |h (f, f c ) I indicates a center frequency f c With a bandwidth of b (f c ) The spectral size of the fourth order gammatine filter bank, expressed as:
wherein the method comprises the steps of
j is an imaginary unit, f is the frequency of the signal, f c Is the center frequency of the gammatine filter bank.
Further, the data continuation is specifically:
the known limited data is used for predicting unknown data so as to achieve the purpose of data extension;
the prediction mode comprises forward prediction and backward prediction;
the forward prediction is to predict the current time by using the historical data of the current time;
the backward prediction is to predict the data of the current time using the future data of the current time.
Further, the data continuation is specifically:
the known limited data is used for predicting unknown data so as to achieve the purpose of data extension;
the prediction mode comprises forward prediction and backward prediction;
the forward prediction is to predict the current time by using the historical data of the current time;
the backward prediction is to predict the data of the current time using the future data of the current time.
The method for virtual externalization of sound images based on bone conduction equipment according to claim 9, wherein the forward prediction adopts an autoregressive model, specifically:
let s (n) be the signal sequence, expressed as:
i.e. the current output of the model is the weighted sum of the current input and the past p outputs of the model, where G is the gain, u (n) is the noise, n is the index of the signal sequence, H (Z) is the systematic function of the autoregressive model, A (Z) is the systematic function of the linear prediction error filter, p is the order of the model, k is the index of the model order, Z is the complex variable of the Z transform, the models given by equations (6) and (7) are called autoregressive models, which are all-pole models, where the coefficients a k Called prediction coefficients, and the predicted s (n) is expressed as:
wherein,is an estimate of s (n), a i For the forward linear predictor coefficients, i is an index of the forward linear predictor coefficients, and predicting or estimating the current value from the past value of s (n) is called forward linear prediction.
The backward prediction is to predict s (n-p) with earlier time from p future values s (n-p+1), …, s (n), and the specific formula is as follows:
wherein,is an estimate of s (n-p), c k For backward linear predictor coefficients, k is an index of the backward linear predictor coefficients, and predicting or estimating the current value from the future value of s (n) is called backward linear prediction.
The beneficial technical effects of the invention are at least as follows:
1) The present invention verifies, using bone conduction virtual sound reproduction, that a condition to which a BRIR signal is modified is a condition in which it cannot be externalized, in which case the sound image of a subject is externalized by changing the value of binaural coherence (IC) of the BRIR signal that makes the sound image impossible to externalize. If the subject is able to produce an acoustic image externalization with some modification of the BRIR signal, then no change in the IC value of the BRIR signal is required. At this time, the BRIR signal with a smaller data amount can be used in combination with the change of the IC value so that the calculation amount of convolution becomes smaller.
2) According to the invention, the minimum BRIR length and azimuth angle which can reach the acoustic image externalization of the subject are measured through the acoustic image externalization experiment, so that the IC value is adjusted to enable the acoustic image of the subject to meet the requirement of externalization, and a reference is provided for the test and the matching of the clinical hearing aid.
3) According to the invention, the sound image externalization under the voice signals with different modification conditions is tested under the condition of using bone conduction equipment, and the complex voice scene frequently appearing in the real scene corresponds to the sound image externalization, so that the method has reference significance on the sound image externalization under the real situation.
4) The invention increases the method for adjusting the sound image externalization by using the IC value of the BRIR signal, and a listener with good sound image externalization does not need to adjust the sound image externalization by using the IC value of the BRIR signal. Therefore, the method can minimize the complexity of the algorithm according to different sound image externalization scenes, increases the practicability of the hearing aid, and has reference significance for the research and development of the hearing aid.
5) The invention can judge the minimum BRIR length of the listener reaching the sound image externalization, and convolves the sound signal with the smaller BRIR length, thereby not only ensuring the sound image externalization perception of the listener, but also reducing the calculated amount and algorithm complexity of convolution, and reducing the operation floating point number and delay of the subsequent algorithm deployed on the hearing aid.
6) The invention adds in the data prolongation of the early reflection of BRIR signals, so the method is suitable for the room scene with smaller RT60, and the BRIR signals subjected to the data prolongation aim to increase the reverberation time of the room, so that the listener has better sound image externalization perception
Drawings
The invention will be further described with reference to the accompanying drawings, in which embodiments do not constitute any limitation of the invention, and other drawings can be obtained by one of ordinary skill in the art without inventive effort from the following drawings.
Fig. 1 is a flow chart of a method for virtual externalization of sound images based on bone conduction equipment.
Fig. 2 is a direct part amplitude spectrum of the invention after changing the direct part of the BRIR signal for the left and right ears at 60 azimuth.
Fig. 3 is a BRIR signal amplitude spectrum of the present invention after changing the BRIR signal direct portion of the left and right ears at 60 azimuth.
FIG. 4 is a GUI interface used in the experiments of the present invention.
Fig. 5 shows the results of the bone conduction acoustic image externalization experiment in experiment one, with an azimuth angle of 45 °.
Fig. 6 shows the results of the bone conduction acoustic image externalization experiment of experiment one with an azimuth angle of 90 °.
Fig. 7 shows the bone conduction acoustic image externalization experimental results of experiment two under different azimuth angles.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
As shown in fig. one, the invention provides a method for virtual externalization of sound image based on bone conduction equipment, which generates subjective listening material with space information (front half horizontal plane) through a virtual auditory environment system platform, combines the bone conduction equipment with factors causing externalization of sound image, and performs measurement and analysis of sound image externalization experiment of virtual sound source of the bone conduction equipment, the method comprises:
s1, constructing virtual space audio by using a BRIR database.
Specifically, a BRIR database is used for constructing virtual spatial audio, the database used is a BRIR database of the university of sari, four different rooms are arranged in the BRIR database of the university of sari, and the method uses sari D. SALID is a typical medium and large seminar and presentation space with a reverberation time (RT 60) of 0.89, a head and torso simulator (HATS) distance of 1.5m from the speakers, the speakers being positioned at azimuth angles of + -90 DEG, and a step size of 5 deg.
S2, adjusting the constant loudness level of the stimulus presented by the bone conduction device in the frequency range.
Specifically, the stimulus presented using the bone conduction device is tuned to a constant loudness level over a range of frequencies by an equal loudness matching method, specifically:
the air conduction stimulus was presented through a Sennheiser IE800 earphone at a sound pressure level of 65dB SPL, calibrated by an artificial head (KU 100, neumann, berlin, germany). Bone conduction stimulation is presented by the bone conduction device at the mastoid process. The stimulation is alternately presented by the air conduction sound of the Sennheiser IE800 headphones and the bone conduction sound of radio B81 on the same side. The subject adjusts the signal applied to the bone conduction device to match the perceived loudness of the air conduction stimulus of 65dB SPL.
S3, collecting the acoustic image externalization score of the subject.
Specifically, subjects scored the externalization of sound images on a GUI interface written in MATLAB voice, the specific style of the GUI interface is shown in fig. 2. After the subject hears the stimulation signal, the subject clicks the gray column of the GUI interface by using a computer mouse according to the feeling of the subject, and the sound image externalization score at the moment can be obtained.
The sound image externalization experiment is to predict the performance problem of the bone conduction device in the aspect of sound image externalization by analyzing the sound image externalization performance difference and multi-factor variance analysis of the air conduction earphone and the bone conduction device and by utilizing the analysis result, so as to provide basis and direction for subsequent improvement and improvement of the sound image externalization performance;
the sound image externalization score is mainly based on the linear scale of perceived sound image externalization, and is specifically shown in table 1:
table 1, linear scale for evaluation of Acoustic image externalization experiments
The BRIR signals of different lengths mainly comprise 2.5, 5, 10, 20, 40, 80, 120 and 200 milliseconds. The duration of the untreated BRIR is 1000 milliseconds (BRIR full length).
S4, performing a subject subjective listening sound image externalization experiment of smoothing the frequency spectrum of the direct part of the BRIR signal by using bone conduction equipment and combining BRIR signals with different lengths and a gamma filter bank.
Specifically, the subjective listening experiments of the sound and image externalization mainly comprise an experiment one and an experiment two.
Wherein, experiment one is that BRIR of each ear is truncated to the following period: 2.5, 5, 10, 20, 40, 80, 120 and 200 milliseconds. The duration of the untreated BRIR signal is 1000 milliseconds (BRIR full length). The duration of the falling slope of each truncation window is 0.5 milliseconds (about 24 sampling points at 48 khz). There are 2 azimuth angles, 45 ° and 90 °, respectively. By way of this treatment, three cases can be observed: (1) The right BRIR is truncated, the window duration is different, while the left BRIR is not truncated (noted as "right ear truncated"); (2) BRIRs of the left ear are truncated by different window times, while BRIRs of the right ear are not truncated (noted as "left ear truncated"); (3) BRIRs of both ears are truncated (denoted as "binaural truncation"). All truncated BRIRs are zero padded to a length of 1000 milliseconds.
The second experiment is to study the importance of the spectral details of the direct part of BRIR signals of the contralateral and ipsilateral ear signals to perceptual externalization, where the amplitude spectrum of the direct part is smoothed by a gammatone filter bank, while the reverberant part of BRIR signals is unchanged. The experiment was performed at 7 azimuth angles, -90 °, -60 °, -30 °, 0 °, 30 °, 60 °, 90 °, respectively. By way of this treatment, four cases can be observed: (1) The spectrum of the direct part of the left ear BRIR is smoothed, while the right ear BRIR is unchanged (noted as "left ear smoothing"); (2) The spectrum of the direct part of the right ear BRIR is smoothed, while the left ear BRIR is unchanged (noted as "right ear smoothing"); (3) The frequency spectrum of the direct part of BRIR of both ears is smoothed (denoted "binaural smoothing"); (4) BRIR for both ears did not make any changes (noted as "binaural non-smooth").
As shown in fig. 3, under the conditions of "binaural non-smoothing" (upper left subgraph), "left ear smoothing" (upper right subgraph), "right ear smoothing" (lower left subgraph), and "binaural smoothing" (lower right subgraph), the pair of BRIR signals have amplitude spectra after direct part processing at 60 ° azimuth. After the smoothing process, the spectral size of the direct part is almost constant over each frequency, and the specific notch (notch) and peak vanish. The reverberations part of the BRIR remains unchanged and is added to the processed direct part to produce a modified BRIR. Fig. 4 shows the amplitude spectrum of the modified BRIR at 60 azimuth under different conditions.
S5, obtaining the length of the minimum BRIR signal which can not be used for externalizing the sound image of the subject and the azimuth angle of the BRIR signal smoothed by the gamma filter bank according to the result of the sound image externalization experiment, if the minimum length of the BRIR signal which can not be used for externalizing the sound image is full length or the sound image with certain azimuth angle can not be used for externalizing, continuing to execute the step S6, otherwise, executing the step S7.
Specifically, the length of the minimum BRIR signal which can not externalize the sound image of the subject and the azimuth angle of the BRIR signal smoothed by the gamma filter bank are obtained through the result of the sound image externalization experiment of the subject. Fig. 5 shows the average results of an osteoinductive acoustic image externalization experiment for 8 subjects under all conditions, experiment-at a 45 ° azimuth. Fig. 6 shows the average results of an osteoinductive acoustic image externalization experiment for 8 subjects under all conditions, experiment-at 90 ° azimuth. As can be seen from both figures, the BRIR signal length is between 2.5 milliseconds and 20 milliseconds, and the sound image cannot be externalized because its externalization score is less than 1. Fig. 7 shows the average results of the bone conduction imaging externalization experiment for the subject at 8 under all conditions for the different azimuth angles. Under "binaural smoothing" conditions, the sound image is not externalizable for all azimuth angles (an externalization score of less than 1). Under the "binaural non-smooth" condition, the sound image is externalized for all azimuth angles. Under the "left ear smoothing" condition, the sound image is not externalizable (an externalization score of less than 1) for azimuth angles of 0 ° and-30 °. Under the "right ear smoothing" condition, the sound image is not externalizable (an externalization score of less than 1) for azimuth angles of 0 ° and 30 °.
S6: by carrying out data prolongation on early reflection of the BRIR signal, the data of the reverberation part of the BRIR signal is prolonged, and the sound image externalization effect of the BRIR signal is changed.
Specifically, if the maximum length of BRIR signal (the full length of BRIR signal) occurs in experiment one or the sound image of certain azimuth angle cannot be externalized in experiment two, the data of the reverberations part of BRIR signal can be prolonged by carrying out data prolongation on the early reflection of BRIR signal. The principle of extension is that the two points from the boundary point of the original signal to the two sides of the original signal have the same slope in the extended signal waveform, namely the three points are positioned on the same straight line, the extended signal keeps the variation trend of the original signal, and the waveform is smoother. The extension of data by the AR model completely meets the requirements, and the same set of AR coefficients is used for extension, so that the spectrum characteristics of the original signals are reserved, and new frequency characteristics are not added.
For the data continuation of the BRIR signal early reflection of the method, the specific steps are as follows: the first 80 milliseconds of the target BRIR signal is intercepted, forward prediction is carried out by utilizing the intercepted data, the modified data is spliced into late reverberation, and finally the BRIR signal subjected to data prolongation is obtained.
S7: the IC value of the BRIR signal that renders the sound image unvariable is changed to externalize the sound image of the subject.
Specifically, the IC value of BRIR signal that makes the sound image unvariable is changed, thereby making the sound image of the subject externalized. Theoretically, the smaller the IC value, the more easily the sound image is externalized, i.e., the farther the distance of the sound source is, and vice versa. As can be seen from the above formula, the IC value is in the range of 0 to 1, so that the sound image which cannot be externalized can be externalized by changing the IC value.
Wherein the IC value, referred to as binaural cross-correlation (IACC), is defined as the correlation between the left and right ear signals, is an important binaural cue for perceiving room acoustics. The IACC can be calculated by a normalized cross-correlation function, as follows:
where ρ (τ) is the normalized cross-correlation function, t 1 And t 2 Is the time scale of BRIR signal, x l (t) and x r (t) is at duration t 2 -t 1 And (c) a left ear signal and a right ear signal (l represents the left ear, r represents the right ear), τ represents the time delay between the left and right ear signals. The maximum peak of ρ (τ) is found along with the time delay τ, which is typically limited to between-1 ms and 1 ms in order to ensure that the binaural time difference is in a reasonable range. IC may be calculated as the maximum value of ρ (τ) as follows:
IC=max{ρ(τ)} (2)
according to the above assumption, a low correlated binaural signal (low IC value) corresponds to a high externalization score (long distance). In the invention, the IC value is calculated according to the above-mentioned formula;
specifically, the direct portion of the BRIR signal refers to the first 2.5 milliseconds of the BRIR signal being considered the direct portion, the remainder being the reverberant portion;
wherein the early reflection of the BRIR signal is early in the interval of 2.5 milliseconds to 80 milliseconds of the BRIR signal, followed by late reverberation 80 milliseconds. Early reflections consist of discrete echoes whose time and amplitude distribution depends largely on the shape of the room and the location of the sound source and receiver. These echoes play a key role in subjective spatial impression. In contrast, late reverberation itself is more suitable for statistical description and can be regarded as a feature of the space itself, irrespective of the positions of the sound source and the receiver.
Wherein the gamatine filter bank smoothes the spectrum of the direct part of the BRIR signal by using gamatine filter banks having different bandwidth coefficients B (the bandwidth coefficients being values relative to an Equivalent Rectangular Bandwidth (ERB)) to achieve different levels of smoothing, ranging from 0.316 to 63.1, and then calculating each center frequency f c Smooth spectral amplitude of the direct part of (2)The formula is as follows:
wherein,represented as the spectral size of the original direct part of BRIR. I H (f, f) c ) I indicates a center frequency f c With a bandwidth of b (f c ) The spectral size of the fourth order gammaratone filter bank can be expressed as:
wherein the method comprises the steps of
j is an imaginary unit, f is the frequency of the signal, f c Is the center frequency of the gammatine filter bank.
Specifically, the data prolongation predicts unknown data through known limited data, so as to achieve the purpose of data prolongation, and the prediction modes comprise forward prediction and backward prediction. Forward prediction means to predict the current time data using the current time history data, and backward prediction means to predict the current time data using the current time future data, which is related to the theory that is an autoregressive model.
Let s (n) be the signal sequence, which can be expressed as:
i.e. the current output of the model is the weighted sum of the current input and the past p outputs of the model, where G is the gain, u (n) is the noise, n is the index of the signal sequence, H (Z) is the systematic function of the AR model, A (Z) is the systematic function of the linear prediction error filter, p is the order of the model, k is the index of the model order, Z is the complex variable of the Z transform, the models given by equations (6) and (7) are called autoregressive models, the AR model is an all-pole model, where the coefficients A k Called prediction coefficients, and the predicted s (n) is expressed as:
wherein,is an estimate of s (n), a i For the forward linear predictor coefficients, i is an index of the forward linear predictor coefficients, and predicting or estimating the current value from the past value of s (n) is called forward linear prediction.
The backward prediction is to predict s (n-p) earlier in time from p future values s (n-p+1), …, s (n), and the specific formula is as follows:
wherein,is an estimate of s (n-p), c k For backward linear predictor coefficients, k is an index of the backward linear predictor coefficients, and predicting or estimating the current value from the future value of s (n) is called backward linear prediction.
In summary, the invention mainly uses the virtual auditory environment system platform to generate the subjective listening material with spatial information (the first half horizontal plane), and combines the bone conduction device with factors such as the external acoustic image, and the like to measure and analyze the external acoustic image experiment of the virtual acoustic source of the bone conduction device. For the amplitude spectrum method of smoothing BRIR signals using the gammatine filter bank, other spectral smoothing methods, such as Savitzky-Golay smoothing, five-point cubic smoothing, etc., may also be used. For methods that adjust the IC value of the BRIR signal to change the sound image externalization, the direct reverberation energy ratio (DRR) of the BRIR signal may also be adjusted to change the sound image externalization. According to the invention, the IC value of the BRIR signal is regulated to change the sound image externalization, if the sound image of a listener cannot be externalized under certain acoustic scenes, the aim of the sound image externalization can be achieved by regulating the IC value of the BRIR signal, and the reverberation part data of the BRIR signal is prolonged by carrying out data prolongation on early reflection of the BRIR signal, so that the acoustic information of a room is indirectly increased.
While embodiments of the invention have been shown and described, it will be understood by those skilled in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

Claims (10)

1. A method for virtual externalization of sound images based on bone conduction equipment, the method comprising:
s1, constructing virtual space audio by using a BRIR database;
s2, adjusting the constant loudness level of the stimulus presented by the bone conduction device in a frequency range;
s3, collecting the acoustic image externalization score of the subject;
s4, performing a subject subjective listening sound image externalization experiment of smoothing a frequency spectrum of a direct part of the BRIR signal by using bone conduction equipment in combination with BRIR signals with different lengths and a gamma filter bank;
s5, obtaining the length of the minimum BRIR signal which can not be used for externalizing the sound image of the subject and the azimuth angle of the BRIR signal smoothed by the gamma filter bank according to the result of the sound image externalization experiment, if the minimum length of the BRIR signal which can not be used for externalizing the sound image is full length or the sound image with certain azimuth angle can not be used for externalizing, continuing to execute the step S6, otherwise, executing the step S7;
s6: the data prolongation is carried out on the early reflection of the BRIR signal, so that the data of the reverberation part of the BRIR signal is prolonged, and the sound image externalization effect of the BRIR signal is changed;
s7: the IC value of the BRIR signal that renders the sound image unvariable is changed to externalize the sound image of the subject.
2. A method of bone conduction device based sound image virtualization as recited in claim 1, wherein the BRIR database is a BRIR database of the university of sari, and the BRIR database contains BRIR databases of four rooms.
3. The method for virtual externalization of sound images based on bone conduction devices according to claim 1, wherein the step S2 adopts an equal loudness matching method, and the stimulus presented by the air conduction earphone and the bone conduction device has the same loudness level in the measurement frequency range, and comprises the following specific steps:
s2.1, using an air conduction earphone to play stimulation, and controlling the stimulation sound pressure level at the moment to be 65dB SPL;
s2.2, alternately playing noise stimulation through bone conduction equipment and an air conduction earphone;
s2.3, the listener adjusts the amplification factor of the bone conduction device until the perceived loudness is matched with the loudness presented by the air conduction earphone.
4. The method for virtual externalization of sound image based on bone conduction device according to claim 1, wherein the experiment for virtual externalization of sound image specifically comprises: and analyzing the difference of the sound image externalization performance of the air conduction earphone and the bone conduction device and the multi-factor variance analysis, and predicting the performance problem of the bone conduction device in the aspect of sound image externalization by utilizing the analysis result.
5. The bone conduction device-based sound image virtualization method according to claim 1, wherein the sound image externalization score is evaluated according to a linear scale of perceived sound image externalization, and the evaluated class is classified into class 0, class 1, class 2, and class 3; wherein the level 1 indicates that the sound image is externalized and is located at the sound source; the level 2 indicates that the sound image is externalized, but not as far as the sound source; the sound image is externalized but very close to the bone conduction device; the level 0 indicates that the sound image is internalized.
6. The bone conduction device-based sound image virtualization method of claim 1, wherein the BRIR signal comprises 2.5, 5, 10, 20, 40, 80, 120, and 200 milliseconds;
the direct portion of the BRIR signal is the first 2.5 milliseconds;
the BRIR signal is early reflected in the interval of 2.5 milliseconds to 80 milliseconds, followed by late reverberation 80 milliseconds.
7. The bone conduction device-based sound image virtualization method according to claim 1, wherein the IC value is characterized by a binaural cross-correlation representing a correlation between left and right ear signals, the binaural cross-correlation being calculated by a normalized cross-correlation function, in particular expressed as follows:
where ρ (τ) is the normalized cross-correlation function, t 1 And t 2 Is the time scale of BRIR signal, x l (t) and x r (t) is at duration t 2 -t 1 I represents the left ear, r represents the right ear, τ represents the time delay between the left and right ear signals, the maximum peak of ρ (τ) is found together with the time delay τ, in order to ensure that the binaural time difference is in a reasonable range, the delay time is limited to between-1 ms and 1 ms;
the binaural signal is calculated as the maximum value of ρ (τ) as follows:
IC=max{ρ(τ)} (2)
the low-correlation binaural signal is obtained corresponding to a high externalization score, which is the distance.
8. The method for virtual externalization of sound images based on bone conduction apparatus according to claim 1, wherein the gamma filter bank smoothes the spectrum of the direct part of BRIR signal, different degrees of smoothing are achieved by using gamma filter banks having different bandwidth coefficients B, ranging from 0.316 to 63.1, and then each center frequency f is calculated c Smooth spectral amplitude of the direct part of (2)The formula is as follows:
wherein the method comprises the steps ofExpressed as the spectral size of the original direct part in BRIR, |h (f, f c ) I indicates a center frequency f c With a bandwidth of b (f c ) The spectral size of the fourth order gammatine filter bank, expressed as:
wherein the method comprises the steps of
j is an imaginary unit, f is the frequency of the signal, f c Is the center frequency of the gammatine filter bank.
9. The method for virtual externalization of sound images based on bone conduction equipment according to claim 1, wherein the data continuation is specifically:
the known limited data is used for predicting unknown data so as to achieve the purpose of data extension;
the prediction mode comprises forward prediction and backward prediction;
the forward prediction is to predict the current time by using the historical data of the current time;
the backward prediction is to predict the data of the current time using the future data of the current time.
10. The method for virtual externalization of sound images based on bone conduction equipment according to claim 9, wherein the forward prediction adopts an autoregressive model, specifically:
let s (n) be the signal sequence, expressed as:
i.e. the current output of the model is the weighted sum of the current input and the past p outputs of the model, where G is the gain, u (n) is the noise, n is the index of the signal sequence, H (Z) is the systematic function of the autoregressive model, A (Z) is the systematic function of the linear prediction error filter, p is the order of the model, k is the index of the model order, Z is the complex variable of the Z transform, the models given by equations (6) and (7) are called autoregressive models, which are all-pole models, where the coefficients a k Called prediction coefficients, and the predicted s (n) is expressed as:
wherein,is an estimate of s (n), a i For the forward linear predictor coefficients, i is an index of the forward linear predictor coefficients, and predicting or estimating the current value from the past value of s (n) is called forward linear prediction.
The backward prediction is to predict s (n-p) with earlier time from p future values s (n-p+1), …, s (n), and the specific formula is as follows:
wherein,is an estimate of s (n-p), c k For backward linear predictor coefficients, k is the backward lineThe index of the sexual predictor coefficients, the prediction or estimation of the current value by the future value of s (n) is called backward linear prediction.
CN202311329162.5A 2023-10-13 2023-10-13 Sound image virtual externalization method based on bone conduction equipment Pending CN117202001A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311329162.5A CN117202001A (en) 2023-10-13 2023-10-13 Sound image virtual externalization method based on bone conduction equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311329162.5A CN117202001A (en) 2023-10-13 2023-10-13 Sound image virtual externalization method based on bone conduction equipment

Publications (1)

Publication Number Publication Date
CN117202001A true CN117202001A (en) 2023-12-08

Family

ID=88990668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311329162.5A Pending CN117202001A (en) 2023-10-13 2023-10-13 Sound image virtual externalization method based on bone conduction equipment

Country Status (1)

Country Link
CN (1) CN117202001A (en)

Similar Documents

Publication Publication Date Title
US9918179B2 (en) Methods and devices for reproducing surround audio signals
US8270616B2 (en) Virtual surround for headphones and earbuds headphone externalization system
Middlebrooks Virtual localization improved by scaling nonindividualized external-ear transfer functions in frequency
US10142761B2 (en) Structural modeling of the head related impulse response
EP1816895B1 (en) Three-dimensional acoustic processor which uses linear predictive coefficients
US6421446B1 (en) Apparatus for creating 3D audio imaging over headphones using binaural synthesis including elevation
US9763020B2 (en) Virtual stereo synthesis method and apparatus
Marquardt et al. Interaural coherence preservation in multi-channel Wiener filtering-based noise reduction for binaural hearing aids
US20080137870A1 (en) Method And Device For Individualizing Hrtfs By Modeling
Ranjan et al. Natural listening over headphones in augmented reality using adaptive filtering techniques
Geronazzo et al. Enhancing vertical localization with image-guided selection of non-individual head-related transfer functions
WO2007045016A1 (en) Spatial audio simulation
US10652686B2 (en) Method of improving localization of surround sound
JP7208365B2 (en) Apparatus and method for adapting virtual 3D audio into a real room
CN113170271A (en) Method and apparatus for processing stereo signals
CN112956210B (en) Audio signal processing method and device based on equalization filter
Lee et al. A real-time audio system for adjusting the sweet spot to the listener's position
EP2822301B1 (en) Determination of individual HRTFs
Li et al. Modeling perceived externalization of a static, lateral sound image
Kurz et al. Prediction of the listening area based on the energy vector
KR20160136716A (en) A method and an apparatus for processing an audio signal
CN117202001A (en) Sound image virtual externalization method based on bone conduction equipment
Alonso-Martınez Improving Binaural Audio Techniques for Augmented Reality
Angel et al. On the design of canonical sound localization environments
Otani et al. Dynamic crosstalk cancellation for spatial audio reproduction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination