CN107820158A - A kind of three-dimensional audio generating means based on the response of head coherent pulse - Google Patents

A kind of three-dimensional audio generating means based on the response of head coherent pulse Download PDF

Info

Publication number
CN107820158A
CN107820158A CN201710551437.8A CN201710551437A CN107820158A CN 107820158 A CN107820158 A CN 107820158A CN 201710551437 A CN201710551437 A CN 201710551437A CN 107820158 A CN107820158 A CN 107820158A
Authority
CN
China
Prior art keywords
hrir
audio
parameters
correlation
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710551437.8A
Other languages
Chinese (zh)
Other versions
CN107820158B (en
Inventor
陈喆
殷福亮
张古强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201710551437.8A priority Critical patent/CN107820158B/en
Publication of CN107820158A publication Critical patent/CN107820158A/en
Application granted granted Critical
Publication of CN107820158B publication Critical patent/CN107820158B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups

Abstract

The invention discloses a kind of three-dimensional audio generation method based on the response of head coherent pulse, comprise the following steps:Tested human body physiological characteristics parameter is obtained, correlation analysis is carried out with head coherent pulse response HRIR;Retention relationship exceedes the human body physiological characteristics parameter of threshold coefficient;By finding the individual with minimum deflection sum in database, the personalized HRIR of current tested human body is obtained, row interpolation is then entered to HRIR using laplacian eigenmaps dimensionality reduction (LEM) algorithm;Using the early reflection reverberator based on mirror image model ImageModel, to what is received, including primary event in a room and the input audio in the early reflection path of secondary reflection carry out convolution with described HRIR, recover the azimuth information of audio;Reverberant audio is obtained using the late reverberation device based on delay of feedback network FeedbackDelayNetwork, FDN;Described audio azimuth information is added with reverberant audio, obtains the three-dimensional audio with reverberation effect, completes three-dimensional audio generation method.

Description

Three-dimensional audio generation device based on head-related impulse response
Technical Field
The invention relates to a three-dimensional audio frequency generating device based on head-related impulse response, which relates to a patent classification number H04 electric communication technology H04R loudspeaker, a microphone, a gramophone pickup or other acoustic-electromechanical sensors; a hearing aid; a loudspeaker system H04R5/00 stereo device.
Background
The three-dimensional audio system can reconstruct a three-dimensional sound field, recover sound source azimuth information and generate sound with azimuth sense, and is widely applied to systems such as a human-computer interaction system, a mobile terminal, a video conference system, digital entertainment, virtual reality and the like. The three-dimensional audio technology mainly comprises a wave field synthesis technology, an Ambisonics technology, an amplitude translation technology and a head-related impulse response technology. The Head-related impulse response (HRIR) technology respectively performs convolution processing on a mono sound source and Head-related impulse responses at the left ear and the right ear of a human body, and then performs reproduction through an earphone to generate a virtual sound source in a specific direction so as to realize three-dimensional sound field reconstruction. The HRIR is fourier transformed to a Head-related transfer function (HRTF). Generally, when a general head-related impulse response is used to reconstruct three-dimensional audio, problems of spatial positioning error, front-back confusion, and "intracranial positioning" are generated, and although the reconstructed three-dimensional audio has a good effect, the personalized head-related impulse response has a complex measurement process and an excessively large data amount and calculation amount, so that the application of the HRIR technology is limited. Therefore, the invention provides an HRIR individuation method which is used for improving the problems of space positioning error and front and back confusion, solving the problem of intracranial positioning by an artificial reverberation technology, and simultaneously providing an HRIR interpolation algorithm to improve the space resolution of HRIR data and enable a three-dimensional audio system to have better hearing effect.
In the prior art, a head-related transfer function personalization method based on parameter matching exists. Selecting 35 individuals with measurement parameters from a database, and taking 5 individuals as measurement objects and the other 30 individuals as reference objects; for each measurement object, finding a reference object closest to its HRIR from the reference object set, calculating the deviation of the parameters between the measurement object and the reference object, and taking the 4 parameters with the smallest deviation as the selected parameters; then, the 4 parameters of the individual are measured, and the 4 parameters of the individual to be measured are searched from the database
However, this method only uses the parameters of the pinna of the human body, and although the height angle can be effectively estimated, the accuracy of positioning the horizontal angle is not good due to the lack of the related parameters of the skull and the shoulders. In addition, the influence of the interaction between the parameters on the HRIR is not considered in the parameter selection process. The two angle parameters used by the method are difficult to measure in practical application.
In the prior art, a Local Linear Embedding (LLE) based spatial auditory reconstruction method also exists to implement compression and interpolation of an HRTF database. The method is based on manifold thought, firstly, local linear embedding algorithm is used for carrying out dimension reduction processing on HRTF data, then characteristics of the HRTF data are extracted in a low-dimensional space, clustering analysis is carried out on the data, and a characteristic HRTF is selected, but a non-characteristic HRTF is obtained through weighting interpolation of adjacent characteristic HRTFs. The method only needs to reserve the characteristic HRTF in the HRTF database, thereby better compressing the HRTF data. Experimental results show that the HRTF reconstruction effect of the method is superior to that of the main component analysis method.
Although the correlation among the HRTFs is considered, the method does not fully utilize the characteristics of the HRTF data, only considers the correlation among the HRTFs of the same individual, ignores the correlation among the HRTFs of different ears of the same individual and the correlation among the HRTFs of different individuals, and therefore has limited interpolation performance. In addition, the method cannot use the HRTF of the existing azimuth to obtain the HRTF of the unknown azimuth, so the spatial resolution of the HRTF data cannot be higher than that of the existing method.
In the prior art, an attempt based on a feedback delay network is also made, the scheme simulates multiple reflections of sound waves by using feedback delay, and a satisfactory reverberation effect can be obtained through reasonable delay parameters, attenuation parameters and the number of feedback paths. Experimental results show that a better reverberation effect can be realized through 12 delay channels and a proper feedback matrix.
The reverberator can generate better late reverberation, but has a poor simulation effect on early reflection of sound waves. In addition, the technology independently utilizes a feedback delay network, and if a better reverberation effect is realized, more delay channels are needed, so that the operation amount is large, and the structure is complex. The method does not consider the orientation characteristics of the primary reflection and the secondary reflection in the previous period, so the three-dimensional audio effect is poor.
Disclosure of Invention
In order to solve the above problems, the invention provides a three-dimensional audio generation method based on head-related impulse response, which mainly adopts the following technical scheme: firstly, considering that the process of measuring the HRIR of the listener is time-consuming and tedious, the general HRIR is used for replacing the measuring HRIR of the listener in the actual system, but the problem of technical positioning error, front-back confusion and the like can be caused by the non-personalized HRIR. Therefore, the invention researches the HRIR personalized method and provides an HRIR personalized algorithm based on human body parameter matching.
Second, it is impractical to measure the HRIRs at all azimuths, so the spatial resolution of the conventional HRIR database is limited. Therefore, the invention provides an HRIR interpolation algorithm based on a Laplace feature mapping and an Inter Subject Graph (ISG), and HRIR data with low resolution is used for obtaining HRIR with high spatial resolution, so that the spatial resolution of the HRIR can be improved.
Finally, the present invention proposes an artificial reverberation system that solves the "intracranial localization" problem in HRIR three-dimensional audio.
Drawings
In order to more clearly illustrate the embodiments or prior art solutions of the present invention, the drawings used in the embodiments or prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a block diagram of a system of the present invention
FIG. 2 shows the head and torso measurement parameters (a) and the auricle measurement parameters (b) of the present invention
FIG. 3 is a schematic diagram of the artificial reverberation algorithm of the present invention
FIG. 4 is a schematic diagram of the sound source and listener positions in a rectangular room according to the present invention
FIG. 5 is a block diagram of the early stage reflection reverberator of the present invention
FIG. 6 is a schematic diagram of a feedback delay reverberator according to the present invention
FIG. 7 is a front (left) and side (right) view of the position measured in an embodiment of the present invention
FIG. 8 is a schematic diagram of the subjective positioning experiment of altitude angle of the present invention
FIG. 9 is a schematic diagram of the horizontal angle subjective positioning experiment of the present invention
FIG. 10 is a schematic diagram of a front and back confusion experiment of the present invention
FIG. 11 is a schematic diagram of the present invention for comparing measured HRTF and reconstructed HRTF
FIG. 12 is a graph showing the early reflection impulse response of FIG. 12 according to the present invention (a), and the late reverberation impulse response (b)
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention:
as shown in fig. 1-10:
the HRIR individuation method of the invention firstly selects proper human body measurement parameters, then measures the relevant parameters of the tested individual, and searches a group of database HRIR closest to the parameters of the tested individual in the HRIR database by using the measurement parameters, wherein the group of HRIR is the individuation HRIR of the tested individual.
For HRIR personalization methods based on anthropometric parameters, if all parameters are used for parameter matching, the following problems exist: (1) The excessive parameters can obviously increase the calculation amount, and the measurement process of some parameters is complex; (2) The interaction between the parameters may degrade the performance of the model. Therefore, it is necessary to select the parameters efficiently.
Anthropometric parameter selection
The number of commonly used anthropometric parameters is 27, 17 of which are parameters related to the head and torso, and the other 10 of which are parameters related to the pinna, and the measured parameters are shown in fig. 2. Table 1 gives the details of these measured parameters as well as the mean and standard deviation of the individual parameters, where distance is in cm and angle is in degrees.
TABLE 1 measurement parameter related data
Parameter selection process
The parameter selection method of the invention is divided into two steps, in step 1, correlation analysis is firstly carried out on all 27 parameters (table 1) and HRIR, and the correlation coefficient calculation process is as follows:
|r xy l represents the degree of correlation between variables, if r xy &gt, 0, representing a positive correlation; if r xy &lt, 0, which represents a negative correlation. Usually | r xy If | is greater than 0.8, the two variables are considered to have strong linear correlation.
The measurement parameters with large correlation with HRIR are retained. After analysis, 18 parameters are reserved, namely chi 1 、 χ 3 、χ 6 、χ 8 ~χ 10 、χ 12 、χ 14 ~χ 17 、d 1 、d 3 ~d 8 . Then, the 18 parameters are analyzed for correlation between the parameters, and the analysis shows that some parameters have strong correlation, such as the parameter χ 3 Hexix- 6 The correlation coefficient between is 0.79, which means that one of the parameters can be removed. In this way, the 8 parameters χ are finally retained 1 、χ 3 、χ 12 、d 1 、d 3 、d 4 、d 5 、d 6
In step 2, the parameters are further selected, and 5 individual labels a with the measured parameters are selected from the database 1 ~a 5 Then search the database for the closest HRIR to the 5 individual HRIRs, labeled b 1 ~b 5 Then calculate a i And b i And 8, performing deviation on the parameters, and sequencing the results in an ascending order, and reserving the parameters with small deviation. After the processing of step 2, 7 parameters are finally selected, namely chi 1 、χ 3 、χ 12 、d 1 、 d 3 、d 5 、d 6
Personalization process
Measuring 7 measurement parameters x of the measured individual 1 ,χ 3 ,χ 12 ,d 1 ,d 3 ,d 5 ,d 6 Then calculating the sum Er of all parameter deviations total And finding the individuals in the database with the smallest sum of the deviations.
Wherein
Wherein N =7 represents the number of parameters, w i Denotes the weight, er, of the ith parameter i Represents the deviation, σ, of the ith parameter obtained by equation (4) 2 Representing the variance, p, of the measured parameter i And p j Representing the measured parameters of different individuals.
Since the pinna has the greatest effect on HRIR, the weight of the pinna-related parameter should be greater than the weight of the other parameters. In the invention, the weight value of the auricle parameter is 0.175, and the weight value of the skull parameter is 0.1.
HRIR interpolation algorithm
The measurement of HRIR is time-consuming and labor-consuming, and only data of limited spatial positions can be measured, so that the spatial resolution of the HRIR obtained by the measurement is not high.
In practical application, an interpolation algorithm can be used to improve the HRIR spatial resolution. The invention provides an HRIR interpolation method based on Laplacian Eigenmaps (LEM).
Laplace feature mapping [12] The method is a common nonlinear data dimension reduction method. For a high dimensional matrix X of dimension A N, a low dimensional matrix of dimension a N is obtained by Laplace eigenmapping, and a&lt, A. The laplacian eigen-mapping algorithm makes the correlated points in the original matrix as close as possible in the space after dimensionality reduction, so that a better interpolation effect can be generated.
The HRIR interpolation algorithm comprises the following specific steps:
building a data relationship graph
Firstly, a relational graph G (V, E) is constructed for the HRIR data matrix X, wherein V represents a set of all nodes, each node represents HRIR data, E represents a set formed by all edges, the weight of each edge represents the correlation among the data, and the larger the correlation is, the larger the weight of each edge is. In view of the prior knowledge of HRIR, the present invention constructs a relationship graph with an Inter Subject Graph (ISG) algorithm based on the following criteria:
correlation between HRIRs at the same orientation for different listeners (criterion 1): if x i And x j Is the same as the ear (e.g., isRight ear HRIR) and the same orientation, but different HRIRs of listeners, connect the two points.
Correlation between HRIRs of left and right ears due to ear symmetry (criterion 2): let x i And x j HRIR of different ears of the same listener, if x i Is x j One of the P nearest neighbors of (c) is connected.
Similarity between HRIRs with similar spatial orientation (criterion 3): let x i And x j HRIR of different orientations of the same ear for the same listener, if x j Is x i One of the surrounding 8 HRIRs, then they are connected.
Determining weights
Constructing the weight matrix W from the relationship graph, the magnitude of the weights between the connected HRIR data points can be determined by:
feature mapping
After the weight matrix W is obtained, a diagonal matrix D is defined whose element value on the diagonal (i, i) is equal to the sum of all the elements in the ith row of the weight matrix W, i.e., the sum of all the elements in the ith row of the weight matrix W
D ii =∑ j W ji (6)
Defining a matrix L:
L=D-W (7)
calculating the eigenvalue and eigenvector of the matrix L:
Ly=λDy (8)
the feature vector corresponding to the minimum a non-zero feature values is the result after dimension reduction, and a is the dimension after dimension reduction.
Interpolation
After data of a low-dimensional space is obtained, the HRIR is interpolated by the formula (9) to obtain the HRIR of a new direction:
wherein x is i The HRIRs representing the adjacent orientations,representing interpolated HRIR, weight w i The following relation is satisfied:
based on literature [13] By solving equation (11), a weight coefficient w is obtained i
In the formula, y i Value, y, in low dimensional space representing the desired interpolated azimuth HRIR j Represents and y i Adjacent dots. HRIR for arbitrary unknown orientations, its lower dimensional form y i Can be obtained by interpolation of other low-dimensional forms of HRIRs of the same individual at other orientations.
The weight coefficient is obtained by equation (11), and then substituted into equation (9), so that the interpolated high-dimensional HRIR can be obtained. Artificial reverberation
The functional block diagram of the artificial reverberator of the present invention is shown in FIG. 3. The algorithm contains two independent paths, namely an early reflection path and a late reverberation path.
The early reverberation obtained by the early reflection path and the input audio are convoluted with the personalized HRIR together, and the azimuth information of the audio is recovered; and adding the reverberation audio obtained by the late reverberation path and the audio with the azimuth and the early reverberation to obtain the three-dimensional audio with the reverberation effect.
Early stage reflection reverberator
The early stage reflection reverberator passes through a mirror model (ImageModel) [14] The primary and secondary reflections of sound in a room are modeled.
In order to simplify the model and reduce the calculation amount, only the reflection of the surrounding four walls is considered, and the reflection of the floor and the ceiling is ignored.
FIG. 4 shows the positions of the sound source, listener and reflection points in the front reflection model room, with point S representing the sound source, point L representing the listener, and points (1, 2,3, 4) on the four walls being the primary reflection points on the walls of the sound from the mirror model. The solid line indicates a propagation path of the acoustic wave between the reflection points, the single-dot chain line indicates a path from the sound source to the reflection points, and the double-dot chain line indicates a propagation path from the reflection points to the listener. These 4 reflection points correspond to the 4 channels of the early reflection reverberator, respectively.
FIG. 5 shows a model of a prior reflection reverberator that implements the reflection cycle depicted in FIG. 4. In FIG. 5 (a), the first reflection time delay of the ith wall is p i +n i Amplitude attenuation factor of f i o i For example, fig. 5 (b) depicts a primary reflection through a wall surface numbered 2. These variables are related to the propagation time t of the sound wave in the air, which is related to the propagation distance (i.e. the distance between the mirror image sound source and the listener), and the propagation distance δ between the corresponding mirror image sound source of the ith wall and the listener is assumed to be i Then the corresponding reflection delay and amplitude attenuation satisfy:
p i +n i =δ i /c (12)
f i o i =1/δ i (13)
where c represents the propagation velocity of sound in air (340 m/s).
Similarly, the sound wave passes through the secondary reflection between the wall surface i and the wall surface j, and the corresponding time delay is p i +k ji +n j Amplitude attenuation of f i e ji o i An example of secondary reflection through the wall 3 and wall 4 is given in fig. 5 (b), where the secondary reflection delay and amplitude attenuation are satisfied:
p i +k ji +n j =δ ij /c (14)
f i e ji o j =1/δ ij (15)
wherein, delta ij Indicating the secondary reflection of sound wavesThe propagation distance between the mirror source and the listener. Filter g in the figure i (z) (i =1,2,3,4) is a low-pass filter, representing the absorption effect of the wall surface on sound waves.
According to the equations (12) and (14), the corresponding delay parameter n can be obtained by solving the following equation i ,p j And k ij (i,j∈{1,2,3,4}):
Wherein, the matrix M 1 Is 16 x 20, which describes the relationship between delay and path in the reverberator, vector B 1 The vector K corresponds to each delay factor corresponding to the distance between the different mirror image sources and the listener. K is obtained by solving equation (17):
in the formula (I), the compound is shown in the specification,representation matrix M 1 Of (2), matrix M 1 Is a constant matrix independent of room size. So M 1 The pseudo-inverse matrix is also a constant, and the pseudo-inverse matrix can be obtained first to save the operation amount.
By a similar method, the attenuation parameter o can be determined i ,f j And e ij (i, j ∈ {1,2,3,4 }). Taking the logarithm of equations (13) and (15), and converting the multiplication into an addition:
log(f i o i )=log(1/δ i ) (18)
log(f i )+log(o i )=-log(δ i ) (19)
log(f i e ji o j )=log(1/δ ij ) (20)
log(f i )+log(e ji )+log(o j )=-log(δ ij ) (21)
then, as well as solving for the delay coefficient, the attenuation coefficient is obtained by solving the following equation (22):
wherein, the variable v is introduced to solve the equation and increase the stability of the system, and in the invention, v is set to 1/29.
Late reverberation
A second Feedback Delay Network (FDN) is used to implement the late reverberator, and the structure of the late reverberator is shown in fig. 6. The larger the number of feedback channels of the FDN is, the better the performance of the reverberator is, but the larger the amount of calculation is required. In order to take the performance and the calculated amount into consideration, 8 feedback channels are selected. The feedback matrix A can affect the effect of the feedback network, and the feedback matrix A selected by the invention is a Hadamard matrix of 8 orders. The hadamard matrix of order 4 is as follows:
the hadamard matrix a of order 8 can be obtained by:
selecting a suitable gain parameter b n (n =1,2, \8230; 8) and c n (n =1,2, \ 8230; 8), and a reasonable delay parameter m n (n =1,2, \ 8230; 8), a satisfactory reverberation effect can be produced, wherein the delay parameter m n The selection principle of (2) requires each m n Are relatively prime. According to Schroeder et al work it is known [15] And the sum M of the delay parameters meets the following conditions:
M≥0.15t 60 f s (25)
wherein, t 60 Representing reverberation time,f s Representing the sampling rate. For example, when the sampling rate is 50KHz, the reverberation time t 60 When the time is 1s, the time delay sum M is more than or equal to 7500. By selecting a reasonable delay time, a better reverberation effect can be generated.
The reverberation time t can be obtained by the Sabin formula 60
t 60 =0.161V/A=0.161V/aS (26)
Wherein A is sound absorption quantity, a is sound absorption coefficient, S is sound absorption area, and V is reverberation chamber volume.
Absorption filter h n (z) is a low pass filter used to simulate the wall surface's absorption coefficient for sound waves.
Database selection
The selected database in the experiment is CIPIC database, which is head-related impulse response database measured by Davis calibration interaction laboratory of California university [16] The present invention uses the HRIR data and anthropometric data provided by the database to perform experiments. The CIPIC database measured the head-related impulse responses of 45 listeners at 25 horizontal angles and 50 elevation angles (1250 spatial orientations total), and contained 27 anthropometric parameters for 37 listeners.
Fig. 7 depicts the placement of the speakers and the listener's position when the CIPIC database measures HRIR, measuring horizontal angles theta of-80 deg., -65 deg., -55 deg., and-45 deg. to 45 deg., in increments of 5 deg., 55 deg., 65 deg., 80 deg., for a total of 25 horizontal angles. Height measuring angleIn the range of-45 deg. to +230.625 deg., in increments of 5.625 deg., for a total of 50 elevation angles.
HRTF personalized positioning experiment result
In order to verify the effectiveness of the personalization method of the present invention, a series of subjective positioning experiments were performed. 6 individuals with normal hearing were enrolled in the localization experiment (3 men and 3 women), all aged between 20-30 years and all had relevant experience.
There are 3 types of HRIR in the experiments, personalized HRIR obtained by the method of the invention, HRIR obtained by the method proposed by Liu et al, and HRIR of general KEMAR, respectively. The HRIRs of the three methods were compared to assess the effectiveness of the method of the present invention.
To simplify the experiment, the experimental test had 18 azimuths, of which 6 horizontal angles were: (θ = -45, -22.5,0,22.5,45, 67.5), 6 height angles (θ =0,) Another 6 elevation angles were used to test the front-to-back confusion ratio (theta =0,). The input sound source is 10s long noise, and is convoluted with the three HRIRs respectively to obtain synthetic sounds of all directions, and the synthetic sounds are output to the ears of the testers through an AKG-k374 earphone. The tester judges the direction of the sound and makes corresponding record. The sound of each direction needs to be played repeatedly 6 times, and the playing sequence of the sounds of different directions is random.
After the experiment was completed, the subjective test data was analyzed, and the analysis results are shown in fig. 8, 9, and 10. Fig. 8 shows the results of listener positioning experiment for 6 azimuth angles of the midplane, and it can be seen from fig. 8 that the general KEMARHRIR has the worst positioning effect, while the personalized HRIR of the present invention has the smallest positioning error. Fig. 9 shows the results of the horizontal angular positioning experiment, and it can be seen from fig. 9 that the personalized HRIR of the method of the present invention has better horizontal positioning accuracy than the other two comparative HRIRs. Fig. 10 shows the result of the HRIR front-back aliasing of the present invention, and it can be seen from fig. 10 that the HRIR generated three-dimensional audio front-back aliasing rate of the method of the present invention is lower than that of the other two methods.
The result of subjective positioning experiment shows that the HRIR of the personalized method has better positioning effect than that of the personalized method of Liu and the like, and the front confusion and the back confusion are further improved. In addition, compared with the method of Liu et al, the parameters selected by the method of the invention are distance parameters and do not contain angle parameters which are difficult to obtain.
HRIR interpolation algorithm performance analysis
In the experimental process, HRIR data in the CIPIC database are divided into two sets, namely a training set and a test set, the resolution of the training set is smaller than that of the original data set, the interval of horizontal angles is selected to be 20 ° (namely 9 horizontal angles), the resolution of an elevation angle is 22.5 ° (namely 14 elevation angles), and the number of individuals is 45, so that the size of the training set is 2 × 9 × 14 × 45=11340. With this spatial resolution, the HRIRs for the horizontal angle 60 ° and-60 ° orientations would be included, and since the HRIRs for these two angles are not measured by the CIPIC database, they are replaced with HRIRs for the 65 ° and-65 ° orientations. HRIRs for the remaining unselected azimuths of the database are used to construct a test set. In the experiment, in order to better show the interpolation effect, the HRIR is subjected to Fourier transform to obtain an HRTF, and the HRTF is compared.
Fig. 11 gives a comparison of the interpolated reconstructed HRTF with the measured HRTF. As can be seen from FIG. 11, the difference between the HRTF reconstructed by the method of the present invention and the measured HRTF is very small, which shows that the interpolation effect is good.
Analysis of artificial reverberation performance
In order to improve the "intracranial localization" phenomenon present in HRIR-based three-dimensional audio systems, the present invention implements an artificial reverberation system.
The early reflection and late reverberation impulse response waveforms obtained by the artificial reverberator are shown in fig. 12. In order to evaluate the performance of the artificial reverberator, 6 individuals (3 men and 3 women) with normal hearing were selected for the experiment and subjected to subjective evaluation scoring, the subjective scoring is 5 points at the highest, and the scoring results are shown in table 2. Through subjective test, all participants consider that after artificial reverberation processing, three-dimensional audio sounds more natural and comfortable and has obvious orientation.
TABLE 2 subjective test Scoring results
Due to the presence of artificial reverberation, the listener perceives the three-dimensional audio heard outside the head, indicating that the artificial reverberation system effectively addresses the "intracranial localization" phenomenon.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered as the technical solutions and the inventive concepts of the present invention within the technical scope of the present invention.

Claims (5)

1. A three-dimensional audio generation method based on head-related impulse response is characterized by comprising the following steps:
obtaining the measured human body physiological characteristic parameters, and carrying out correlation analysis on the parameters and the head-related impulse response HRIR; human physiological characteristic parameters with the correlation exceeding a threshold coefficient are reserved; obtaining the personalized HRIR of the current detected human body by searching individuals with the minimum deviation sum in a database, and then interpolating the HRIR by adopting a Laplace feature mapping dimension reduction (LEM) algorithm;
convolving the received input audio, including the early reflection paths of the primary and secondary reflections in the room, with said HRIR using a mirror model ImageModel based early reflection reverberator to recover the azimuth information of the audio;
obtaining a reverberation audio by adopting a Feedback Delay Network (FDN) -based late reverberator;
and adding the audio azimuth information and the reverberation audio to obtain a three-dimensional audio with a reverberation effect, thereby completing the three-dimensional audio generation method.
2. The method of claim 1, wherein the correlation coefficient is obtained by the following steps:
the correlation between each human body physiological characteristic parameter and HRIR is respectively solved, and the formula is as follows
Wherein, X i Is the measured value of the physiological characteristic parameters of the human body of the ith individual, i =1,2, \8230;, M and M are the number of measured people,the theoretical average value of the human physiological characteristic parameter is taken as the parameter; y is i For the measurement of the ith HRIR,is the theoretical average of the HRIR; | r xy I represents the degree of correlation between variables, if r xy &gt, 0, representing positive correlation; if r is xy &lt, 0, this represents a negative correlation, when | r xy When the | is more than 0.8, the two variables are considered to have strong linear correlation and serve as correlation parameters.
3. The method for generating three-dimensional audio based on head-related impulse response according to claim 1, further characterized by the step of parameter personalization after obtaining the correlation parameters:
-calculating the sum of all parameter deviations based on the obtained correlation parameters, finding the individual with the smallest sum of deviations in the database, the parameter deviations sum being calculated as follows
Wherein
Wherein N =7 represents the number of parameters, w i Denotes the weight, er, of the ith parameter i Denotes the deviation, σ, of the i-th parameter obtained by equation (4) 2 Representing the variance, p, of the measured parameter i And p j Representing measured parameters of different individuals;
and then, interpolating the HRIR by adopting a Laplace eigenmap dimension reduction (LEM) algorithm, wherein the method comprises the following specific steps of:
constructing a relational graph G, G (V, E) for the matrix X of the HRIR data, wherein V represents a set of all nodes, each node represents one piece of HRIR data, E represents a set formed by all edges, the weight of each edge represents the correlation among the data, and the larger the correlation is, the larger the weight of each edge is;
constructing the weight matrix W from the relationship graph, the magnitude of the weights between the connected HRIR data points can be determined by:
after the weight matrix W is obtained, a diagonal matrix D is defined, whose element value on the diagonal (i, i) is equal to the sum of all elements in the ith row of the weight matrix W, i.e. the sum of all elements in the ith row of the weight matrix W
D ii =∑ j W ji
Defining a matrix L:
L=D-W
calculating an eigenvalue lambda and an eigenvector v of the matrix L:
Lv=λDv
the eigenvector corresponding to the minimum a nonzero eigenvalue is the result after dimension reduction, and a is the dimension after dimension reduction to obtain low-dimensional space data;
the HRIR is differed by the following formula to obtain the HRIR of the new orientation,
wherein x is i The HRIRs representing the adjacent orientations,representing the interpolated HRIR, weight w i The following relation is satisfied:
obtaining the weight coefficient w by solving the following formula i
In the formula, y i Value, y, in low dimensional space representing the desired interpolated azimuth HRIR j Is represented by i Adjacent points; HRIR for arbitrary unknown orientations, its lower dimensional form y i And (4) obtaining the direction through low-dimensional interpolation of other directions HRIR of the same individual.
4. The method of claim 1, further characterized in that the early reverberator model is specifically as follows:
setting the first reflection time delay of the ith wall as p i +n i Amplitude attenuation factor of f i o i For example, fig. 5 (b) depicts a primary reflection through a wall surface numbered 2. These variables are related to the propagation time t of sound wave in air, and the propagation time t is related to the propagation distance (i.e. the distance between the mirror image sound source and the listener), and the propagation distance between the corresponding mirror image sound source of the ith wall and the listener is assumed to be delta i Then the corresponding reflection delay and amplitude attenuation satisfy:
p i +n i =δ i /c
f i o i =1/δ i
wherein c represents the propagation speed of sound in air (340 m/s);
similarly, sound waves pass through the wallA secondary reflection between surface i and wall surface j with a corresponding time delay p i +k ji +n j Amplitude attenuation of f i e ji o i And the secondary reflection delay and the amplitude attenuation meet the following conditions:
p i +k ji +n j =δ ij /c
f i e ji o j =1/δ ij
wherein, delta ij Representing the propagation distance of the sound wave between the source of the secondary mirror image and the listener. Filter g in the figure i (z) (i =1,2,3,4) is a low-pass filter representing the absorption effect of the wall surface on sound waves.
5. The method of claim 1, further characterized in that the late reverberation process is as follows:
the feedback matrix A selected by the invention is an 8-order Hadamard matrix, and the 4-order Hadamard matrix is as follows:
the hadamard matrix a of order 8 can be obtained by:
selecting a gain parameter b n (n =1,2, \8230; 8) and c n (n =1,2, \82308), and a delay parameter m n (n =1,2, \ 8230; 8), wherein the delay parameter m n The selection principle of (2) requires each m n Are relatively prime. According to Schroeder et al work it is known [15] And the sum M of the delay parameters meets the following condition:
M≥0.15t 60 f s
wherein, t 60 Denotes the reverberation time, f s Representing the sampling rate. For example, when the sampling rate is 50KHz, reverberationTime t 60 When the time is 1s, the time delay sum M is more than or equal to 7500, and the reverberation time t can be obtained by the Sabin formula 60
t 60 =0.161V/A=0.161V/aS
Wherein A is sound absorption quantity, a is sound absorption coefficient, S is sound absorption area, V is volume of reverberation chamber, and absorption filter h n (z) is a low pass filter used to simulate the wall surface's absorption coefficient for sound waves.
CN201710551437.8A 2017-07-07 2017-07-07 Three-dimensional audio generation device based on head-related impulse response Active CN107820158B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710551437.8A CN107820158B (en) 2017-07-07 2017-07-07 Three-dimensional audio generation device based on head-related impulse response

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710551437.8A CN107820158B (en) 2017-07-07 2017-07-07 Three-dimensional audio generation device based on head-related impulse response

Publications (2)

Publication Number Publication Date
CN107820158A true CN107820158A (en) 2018-03-20
CN107820158B CN107820158B (en) 2020-09-29

Family

ID=61601515

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710551437.8A Active CN107820158B (en) 2017-07-07 2017-07-07 Three-dimensional audio generation device based on head-related impulse response

Country Status (1)

Country Link
CN (1) CN107820158B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108805104A (en) * 2018-06-29 2018-11-13 中国航空无线电电子研究所 Personalized HRTF obtains system
CN110751281A (en) * 2019-10-18 2020-02-04 武汉大学 Head-related transfer function modeling method based on convolution self-encoder
CN111031467A (en) * 2019-12-27 2020-04-17 中航华东光电(上海)有限公司 Method for enhancing front and back directions of hrir
CN111107482A (en) * 2018-10-25 2020-05-05 创新科技有限公司 System and method for modifying room characteristics for spatial audio rendering through headphones
CN111949846A (en) * 2020-08-13 2020-11-17 中航华东光电(上海)有限公司 HRTF personalization method based on principal component analysis and sparse representation
CN112188382A (en) * 2020-09-10 2021-01-05 江汉大学 Sound signal processing method, device, equipment and storage medium
CN117268796A (en) * 2023-11-16 2023-12-22 天津大学 Vehicle fault acoustic event detection method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09261792A (en) * 1996-03-19 1997-10-03 Nippon Telegr & Teleph Corp <Ntt> Sound receiving method and its device
US20030007648A1 (en) * 2001-04-27 2003-01-09 Christopher Currell Virtual audio system and techniques
CN102790931A (en) * 2011-05-20 2012-11-21 中国科学院声学研究所 Distance sense synthetic method in three-dimensional sound field synthesis
CN104408040A (en) * 2014-09-26 2015-03-11 大连理工大学 Head related function three-dimensional data compression method and system
CN105792090A (en) * 2016-04-27 2016-07-20 华为技术有限公司 Method and device of increasing reverberation
CN106162499A (en) * 2016-07-04 2016-11-23 大连理工大学 The personalized method of a kind of related transfer function and system
CN106231528A (en) * 2016-08-04 2016-12-14 武汉大学 Personalized head related transfer function based on stagewise multiple linear regression generates system and method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09261792A (en) * 1996-03-19 1997-10-03 Nippon Telegr & Teleph Corp <Ntt> Sound receiving method and its device
US20030007648A1 (en) * 2001-04-27 2003-01-09 Christopher Currell Virtual audio system and techniques
CN102790931A (en) * 2011-05-20 2012-11-21 中国科学院声学研究所 Distance sense synthetic method in three-dimensional sound field synthesis
CN104408040A (en) * 2014-09-26 2015-03-11 大连理工大学 Head related function three-dimensional data compression method and system
CN105792090A (en) * 2016-04-27 2016-07-20 华为技术有限公司 Method and device of increasing reverberation
CN106162499A (en) * 2016-07-04 2016-11-23 大连理工大学 The personalized method of a kind of related transfer function and system
CN106231528A (en) * 2016-08-04 2016-12-14 武汉大学 Personalized head related transfer function based on stagewise multiple linear regression generates system and method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BELKIN M, NIYOGI P: "Laplacian eigenmaps and spectral techniques for embedding and clustering", 《INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS.MIT PRESS》 *
汪林,殷福亮,陈喆: "3D声场合成中近似个性头相关传递函数的主观选择方法", 《信号处理》 *
黄青华,李琳,赖士村: "基于RBF神经网络的头相关传输函数的个性化建模方法", 《上海大学学报(自然科学版)》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108805104A (en) * 2018-06-29 2018-11-13 中国航空无线电电子研究所 Personalized HRTF obtains system
CN108805104B (en) * 2018-06-29 2022-03-08 中国航空无线电电子研究所 Personalized HRTF acquisition system
CN111107482A (en) * 2018-10-25 2020-05-05 创新科技有限公司 System and method for modifying room characteristics for spatial audio rendering through headphones
CN111107482B (en) * 2018-10-25 2023-08-29 创新科技有限公司 System and method for modifying room characteristics for spatial audio presentation via headphones
CN110751281A (en) * 2019-10-18 2020-02-04 武汉大学 Head-related transfer function modeling method based on convolution self-encoder
CN110751281B (en) * 2019-10-18 2022-04-15 武汉大学 Head-related transfer function modeling method based on convolution self-encoder
CN111031467A (en) * 2019-12-27 2020-04-17 中航华东光电(上海)有限公司 Method for enhancing front and back directions of hrir
CN111949846A (en) * 2020-08-13 2020-11-17 中航华东光电(上海)有限公司 HRTF personalization method based on principal component analysis and sparse representation
CN112188382A (en) * 2020-09-10 2021-01-05 江汉大学 Sound signal processing method, device, equipment and storage medium
CN112188382B (en) * 2020-09-10 2021-11-09 江汉大学 Sound signal processing method, device, equipment and storage medium
CN117268796A (en) * 2023-11-16 2023-12-22 天津大学 Vehicle fault acoustic event detection method
CN117268796B (en) * 2023-11-16 2024-01-26 天津大学 Vehicle fault acoustic event detection method

Also Published As

Publication number Publication date
CN107820158B (en) 2020-09-29

Similar Documents

Publication Publication Date Title
CN107820158B (en) Three-dimensional audio generation device based on head-related impulse response
TW201939973A (en) Method for generating customized spatial audio with head tracking
US5500900A (en) Methods and apparatus for producing directional sound
Zhong et al. Head-related transfer functions and virtual auditory display
Schönstein et al. HRTF selection for binaural synthesis from a database using morphological parameters
CN108596016B (en) Personalized head-related transfer function modeling method based on deep neural network
US10652686B2 (en) Method of improving localization of surround sound
Akeroyd et al. The binaural performance of a cross-talk cancellation system with matched or mismatched setup and playback acoustics
Salvador et al. Design theory for binaural synthesis: Combining microphone array recordings and head-related transfer function datasets
Pollack et al. Perspective chapter: Modern acquisition of personalised head-related transfer functions–an overview
Barumerli et al. Round Robin Comparison of Inter-Laboratory HRTF Measurements–Assessment with an auditory model for elevation
Keyrouz et al. Binaural source localization and spatial audio reproduction for telepresence applications
Iida et al. Generation of the amplitude spectra of the individual head-related transfer functions in the upper median plane based on the anthropometry of the listener’s pinnae
Iida et al. Estimation of the category of notch frequency bins of the individual head-related transfer functions using the anthropometry of the listener’s pinnae
Zhang et al. Distance-dependent modeling of head-related transfer functions
Adams et al. State-space synthesis of virtual auditory space
Barumerli et al. Localization in elevation with non-individual head-related transfer functions: comparing predictions of two auditory models
Georgiou et al. Robust maximum likelihood source localization: The case for sub-Gaussian versus Gaussian
Geronazzo et al. Evaluating vertical localization performance of 3d sound rendering models with a perceptual metric
CN109068262A (en) A kind of acoustic image personalization replay method and device based on loudspeaker
Puomio et al. Sound rendering with early reflections extracted from a measured spatial room impulse response
Hsu et al. Model-matching principle applied to the design of an array-based all-neural binaural rendering system for audio telepresence
CN110166927B (en) Virtual sound image reconstruction method based on positioning correction
Klunk Spatial Evaluation of Cross-Talk Cancellation Performance Utilizing In-Situ Recorded BRTFs
Lee Position-dependent crosstalk cancellation using space partitioning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant