CN115412808A - Method and system for improving virtual auditory reproduction based on personalized head-related transfer function - Google Patents
Method and system for improving virtual auditory reproduction based on personalized head-related transfer function Download PDFInfo
- Publication number
- CN115412808A CN115412808A CN202211077500.6A CN202211077500A CN115412808A CN 115412808 A CN115412808 A CN 115412808A CN 202211077500 A CN202211077500 A CN 202211077500A CN 115412808 A CN115412808 A CN 115412808A
- Authority
- CN
- China
- Prior art keywords
- hrtf
- head
- personalized
- virtual
- parameters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 238000012546 transfer Methods 0.000 title claims abstract description 19
- 238000012360 testing method Methods 0.000 claims abstract description 21
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 14
- 238000005516 engineering process Methods 0.000 claims abstract description 14
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 14
- 238000013528 artificial neural network Methods 0.000 claims abstract description 8
- 210000005069 ears Anatomy 0.000 claims abstract description 7
- 238000005094 computer simulation Methods 0.000 claims abstract description 5
- 238000009877 rendering Methods 0.000 claims abstract description 5
- 238000001514 detection method Methods 0.000 claims description 17
- 230000005284 excitation Effects 0.000 claims description 16
- 239000013598 vector Substances 0.000 claims description 13
- 230000005236 sound signal Effects 0.000 claims description 11
- 238000012549 training Methods 0.000 claims description 11
- 230000002708 enhancing effect Effects 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000012896 Statistical algorithm Methods 0.000 claims description 5
- 238000006073 displacement reaction Methods 0.000 claims description 4
- 230000008030 elimination Effects 0.000 claims description 4
- 238000003379 elimination reaction Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 4
- 230000001360 synchronised effect Effects 0.000 claims description 4
- 238000010219 correlation analysis Methods 0.000 claims description 3
- 230000000306 recurrent effect Effects 0.000 claims description 3
- 238000007788 roughening Methods 0.000 claims description 3
- 238000004088 simulation Methods 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 claims description 2
- 238000012216 screening Methods 0.000 claims description 2
- 238000000638 solvent extraction Methods 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 8
- 230000003993 interaction Effects 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 20
- 210000002569 neuron Anatomy 0.000 description 19
- 230000003595 spectral effect Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 239000011159 matrix material Substances 0.000 description 6
- 238000000354 decomposition reaction Methods 0.000 description 5
- 230000008447 perception Effects 0.000 description 5
- 238000010276 construction Methods 0.000 description 3
- 238000009499 grossing Methods 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- AOQBFUJPFAJULO-UHFFFAOYSA-N 2-(4-isothiocyanatophenyl)isoindole-1-carbonitrile Chemical compound C1=CC(N=C=S)=CC=C1N1C(C#N)=C2C=CC=CC2=C1 AOQBFUJPFAJULO-UHFFFAOYSA-N 0.000 description 1
- 241000486463 Eugraphe sigma Species 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000004886 head movement Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011017 operating method Methods 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R5/00—Stereophonic arrangements
- H04R5/033—Headphones for stereophonic communication
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/30—Image reproducers
- H04N13/332—Displays for viewing with the aid of special glasses or head-mounted displays [HMD]
- H04N13/344—Displays for viewing with the aid of special glasses or head-mounted displays [HMD] with head-mounted left-right displays
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R29/00—Monitoring arrangements; Testing arrangements
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Otolaryngology (AREA)
- Stereophonic System (AREA)
Abstract
The invention discloses a method and a system for improving virtual auditory reproduction based on a personalized head-related transfer function, wherein the method comprises the following steps: establishing a high-precision Head-Related Transfer Function (HRTF) database, and extracting high-dimensional features of the HRTF; collecting and preferably selecting human characteristic parameters; and customizing the personalized HRTF by using the generalized regression neural network. The system consists of a 3D auditory display module, an audio synthesis module and a test module, wherein the 3D auditory display module visualizes and stores the personalized time difference between two ears, the intensity difference between two ears and the HRTF; the audio synthesis module selects different types, frequencies and duration of audio frequencies from an audio library to be convolved with the personalized HRTF, and synthesizes sound source signals of different spatial directions; the testing module is embedded into a computer simulation technology and a 3D rendering technology, and an operator estimates the personalized HRTF virtual auditory playback effect in a diversified manner through man-machine interaction.
Description
Technical Field
The invention relates to the technical field of virtual auditory display, in particular to a method and a system for establishing a personalized Head-Related Transfer Function (HRTF) model suitable for a binaural earphone device and generating a model matched with a user to improve virtual auditory playback.
Background
Human hearing includes perception of spatial attributes of sound in addition to perception of timbre, loudness, pitch, and duration of sound. The perception of spatial properties by the auditory system mainly depends on information such as binaural time difference (ITD), binaural intensity difference (IID), and spectral factors, which can be uniformly expressed by HRTFs.
Humans can construct virtual auditory spaces using sound reproduction techniques, i.e., to allow listeners to generate specific spatial perception information. With the development of artificial intelligence audio-visual technology, the construction of immersive virtual auditory space gradually becomes a key component for realizing realistic virtual reality feeling. The sound reproduction technology can be divided into two categories according to different sound reproduction devices: the first is multi-channel loudspeaker playback, involving acoustic holographic playback, spherical harmonic decomposition, wave field synthesis, etc. However, from the practical application point of view, although this method can accurately construct a sound field in a certain area, a large number of linear array speakers need to be arranged, and the configuration of the speakers is strictly required, so the development of this technology is limited. The second category is dual channel headphone playback, mainly involving binaural picking and virtual auditory playback. The key of the binaural pickup technology is the process of picking up binaural signals by using a microphone, and the technical core of virtual auditory reproduction is the construction of an HRTF. Both aims are to accurately reconstruct binaural sound pressure, rather than reconstructing a physical sound field in a spatial region. The method has the advantages that the three-dimensional sound effect can be generated only by a pair of double-channel earphones, and the method quickly becomes a mainstream mode for constructing the virtual space due to convenience and practicability.
HRTFs rely heavily on human characteristic parameters related to sound reflection, diffraction and dispersion, which are unique to everyone. Sound source perception errors can easily arise if a common HRTF is applied to all individuals, for example: front-back confusion, up-down confusion, angular deviation, head-in-head effects, etc., so constructing a personalized HRTF is the key to constructing an immersive virtual auditory space.
At present, methods for obtaining personalized HRTFs mainly include: acoustic measurement, numerical calculation, and customization of human body features, but each method has limitations, such as: the acoustic measurement method has the defects of long acquisition time, expensive acquisition equipment and high acquisition environment requirement; the numerical calculation method requires a specific device (MRI or CT, etc.) to acquire the head model; most of human body characteristic customization methods are based on the problems that the precision of personalized HRTFs obtained by a public database is low and the like, so that the methods cannot be well applied to practice.
Disclosure of Invention
The invention provides a method and a system for improving virtual auditory sense replay based on an individualized head related transfer function, the invention quickly customizes a full-space individualized HRTF based on human body characteristic parameters at low cost, and the method is mainly divided into five steps of establishing a high-precision HRTF database, extracting HRTF high-dimensional characteristics, acquiring human body characteristic parameters, optimizing the human body characteristic parameters and establishing the individualized HRTF; the system consists of a 3D auditory display module, an audio synthesis module and a test module, and can evaluate the virtual auditory playback effect of the personalized HRTF in a diversified and accurate manner; the invention can effectively improve the effect of the virtual auditory reproduction system, has good application prospect in the application of virtual reality and the like, and is described in detail as follows:
a method for enhancing virtual auditory reproduction based on a personalized head-related transfer function, the method comprising:
establishing a high-precision HRTF database, and extracting high-dimensional characteristics of the HRTF;
collecting human body characteristic parameters and preferably selecting the human body characteristic parameters;
and customizing the personalized HRTF by using the generalized regression neural network to obtain the optimal auditory reproduction result.
Wherein, the step of establishing the high-precision HRTF database comprises the following steps:
(1) Acquiring 48 tested head-neck three-dimensional geometric data by using a 3D laser scanner, and performing three-dimensional reconstruction;
(2) Importing the head three-dimensional geometric model into Magics software for repairing, wherein the repairing comprises the following steps: repairing holes, detecting holes and roughening;
(3) The spatial position of the head model is uniformly calibrated by setting a reference coordinate system, the midpoint of a connecting line of two ears of the head is used as a coordinate origin, the positive direction of the nose tip is used as the positive direction of an X axis, the direction of the right ear is used as the positive direction of a Y axis, and the upper part of the head is used as the positive direction of a Z axis;
(4) A boundary meta-simulation calculation model is established by adopting multi-physics simulation software COMSOL, the pinna part which has the largest influence on the HRTF is divided into an independent solving area during grid division, and the rest part is divided into another solving area.
Further, the collecting human body characteristic parameters, preferably the human body characteristic parameters are as follows:
measuring by using electronic measuring tool Magics software to obtain human body characteristic parameters of 10 head parameters and 20 ear parameters of each tested object; and screening 10 human body characteristic parameters by combining correlation analysis and recursive characteristic elimination.
Wherein the method comprises the following steps: the training of the generalized recurrent neural network specifically comprises the following steps:
input X is a 12-dimensional vector, and parameters are measured from 10 individuals (X) 1 ,X 2 ,X 3 ,X 4 ,x 5 ,d 1 ,d 4 ,d 6 ,d 8 ,d 12 ) Azimuth theta and elevation angleThe output Y is a 10-dimensional vector consisting of 10 HRTF high-dimensional features (a) 1 ,a 2 ,...,a 10 ) And (4) forming.
Further, when the grid is divided:
the auricle part follows the highest standard of analyzing 1 wavelength by 6 grids, namely, the auricle model is uniformly divided by adopting a 1/6 wavelength division standard; the rest part follows 4 grid analysis division standards of 1 wavelength, namely, the other part is uniformly divided by adopting the 1/4 wavelength division standard.
Wherein, the 10 human body characteristic parameters are as follows:
head width, head height, head depth, amount of displacement under the pinna, amount of displacement behind the pinna, pinna height, deltoid height, concha cavity height, pinna width, concha cavity width.
A system for enhancing virtual auditory reproduction based on personalized head-related transfer function, the system is composed of a 3D auditory display module, an audio synthesis module and a test module,
the 3D auditory display module visualizes the personalized ITD, ILD and HRTF parameters and stores all the related data to be tested;
the audio synthesis module selects audio signals of different types, different frequencies and different durations from an audio library to be convolved with the personalized HRTF, and audio signals with different spatial orientations are generated;
the testing module reconstructs a three-dimensional space in a computer system by utilizing synchronous mapping through a computer simulation technology and a 3D (three-dimensional) image rendering technology, a display interface presents a sphere space, and the position of a virtual sound source point is mapped to the position of a detection point of a space sphere in a virtual simulation surface;
the testee selects the heard virtual space position through the test module, and corresponding objective detection parameter results are given to the collected detection data through a statistical algorithm in real time, wherein the objective detection parameter results comprise global accuracy, front and back confusion rate, upper and lower confusion rate and click position.
Wherein the different types of audio are: pure tone excitation, noise excitation, voice excitation, music sound excitation; the frequency is low frequency, intermediate frequency and high frequency, the duration is 1-5s, and the interval is 1s.
Wherein the system tests three planes, a transverse plane, a sagittal plane and a coronal plane, respectively.
Further, the sphere space is divided into eight parts, which are respectively: relative coordinates of sound sources at actual spatial positions are all mapped to a virtual sphere space, and all mapped sound source point positions are displayed simultaneously.
The technical scheme provided by the invention has the beneficial effects that:
1. the personalized HRTF customizing method established by the invention can realize the low-cost rapid customization of the full-space personalized HRTF based on the human body characteristic parameters, and compared with a rapid HRTF measuring system, the method does not depend on a specific measuring environment and has low equipment cost;
2. compared with the method for directly calculating the personalized HRTF, the method has small calculation amount; compared with other modeling methods, the method can customize the full-space personalized HRTF and has small spectrum distortion, and when the method is used for a CICIPIC database, the spectrum failure value is still small, and the method has good universality;
3. the invention can visualize parameters such as personalized ITD, ILD and HRTF, synthesize sound sources at any position in the whole space, and comprehensively evaluate the performance of the personalized HRTF in a diversified manner.
Drawings
FIG. 1 is an overall architecture diagram for enhancing virtual auditory reproduction based on a personalized head-related transfer function;
FIG. 2 is a flow chart of an HRTF database creation for enhancing virtual auditory reproduction based on personalized head-related transfer functions;
FIG. 3 is a diagram of 30 parameters of human body characteristics;
FIG. 4 is a schematic diagram of spectral distortion versus a radial basis function neural network and a standard artificial head at different azimuth angles;
FIG. 5 is a schematic diagram of spectral distortion versus a radial basis function neural network and a standard artificial head at different frequencies;
FIG. 6 is a block diagram of a 3D auditory display system;
FIG. 7 is a schematic diagram of an audio synthesis and test module.
Detailed Description
To make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
In order to solve the problems in the background art and aim at the defect of personalized HRTF design in the field of virtual hearing, the embodiment of the invention provides a more practical personalized HRTF obtaining method on the basis of the existing method, so that the virtual hearing playback effect is improved, and meanwhile, the system can perform diversified evaluation on the virtual hearing playback performance of the personalized HRTF.
The embodiment of the invention aims to provide a method for improving virtual auditory reproduction based on an individualized HRTF, which can acquire HRTF data on different tested arbitrary spaces so as to meet the requirements on high spatial resolution and individuation, and adopts the technical scheme comprising the following steps:
step 1: establishing a high-precision HRTF database;
wherein, this step includes:
(1) The method comprises the following steps of acquiring 48 tested head-neck three-dimensional geometric data by using a 3D laser scanner, and in the scanning process, not making special requirements on the posture of a tested person, wherein the tested person only needs to select a comfortable posture, so that noise caused by head movement is reduced;
(2) Importing the three-dimensional geometric model into Magics software for model repair, wherein the method comprises the following steps: repairing holes, detecting holes, roughening treatment and the like;
(3) The spatial position of the head model is uniformly calibrated by setting a reference coordinate system, namely, the middle point of a connecting line of two ears of the head of a person is used as a coordinate origin, the positive direction of the nose tip is used as the positive direction of an X axis, the direction of the right ear is used as the positive direction of a Y axis, and the upper part of the head is used as the positive direction of a Z axis;
(4) The method comprises the steps of solving by adopting a pressure acoustics-boundary element module of multi-physics simulation software COMSOL, namely establishing a boundary element simulation calculation model by configuring a sound field environment, setting parameters, configuring a solver and other flows, and simultaneously providing a region meshing method during meshing, namely dividing a head model into two parts, independently dividing an auricle part with the largest influence on the HRTF into an independent solving region which follows the highest standard of 6 meshes for analyzing 1 wavelength (namely uniformly dividing the auricle model by adopting a 1/6 wavelength division standard), dividing the rest part into another solving region which follows the division standard of 4 meshes for analyzing 1 wavelength (namely uniformly dividing the other parts by adopting a 1/4 wavelength division standard), so that the function of HRTF data can be quickly obtained.
The parameters of the high-precision HRTF database constructed by the embodiment of the invention are detailed in a table 1:
TABLE 1 high-accuracy HRTF database parameters
Step 2: extracting HRTF high-dimensional features;
and further extracting high-dimensional features of the HRTF by using a singular value decomposition method. Firstly, converting the collected HRTF parameters into a logarithmic form, as shown in formula (1):
HRTF log (s,m,f)=20log 10 (|HRTF(s,m,f)|) (1)
where S =1,2 … …, S is the number of samples tested, M =1,2, … …, M is the number of directions, f =1,2, … …, and N is the number of frequency sampling points. Then, the HRTF is combined log (s, m, f) is decomposed into directional transfer functions D related to direction and the direction under test log (s, m, f), and a direction-independent average spectral function HRTF mean (f) As shown in formulas (2) and (3):
D log (s,m,f)=HRTF log (s,m,f)-HRTF mean (f) (2)
in this case, only D is needed log (s, M, f) reducing dimensions, taking data of a subject as an example, each HRTF in M directions has N discrete frequency sampling points, and a matrix shown in formula (4) is constructed:
wherein D is log (m M ,f N ) The HRTF value of the nth discrete frequency sampling point in the mth direction of the subject.
For any real matrix, its singular value decomposition can be expressed as shown in equation (5):
[D log (s,m,f)]=U∑V T (5)
wherein, U = (U) 1 ,u 2 ,…,u M )∈R M×M And V = (V) 1 ,v 2 ,…,v N )∈R N×N Respectively a left singular matrix and a right singular matrix, both orthogonal, where u p ∈R M×1 And v q ∈R N×1 Respectively, left and right singular vectors. Σ = diag (σ) 1 ,σ 2 ,…,σ r ) Being diagonal matrices, singular values in which the elements are arranged in descending order, i.e. sigma 1 ≥σ 2 ≥σ r ≥0。
The weight coefficient and the principal component in the principal component analysis are indirectly obtained by solving the left and right singular values through a singular value decomposition method, and the solved singular matrix is not suitable for visualization, so after the singular value decomposition method is used for solving, the important characteristics extracted after dimension reduction are subjected to main component W i And a weight coefficient a i The form of (A) is shown. The HRTF log-magnitude spectrum can thus be decomposed as shown in equation (6):
finally, the cumulative variance percentage Var is used to represent the HRTF reconstruction effect, as shown in equation (7):
wherein σ j The representation is a matrix [ D ] log (s,m,f)]Rank of (2), i.e. singular value.
When the first 10 principal components are selected, the contribution rate of the accumulated variance is 90%, and the first 10 high-dimensional features { a } are extracted through verification 1 ,a 2 ,...,a 10 Most of the HRTF spectral features can be recovered.
And step 3: collecting human body characteristic parameters;
according to the < < GB/T22187-2009 > measurement standard of human characteristic parameters, 30 human characteristic parameters are measured by an electronic measuring tool SolidWorks software based on a three-dimensional head-neck geometric model, wherein the human characteristic parameters comprise 10 head parameters and 20 ear parameters, and the details are shown in Table 2.
Table 230 items of human body characteristic parameters
And 4, step 4: human body characteristic parameters are optimized;
the human body characteristic parameters are further optimized by utilizing correlation analysis and recursive characteristic elimination. Firstly, a Spearman correlation coefficient is adopted to screen characteristics with high correlation, as shown in formula (8):
wherein x is i And y i Are two different anthropometric parameters of the same subject,andis the average of all two parameters tested, i =1,2 … …, and M is the number tested. Human body characteristic parameters with the correlation coefficient larger than 0.8 are removed through the step (one of the two parameters is selected).
Secondly, a recursive feature elimination method is adopted to further optimize the human body feature parameters which have larger influence on the HRTF, and the specific steps are as follows:
(1) Setting an initial weight coefficient for the rest human body characteristic parameters;
(2) Constructing a Logistic Regression equation to train the characteristic parameters;
(3) Extracting the weight value of the characteristic parameter, and removing the minimum value after taking the absolute value;
(4) And circularly iterating the steps until the number of the residual characteristic parameters reaches the set quantity.
Finally, 10 parameters of human body characteristics are preferred, as detailed in table 3:
table 3 preferred 10 parameters of human body characteristics
And 5: customizing an individualized HRTF model;
the customized HRTF is customized by utilizing a Generalized Regression Neural Network (GRNN). GRNN is composed of input layer, mode layer, summation layer and output layer, and the input vector of network is X = [ X = [ ] 1 ,x 2 ,...,x n ] T The output vector is Y = [ Y = 1 ,y 2 ,...,y k ] T 。
The number of neurons in the input layer is the input dimension m of the training sample, and the input layer passes the input vector to the mode layer. The number of neurons in the mode layer is the number n of training samples, wherein the activation function adopts a radial basis gaussian function, as shown in formula (9):
wherein X is an input vector, X a Is the center of the a-th mode layer neuron, i.e., the input of the a-th training sample, (X-X) a ) T (X-X a ) Is X and X a BetweenIs the width of the radial basis function, determines the shape of the radial basis function in the a-th mode layer neuron.
The neurons in the summation layer are divided into two types, wherein one neuron arithmetically sums the outputs of all the neurons in the mode layer, the connection weight between each neuron in the mode layer and the neuron in the summation layer is 1, and the transfer function of the neuron is shown as the formula (10):
the other summation layer neurons carry out weighted summation on the outputs of all the mode layer neurons, and the connection weight value between the a mode layer neuron and the b summation layer neuron is the output Y of the a training sample a The b-th element y in (1) ab The transfer function of the summing neuron b is shown in equation (11):
the output layer neuron number is the dimension of the output vector, the output layer neuron divides the outputs of the two types of neurons of the summation layer, the output of the b-th output layer neuron corresponds to the b-th element of the output vector, as shown in equation (12):
in constructing the personalized HRTF model, the input vector X is a 12-dimensional vector, as shown in equation (13), which is measured by 10 human measurement parameters (X) 1 ,X 2 ,X 3 ,X 4 ,X 5 ,d 1 ,d 4 ,d 6 ,d 8 ,d 12 ) And 2 direction parameters (azimuth theta and elevation angle)) And (4) forming.
By setting the addition of the orientation parameters, the model can be trained and personalized HRTFs in different directions can be customized at the same time, and the practicability of the customized model by the method is enhanced. The extracted 10 HRTF high-dimensional features are used as model output, and the expression (14) is shown as follows:
Y={a 1 ,a 2 ,...,a 10 } (14)
training a customized model of GRNN by:
(1) Normalizing the mean value and the variance of the independent variable and the dependent variable, and shuffling the mean value and the variance;
(2) HRTF data sets of 48 subjects were randomly divided into 40 training sets and 8 validation sets;
(3) In consideration of the size of a training set, training, adjusting and verifying the proposed GRNN model by adopting 5 times of cross validation;
(4) Searching an optimal smoothing parameter sigma by adopting a grid search method, and enabling the sigma to change within a certain range by taking the step length as 0.01;
(5) After the above steps, the smoothing factor σ =0.60 provides the best performance, i.e. the minimum mean square error.
Finally, a GRNN model with 12-dimensional input and 10-dimensional output was designed using a smoothing factor σ =0.60 and trained using the complete training data set, completing the construction of the HRTF model. The personalized ITD and ILD models corresponding to the tested object can be reconstructed by the same method, which is not described herein.
The system of the embodiment of the invention consists of three modules: (1) the device comprises a 3D auditory display module, (2) an audio synthesis module, and (3) a test module.
(1) 3D auditory display module
According to the interface prompt, 10 key human body characteristic parameters and corresponding orientation parameters of the testee are input, namely parameters such as personalized ITD, ILD and HRTF can be visualized, and all tested results can be stored.
(2) Audio synthesis module
The system is controlled by a software program, and can select different types (pure tone excitation, noise excitation, voice excitation and music sound excitation), different frequencies (low frequency, intermediate frequency and high frequency) and different time lengths (1-5 s, interval 1 s) from an audio library to be convolved with the personalized HRTF to generate audio signals with different spatial orientations.
(3) Test module
A three-dimensional space is reconstructed in a computer system by utilizing a synchronous mapping algorithm through a computer simulation technology and a 3D image rendering technology, a display interface is presented as a sphere space, and the position of a virtual sound source point is mapped to the position of a detection point of a space sphere in a virtual simulation surface. The human-computer interaction is carried out through the handheld Bluetooth controller by the testee, namely, the virtual space position is selected to be heard, and the corresponding objective detection parameter result is given in real time through the detection data acquired by the system through a statistical algorithm, wherein the method comprises the following steps: accuracy, front and back confusion rate, up and down confusion rate, click position and the like. The specific operation will be described in detail in the following examples.
Example 1
The system comprises: a 3D auditory display module, an audio synthesis module, a testing module, see fig. 6 and 7.
The 3D auditory display module inputs basic information such as tested name, gender, age, contact information and the like in sequence according to the prompting of the auditory display interface in figure 6, then 10 key human body characteristic parameters such as tested head width, head length, head depth, auricle lower offset, auricle rear offset, concha cavity width, concha cavity height, fossa trigonis height, total auricle length and total auricle width are input, and corresponding azimuth angle (0-360 degrees, precision 1 degree) and elevation angle (0-360 degrees, precision 1 degree) are selected, so that the tested individual ITD, ILD, HRTF and other auditory parameters can be visualized.
The audio synthesis module is controlled by a software program, the system can select original audio signals of different types (pure tone excitation, noise excitation, voice excitation and music sound excitation), different frequencies (low frequency, intermediate frequency and high frequency) and different durations (1-5 s) from an audio library to generate sound source signals of different directions (azimuth angle and elevation angle), and click 'synthetic audio' after all parameters are set, and the specific steps are as follows:
(1) Fourier transforming the different types of original sound signals F (t) in the time domain, resulting in a sound signal F (w) in the frequency domain, as shown in equation (15):
where t is the time of the original signal and w is the frequency of the original signal.
(2) Extracting a particular orientationOnBy usingThe sound signal F (w) in the frequency domain is filtered, and a new sound signal F (Y) is generated in the frequency domain, as shown in equation (16):
where w is the frequency of the original signal and Y is the frequency of the new signal.
(3) Performing inverse fourier transform on the newly generated sound signal F (Y), so as to obtain a three-dimensional virtual sound source signal F (Y) in a desired time domain, as shown in formula (17):
where Y is the frequency of the new signal and Y is the time of the new signal.
The testing module reconstructs a three-dimensional space in a computer system by using a synchronous mapping algorithm through a computer simulation technology and a 3D image rendering technology, a display interface presents a sphere space, the position of a virtual sound source point is mapped to the position of a detection point of a sphere in the virtual space, and the system can respectively test three planes of a cross section, a sagittal plane and a coronal plane. The human-computer interaction is carried out through a test module (a handheld Bluetooth controller) by a testee, namely, a virtual space position is selected to be heard, and the detection data acquired by the system is used for giving out a corresponding objective detection parameter result in real time through a statistical algorithm, wherein the test module comprises: accuracy, front and back confusion rate, upper and lower confusion rate, click position and the like.
Example 2
The test module of example 1 is further described below in conjunction with specific operating procedures, as described in detail below: the testee sits upright in front of the screen, wears an in-ear earphone and holds the Bluetooth controller in hand. The operation of the principal for initializing the 3D auditory display and audio synthesis system includes: basic information such as human body characteristic parameters of a human subject and relevant information detected by an input experiment are input, and the basic information comprises the following steps: type of audio, audio frequency, audio duration, azimuth and elevation, etc. After the system initialization is completed, the audio system starts to play sound source information, and after the testee hears the sound source, the handheld Bluetooth controller is used for selecting the detection point position of the corresponding space sphere in the virtual simulation interface of the operation system of the testee. The whole virtual sphere space is divided into eight parts, the relative coordinates of sound sources at the actual spatial position are all mapped to the virtual sphere space, and the mapped whole sound source points are displayed simultaneously. During detection, parameter indexes of detection are given in real time through statistical algorithm processing, and the method comprises the following steps: and the results of correct and no signal prompt lamps, positioning accuracy, front and back confusion rate, up and down confusion rate, a point bitmap actually clicked by a testee and the like are used as personalized HRTF (head related transfer function) diversified evaluation indexes.
Example 3
The feasibility of the above protocol was verified in conjunction with table 4, described in detail below:
compared with other modeling methods, the method and the device can customize the full-space personalized HRTF and have small spectral distortion, and when the method is used for a CIPIC database, the spectral distortion value is still small, and as shown in Table 4, the method and the device have good universality. The embodiment of the invention can visualize parameters such as personalized ITD, ILD and HRTF, synthesize sound sources at any position in the whole space, and comprehensively evaluate the performance of the personalized HRTF in a diversified manner. The embodiment of the invention can effectively improve the effect of the virtual auditory reproduction system and has good application prospect in virtual reality and other applications.
TABLE 4 spectral distortion for customizing personalized HRTFs by different methods
In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (10)
1. A method for enhancing virtual auditory reproduction based on a personalized head-related transfer function, the method comprising:
establishing a high-precision head HRTF database, and extracting high-dimensional features of the HRTF;
collecting human body characteristic parameters and preferably selecting the human body characteristic parameters;
and customizing the personalized HRTF by using the generalized recurrent neural network to obtain the optimal auditory reproduction result.
2. The method of claim 1, wherein the step of building a high-precision HRTF database comprises:
(1) Acquiring 48 tested head-neck three-dimensional geometric data by using a 3D laser scanner, and performing three-dimensional reconstruction;
(2) Importing the head three-dimensional geometric model into Magics software for repairing, wherein the repairing comprises the following steps: repairing holes, detecting holes and roughening;
(3) The spatial position of the head model is uniformly calibrated by setting a reference coordinate system, the midpoint of a connection line of two ears of the head is taken as the origin of coordinates, the positive direction of the nose tip is taken as the positive direction of an X axis, the direction of the right ear is taken as the positive direction of a Y axis, and the upper part of the head is taken as the positive direction of a Z axis;
(4) A boundary meta-simulation calculation model is established by adopting multi-physics simulation software COMSOL, the pinna part which has the largest influence on the HRTF is divided into an independent solving area during grid division, and the rest part is divided into another solving area.
3. The method for enhancing virtual auditory reproduction based on the personalized head-related transfer function according to claim 1, wherein the collected human body characteristic parameters, preferably the human body characteristic parameters, are:
measuring by using electronic measuring tool Magics software to obtain human body characteristic parameters of 10 head parameters and 20 ear parameters of each tested object; and (3) screening 10 human body characteristic parameters by combining correlation analysis and recursive characteristic elimination.
4. The method of claim 1 for enhancing virtual auditory reproduction based on an individualized head-related transfer function, the method comprising: the training of the generalized recurrent neural network specifically comprises the following steps:
the input X is a 12-dimensional vector, and the parameters are measured by 10 human bodies (X) 1 ,X 2 ,X 3 ,X 4 ,X 5 ,d 1 ,d 4 ,d 6 ,d 8 ,d 12 ) Azimuth theta and elevation angleThe output Y is a 10-dimensional vector consisting of 10 HRTF high-dimensional features (a) 1 ,a 2 ,...,a 10 ) And (4) forming.
5. The method of claim 2 for enhancing virtual auditory reproduction based on an individualized head-related transfer function, wherein, in the grid partitioning:
the pinna part follows the highest standard of analyzing 1 wavelength by 6 grids, namely, the pinna model is uniformly divided by adopting a 1/6 wavelength division standard; the rest part follows 4 grid analysis division standards of 1 wavelength, namely, the other part is uniformly divided by adopting the 1/4 wavelength division standard.
6. The method for enhancing virtual auditory reproduction according to claim 3, wherein the 10 human body characteristic parameters are:
head width, head height, head depth, amount of displacement under the pinna, amount of displacement behind the pinna, pinna height, deltoid height, concha cavity height, pinna width, concha cavity width.
7. A system for improving virtual auditory reproduction based on personalized head-related transfer function is characterized by comprising a 3D auditory display module, an audio synthesis module and a test module,
the 3D auditory display module visualizes the personalized time difference between two ears, the intensity difference between two ears and the HRTF parameters and stores all the related data to be tested;
the audio synthesis module selects audio signals of different types, different frequencies and different durations from an audio library to be convolved with the personalized HRTF, and audio signals with different spatial orientations are generated;
the testing module reconstructs a three-dimensional space in a computer system by utilizing synchronous mapping through a computer simulation technology and a 3D (three-dimensional) image rendering technology, a display interface presents a sphere space, and the position of a virtual sound source point is mapped to the position of a detection point of a space sphere in a virtual simulation surface;
the testee selects the heard virtual space position through the test module, and corresponding objective detection parameter results are given to the collected detection data through a statistical algorithm in real time, wherein the objective detection parameter results comprise global accuracy, front and back confusion rate, upper and lower confusion rate and click position.
8. The system of claim 7, wherein the different types of audio are: pure tone excitation, noise excitation, voice excitation, music sound excitation; the frequency is low frequency, intermediate frequency and high frequency, the duration is 1-5s, and the interval is 1s.
9. The system for enhancing virtual auditory reproduction according to claim 7, wherein the system separately tests three planes, the transverse, sagittal and coronal planes.
10. The system of claim 7, wherein the sphere space is divided into eight parts, namely: relative coordinates of sound sources at actual spatial positions are all mapped to a virtual sphere space, and all mapped sound source points are displayed simultaneously.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211077500.6A CN115412808B (en) | 2022-09-05 | 2022-09-05 | Virtual hearing replay method and system based on personalized head related transfer function |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211077500.6A CN115412808B (en) | 2022-09-05 | 2022-09-05 | Virtual hearing replay method and system based on personalized head related transfer function |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115412808A true CN115412808A (en) | 2022-11-29 |
CN115412808B CN115412808B (en) | 2024-04-02 |
Family
ID=84163847
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211077500.6A Active CN115412808B (en) | 2022-09-05 | 2022-09-05 | Virtual hearing replay method and system based on personalized head related transfer function |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115412808B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117177165A (en) * | 2023-11-02 | 2023-12-05 | 歌尔股份有限公司 | Method, device, equipment and medium for testing spatial audio function of audio equipment |
CN117437367A (en) * | 2023-12-22 | 2024-01-23 | 天津大学 | Early warning earphone sliding and dynamic correction method based on auricle correlation function |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8428269B1 (en) * | 2009-05-20 | 2013-04-23 | The United States Of America As Represented By The Secretary Of The Air Force | Head related transfer function (HRTF) enhancement for improved vertical-polar localization in spatial audio systems |
CN106535043A (en) * | 2016-11-18 | 2017-03-22 | 华南理工大学 | Full-frequency 3D virtual sound customization method and device based on physiological characteristics |
US20170094440A1 (en) * | 2014-03-06 | 2017-03-30 | Dolby Laboratories Licensing Corporation | Structural Modeling of the Head Related Impulse Response |
CN107182003A (en) * | 2017-06-01 | 2017-09-19 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Airborne three-dimensional call virtual auditory processing method |
CN108476358A (en) * | 2015-12-31 | 2018-08-31 | 创新科技有限公司 | A method of for generating customized/personalized head related transfer function |
CN108540925A (en) * | 2018-04-11 | 2018-09-14 | 北京理工大学 | A kind of fast matching method of personalization head related transfer function |
CN108596016A (en) * | 2018-03-06 | 2018-09-28 | 北京大学 | A kind of personalized head-position difficult labor modeling method based on deep neural network |
CN108616789A (en) * | 2018-04-11 | 2018-10-02 | 北京理工大学 | The individualized virtual voice reproducing method measured in real time based on ears |
CN109998553A (en) * | 2019-04-29 | 2019-07-12 | 天津大学 | The method of the parametrization detection system and minimum audible angle of spatial localization of sound ability |
CN111246363A (en) * | 2020-01-08 | 2020-06-05 | 华南理工大学 | Auditory matching-based virtual sound customization method and device |
JP2020170938A (en) * | 2019-04-03 | 2020-10-15 | アルパイン株式会社 | Head transfer function learning device and head transfer function inference device |
CN113038356A (en) * | 2019-12-09 | 2021-06-25 | 上海航空电器有限公司 | Personalized HRTF rapid modeling acquisition method |
CN113316077A (en) * | 2021-06-27 | 2021-08-27 | 高小翎 | Three-dimensional vivid generation system for voice sound source space sound effect |
-
2022
- 2022-09-05 CN CN202211077500.6A patent/CN115412808B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8428269B1 (en) * | 2009-05-20 | 2013-04-23 | The United States Of America As Represented By The Secretary Of The Air Force | Head related transfer function (HRTF) enhancement for improved vertical-polar localization in spatial audio systems |
US20170094440A1 (en) * | 2014-03-06 | 2017-03-30 | Dolby Laboratories Licensing Corporation | Structural Modeling of the Head Related Impulse Response |
CN108476358A (en) * | 2015-12-31 | 2018-08-31 | 创新科技有限公司 | A method of for generating customized/personalized head related transfer function |
CN106535043A (en) * | 2016-11-18 | 2017-03-22 | 华南理工大学 | Full-frequency 3D virtual sound customization method and device based on physiological characteristics |
CN107182003A (en) * | 2017-06-01 | 2017-09-19 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Airborne three-dimensional call virtual auditory processing method |
CN108596016A (en) * | 2018-03-06 | 2018-09-28 | 北京大学 | A kind of personalized head-position difficult labor modeling method based on deep neural network |
CN108540925A (en) * | 2018-04-11 | 2018-09-14 | 北京理工大学 | A kind of fast matching method of personalization head related transfer function |
CN108616789A (en) * | 2018-04-11 | 2018-10-02 | 北京理工大学 | The individualized virtual voice reproducing method measured in real time based on ears |
JP2020170938A (en) * | 2019-04-03 | 2020-10-15 | アルパイン株式会社 | Head transfer function learning device and head transfer function inference device |
CN109998553A (en) * | 2019-04-29 | 2019-07-12 | 天津大学 | The method of the parametrization detection system and minimum audible angle of spatial localization of sound ability |
CN113038356A (en) * | 2019-12-09 | 2021-06-25 | 上海航空电器有限公司 | Personalized HRTF rapid modeling acquisition method |
CN111246363A (en) * | 2020-01-08 | 2020-06-05 | 华南理工大学 | Auditory matching-based virtual sound customization method and device |
CN113316077A (en) * | 2021-06-27 | 2021-08-27 | 高小翎 | Three-dimensional vivid generation system for voice sound source space sound effect |
Non-Patent Citations (3)
Title |
---|
BHARITKAR, S, MAUER, T, WELLS, T,BERFANGER, D: "Stacked Autoencoder Based HRTF Synthesis from Sparse Data", 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 15 November 2018 (2018-11-15), pages 356 - 361, XP033525853, DOI: 10.23919/APSIPA.2018.8659495 * |
刘宝禄, 刘庆峰, 郭小朝等: "个性化头相关传递函数的研究进展", 电子测量与仪器学报, vol. 34, no. 11, 15 November 2021 (2021-11-15), pages 155 - 165 * |
杨立东, 焦慧媛: "头部相关传递函数获取关键技术研究", 软件导刊, vol. 18, no. 1, 31 January 2019 (2019-01-31), pages 34 - 39 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117177165A (en) * | 2023-11-02 | 2023-12-05 | 歌尔股份有限公司 | Method, device, equipment and medium for testing spatial audio function of audio equipment |
CN117177165B (en) * | 2023-11-02 | 2024-03-12 | 歌尔股份有限公司 | Method, device, equipment and medium for testing spatial audio function of audio equipment |
CN117437367A (en) * | 2023-12-22 | 2024-01-23 | 天津大学 | Early warning earphone sliding and dynamic correction method based on auricle correlation function |
CN117437367B (en) * | 2023-12-22 | 2024-02-23 | 天津大学 | Early warning earphone sliding and dynamic correction method based on auricle correlation function |
Also Published As
Publication number | Publication date |
---|---|
CN115412808B (en) | 2024-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4718559B2 (en) | Method and apparatus for individualizing HRTFs by modeling | |
CN115412808B (en) | Virtual hearing replay method and system based on personalized head related transfer function | |
CN108596016B (en) | Personalized head-related transfer function modeling method based on deep neural network | |
Francl et al. | Deep neural network models of sound localization reveal how perception is adapted to real-world environments | |
CN108476369A (en) | Method and system for developing the head related transfer function for being suitable for individual | |
Leng et al. | Binauralgrad: A two-stage conditional diffusion probabilistic model for binaural audio synthesis | |
US20040091119A1 (en) | Method for measurement of head related transfer functions | |
Pörschmann et al. | Directional equalization of sparse head-related transfer function sets for spatial upsampling | |
Geronazzo et al. | Do we need individual head-related transfer functions for vertical localization? The case study of a spectral notch distance metric | |
CN106165444B (en) | Sound field reproduction apparatus, methods and procedures | |
Schönstein et al. | HRTF selection for binaural synthesis from a database using morphological parameters | |
JP2009512364A (en) | Virtual audio simulation | |
Tenenbaum et al. | Auralization generated by modeling HRIRs with artificial neural networks and its validation using articulation tests | |
CN113849767B (en) | Personalized HRTF (head related transfer function) generation method and system based on physiological parameters and artificial head data | |
Zagala et al. | Comparison of direct and indirect perceptual head-related transfer function selection methods | |
Barumerli et al. | Round Robin Comparison of Inter-Laboratory HRTF Measurements–Assessment with an auditory model for elevation | |
O’Connor et al. | An evaluation of 3D printing for the manufacture of a binaural recording device | |
Zhang et al. | HRTF field: Unifying measured HRTF magnitude representation with neural fields | |
Zhu et al. | HRTF personalization based on weighted sparse representation of anthropometric features | |
CN113038356A (en) | Personalized HRTF rapid modeling acquisition method | |
Xi et al. | Magnitude modelling of individualized HRTFs using DNN based spherical harmonic analysis | |
Spagnol et al. | Estimation of spectral notches from pinna meshes: Insights from a simple computational model | |
Lokki et al. | Auditorium acoustics assessment with sensory evaluation methods | |
Barumerli et al. | Localization in elevation with non-individual head-related transfer functions: comparing predictions of two auditory models | |
Wang et al. | Prediction of head-related transfer function based on tensor completion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |