CN115412808A - Method and system for improving virtual auditory reproduction based on personalized head-related transfer function - Google Patents

Method and system for improving virtual auditory reproduction based on personalized head-related transfer function Download PDF

Info

Publication number
CN115412808A
CN115412808A CN202211077500.6A CN202211077500A CN115412808A CN 115412808 A CN115412808 A CN 115412808A CN 202211077500 A CN202211077500 A CN 202211077500A CN 115412808 A CN115412808 A CN 115412808A
Authority
CN
China
Prior art keywords
hrtf
head
personalized
virtual
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211077500.6A
Other languages
Chinese (zh)
Other versions
CN115412808B (en
Inventor
倪广健
刘洪兴
白艳茹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202211077500.6A priority Critical patent/CN115412808B/en
Publication of CN115412808A publication Critical patent/CN115412808A/en
Application granted granted Critical
Publication of CN115412808B publication Critical patent/CN115412808B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/033Headphones for stereophonic communication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/332Displays for viewing with the aid of special glasses or head-mounted displays [HMD]
    • H04N13/344Displays for viewing with the aid of special glasses or head-mounted displays [HMD] with head-mounted left-right displays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R29/00Monitoring arrangements; Testing arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Stereophonic System (AREA)

Abstract

The invention discloses a method and a system for improving virtual auditory reproduction based on a personalized head-related transfer function, wherein the method comprises the following steps: establishing a high-precision Head-Related Transfer Function (HRTF) database, and extracting high-dimensional features of the HRTF; collecting and preferably selecting human characteristic parameters; and customizing the personalized HRTF by using the generalized regression neural network. The system consists of a 3D auditory display module, an audio synthesis module and a test module, wherein the 3D auditory display module visualizes and stores the personalized time difference between two ears, the intensity difference between two ears and the HRTF; the audio synthesis module selects different types, frequencies and duration of audio frequencies from an audio library to be convolved with the personalized HRTF, and synthesizes sound source signals of different spatial directions; the testing module is embedded into a computer simulation technology and a 3D rendering technology, and an operator estimates the personalized HRTF virtual auditory playback effect in a diversified manner through man-machine interaction.

Description

Method and system for improving virtual auditory reproduction based on personalized head-related transfer function
Technical Field
The invention relates to the technical field of virtual auditory display, in particular to a method and a system for establishing a personalized Head-Related Transfer Function (HRTF) model suitable for a binaural earphone device and generating a model matched with a user to improve virtual auditory playback.
Background
Human hearing includes perception of spatial attributes of sound in addition to perception of timbre, loudness, pitch, and duration of sound. The perception of spatial properties by the auditory system mainly depends on information such as binaural time difference (ITD), binaural intensity difference (IID), and spectral factors, which can be uniformly expressed by HRTFs.
Humans can construct virtual auditory spaces using sound reproduction techniques, i.e., to allow listeners to generate specific spatial perception information. With the development of artificial intelligence audio-visual technology, the construction of immersive virtual auditory space gradually becomes a key component for realizing realistic virtual reality feeling. The sound reproduction technology can be divided into two categories according to different sound reproduction devices: the first is multi-channel loudspeaker playback, involving acoustic holographic playback, spherical harmonic decomposition, wave field synthesis, etc. However, from the practical application point of view, although this method can accurately construct a sound field in a certain area, a large number of linear array speakers need to be arranged, and the configuration of the speakers is strictly required, so the development of this technology is limited. The second category is dual channel headphone playback, mainly involving binaural picking and virtual auditory playback. The key of the binaural pickup technology is the process of picking up binaural signals by using a microphone, and the technical core of virtual auditory reproduction is the construction of an HRTF. Both aims are to accurately reconstruct binaural sound pressure, rather than reconstructing a physical sound field in a spatial region. The method has the advantages that the three-dimensional sound effect can be generated only by a pair of double-channel earphones, and the method quickly becomes a mainstream mode for constructing the virtual space due to convenience and practicability.
HRTFs rely heavily on human characteristic parameters related to sound reflection, diffraction and dispersion, which are unique to everyone. Sound source perception errors can easily arise if a common HRTF is applied to all individuals, for example: front-back confusion, up-down confusion, angular deviation, head-in-head effects, etc., so constructing a personalized HRTF is the key to constructing an immersive virtual auditory space.
At present, methods for obtaining personalized HRTFs mainly include: acoustic measurement, numerical calculation, and customization of human body features, but each method has limitations, such as: the acoustic measurement method has the defects of long acquisition time, expensive acquisition equipment and high acquisition environment requirement; the numerical calculation method requires a specific device (MRI or CT, etc.) to acquire the head model; most of human body characteristic customization methods are based on the problems that the precision of personalized HRTFs obtained by a public database is low and the like, so that the methods cannot be well applied to practice.
Disclosure of Invention
The invention provides a method and a system for improving virtual auditory sense replay based on an individualized head related transfer function, the invention quickly customizes a full-space individualized HRTF based on human body characteristic parameters at low cost, and the method is mainly divided into five steps of establishing a high-precision HRTF database, extracting HRTF high-dimensional characteristics, acquiring human body characteristic parameters, optimizing the human body characteristic parameters and establishing the individualized HRTF; the system consists of a 3D auditory display module, an audio synthesis module and a test module, and can evaluate the virtual auditory playback effect of the personalized HRTF in a diversified and accurate manner; the invention can effectively improve the effect of the virtual auditory reproduction system, has good application prospect in the application of virtual reality and the like, and is described in detail as follows:
a method for enhancing virtual auditory reproduction based on a personalized head-related transfer function, the method comprising:
establishing a high-precision HRTF database, and extracting high-dimensional characteristics of the HRTF;
collecting human body characteristic parameters and preferably selecting the human body characteristic parameters;
and customizing the personalized HRTF by using the generalized regression neural network to obtain the optimal auditory reproduction result.
Wherein, the step of establishing the high-precision HRTF database comprises the following steps:
(1) Acquiring 48 tested head-neck three-dimensional geometric data by using a 3D laser scanner, and performing three-dimensional reconstruction;
(2) Importing the head three-dimensional geometric model into Magics software for repairing, wherein the repairing comprises the following steps: repairing holes, detecting holes and roughening;
(3) The spatial position of the head model is uniformly calibrated by setting a reference coordinate system, the midpoint of a connecting line of two ears of the head is used as a coordinate origin, the positive direction of the nose tip is used as the positive direction of an X axis, the direction of the right ear is used as the positive direction of a Y axis, and the upper part of the head is used as the positive direction of a Z axis;
(4) A boundary meta-simulation calculation model is established by adopting multi-physics simulation software COMSOL, the pinna part which has the largest influence on the HRTF is divided into an independent solving area during grid division, and the rest part is divided into another solving area.
Further, the collecting human body characteristic parameters, preferably the human body characteristic parameters are as follows:
measuring by using electronic measuring tool Magics software to obtain human body characteristic parameters of 10 head parameters and 20 ear parameters of each tested object; and screening 10 human body characteristic parameters by combining correlation analysis and recursive characteristic elimination.
Wherein the method comprises the following steps: the training of the generalized recurrent neural network specifically comprises the following steps:
input X is a 12-dimensional vector, and parameters are measured from 10 individuals (X) 1 ,X 2 ,X 3 ,X 4 ,x 5 ,d 1 ,d 4 ,d 6 ,d 8 ,d 12 ) Azimuth theta and elevation angle
Figure BDA0003832216780000021
The output Y is a 10-dimensional vector consisting of 10 HRTF high-dimensional features (a) 1 ,a 2 ,...,a 10 ) And (4) forming.
Further, when the grid is divided:
the auricle part follows the highest standard of analyzing 1 wavelength by 6 grids, namely, the auricle model is uniformly divided by adopting a 1/6 wavelength division standard; the rest part follows 4 grid analysis division standards of 1 wavelength, namely, the other part is uniformly divided by adopting the 1/4 wavelength division standard.
Wherein, the 10 human body characteristic parameters are as follows:
head width, head height, head depth, amount of displacement under the pinna, amount of displacement behind the pinna, pinna height, deltoid height, concha cavity height, pinna width, concha cavity width.
A system for enhancing virtual auditory reproduction based on personalized head-related transfer function, the system is composed of a 3D auditory display module, an audio synthesis module and a test module,
the 3D auditory display module visualizes the personalized ITD, ILD and HRTF parameters and stores all the related data to be tested;
the audio synthesis module selects audio signals of different types, different frequencies and different durations from an audio library to be convolved with the personalized HRTF, and audio signals with different spatial orientations are generated;
the testing module reconstructs a three-dimensional space in a computer system by utilizing synchronous mapping through a computer simulation technology and a 3D (three-dimensional) image rendering technology, a display interface presents a sphere space, and the position of a virtual sound source point is mapped to the position of a detection point of a space sphere in a virtual simulation surface;
the testee selects the heard virtual space position through the test module, and corresponding objective detection parameter results are given to the collected detection data through a statistical algorithm in real time, wherein the objective detection parameter results comprise global accuracy, front and back confusion rate, upper and lower confusion rate and click position.
Wherein the different types of audio are: pure tone excitation, noise excitation, voice excitation, music sound excitation; the frequency is low frequency, intermediate frequency and high frequency, the duration is 1-5s, and the interval is 1s.
Wherein the system tests three planes, a transverse plane, a sagittal plane and a coronal plane, respectively.
Further, the sphere space is divided into eight parts, which are respectively: relative coordinates of sound sources at actual spatial positions are all mapped to a virtual sphere space, and all mapped sound source point positions are displayed simultaneously.
The technical scheme provided by the invention has the beneficial effects that:
1. the personalized HRTF customizing method established by the invention can realize the low-cost rapid customization of the full-space personalized HRTF based on the human body characteristic parameters, and compared with a rapid HRTF measuring system, the method does not depend on a specific measuring environment and has low equipment cost;
2. compared with the method for directly calculating the personalized HRTF, the method has small calculation amount; compared with other modeling methods, the method can customize the full-space personalized HRTF and has small spectrum distortion, and when the method is used for a CICIPIC database, the spectrum failure value is still small, and the method has good universality;
3. the invention can visualize parameters such as personalized ITD, ILD and HRTF, synthesize sound sources at any position in the whole space, and comprehensively evaluate the performance of the personalized HRTF in a diversified manner.
Drawings
FIG. 1 is an overall architecture diagram for enhancing virtual auditory reproduction based on a personalized head-related transfer function;
FIG. 2 is a flow chart of an HRTF database creation for enhancing virtual auditory reproduction based on personalized head-related transfer functions;
FIG. 3 is a diagram of 30 parameters of human body characteristics;
FIG. 4 is a schematic diagram of spectral distortion versus a radial basis function neural network and a standard artificial head at different azimuth angles;
FIG. 5 is a schematic diagram of spectral distortion versus a radial basis function neural network and a standard artificial head at different frequencies;
FIG. 6 is a block diagram of a 3D auditory display system;
FIG. 7 is a schematic diagram of an audio synthesis and test module.
Detailed Description
To make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
In order to solve the problems in the background art and aim at the defect of personalized HRTF design in the field of virtual hearing, the embodiment of the invention provides a more practical personalized HRTF obtaining method on the basis of the existing method, so that the virtual hearing playback effect is improved, and meanwhile, the system can perform diversified evaluation on the virtual hearing playback performance of the personalized HRTF.
The embodiment of the invention aims to provide a method for improving virtual auditory reproduction based on an individualized HRTF, which can acquire HRTF data on different tested arbitrary spaces so as to meet the requirements on high spatial resolution and individuation, and adopts the technical scheme comprising the following steps:
step 1: establishing a high-precision HRTF database;
wherein, this step includes:
(1) The method comprises the following steps of acquiring 48 tested head-neck three-dimensional geometric data by using a 3D laser scanner, and in the scanning process, not making special requirements on the posture of a tested person, wherein the tested person only needs to select a comfortable posture, so that noise caused by head movement is reduced;
(2) Importing the three-dimensional geometric model into Magics software for model repair, wherein the method comprises the following steps: repairing holes, detecting holes, roughening treatment and the like;
(3) The spatial position of the head model is uniformly calibrated by setting a reference coordinate system, namely, the middle point of a connecting line of two ears of the head of a person is used as a coordinate origin, the positive direction of the nose tip is used as the positive direction of an X axis, the direction of the right ear is used as the positive direction of a Y axis, and the upper part of the head is used as the positive direction of a Z axis;
(4) The method comprises the steps of solving by adopting a pressure acoustics-boundary element module of multi-physics simulation software COMSOL, namely establishing a boundary element simulation calculation model by configuring a sound field environment, setting parameters, configuring a solver and other flows, and simultaneously providing a region meshing method during meshing, namely dividing a head model into two parts, independently dividing an auricle part with the largest influence on the HRTF into an independent solving region which follows the highest standard of 6 meshes for analyzing 1 wavelength (namely uniformly dividing the auricle model by adopting a 1/6 wavelength division standard), dividing the rest part into another solving region which follows the division standard of 4 meshes for analyzing 1 wavelength (namely uniformly dividing the other parts by adopting a 1/4 wavelength division standard), so that the function of HRTF data can be quickly obtained.
The parameters of the high-precision HRTF database constructed by the embodiment of the invention are detailed in a table 1:
TABLE 1 high-accuracy HRTF database parameters
Figure BDA0003832216780000051
Step 2: extracting HRTF high-dimensional features;
and further extracting high-dimensional features of the HRTF by using a singular value decomposition method. Firstly, converting the collected HRTF parameters into a logarithmic form, as shown in formula (1):
HRTF log (s,m,f)=20log 10 (|HRTF(s,m,f)|) (1)
where S =1,2 … …, S is the number of samples tested, M =1,2, … …, M is the number of directions, f =1,2, … …, and N is the number of frequency sampling points. Then, the HRTF is combined log (s, m, f) is decomposed into directional transfer functions D related to direction and the direction under test log (s, m, f), and a direction-independent average spectral function HRTF mean (f) As shown in formulas (2) and (3):
D log (s,m,f)=HRTF log (s,m,f)-HRTF mean (f) (2)
Figure BDA0003832216780000061
in this case, only D is needed log (s, M, f) reducing dimensions, taking data of a subject as an example, each HRTF in M directions has N discrete frequency sampling points, and a matrix shown in formula (4) is constructed:
Figure BDA0003832216780000062
wherein D is log (m M ,f N ) The HRTF value of the nth discrete frequency sampling point in the mth direction of the subject.
For any real matrix, its singular value decomposition can be expressed as shown in equation (5):
[D log (s,m,f)]=U∑V T (5)
wherein, U = (U) 1 ,u 2 ,…,u M )∈R M×M And V = (V) 1 ,v 2 ,…,v N )∈R N×N Respectively a left singular matrix and a right singular matrix, both orthogonal, where u p ∈R M×1 And v q ∈R N×1 Respectively, left and right singular vectors. Σ = diag (σ) 1 ,σ 2 ,…,σ r ) Being diagonal matrices, singular values in which the elements are arranged in descending order, i.e. sigma 1 ≥σ 2 ≥σ r ≥0。
The weight coefficient and the principal component in the principal component analysis are indirectly obtained by solving the left and right singular values through a singular value decomposition method, and the solved singular matrix is not suitable for visualization, so after the singular value decomposition method is used for solving, the important characteristics extracted after dimension reduction are subjected to main component W i And a weight coefficient a i The form of (A) is shown. The HRTF log-magnitude spectrum can thus be decomposed as shown in equation (6):
Figure BDA0003832216780000063
finally, the cumulative variance percentage Var is used to represent the HRTF reconstruction effect, as shown in equation (7):
Figure BDA0003832216780000064
wherein σ j The representation is a matrix [ D ] log (s,m,f)]Rank of (2), i.e. singular value.
When the first 10 principal components are selected, the contribution rate of the accumulated variance is 90%, and the first 10 high-dimensional features { a } are extracted through verification 1 ,a 2 ,...,a 10 Most of the HRTF spectral features can be recovered.
And step 3: collecting human body characteristic parameters;
according to the < < GB/T22187-2009 > measurement standard of human characteristic parameters, 30 human characteristic parameters are measured by an electronic measuring tool SolidWorks software based on a three-dimensional head-neck geometric model, wherein the human characteristic parameters comprise 10 head parameters and 20 ear parameters, and the details are shown in Table 2.
Table 230 items of human body characteristic parameters
Figure BDA0003832216780000065
Figure BDA0003832216780000071
And 4, step 4: human body characteristic parameters are optimized;
the human body characteristic parameters are further optimized by utilizing correlation analysis and recursive characteristic elimination. Firstly, a Spearman correlation coefficient is adopted to screen characteristics with high correlation, as shown in formula (8):
Figure BDA0003832216780000072
wherein x is i And y i Are two different anthropometric parameters of the same subject,
Figure BDA0003832216780000073
and
Figure BDA0003832216780000074
is the average of all two parameters tested, i =1,2 … …, and M is the number tested. Human body characteristic parameters with the correlation coefficient larger than 0.8 are removed through the step (one of the two parameters is selected).
Secondly, a recursive feature elimination method is adopted to further optimize the human body feature parameters which have larger influence on the HRTF, and the specific steps are as follows:
(1) Setting an initial weight coefficient for the rest human body characteristic parameters;
(2) Constructing a Logistic Regression equation to train the characteristic parameters;
(3) Extracting the weight value of the characteristic parameter, and removing the minimum value after taking the absolute value;
(4) And circularly iterating the steps until the number of the residual characteristic parameters reaches the set quantity.
Finally, 10 parameters of human body characteristics are preferred, as detailed in table 3:
table 3 preferred 10 parameters of human body characteristics
Figure BDA0003832216780000075
Figure BDA0003832216780000081
And 5: customizing an individualized HRTF model;
the customized HRTF is customized by utilizing a Generalized Regression Neural Network (GRNN). GRNN is composed of input layer, mode layer, summation layer and output layer, and the input vector of network is X = [ X = [ ] 1 ,x 2 ,...,x n ] T The output vector is Y = [ Y = 1 ,y 2 ,...,y k ] T
The number of neurons in the input layer is the input dimension m of the training sample, and the input layer passes the input vector to the mode layer. The number of neurons in the mode layer is the number n of training samples, wherein the activation function adopts a radial basis gaussian function, as shown in formula (9):
Figure BDA0003832216780000082
wherein X is an input vector, X a Is the center of the a-th mode layer neuron, i.e., the input of the a-th training sample, (X-X) a ) T (X-X a ) Is X and X a BetweenIs the width of the radial basis function, determines the shape of the radial basis function in the a-th mode layer neuron.
The neurons in the summation layer are divided into two types, wherein one neuron arithmetically sums the outputs of all the neurons in the mode layer, the connection weight between each neuron in the mode layer and the neuron in the summation layer is 1, and the transfer function of the neuron is shown as the formula (10):
Figure BDA0003832216780000083
the other summation layer neurons carry out weighted summation on the outputs of all the mode layer neurons, and the connection weight value between the a mode layer neuron and the b summation layer neuron is the output Y of the a training sample a The b-th element y in (1) ab The transfer function of the summing neuron b is shown in equation (11):
Figure BDA0003832216780000084
the output layer neuron number is the dimension of the output vector, the output layer neuron divides the outputs of the two types of neurons of the summation layer, the output of the b-th output layer neuron corresponds to the b-th element of the output vector, as shown in equation (12):
Figure BDA0003832216780000085
in constructing the personalized HRTF model, the input vector X is a 12-dimensional vector, as shown in equation (13), which is measured by 10 human measurement parameters (X) 1 ,X 2 ,X 3 ,X 4 ,X 5 ,d 1 ,d 4 ,d 6 ,d 8 ,d 12 ) And 2 direction parameters (azimuth theta and elevation angle)
Figure BDA0003832216780000087
) And (4) forming.
Figure BDA0003832216780000086
By setting the addition of the orientation parameters, the model can be trained and personalized HRTFs in different directions can be customized at the same time, and the practicability of the customized model by the method is enhanced. The extracted 10 HRTF high-dimensional features are used as model output, and the expression (14) is shown as follows:
Y={a 1 ,a 2 ,...,a 10 } (14)
training a customized model of GRNN by:
(1) Normalizing the mean value and the variance of the independent variable and the dependent variable, and shuffling the mean value and the variance;
(2) HRTF data sets of 48 subjects were randomly divided into 40 training sets and 8 validation sets;
(3) In consideration of the size of a training set, training, adjusting and verifying the proposed GRNN model by adopting 5 times of cross validation;
(4) Searching an optimal smoothing parameter sigma by adopting a grid search method, and enabling the sigma to change within a certain range by taking the step length as 0.01;
(5) After the above steps, the smoothing factor σ =0.60 provides the best performance, i.e. the minimum mean square error.
Finally, a GRNN model with 12-dimensional input and 10-dimensional output was designed using a smoothing factor σ =0.60 and trained using the complete training data set, completing the construction of the HRTF model. The personalized ITD and ILD models corresponding to the tested object can be reconstructed by the same method, which is not described herein.
The system of the embodiment of the invention consists of three modules: (1) the device comprises a 3D auditory display module, (2) an audio synthesis module, and (3) a test module.
(1) 3D auditory display module
According to the interface prompt, 10 key human body characteristic parameters and corresponding orientation parameters of the testee are input, namely parameters such as personalized ITD, ILD and HRTF can be visualized, and all tested results can be stored.
(2) Audio synthesis module
The system is controlled by a software program, and can select different types (pure tone excitation, noise excitation, voice excitation and music sound excitation), different frequencies (low frequency, intermediate frequency and high frequency) and different time lengths (1-5 s, interval 1 s) from an audio library to be convolved with the personalized HRTF to generate audio signals with different spatial orientations.
(3) Test module
A three-dimensional space is reconstructed in a computer system by utilizing a synchronous mapping algorithm through a computer simulation technology and a 3D image rendering technology, a display interface is presented as a sphere space, and the position of a virtual sound source point is mapped to the position of a detection point of a space sphere in a virtual simulation surface. The human-computer interaction is carried out through the handheld Bluetooth controller by the testee, namely, the virtual space position is selected to be heard, and the corresponding objective detection parameter result is given in real time through the detection data acquired by the system through a statistical algorithm, wherein the method comprises the following steps: accuracy, front and back confusion rate, up and down confusion rate, click position and the like. The specific operation will be described in detail in the following examples.
Example 1
The system comprises: a 3D auditory display module, an audio synthesis module, a testing module, see fig. 6 and 7.
The 3D auditory display module inputs basic information such as tested name, gender, age, contact information and the like in sequence according to the prompting of the auditory display interface in figure 6, then 10 key human body characteristic parameters such as tested head width, head length, head depth, auricle lower offset, auricle rear offset, concha cavity width, concha cavity height, fossa trigonis height, total auricle length and total auricle width are input, and corresponding azimuth angle (0-360 degrees, precision 1 degree) and elevation angle (0-360 degrees, precision 1 degree) are selected, so that the tested individual ITD, ILD, HRTF and other auditory parameters can be visualized.
The audio synthesis module is controlled by a software program, the system can select original audio signals of different types (pure tone excitation, noise excitation, voice excitation and music sound excitation), different frequencies (low frequency, intermediate frequency and high frequency) and different durations (1-5 s) from an audio library to generate sound source signals of different directions (azimuth angle and elevation angle), and click 'synthetic audio' after all parameters are set, and the specific steps are as follows:
(1) Fourier transforming the different types of original sound signals F (t) in the time domain, resulting in a sound signal F (w) in the frequency domain, as shown in equation (15):
Figure BDA0003832216780000101
where t is the time of the original signal and w is the frequency of the original signal.
(2) Extracting a particular orientation
Figure BDA0003832216780000103
On
Figure BDA0003832216780000104
By using
Figure BDA0003832216780000105
The sound signal F (w) in the frequency domain is filtered, and a new sound signal F (Y) is generated in the frequency domain, as shown in equation (16):
Figure BDA0003832216780000106
where w is the frequency of the original signal and Y is the frequency of the new signal.
(3) Performing inverse fourier transform on the newly generated sound signal F (Y), so as to obtain a three-dimensional virtual sound source signal F (Y) in a desired time domain, as shown in formula (17):
Figure BDA0003832216780000102
where Y is the frequency of the new signal and Y is the time of the new signal.
The testing module reconstructs a three-dimensional space in a computer system by using a synchronous mapping algorithm through a computer simulation technology and a 3D image rendering technology, a display interface presents a sphere space, the position of a virtual sound source point is mapped to the position of a detection point of a sphere in the virtual space, and the system can respectively test three planes of a cross section, a sagittal plane and a coronal plane. The human-computer interaction is carried out through a test module (a handheld Bluetooth controller) by a testee, namely, a virtual space position is selected to be heard, and the detection data acquired by the system is used for giving out a corresponding objective detection parameter result in real time through a statistical algorithm, wherein the test module comprises: accuracy, front and back confusion rate, upper and lower confusion rate, click position and the like.
Example 2
The test module of example 1 is further described below in conjunction with specific operating procedures, as described in detail below: the testee sits upright in front of the screen, wears an in-ear earphone and holds the Bluetooth controller in hand. The operation of the principal for initializing the 3D auditory display and audio synthesis system includes: basic information such as human body characteristic parameters of a human subject and relevant information detected by an input experiment are input, and the basic information comprises the following steps: type of audio, audio frequency, audio duration, azimuth and elevation, etc. After the system initialization is completed, the audio system starts to play sound source information, and after the testee hears the sound source, the handheld Bluetooth controller is used for selecting the detection point position of the corresponding space sphere in the virtual simulation interface of the operation system of the testee. The whole virtual sphere space is divided into eight parts, the relative coordinates of sound sources at the actual spatial position are all mapped to the virtual sphere space, and the mapped whole sound source points are displayed simultaneously. During detection, parameter indexes of detection are given in real time through statistical algorithm processing, and the method comprises the following steps: and the results of correct and no signal prompt lamps, positioning accuracy, front and back confusion rate, up and down confusion rate, a point bitmap actually clicked by a testee and the like are used as personalized HRTF (head related transfer function) diversified evaluation indexes.
Example 3
The feasibility of the above protocol was verified in conjunction with table 4, described in detail below:
compared with other modeling methods, the method and the device can customize the full-space personalized HRTF and have small spectral distortion, and when the method is used for a CIPIC database, the spectral distortion value is still small, and as shown in Table 4, the method and the device have good universality. The embodiment of the invention can visualize parameters such as personalized ITD, ILD and HRTF, synthesize sound sources at any position in the whole space, and comprehensively evaluate the performance of the personalized HRTF in a diversified manner. The embodiment of the invention can effectively improve the effect of the virtual auditory reproduction system and has good application prospect in virtual reality and other applications.
TABLE 4 spectral distortion for customizing personalized HRTFs by different methods
Figure BDA0003832216780000111
In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A method for enhancing virtual auditory reproduction based on a personalized head-related transfer function, the method comprising:
establishing a high-precision head HRTF database, and extracting high-dimensional features of the HRTF;
collecting human body characteristic parameters and preferably selecting the human body characteristic parameters;
and customizing the personalized HRTF by using the generalized recurrent neural network to obtain the optimal auditory reproduction result.
2. The method of claim 1, wherein the step of building a high-precision HRTF database comprises:
(1) Acquiring 48 tested head-neck three-dimensional geometric data by using a 3D laser scanner, and performing three-dimensional reconstruction;
(2) Importing the head three-dimensional geometric model into Magics software for repairing, wherein the repairing comprises the following steps: repairing holes, detecting holes and roughening;
(3) The spatial position of the head model is uniformly calibrated by setting a reference coordinate system, the midpoint of a connection line of two ears of the head is taken as the origin of coordinates, the positive direction of the nose tip is taken as the positive direction of an X axis, the direction of the right ear is taken as the positive direction of a Y axis, and the upper part of the head is taken as the positive direction of a Z axis;
(4) A boundary meta-simulation calculation model is established by adopting multi-physics simulation software COMSOL, the pinna part which has the largest influence on the HRTF is divided into an independent solving area during grid division, and the rest part is divided into another solving area.
3. The method for enhancing virtual auditory reproduction based on the personalized head-related transfer function according to claim 1, wherein the collected human body characteristic parameters, preferably the human body characteristic parameters, are:
measuring by using electronic measuring tool Magics software to obtain human body characteristic parameters of 10 head parameters and 20 ear parameters of each tested object; and (3) screening 10 human body characteristic parameters by combining correlation analysis and recursive characteristic elimination.
4. The method of claim 1 for enhancing virtual auditory reproduction based on an individualized head-related transfer function, the method comprising: the training of the generalized recurrent neural network specifically comprises the following steps:
the input X is a 12-dimensional vector, and the parameters are measured by 10 human bodies (X) 1 ,X 2 ,X 3 ,X 4 ,X 5 ,d 1 ,d 4 ,d 6 ,d 8 ,d 12 ) Azimuth theta and elevation angle
Figure FDA0003832216770000011
The output Y is a 10-dimensional vector consisting of 10 HRTF high-dimensional features (a) 1 ,a 2 ,...,a 10 ) And (4) forming.
5. The method of claim 2 for enhancing virtual auditory reproduction based on an individualized head-related transfer function, wherein, in the grid partitioning:
the pinna part follows the highest standard of analyzing 1 wavelength by 6 grids, namely, the pinna model is uniformly divided by adopting a 1/6 wavelength division standard; the rest part follows 4 grid analysis division standards of 1 wavelength, namely, the other part is uniformly divided by adopting the 1/4 wavelength division standard.
6. The method for enhancing virtual auditory reproduction according to claim 3, wherein the 10 human body characteristic parameters are:
head width, head height, head depth, amount of displacement under the pinna, amount of displacement behind the pinna, pinna height, deltoid height, concha cavity height, pinna width, concha cavity width.
7. A system for improving virtual auditory reproduction based on personalized head-related transfer function is characterized by comprising a 3D auditory display module, an audio synthesis module and a test module,
the 3D auditory display module visualizes the personalized time difference between two ears, the intensity difference between two ears and the HRTF parameters and stores all the related data to be tested;
the audio synthesis module selects audio signals of different types, different frequencies and different durations from an audio library to be convolved with the personalized HRTF, and audio signals with different spatial orientations are generated;
the testing module reconstructs a three-dimensional space in a computer system by utilizing synchronous mapping through a computer simulation technology and a 3D (three-dimensional) image rendering technology, a display interface presents a sphere space, and the position of a virtual sound source point is mapped to the position of a detection point of a space sphere in a virtual simulation surface;
the testee selects the heard virtual space position through the test module, and corresponding objective detection parameter results are given to the collected detection data through a statistical algorithm in real time, wherein the objective detection parameter results comprise global accuracy, front and back confusion rate, upper and lower confusion rate and click position.
8. The system of claim 7, wherein the different types of audio are: pure tone excitation, noise excitation, voice excitation, music sound excitation; the frequency is low frequency, intermediate frequency and high frequency, the duration is 1-5s, and the interval is 1s.
9. The system for enhancing virtual auditory reproduction according to claim 7, wherein the system separately tests three planes, the transverse, sagittal and coronal planes.
10. The system of claim 7, wherein the sphere space is divided into eight parts, namely: relative coordinates of sound sources at actual spatial positions are all mapped to a virtual sphere space, and all mapped sound source points are displayed simultaneously.
CN202211077500.6A 2022-09-05 2022-09-05 Virtual hearing replay method and system based on personalized head related transfer function Active CN115412808B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211077500.6A CN115412808B (en) 2022-09-05 2022-09-05 Virtual hearing replay method and system based on personalized head related transfer function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211077500.6A CN115412808B (en) 2022-09-05 2022-09-05 Virtual hearing replay method and system based on personalized head related transfer function

Publications (2)

Publication Number Publication Date
CN115412808A true CN115412808A (en) 2022-11-29
CN115412808B CN115412808B (en) 2024-04-02

Family

ID=84163847

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211077500.6A Active CN115412808B (en) 2022-09-05 2022-09-05 Virtual hearing replay method and system based on personalized head related transfer function

Country Status (1)

Country Link
CN (1) CN115412808B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117177165A (en) * 2023-11-02 2023-12-05 歌尔股份有限公司 Method, device, equipment and medium for testing spatial audio function of audio equipment
CN117437367A (en) * 2023-12-22 2024-01-23 天津大学 Early warning earphone sliding and dynamic correction method based on auricle correlation function

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8428269B1 (en) * 2009-05-20 2013-04-23 The United States Of America As Represented By The Secretary Of The Air Force Head related transfer function (HRTF) enhancement for improved vertical-polar localization in spatial audio systems
CN106535043A (en) * 2016-11-18 2017-03-22 华南理工大学 Full-frequency 3D virtual sound customization method and device based on physiological characteristics
US20170094440A1 (en) * 2014-03-06 2017-03-30 Dolby Laboratories Licensing Corporation Structural Modeling of the Head Related Impulse Response
CN107182003A (en) * 2017-06-01 2017-09-19 西南电子技术研究所(中国电子科技集团公司第十研究所) Airborne three-dimensional call virtual auditory processing method
CN108476358A (en) * 2015-12-31 2018-08-31 创新科技有限公司 A method of for generating customized/personalized head related transfer function
CN108540925A (en) * 2018-04-11 2018-09-14 北京理工大学 A kind of fast matching method of personalization head related transfer function
CN108596016A (en) * 2018-03-06 2018-09-28 北京大学 A kind of personalized head-position difficult labor modeling method based on deep neural network
CN108616789A (en) * 2018-04-11 2018-10-02 北京理工大学 The individualized virtual voice reproducing method measured in real time based on ears
CN109998553A (en) * 2019-04-29 2019-07-12 天津大学 The method of the parametrization detection system and minimum audible angle of spatial localization of sound ability
CN111246363A (en) * 2020-01-08 2020-06-05 华南理工大学 Auditory matching-based virtual sound customization method and device
JP2020170938A (en) * 2019-04-03 2020-10-15 アルパイン株式会社 Head transfer function learning device and head transfer function inference device
CN113038356A (en) * 2019-12-09 2021-06-25 上海航空电器有限公司 Personalized HRTF rapid modeling acquisition method
CN113316077A (en) * 2021-06-27 2021-08-27 高小翎 Three-dimensional vivid generation system for voice sound source space sound effect

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8428269B1 (en) * 2009-05-20 2013-04-23 The United States Of America As Represented By The Secretary Of The Air Force Head related transfer function (HRTF) enhancement for improved vertical-polar localization in spatial audio systems
US20170094440A1 (en) * 2014-03-06 2017-03-30 Dolby Laboratories Licensing Corporation Structural Modeling of the Head Related Impulse Response
CN108476358A (en) * 2015-12-31 2018-08-31 创新科技有限公司 A method of for generating customized/personalized head related transfer function
CN106535043A (en) * 2016-11-18 2017-03-22 华南理工大学 Full-frequency 3D virtual sound customization method and device based on physiological characteristics
CN107182003A (en) * 2017-06-01 2017-09-19 西南电子技术研究所(中国电子科技集团公司第十研究所) Airborne three-dimensional call virtual auditory processing method
CN108596016A (en) * 2018-03-06 2018-09-28 北京大学 A kind of personalized head-position difficult labor modeling method based on deep neural network
CN108540925A (en) * 2018-04-11 2018-09-14 北京理工大学 A kind of fast matching method of personalization head related transfer function
CN108616789A (en) * 2018-04-11 2018-10-02 北京理工大学 The individualized virtual voice reproducing method measured in real time based on ears
JP2020170938A (en) * 2019-04-03 2020-10-15 アルパイン株式会社 Head transfer function learning device and head transfer function inference device
CN109998553A (en) * 2019-04-29 2019-07-12 天津大学 The method of the parametrization detection system and minimum audible angle of spatial localization of sound ability
CN113038356A (en) * 2019-12-09 2021-06-25 上海航空电器有限公司 Personalized HRTF rapid modeling acquisition method
CN111246363A (en) * 2020-01-08 2020-06-05 华南理工大学 Auditory matching-based virtual sound customization method and device
CN113316077A (en) * 2021-06-27 2021-08-27 高小翎 Three-dimensional vivid generation system for voice sound source space sound effect

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BHARITKAR, S, MAUER, T, WELLS, T,BERFANGER, D: "Stacked Autoencoder Based HRTF Synthesis from Sparse Data", 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 15 November 2018 (2018-11-15), pages 356 - 361, XP033525853, DOI: 10.23919/APSIPA.2018.8659495 *
刘宝禄, 刘庆峰, 郭小朝等: "个性化头相关传递函数的研究进展", 电子测量与仪器学报, vol. 34, no. 11, 15 November 2021 (2021-11-15), pages 155 - 165 *
杨立东, 焦慧媛: "头部相关传递函数获取关键技术研究", 软件导刊, vol. 18, no. 1, 31 January 2019 (2019-01-31), pages 34 - 39 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117177165A (en) * 2023-11-02 2023-12-05 歌尔股份有限公司 Method, device, equipment and medium for testing spatial audio function of audio equipment
CN117177165B (en) * 2023-11-02 2024-03-12 歌尔股份有限公司 Method, device, equipment and medium for testing spatial audio function of audio equipment
CN117437367A (en) * 2023-12-22 2024-01-23 天津大学 Early warning earphone sliding and dynamic correction method based on auricle correlation function
CN117437367B (en) * 2023-12-22 2024-02-23 天津大学 Early warning earphone sliding and dynamic correction method based on auricle correlation function

Also Published As

Publication number Publication date
CN115412808B (en) 2024-04-02

Similar Documents

Publication Publication Date Title
JP4718559B2 (en) Method and apparatus for individualizing HRTFs by modeling
CN115412808B (en) Virtual hearing replay method and system based on personalized head related transfer function
CN108596016B (en) Personalized head-related transfer function modeling method based on deep neural network
Francl et al. Deep neural network models of sound localization reveal how perception is adapted to real-world environments
CN108476369A (en) Method and system for developing the head related transfer function for being suitable for individual
Leng et al. Binauralgrad: A two-stage conditional diffusion probabilistic model for binaural audio synthesis
US20040091119A1 (en) Method for measurement of head related transfer functions
Pörschmann et al. Directional equalization of sparse head-related transfer function sets for spatial upsampling
Geronazzo et al. Do we need individual head-related transfer functions for vertical localization? The case study of a spectral notch distance metric
CN106165444B (en) Sound field reproduction apparatus, methods and procedures
Schönstein et al. HRTF selection for binaural synthesis from a database using morphological parameters
JP2009512364A (en) Virtual audio simulation
Tenenbaum et al. Auralization generated by modeling HRIRs with artificial neural networks and its validation using articulation tests
CN113849767B (en) Personalized HRTF (head related transfer function) generation method and system based on physiological parameters and artificial head data
Zagala et al. Comparison of direct and indirect perceptual head-related transfer function selection methods
Barumerli et al. Round Robin Comparison of Inter-Laboratory HRTF Measurements–Assessment with an auditory model for elevation
O’Connor et al. An evaluation of 3D printing for the manufacture of a binaural recording device
Zhang et al. HRTF field: Unifying measured HRTF magnitude representation with neural fields
Zhu et al. HRTF personalization based on weighted sparse representation of anthropometric features
CN113038356A (en) Personalized HRTF rapid modeling acquisition method
Xi et al. Magnitude modelling of individualized HRTFs using DNN based spherical harmonic analysis
Spagnol et al. Estimation of spectral notches from pinna meshes: Insights from a simple computational model
Lokki et al. Auditorium acoustics assessment with sensory evaluation methods
Barumerli et al. Localization in elevation with non-individual head-related transfer functions: comparing predictions of two auditory models
Wang et al. Prediction of head-related transfer function based on tensor completion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant