CN115412808A

CN115412808A - Method and system for improving virtual auditory reproduction based on personalized head-related transfer function

Info

Publication number: CN115412808A
Application number: CN202211077500.6A
Authority: CN
Inventors: 倪广健; 刘洪兴; 白艳茹
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2022-09-05
Filing date: 2022-09-05
Publication date: 2022-11-29
Anticipated expiration: 2042-09-05
Also published as: CN115412808B

Abstract

The invention discloses a method and a system for improving virtual auditory reproduction based on a personalized head-related transfer function, wherein the method comprises the following steps: establishing a high-precision Head-Related Transfer Function (HRTF) database, and extracting high-dimensional features of the HRTF; collecting and preferably selecting human characteristic parameters; and customizing the personalized HRTF by using the generalized regression neural network. The system consists of a 3D auditory display module, an audio synthesis module and a test module, wherein the 3D auditory display module visualizes and stores the personalized time difference between two ears, the intensity difference between two ears and the HRTF; the audio synthesis module selects different types, frequencies and duration of audio frequencies from an audio library to be convolved with the personalized HRTF, and synthesizes sound source signals of different spatial directions; the testing module is embedded into a computer simulation technology and a 3D rendering technology, and an operator estimates the personalized HRTF virtual auditory playback effect in a diversified manner through man-machine interaction.

Description

Method and system for improving virtual auditory reproduction based on personalized head-related transfer function

Technical Field

The invention relates to the technical field of virtual auditory display, in particular to a method and a system for establishing a personalized Head-Related Transfer Function (HRTF) model suitable for a binaural earphone device and generating a model matched with a user to improve virtual auditory playback.

Background

Human hearing includes perception of spatial attributes of sound in addition to perception of timbre, loudness, pitch, and duration of sound. The perception of spatial properties by the auditory system mainly depends on information such as binaural time difference (ITD), binaural intensity difference (IID), and spectral factors, which can be uniformly expressed by HRTFs.

Humans can construct virtual auditory spaces using sound reproduction techniques, i.e., to allow listeners to generate specific spatial perception information. With the development of artificial intelligence audio-visual technology, the construction of immersive virtual auditory space gradually becomes a key component for realizing realistic virtual reality feeling. The sound reproduction technology can be divided into two categories according to different sound reproduction devices: the first is multi-channel loudspeaker playback, involving acoustic holographic playback, spherical harmonic decomposition, wave field synthesis, etc. However, from the practical application point of view, although this method can accurately construct a sound field in a certain area, a large number of linear array speakers need to be arranged, and the configuration of the speakers is strictly required, so the development of this technology is limited. The second category is dual channel headphone playback, mainly involving binaural picking and virtual auditory playback. The key of the binaural pickup technology is the process of picking up binaural signals by using a microphone, and the technical core of virtual auditory reproduction is the construction of an HRTF. Both aims are to accurately reconstruct binaural sound pressure, rather than reconstructing a physical sound field in a spatial region. The method has the advantages that the three-dimensional sound effect can be generated only by a pair of double-channel earphones, and the method quickly becomes a mainstream mode for constructing the virtual space due to convenience and practicability.

HRTFs rely heavily on human characteristic parameters related to sound reflection, diffraction and dispersion, which are unique to everyone. Sound source perception errors can easily arise if a common HRTF is applied to all individuals, for example: front-back confusion, up-down confusion, angular deviation, head-in-head effects, etc., so constructing a personalized HRTF is the key to constructing an immersive virtual auditory space.

At present, methods for obtaining personalized HRTFs mainly include: acoustic measurement, numerical calculation, and customization of human body features, but each method has limitations, such as: the acoustic measurement method has the defects of long acquisition time, expensive acquisition equipment and high acquisition environment requirement; the numerical calculation method requires a specific device (MRI or CT, etc.) to acquire the head model; most of human body characteristic customization methods are based on the problems that the precision of personalized HRTFs obtained by a public database is low and the like, so that the methods cannot be well applied to practice.

Disclosure of Invention

The invention provides a method and a system for improving virtual auditory sense replay based on an individualized head related transfer function, the invention quickly customizes a full-space individualized HRTF based on human body characteristic parameters at low cost, and the method is mainly divided into five steps of establishing a high-precision HRTF database, extracting HRTF high-dimensional characteristics, acquiring human body characteristic parameters, optimizing the human body characteristic parameters and establishing the individualized HRTF; the system consists of a 3D auditory display module, an audio synthesis module and a test module, and can evaluate the virtual auditory playback effect of the personalized HRTF in a diversified and accurate manner; the invention can effectively improve the effect of the virtual auditory reproduction system, has good application prospect in the application of virtual reality and the like, and is described in detail as follows:

a method for enhancing virtual auditory reproduction based on a personalized head-related transfer function, the method comprising:

establishing a high-precision HRTF database, and extracting high-dimensional characteristics of the HRTF;

collecting human body characteristic parameters and preferably selecting the human body characteristic parameters;

and customizing the personalized HRTF by using the generalized regression neural network to obtain the optimal auditory reproduction result.

Wherein, the step of establishing the high-precision HRTF database comprises the following steps:

(1) Acquiring 48 tested head-neck three-dimensional geometric data by using a 3D laser scanner, and performing three-dimensional reconstruction;

(2) Importing the head three-dimensional geometric model into Magics software for repairing, wherein the repairing comprises the following steps: repairing holes, detecting holes and roughening;

(3) The spatial position of the head model is uniformly calibrated by setting a reference coordinate system, the midpoint of a connecting line of two ears of the head is used as a coordinate origin, the positive direction of the nose tip is used as the positive direction of an X axis, the direction of the right ear is used as the positive direction of a Y axis, and the upper part of the head is used as the positive direction of a Z axis;

(4) A boundary meta-simulation calculation model is established by adopting multi-physics simulation software COMSOL, the pinna part which has the largest influence on the HRTF is divided into an independent solving area during grid division, and the rest part is divided into another solving area.

Further, the collecting human body characteristic parameters, preferably the human body characteristic parameters are as follows:

measuring by using electronic measuring tool Magics software to obtain human body characteristic parameters of 10 head parameters and 20 ear parameters of each tested object; and screening 10 human body characteristic parameters by combining correlation analysis and recursive characteristic elimination.

Wherein the method comprises the following steps: the training of the generalized recurrent neural network specifically comprises the following steps:

input X is a 12-dimensional vector, and parameters are measured from 10 individuals (X) ₁ ，X ₂ ，X ₃ ，X ₄ ，x ₅ ，d ₁ ，d ₄ ，d ₆ ，d ₈ ，d ₁₂ ) Azimuth theta and elevation angle

The output Y is a 10-dimensional vector consisting of 10 HRTF high-dimensional features (a) ₁ ，a ₂ ，...，a ₁₀ ) And (4) forming.

Further, when the grid is divided:

the auricle part follows the highest standard of analyzing 1 wavelength by 6 grids, namely, the auricle model is uniformly divided by adopting a 1/6 wavelength division standard; the rest part follows 4 grid analysis division standards of 1 wavelength, namely, the other part is uniformly divided by adopting the 1/4 wavelength division standard.

Wherein, the 10 human body characteristic parameters are as follows:

head width, head height, head depth, amount of displacement under the pinna, amount of displacement behind the pinna, pinna height, deltoid height, concha cavity height, pinna width, concha cavity width.

A system for enhancing virtual auditory reproduction based on personalized head-related transfer function, the system is composed of a 3D auditory display module, an audio synthesis module and a test module,

the 3D auditory display module visualizes the personalized ITD, ILD and HRTF parameters and stores all the related data to be tested;

the audio synthesis module selects audio signals of different types, different frequencies and different durations from an audio library to be convolved with the personalized HRTF, and audio signals with different spatial orientations are generated;

the testing module reconstructs a three-dimensional space in a computer system by utilizing synchronous mapping through a computer simulation technology and a 3D (three-dimensional) image rendering technology, a display interface presents a sphere space, and the position of a virtual sound source point is mapped to the position of a detection point of a space sphere in a virtual simulation surface;

the testee selects the heard virtual space position through the test module, and corresponding objective detection parameter results are given to the collected detection data through a statistical algorithm in real time, wherein the objective detection parameter results comprise global accuracy, front and back confusion rate, upper and lower confusion rate and click position.

Wherein the different types of audio are: pure tone excitation, noise excitation, voice excitation, music sound excitation; the frequency is low frequency, intermediate frequency and high frequency, the duration is 1-5s, and the interval is 1s.

Wherein the system tests three planes, a transverse plane, a sagittal plane and a coronal plane, respectively.

Further, the sphere space is divided into eight parts, which are respectively: relative coordinates of sound sources at actual spatial positions are all mapped to a virtual sphere space, and all mapped sound source point positions are displayed simultaneously.

The technical scheme provided by the invention has the beneficial effects that:

1. the personalized HRTF customizing method established by the invention can realize the low-cost rapid customization of the full-space personalized HRTF based on the human body characteristic parameters, and compared with a rapid HRTF measuring system, the method does not depend on a specific measuring environment and has low equipment cost;

2. compared with the method for directly calculating the personalized HRTF, the method has small calculation amount; compared with other modeling methods, the method can customize the full-space personalized HRTF and has small spectrum distortion, and when the method is used for a CICIPIC database, the spectrum failure value is still small, and the method has good universality;

3. the invention can visualize parameters such as personalized ITD, ILD and HRTF, synthesize sound sources at any position in the whole space, and comprehensively evaluate the performance of the personalized HRTF in a diversified manner.

Drawings

FIG. 1 is an overall architecture diagram for enhancing virtual auditory reproduction based on a personalized head-related transfer function;

FIG. 2 is a flow chart of an HRTF database creation for enhancing virtual auditory reproduction based on personalized head-related transfer functions;

FIG. 3 is a diagram of 30 parameters of human body characteristics;

FIG. 4 is a schematic diagram of spectral distortion versus a radial basis function neural network and a standard artificial head at different azimuth angles;

FIG. 5 is a schematic diagram of spectral distortion versus a radial basis function neural network and a standard artificial head at different frequencies;

FIG. 6 is a block diagram of a 3D auditory display system;

FIG. 7 is a schematic diagram of an audio synthesis and test module.

Detailed Description

To make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.

In order to solve the problems in the background art and aim at the defect of personalized HRTF design in the field of virtual hearing, the embodiment of the invention provides a more practical personalized HRTF obtaining method on the basis of the existing method, so that the virtual hearing playback effect is improved, and meanwhile, the system can perform diversified evaluation on the virtual hearing playback performance of the personalized HRTF.

The embodiment of the invention aims to provide a method for improving virtual auditory reproduction based on an individualized HRTF, which can acquire HRTF data on different tested arbitrary spaces so as to meet the requirements on high spatial resolution and individuation, and adopts the technical scheme comprising the following steps:

step 1: establishing a high-precision HRTF database;

wherein, this step includes:

(1) The method comprises the following steps of acquiring 48 tested head-neck three-dimensional geometric data by using a 3D laser scanner, and in the scanning process, not making special requirements on the posture of a tested person, wherein the tested person only needs to select a comfortable posture, so that noise caused by head movement is reduced;

(2) Importing the three-dimensional geometric model into Magics software for model repair, wherein the method comprises the following steps: repairing holes, detecting holes, roughening treatment and the like;

(3) The spatial position of the head model is uniformly calibrated by setting a reference coordinate system, namely, the middle point of a connecting line of two ears of the head of a person is used as a coordinate origin, the positive direction of the nose tip is used as the positive direction of an X axis, the direction of the right ear is used as the positive direction of a Y axis, and the upper part of the head is used as the positive direction of a Z axis;

(4) The method comprises the steps of solving by adopting a pressure acoustics-boundary element module of multi-physics simulation software COMSOL, namely establishing a boundary element simulation calculation model by configuring a sound field environment, setting parameters, configuring a solver and other flows, and simultaneously providing a region meshing method during meshing, namely dividing a head model into two parts, independently dividing an auricle part with the largest influence on the HRTF into an independent solving region which follows the highest standard of 6 meshes for analyzing 1 wavelength (namely uniformly dividing the auricle model by adopting a 1/6 wavelength division standard), dividing the rest part into another solving region which follows the division standard of 4 meshes for analyzing 1 wavelength (namely uniformly dividing the other parts by adopting a 1/4 wavelength division standard), so that the function of HRTF data can be quickly obtained.

The parameters of the high-precision HRTF database constructed by the embodiment of the invention are detailed in a table 1:

TABLE 1 high-accuracy HRTF database parameters

Step 2: extracting HRTF high-dimensional features;

and further extracting high-dimensional features of the HRTF by using a singular value decomposition method. Firstly, converting the collected HRTF parameters into a logarithmic form, as shown in formula (1):

HRTF _log (s，m，f)＝20log ₁₀ (|HRTF(s，m，f)|) (1)

where S =1,2 … …, S is the number of samples tested, M =1,2, … …, M is the number of directions, f =1,2, … …, and N is the number of frequency sampling points. Then, the HRTF is combined _log (s, m, f) is decomposed into directional transfer functions D related to direction and the direction under test _log (s, m, f), and a direction-independent average spectral function HRTF _mean (f) As shown in formulas (2) and (3):

D _log (s，m，f)＝HRTF _log (s，m，f)-HRTF _mean (f) (2)

in this case, only D is needed _log (s, M, f) reducing dimensions, taking data of a subject as an example, each HRTF in M directions has N discrete frequency sampling points, and a matrix shown in formula (4) is constructed:

wherein D is _log (m _M ，f _N ) The HRTF value of the nth discrete frequency sampling point in the mth direction of the subject.

For any real matrix, its singular value decomposition can be expressed as shown in equation (5):

[D _log (s，m，f)]＝U∑V ^T (5)

wherein, U = (U) ₁ ，u ₂ ，…，u _M )∈R ^M×M And V = (V) ₁ ，v ₂ ，…，v _N )∈R ^N×N Respectively a left singular matrix and a right singular matrix, both orthogonal, where u _p ∈R ^M×1 And v _q ∈R ^N×1 Respectively, left and right singular vectors. Σ = diag (σ) ₁ ，σ ₂ ，…，σ _r ) Being diagonal matrices, singular values in which the elements are arranged in descending order, i.e. sigma ₁ ≥σ ₂ ≥σ _r ≥0。

The weight coefficient and the principal component in the principal component analysis are indirectly obtained by solving the left and right singular values through a singular value decomposition method, and the solved singular matrix is not suitable for visualization, so after the singular value decomposition method is used for solving, the important characteristics extracted after dimension reduction are subjected to main component W _i And a weight coefficient a _i The form of (A) is shown. The HRTF log-magnitude spectrum can thus be decomposed as shown in equation (6):

finally, the cumulative variance percentage Var is used to represent the HRTF reconstruction effect, as shown in equation (7):

wherein σ _j The representation is a matrix [ D ] _log (s，m，f)]Rank of (2), i.e. singular value.

When the first 10 principal components are selected, the contribution rate of the accumulated variance is 90%, and the first 10 high-dimensional features { a } are extracted through verification ₁ ，a ₂ ，...，a ₁₀ Most of the HRTF spectral features can be recovered.

And step 3: collecting human body characteristic parameters;

according to the < < GB/T22187-2009 > measurement standard of human characteristic parameters, 30 human characteristic parameters are measured by an electronic measuring tool SolidWorks software based on a three-dimensional head-neck geometric model, wherein the human characteristic parameters comprise 10 head parameters and 20 ear parameters, and the details are shown in Table 2.

Table 230 items of human body characteristic parameters

And 4, step 4: human body characteristic parameters are optimized;

the human body characteristic parameters are further optimized by utilizing correlation analysis and recursive characteristic elimination. Firstly, a Spearman correlation coefficient is adopted to screen characteristics with high correlation, as shown in formula (8):

wherein x is _i And y _i Are two different anthropometric parameters of the same subject,

and

is the average of all two parameters tested, i =1,2 … …, and M is the number tested. Human body characteristic parameters with the correlation coefficient larger than 0.8 are removed through the step (one of the two parameters is selected).

Secondly, a recursive feature elimination method is adopted to further optimize the human body feature parameters which have larger influence on the HRTF, and the specific steps are as follows:

(1) Setting an initial weight coefficient for the rest human body characteristic parameters;

(2) Constructing a Logistic Regression equation to train the characteristic parameters;

(3) Extracting the weight value of the characteristic parameter, and removing the minimum value after taking the absolute value;

(4) And circularly iterating the steps until the number of the residual characteristic parameters reaches the set quantity.

Finally, 10 parameters of human body characteristics are preferred, as detailed in table 3:

table 3 preferred 10 parameters of human body characteristics

And 5: customizing an individualized HRTF model;

the customized HRTF is customized by utilizing a Generalized Regression Neural Network (GRNN). GRNN is composed of input layer, mode layer, summation layer and output layer, and the input vector of network is X = [ X = [ ] ₁ ，x ₂ ，...，x _n ] ^T The output vector is Y = [ Y = ₁ ，y ₂ ，...，y _k ] ^T 。

The number of neurons in the input layer is the input dimension m of the training sample, and the input layer passes the input vector to the mode layer. The number of neurons in the mode layer is the number n of training samples, wherein the activation function adopts a radial basis gaussian function, as shown in formula (9):

wherein X is an input vector, X _a Is the center of the a-th mode layer neuron, i.e., the input of the a-th training sample, (X-X) _a ) ^T (X-X _a ) Is X and X _a BetweenIs the width of the radial basis function, determines the shape of the radial basis function in the a-th mode layer neuron.

The neurons in the summation layer are divided into two types, wherein one neuron arithmetically sums the outputs of all the neurons in the mode layer, the connection weight between each neuron in the mode layer and the neuron in the summation layer is 1, and the transfer function of the neuron is shown as the formula (10):

the other summation layer neurons carry out weighted summation on the outputs of all the mode layer neurons, and the connection weight value between the a mode layer neuron and the b summation layer neuron is the output Y of the a training sample _a The b-th element y in (1) _ab The transfer function of the summing neuron b is shown in equation (11):

the output layer neuron number is the dimension of the output vector, the output layer neuron divides the outputs of the two types of neurons of the summation layer, the output of the b-th output layer neuron corresponds to the b-th element of the output vector, as shown in equation (12):

in constructing the personalized HRTF model, the input vector X is a 12-dimensional vector, as shown in equation (13), which is measured by 10 human measurement parameters (X) ₁ ，X ₂ ，X ₃ ，X ₄ ，X ₅ ，d ₁ ，d ₄ ，d ₆ ，d ₈ ，d ₁₂ ) And 2 direction parameters (azimuth theta and elevation angle)

) And (4) forming.

By setting the addition of the orientation parameters, the model can be trained and personalized HRTFs in different directions can be customized at the same time, and the practicability of the customized model by the method is enhanced. The extracted 10 HRTF high-dimensional features are used as model output, and the expression (14) is shown as follows:

Y＝{a ₁ ，a ₂ ，...，a ₁₀ } (14)

training a customized model of GRNN by:

(1) Normalizing the mean value and the variance of the independent variable and the dependent variable, and shuffling the mean value and the variance;

(2) HRTF data sets of 48 subjects were randomly divided into 40 training sets and 8 validation sets;

(3) In consideration of the size of a training set, training, adjusting and verifying the proposed GRNN model by adopting 5 times of cross validation;

(4) Searching an optimal smoothing parameter sigma by adopting a grid search method, and enabling the sigma to change within a certain range by taking the step length as 0.01;

(5) After the above steps, the smoothing factor σ =0.60 provides the best performance, i.e. the minimum mean square error.

Finally, a GRNN model with 12-dimensional input and 10-dimensional output was designed using a smoothing factor σ =0.60 and trained using the complete training data set, completing the construction of the HRTF model. The personalized ITD and ILD models corresponding to the tested object can be reconstructed by the same method, which is not described herein.

The system of the embodiment of the invention consists of three modules: (1) the device comprises a 3D auditory display module, (2) an audio synthesis module, and (3) a test module.

(1) 3D auditory display module

According to the interface prompt, 10 key human body characteristic parameters and corresponding orientation parameters of the testee are input, namely parameters such as personalized ITD, ILD and HRTF can be visualized, and all tested results can be stored.

(2) Audio synthesis module

The system is controlled by a software program, and can select different types (pure tone excitation, noise excitation, voice excitation and music sound excitation), different frequencies (low frequency, intermediate frequency and high frequency) and different time lengths (1-5 s, interval 1 s) from an audio library to be convolved with the personalized HRTF to generate audio signals with different spatial orientations.

(3) Test module

A three-dimensional space is reconstructed in a computer system by utilizing a synchronous mapping algorithm through a computer simulation technology and a 3D image rendering technology, a display interface is presented as a sphere space, and the position of a virtual sound source point is mapped to the position of a detection point of a space sphere in a virtual simulation surface. The human-computer interaction is carried out through the handheld Bluetooth controller by the testee, namely, the virtual space position is selected to be heard, and the corresponding objective detection parameter result is given in real time through the detection data acquired by the system through a statistical algorithm, wherein the method comprises the following steps: accuracy, front and back confusion rate, up and down confusion rate, click position and the like. The specific operation will be described in detail in the following examples.

Example 1

The system comprises: a 3D auditory display module, an audio synthesis module, a testing module, see fig. 6 and 7.

The 3D auditory display module inputs basic information such as tested name, gender, age, contact information and the like in sequence according to the prompting of the auditory display interface in figure 6, then 10 key human body characteristic parameters such as tested head width, head length, head depth, auricle lower offset, auricle rear offset, concha cavity width, concha cavity height, fossa trigonis height, total auricle length and total auricle width are input, and corresponding azimuth angle (0-360 degrees, precision 1 degree) and elevation angle (0-360 degrees, precision 1 degree) are selected, so that the tested individual ITD, ILD, HRTF and other auditory parameters can be visualized.

The audio synthesis module is controlled by a software program, the system can select original audio signals of different types (pure tone excitation, noise excitation, voice excitation and music sound excitation), different frequencies (low frequency, intermediate frequency and high frequency) and different durations (1-5 s) from an audio library to generate sound source signals of different directions (azimuth angle and elevation angle), and click 'synthetic audio' after all parameters are set, and the specific steps are as follows:

(1) Fourier transforming the different types of original sound signals F (t) in the time domain, resulting in a sound signal F (w) in the frequency domain, as shown in equation (15):

where t is the time of the original signal and w is the frequency of the original signal.

(2) Extracting a particular orientation

On

By using

The sound signal F (w) in the frequency domain is filtered, and a new sound signal F (Y) is generated in the frequency domain, as shown in equation (16):

where w is the frequency of the original signal and Y is the frequency of the new signal.

(3) Performing inverse fourier transform on the newly generated sound signal F (Y), so as to obtain a three-dimensional virtual sound source signal F (Y) in a desired time domain, as shown in formula (17):

where Y is the frequency of the new signal and Y is the time of the new signal.

The testing module reconstructs a three-dimensional space in a computer system by using a synchronous mapping algorithm through a computer simulation technology and a 3D image rendering technology, a display interface presents a sphere space, the position of a virtual sound source point is mapped to the position of a detection point of a sphere in the virtual space, and the system can respectively test three planes of a cross section, a sagittal plane and a coronal plane. The human-computer interaction is carried out through a test module (a handheld Bluetooth controller) by a testee, namely, a virtual space position is selected to be heard, and the detection data acquired by the system is used for giving out a corresponding objective detection parameter result in real time through a statistical algorithm, wherein the test module comprises: accuracy, front and back confusion rate, upper and lower confusion rate, click position and the like.

Example 2

The test module of example 1 is further described below in conjunction with specific operating procedures, as described in detail below: the testee sits upright in front of the screen, wears an in-ear earphone and holds the Bluetooth controller in hand. The operation of the principal for initializing the 3D auditory display and audio synthesis system includes: basic information such as human body characteristic parameters of a human subject and relevant information detected by an input experiment are input, and the basic information comprises the following steps: type of audio, audio frequency, audio duration, azimuth and elevation, etc. After the system initialization is completed, the audio system starts to play sound source information, and after the testee hears the sound source, the handheld Bluetooth controller is used for selecting the detection point position of the corresponding space sphere in the virtual simulation interface of the operation system of the testee. The whole virtual sphere space is divided into eight parts, the relative coordinates of sound sources at the actual spatial position are all mapped to the virtual sphere space, and the mapped whole sound source points are displayed simultaneously. During detection, parameter indexes of detection are given in real time through statistical algorithm processing, and the method comprises the following steps: and the results of correct and no signal prompt lamps, positioning accuracy, front and back confusion rate, up and down confusion rate, a point bitmap actually clicked by a testee and the like are used as personalized HRTF (head related transfer function) diversified evaluation indexes.

Example 3

The feasibility of the above protocol was verified in conjunction with table 4, described in detail below:

compared with other modeling methods, the method and the device can customize the full-space personalized HRTF and have small spectral distortion, and when the method is used for a CIPIC database, the spectral distortion value is still small, and as shown in Table 4, the method and the device have good universality. The embodiment of the invention can visualize parameters such as personalized ITD, ILD and HRTF, synthesize sound sources at any position in the whole space, and comprehensively evaluate the performance of the personalized HRTF in a diversified manner. The embodiment of the invention can effectively improve the effect of the virtual auditory reproduction system and has good application prospect in virtual reality and other applications.

TABLE 4 spectral distortion for customizing personalized HRTFs by different methods

In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.

Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for enhancing virtual auditory reproduction based on a personalized head-related transfer function, the method comprising:

establishing a high-precision head HRTF database, and extracting high-dimensional features of the HRTF;

and customizing the personalized HRTF by using the generalized recurrent neural network to obtain the optimal auditory reproduction result.

2. The method of claim 1, wherein the step of building a high-precision HRTF database comprises:

(3) The spatial position of the head model is uniformly calibrated by setting a reference coordinate system, the midpoint of a connection line of two ears of the head is taken as the origin of coordinates, the positive direction of the nose tip is taken as the positive direction of an X axis, the direction of the right ear is taken as the positive direction of a Y axis, and the upper part of the head is taken as the positive direction of a Z axis;

3. The method for enhancing virtual auditory reproduction based on the personalized head-related transfer function according to claim 1, wherein the collected human body characteristic parameters, preferably the human body characteristic parameters, are:

measuring by using electronic measuring tool Magics software to obtain human body characteristic parameters of 10 head parameters and 20 ear parameters of each tested object; and (3) screening 10 human body characteristic parameters by combining correlation analysis and recursive characteristic elimination.

4. The method of claim 1 for enhancing virtual auditory reproduction based on an individualized head-related transfer function, the method comprising: the training of the generalized recurrent neural network specifically comprises the following steps:

the input X is a 12-dimensional vector, and the parameters are measured by 10 human bodies (X) ₁ ，X ₂ ，X ₃ ，X ₄ ，X ₅ ，d ₁ ，d ₄ ，d ₆ ，d ₈ ，d ₁₂ ) Azimuth theta and elevation angle

5. The method of claim 2 for enhancing virtual auditory reproduction based on an individualized head-related transfer function, wherein, in the grid partitioning:

the pinna part follows the highest standard of analyzing 1 wavelength by 6 grids, namely, the pinna model is uniformly divided by adopting a 1/6 wavelength division standard; the rest part follows 4 grid analysis division standards of 1 wavelength, namely, the other part is uniformly divided by adopting the 1/4 wavelength division standard.

6. The method for enhancing virtual auditory reproduction according to claim 3, wherein the 10 human body characteristic parameters are:

7. A system for improving virtual auditory reproduction based on personalized head-related transfer function is characterized by comprising a 3D auditory display module, an audio synthesis module and a test module,

the 3D auditory display module visualizes the personalized time difference between two ears, the intensity difference between two ears and the HRTF parameters and stores all the related data to be tested;

8. The system of claim 7, wherein the different types of audio are: pure tone excitation, noise excitation, voice excitation, music sound excitation; the frequency is low frequency, intermediate frequency and high frequency, the duration is 1-5s, and the interval is 1s.

9. The system for enhancing virtual auditory reproduction according to claim 7, wherein the system separately tests three planes, the transverse, sagittal and coronal planes.

10. The system of claim 7, wherein the sphere space is divided into eight parts, namely: relative coordinates of sound sources at actual spatial positions are all mapped to a virtual sphere space, and all mapped sound source points are displayed simultaneously.