CN116962938A - Multi-point sampling sound field reconstruction method based on two-way self-attention mechanism learning - Google Patents

Multi-point sampling sound field reconstruction method based on two-way self-attention mechanism learning Download PDF

Info

Publication number
CN116962938A
CN116962938A CN202310667120.6A CN202310667120A CN116962938A CN 116962938 A CN116962938 A CN 116962938A CN 202310667120 A CN202310667120 A CN 202310667120A CN 116962938 A CN116962938 A CN 116962938A
Authority
CN
China
Prior art keywords
spherical harmonic
sound field
dimensional matrix
spherical
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310667120.6A
Other languages
Chinese (zh)
Inventor
曲天书
吴玺宏
王奕文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Futurebrain Technology Co ltd
Original Assignee
Nanjing Futurebrain Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Futurebrain Technology Co ltd filed Critical Nanjing Futurebrain Technology Co ltd
Priority to CN202310667120.6A priority Critical patent/CN116962938A/en
Publication of CN116962938A publication Critical patent/CN116962938A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01HMEASUREMENT OF MECHANICAL VIBRATIONS OR ULTRASONIC, SONIC OR INFRASONIC WAVES
    • G01H17/00Measuring mechanical vibrations or ultrasonic, sonic or infrasonic waves, not provided for in the preceding groups
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a multipoint sampling sound field reconstruction method based on double-path self-attention mechanism learning, which comprises the following steps: 1) Performing multipoint sampling on the target space to obtain sound pressure of each sampling position; then decomposing the sound pressure of each sampling position into an expansion of spherical harmonic coefficients to obtain spherical harmonic coefficients of the local coordinate center sound field of each sampling position; 2) Forming a first three-dimensional matrix by each sampling position and the spherical harmonic coefficient of the sound field of the corresponding local coordinate center, and inputting the first three-dimensional matrix into a transfer network to obtain the spherical harmonic coefficient of the global coordinate center; 3) And decomposing the sound pressure of the target space by utilizing the spherical harmonic coefficient of the global coordinate center, and reconstructing the sound field in the target space. Compared with the traditional method, the method has higher precision of recovering the spherical harmonic function coefficient, and can obtain a more accurate sound field by utilizing the spherical harmonic function coefficient of the sound field with high precision to express the sound field.

Description

Multi-point sampling sound field reconstruction method based on two-way self-attention mechanism learning
Technical Field
The invention belongs to the field of sound field analysis based on spherical harmonic function analysis, and particularly relates to a multipoint sampling transfer method based on sound field spherical harmonic coefficient learning.
Background
Spherical harmonic coefficient analysis is widely used in array signal processing. Spherical harmonics are a set of basis functions that can represent and manipulate exponential and legendre functions defined on a unit sphere. Sound field reconstruction is one of the important applications of spherical harmonic coefficient analysis. The aim of sound field reconstruction is to reconstruct a continuous original sound field in a spatial region by sound field acquisition in a limited region. One of the most common methods of spherical harmonic coefficient based acoustic analysis is to use higher order spherical harmonic coefficient analysis. In the high-order spherical harmonic coefficient analysis, given a truncated order, the single-point sound pressure can be expressed as a finite basis summation based on spherical harmonics corresponding to the truncated order. In the task of sound field analysis, a plurality of spherical microphone arrays are generally required to obtain spherical harmonic coefficients of a limited order, so that accurate sound field expression of a control area is obtained, and accurate sound field description of space can be realized. Virtual reality applications require an additional degree of freedom transfer, commonly referred to as six degrees of freedom sound field reconstruction (6 DoF).
One common method of achieving six-degree-of-freedom sound field reconstruction is to use multiple ball microphone arrays distributed in three-dimensional space. A common approach is to use the mathematical properties of spherical harmonics to transfer the spherical harmonic coefficients obtained at the sampling points to the higher order spherical harmonic coefficients of the selected points. The method mainly utilizes a transfer equation derived by an addition theorem of spherical harmonics to realize the single-point transfer process of the high-order vector table of the multipoint sound field, and the spherical harmonic sparsity of the local coordinate center of the transfer equation is connected with the spherical harmonic coordinate coefficient of the global coordinate center through frequency, azimuth and cutoff order. However, the numerical calculation method is relatively complex, and the problem of a pathological matrix under the high-order high-frequency condition exists. The method based on the transfer matrix is faced with the problem that under the high-frequency condition, the transfer result has larger error due to the increase of the condition number, so that in the traditional scheme, the result coefficient obtained by transfer calculation is inaccurate at high frequency, and further the sound field cannot be accurately described.
Disclosure of Invention
Aiming at the defects of the existing method, the invention aims to provide a spherical harmonic transfer scheme based on neural network learning. The input of the scheme is Gao Jieqiu harmonic coefficients at different positions and corresponding spherical coordinate system space position information, the input is Gao Jieqiu harmonic coefficients of a global coordinate center, and the high-order spherical harmonic coefficients of the global center are used for unfolding to obtain accurate representation of the sound field in the region. In the scheme based on the double-path self-attention network, the result of multiple frequency points is considered as the result of combined input and output, so that the working performance of the model in different frequency bands is improved.
The invention provides a sound field spherical harmonic coefficient transfer method based on neural network learning, which uses a dual-path self-attention mechanism network to restrict a transfer process from a frequency dimension and a space dimension by introducing multi-frequency-band joint learning, effectively improves the accuracy of a transfer result of a model in multiple frequency bands, and realizes fusion of multiple transfer results under the restriction of a self-attention mechanism through multiple-point sampling.
The invention aims to solve the technical problems of improving the performance of a spherical harmonic coefficient multi-point transfer result, realizing the accurate expression of the Gao Jieqiu harmonic coefficients of sound fields of a plurality of frequency bands under the multi-point condition, and improving the accuracy of the transfer result under the high-frequency condition.
The technical scheme adopted by the invention is that a network structure of a dual-path self-attention mechanism is taken as a main body, and a transfer result is obtained step by modeling a frequency dimension and space dimensions transferred in different directions, so that high-order spherical harmonic coefficient expression is realized.
The technical scheme of the invention is as follows:
a multipoint sampling sound field reconstruction method based on double-path self-attention mechanism learning comprises the following steps:
1) Performing multipoint sampling on the target space to obtain sound pressure of each sampling position; then decomposing the sound pressure of each sampling position into an expansion of spherical harmonic coefficients to obtain spherical harmonic coefficients of the local coordinate center sound field of each sampling position; the spherical harmonic coefficient of the local coordinate center sound field is a two-dimensional matrix and comprises a plurality of frequencies and spherical harmonic coefficients in each frequency dimension;
2) Forming a first three-dimensional matrix by each sampling position and the spherical harmonic coefficient of the sound field of the corresponding local coordinate center, and inputting the first three-dimensional matrix into a transfer network to obtain the spherical harmonic coefficient of the global coordinate center; the transfer network comprises a spherical harmonic basis function mapping module, a radial function mapping module, a dual-path self-attention network, a conversion average splicing module and an ascending order module;
the radial function mapping module is used for mapping the space distance information in the spherical harmonic coefficient of the local coordinate center sound field corresponding to each sampling position, generating a radial function matrix according to a group of vectors which are obtained by mapping and have the same dimension as the radial function and contain different frequencies and distances, and sending the radial function matrix to the dual-path self-attention network;
the spherical harmonic basis function mapping module is used for mapping azimuth information in spherical harmonic coefficients of the local coordinate central sound field corresponding to each sampling position, generating a spherical harmonic basis function matrix according to a group of vectors which are obtained by mapping and have the same dimension as the spherical harmonic, and sending the spherical harmonic basis function matrix to the dual-path self-attention network;
the dual-path self-attention network is used for performing attention operation by utilizing the first three-dimensional matrix and the radial function matrix, so as to realize attention splicing of frequency and distance dimensions and obtain a second three-dimensional matrix; then performing attention operation on the dimension corresponding to the azimuth of the second three-dimensional matrix and the spherical harmonic basis function matrix to realize attention splicing of the azimuth dimension, obtaining a third three-dimensional matrix and sending the third three-dimensional matrix to the transformation average splicing module;
the transformation average splicing module is used for carrying out average value obtaining operation on the third three-dimensional matrix, obtaining an average value after summing up two-dimensional matrices aiming at coordinate centers of different positions, realizing fusion of information of different positions, and adding a fourth three-dimensional matrix obtained after fusion with the first three-dimensional matrix to send the fourth three-dimensional matrix to the step-up module;
the ascending module is used for ascending the input spherical harmonic coefficient vector, mapping the low-order spherical harmonic coefficient to the high-order spherical harmonic coefficient and obtaining the spherical harmonic coefficient of the global coordinate center;
3) And decomposing the sound pressure of the target space by utilizing the spherical harmonic coefficient of the global coordinate center, and reconstructing the sound field in the target space.
Further, the sound pressure p (k, r) at the sampling position r; wherein the wave numberc is sound velocity, f is sound wave frequency, and a spherical coordinate system is adopted to represent a sampling position r= (ρ, θ, φ), ρ is radius, θ is pitch angle, and φ is horizontal angle.
Further, the formula is utilizedDecomposing the sound pressure of the target space; wherein j is n (x) As a spherical Bessel function, Y nm (θ, φ) is the basis function of spherical harmonics, B nm (k) Is the corresponding spherical harmonic coefficient.
The invention has the following advantages:
the occurrence of singular solutions when solving spherical harmonic coefficients is reduced. Simulation results under different frequency ranges, signal-to-noise ratios and more complex sound source conditions show that the method has higher precision in recovering spherical harmonic coefficients than the traditional method, and a more accurate sound field can be obtained by utilizing the spherical harmonic coefficients of the sound field with high precision to perform sound field expression. Under the same conditions, the proposed method brings about a 3dB improvement in the signal-to-distortion ratio index.
Drawings
FIG. 1 is a schematic diagram of a global coordinate center and a local coordinate center range.
Fig. 2 is an overall flow chart of the proposed transfer scheme.
FIG. 3 shows the transfer results under different signal-to-noise ratios;
(a) The Euclidean distance measurement results are compared under different signal-to-noise ratio conditions,
(b) Comparing signal distortion ratio results under different signal to noise ratio conditions,
(c) And (5) comparing cosine similarity results under different signal-to-noise ratio conditions.
FIG. 4 shows the transfer results at different distances;
(a) Comparison of euclidean distance measurements under different distance conditions,
(b) Comparing the signal distortion ratio results under different distance conditions,
(c) And (5) comparing cosine similarity results under different distance conditions.
FIG. 5 is a graph showing the transfer results for different sample numbers;
(a) Euclidean distance measurement results are compared under the condition of different sampling numbers,
(b) Comparing signal distortion ratio results under the condition of different sampling numbers,
(c) And (5) comparing cosine similarity results under the condition of different sampling numbers.
FIG. 6 is a graph showing the transfer results for different numbers of sources;
(a) Euclidean distance measurement results are compared under the condition of different sound source numbers,
(b) Comparing the signal distortion ratio results under different sound source number conditions,
(c) And (5) comparing cosine similarity results under different sound source numbers.
FIG. 7 is a simulation comparison of a binaural sound field;
(a) The two-dimensional plane sound pressure result of the traditional method under the condition of 1000Hz,
(b) The two-dimensional plane sound pressure result of the method under the condition of 1000Hz,
(c) The real sound pressure result of the two-dimensional plane under the condition of 1000Hz,
(d) The two-dimensional plane sound pressure result of the traditional method under 1800Hz,
(e) The two-dimensional plane sound pressure result of the method under 1800Hz,
(f) And (3) a real sound pressure result of the two-dimensional plane under the condition of 1800 Hz.
Detailed Description
The invention will now be described in further detail with reference to the accompanying drawings, which are given by way of illustration only and are not intended to limit the scope of the invention.
1. Sound field expression and sound field transfer based on spherical harmonic decomposition
In a spatial sound field, for simplicity of description, a single-frequency sound field condition is used as expression, and under a broadband or multi-frequency condition, the corresponding sound pressure can be passedThe fourier transform yields a corresponding single frequency acoustic field. The invention assumes that the sound pressure in the target space is plane wave sound pressure, and in the free three-dimensional space, the sound pressure solution meeting the homogeneous sound field wave equation can be converted into a Helmholtz equation irrelevant to time. Amplitude a (k, θ) in the presence direction kk ) The sound pressure value can be described as
Wherein the wave numberAnd c is sound velocity and is a fixed value under normal temperature. r= (ρ, θ, Φ) is a spherical coordinate system representation, and represents a radius, a pitch angle, and a horizontal angle, respectively.
Considering the case of far-field plane waves, sound pressure in a spherical coordinate system can be decomposed into an expansion of spherical harmonic coefficients
Where k is the wavenumber, (ρ, θ, φ) is the spherical coordinate system represented by elevation and azimuth angles θ and φ together. Where the radial distance ρ, p (k, r) represents the sound pressure at r, j n (x) As a spherical Bessel function, Y nm (θ, φ) is the basis function of spherical harmonics, B nm (k) Is the corresponding spherical harmonic coefficient, n is the order of the spherical harmonic signal, m is the number of stages corresponding to the order, and the value range of m is an integer between-n and n. In the invention, the sampled sound pressure is obtained by ideal open sphere array sampling, and the problems of space aliasing, truncation error and the like do not exist in the process of transferring the spherical harmonic coefficients of the sound pressure and the sound field. Consider a transfer r ' ≡ (ρ ', θ ', φ ') from a global origin to a local transfer origin, as shown in FIG. 1, where r and r ' represent positions relative to the global origin, global origin and new local coordinate centers, respectively. According to the addition theorem of spherical harmonics, from sphericalThe transfer process of the Bessel function to the spherical Bessel function is described in numerous works as
Wherein the method comprises the steps ofMultiplication involving two wiener 3-j operators.
For each transition position r ", the sound pressure in the spatial range is represented by a set of spherical harmonic coefficients centered on the local coordinates of r". Suppose that the spherical harmonic coefficients of Q local coordinate systems are truncated to the N "order, i.e., (1+N') 2 X Q spherical harmonic coefficients are expressed, the coefficient of the global origin is truncated to N order, namely the spherical harmonic coefficient of the global coordinate center is (1+N) 2 And each. According to the addition theorem, the relationship between the two types of coefficients can be established by a transfer matrix of spherical harmonic coefficients. The formula is expressed as
b″ T =T trans b T (4)
Where b "is a one-dimensional vector obtained by stacking and expanding transfer coefficients obtained from all local coordinate centers, and has a dimension of 1 XQ (N" +1) 2 . Transfer matrix T trans Is of the shape Q (N' +1) 2 ×(N+1) 2 . Each row of elements of the matrix is developed according to the order of the global origin system, corresponding to a different order n ", n, m in formula (3), respectively.
The traditional method is to establish the relationship of spherical harmonic coefficients of different coordinate centers through a transfer matrix. Theoretically, the coefficients of the global origin can be realized by matrix inversion operation. However, in practice, the condition number of the matrix is large due to the problem of the Bessel zero point, which seriously affects the result of numerical calculation. The ridge regression can alleviate the problem of matrix singularity to a certain extent, but in practical application, different distance conditions can bring errors to coefficients of different orders. The invention provides a spherical harmonic coefficient transfer network based on a dual-path self-attention mechanism to solve the problems in work.
2. Spherical harmonic sparse transfer scheme based on dual-path self-attention mechanism network
To simplify naming, a dual path self-attention mechanism based transport network named TT-Net is used in the present invention. More specifically, the inventive network architecture is described in detail taking a sample of one location as an example. One layer structure of TT-Net is shown in FIG. 2.
The input and output of the network are spherical harmonic coefficients of different orders. The input characteristic of the spherical harmonic coefficient is a K× (N+1) 2 The matrix, where K is the total number of bins and N is the order of the bins. For the transfer of spherical harmonics, both the radial spherical Bessel function and the angle-dependent spherical harmonics contribute to the transfer process. Due to decoupling of distance and angle, a radial function mapping module and a spherical harmonic basis function mapping module are constructed to replace J n (kρ) and Y nm (θ,φ)。Y nm (θ, φ) is related only to the order and direction of truncation, and is independent of frequency. Therefore, the same constraint is used for the spherical harmonic basis function module, the input is angle information, and the output is a vector of the spherical harmonic of the order N subjected to shape change operation. Similar structural constraints are applied to the radial function mapping module such that the radial function mapping module output dimension is consistent with the dimension of the spherical Bessel function, i.e., kX (N+1). The output of the two-way self-attention module remains the same shape as the input spherical harmonic coefficients. The output of the two-way self-attention module is used as input to the transfer average and full connection layers. The transform average stitching module in fig. 2 is used to integrate different spherical harmonic coefficients, while the fully connected layer is used to upscale the coefficients, i.e. the upscaling module described in fig. 2. A residual connection is added between the transform average stitching module and the upscaled module to aid in training of the network. The output order of each layer is larger than the input order through the step-up operation. The last layer of the network no longer uses the residual connection and the ascending order module, and the average operation output of the last layer is taken as the final output of the whole model.
The sound field reconstruction method comprises the following specific processes:
step 1, performing multipoint sampling under random distribution conditions aiming at a target space, sampling to obtain sound pressure p (k, r) of each position r under ideal sphere opening conditions, and establishing an equation relation of sound pressure under ideal conditions through a formula (2) to obtain a spherical harmonic coefficient of a local coordinate center sound field of each position, wherein the process is a conversion process from sound pressure of a spherical microphone to the spherical harmonic coefficient, and specifically, decomposing the sound pressure p (k, r) of each sampling position r into an expansion of the spherical harmonic coefficient to obtain the spherical harmonic coefficient of the local coordinate center sound field of each sampling position r; the spherical harmonic coefficients of the local coordinate center sound field are two-dimensional matrixes, and the spherical harmonic coefficients comprise a plurality of frequencies and spherical harmonic coefficients in each frequency dimension.
And 2, realizing a spherical harmonic coefficient transfer process from the spherical harmonic coefficient of the local coordinate center to the spherical harmonic coefficient of the global coordinate center by using the process shown in the figure 2 in the invention, wherein the process comprises a spherical harmonic basis function mapping module, a radial function mapping module, a dual-path self-attention network, a transformation average splicing module and an ascending order module. Specifically, the input is a three-dimensional matrix, the three-dimensional matrix is composed of two-dimensional matrixes of local coordinate centers at different positions, and the two-dimensional matrixes are formed by splicing one-dimensional vectors composed of spherical harmonic coefficients corresponding to the local coordinate centers at different frequencies. In addition, the input also comprises the space relative distance and relative azimuth angle information corresponding to different local coordinate centers. The radial function mapping module outputs spatial distance information to a group of vectors which contain different frequencies and distances and have the same dimension as the radial function through the radial function mapping module for each distance, the module is realized by adopting a full-connection structure, the function of the module is to realize the study and substitution of a physical radial function module, and the output of the module is a radial function matrix. The spherical harmonic basis function mapping module outputs a group of vectors with the same dimensions as the spherical harmonic basis functions through the spherical harmonic basis function mapping module for each group of azimuth information, the output of the spherical harmonic basis function mapping module is similar to the effect of the spherical harmonic, the output of the spherical harmonic basis function mapping module is irrelevant to frequency, the spherical harmonic basis function mapping module is realized by adopting a full-connection structure, the function of the spherical harmonic basis function mapping module is to realize the learning and the replacement of the spherical harmonic basis functions, and the output of the spherical harmonic basis function mapping module is a spherical harmonic basis function matrix. The dual-path self-attention network uses as inputs a three-dimensional matrix, a radial function matrix, and a spherical harmonic basis function matrix. In the dual-path self-attention network, attention operation is firstly carried out by using a three-dimensional matrix and a radial function matrix, so that attention splicing of frequency and distance dimensions is realized, and the dimension passing through the attention operation is the same as the dimension of the original three-dimensional matrix. And then, performing attention operation on the dimension corresponding to the azimuth of the three-dimensional matrix and the spherical harmonic basis function matrix to realize attention splicing of the azimuth dimension, wherein the obtained result is still the same as the dimension of the three-dimensional matrix. The transformation average splicing module calculates an average value of the three-dimensional matrix, sums up two-dimensional matrixes of the three-dimensional matrix aiming at coordinate centers of different positions and averages the two-dimensional matrixes, so that the aim of information fusion of the coordinate centers of different positions is fulfilled. And adding the fused result with the original three-dimensional matrix to realize residual connection, thereby being beneficial to improving the convergence of the network. The obtained output is subjected to ascending operation on the dimension of the spherical harmonic coefficient vector, namely, the ascending module is used for realizing the mapping from the spherical harmonic coefficient of the low order to the spherical harmonic coefficient of Gao Jieqiu, the ascending module adopts a full-connection structure, and the input and output dimensions are the dimension of the spherical harmonic coefficient of the current order and the dimension of the spherical harmonic coefficient of the target order respectively, so that the learning process of the spherical harmonic coefficient of the global coordinate center of the target order is realized.
Step 3, the output of the dual-path spherical harmonic coefficient attention transfer network of the invention is the global coordinate center spherical harmonic coefficient of the target order, and the output order of the spherical harmonic coefficient is (N+1) 2 By using the output result to express the sound field as shown in the formula (2), the sound field expression result in the range can be accurately obtained.
3. Experimental setup and evaluation index
In the transfer process, the information such as the position, the frequency, the number of sampling points and the like can influence the result. Therefore, different cases are considered in the analog data. For convenience of subsequent practical application, a fourth-order spherical harmonic coefficient is used as input, i.e., n=4. The frequency ranges from 100Hz to 3000Hz, and the frequency point interval is 100Hz, i.e. k=30. The coefficient of the global coordinate center is N "=8. The distance between the set sampling point and the global origin is randomly sampled between 0.2m and 2.0 m. In the experiment, plane waves of 1 to 4 directions were randomly generated as signal sources. The amplitude of the signal is randomly selected from 0.1 to 1.0. Since low frequency system noise presents difficulties in solving spherical harmonic coefficients, noise of different signal-to-noise ratios (Signal Noise Ratio, SNR) is added to enhance the noise immunity of the model. SNR varies from 10 to 30 dB. The number of the space sampling points is set to be 4-10, and the minimum number of the space sampling points is set to be 4 sampling points so as to ensure the full rank property of the transfer matrix.
The network trains using the mean square error loss. The output order of each two-way self-attention module is one order greater than the input order. A multi-head attention mechanism is introduced, the number of attention being the current order plus one. For stable learning, the gradient is tailored to [ -1.0,1.0]. All models were trained distributively using 2 TITAN RTXs of batch size 32. The learning rate was initialized to 3e-4 and halved by Adam optimizer training. For training stability, normalization processing is carried out on each data, so that the number of data sampling points of each batch process is guaranteed to be the same. Training data were trained sequentially from 10 to 4.
The evaluation is based on the similarity of spherical harmonic coefficients and the sound field. Euclidean Distance Metric (EDM) and cosine similarity (COSS) metrics are used to determine the difference between the recovery coefficient and the ideal coefficient, respectively, where EDM gives a determination of Euclidean space and COSS provides a determination of structural similarity. Where N is the total number of test samples, y is the spherical harmonic coefficient of the actual global coordinate center,and estimating a result for the corresponding spherical harmonic coefficient.
In addition, a Signal-to-Distortion Ratio (SDR) is selected as an evaluation of a sound field result, wherein the sound field is obtained by spherical harmonic coefficient expansion. The calculation formula of SDR is
Wherein u andrepresenting sound field sound pressure obtained by expansion of the ideal spherical harmonic coefficient and the estimated spherical harmonic coefficient, respectively. The calculation of SDR is limited to a radius of 1m, with a resolution of 0.02m.
4. Experimental results and analysis
Table 1 shows comparison of 8 space sampling point statistics of ablation experiments
The results of the ablation study are shown in table 1. The test data uses eight spatial sampling points randomly selected in three-dimensional space, with a distance of 1m. The direction of the individual sound sources is randomly generated in three-dimensional space. The number of test data sets was 600. The result is an average indicator of all frequencies. The method of solving the inverse of the transfer matrix is simply called Least Squares (LSM). And a one-layer architecture and a two-layer architecture are adopted for comparison, and the optimal effect of the two-way module is verified to be gradually increased. Accordingly, the spherical harmonic ascending portion implements a mapping from 4 th order to 8 or other 4 to 6 last to 8. The numbers in brackets are the number of layers of the architecture. The number 4 indicates that the scheme increases sequentially from 4 to 8. The descending order indicates that the network is trained with spatial sampling points from 10 to 4, while the ascending order is reversed. The L1 loss was selected for comparison. The results show that the long to short training mode is helpful for training. The results also show that performance can be reduced if fewer dual path self-attention mechanism modules are used. While the L1 penalty function performs better in some regression tasks, this is not the case in our work. The final average result shows that the proposed method is effective for spherical harmonic coefficients and the restored sound field.
The results for the different signal-to-noise conditions are shown in figure 3. From left to right, are the result of SDR, EDM, COSS, respectively. The level is frequency information. The best-performing TT-Net (4) was used, we were abbreviated as TT-Net. The least squares method (Least Square Method, LSM) is compared to the invented optimal solution. From the SDR results, it can be seen that the performance of LSM decreases with decreasing SNR. In contrast, our approach appears consistent on SDR. The EDM and COSS indices indicate that the results for the coefficients are consistent with the previous discussion. Noise causes outliers to be added to the numerical solution results, which severely degrades the results. In contrast, the inventive method has noise immunity.
The results of spatially sampled points at different distances are shown in fig. 4. Each test data uses eight sampling points randomly distributed on equidistant spheres. The number of test samples per distance was 100.COSS shows that the conventional method works well at low frequencies. As the frequency increases, the increase in kr results in a deterioration of the result, which is related to the properties of the spherical bessel function. The network method has stable performance under the condition of less than 1.00 m. However, performance is degraded under long distance conditions, and there is a singularity in the frequency bands below 1kHz and above 2 kHz. According to analysis of SDR results, the reconstructed sound field results remain unchanged, which shows that the singular value results are positioned in a high-order part and have little influence on the sound field near the origin.
For experiments of different spatial sampling points, 4-13 spatial sampling points are selected, and the number of test samples of each condition is 100. We choose four cases, 4, 7, 10, 13 spatial sampling points visualizations, respectively. In LSM, using more spatial sampling points reduces the singularity of the transfer matrix, which is reflected in all three indices, as shown in fig. 5. As the number increases, there is a consistent trend for networks. Under test conditions, the numbers outside the training set achieve better results. The results show that the proposed method can increase stability with increasing number and confirm the role of the TAC module.
The results of the multi-source condition are shown in fig. 6. In LSM, the influence of sound sources in different directions only affects the spherical harmonic coefficients, and does not affect the solving of the transfer matrix. Thus, the conventional method is consistent. That is, the number of sound sources has little influence. The network is tested using 2 to 6 different numbers of sound sources. The results show that the spherical harmonic coefficients of different numbers of sound sources tend to be consistent. The performance of the inventive method does not decrease with increasing number of sound sources. Test sets with more than four sound sources are consistent with other results on the EDM and COSS targets. It should be noted that the results of the inventive method at different frequencies are subject to specific differences, which will be further analyzed in subsequent work.
Fig. 7 visualizes sound pressure examples of 1000Hz and 1800 Hz. The figure shows the sound pressure distribution on a 2m x 2m horizontal plane. The two sound sources in this example are directed at 0 deg. and 225 deg.. In each case, the sound pressure developed by the invented TT-Net, LSM and ideal spherical harmonic coefficients is shown from left to right. The results show that the network performance is stable and better under different frequency conditions.
5. Summary
The invention provides a sound field multipoint sampling transfer method based on a two-way self-attention mechanism network. The method is suitable for transferring spherical harmonic coefficients. The present invention uses neural networks to optimize spherical harmonic analysis. The occurrence of singular solutions when solving spherical harmonic coefficients is reduced. Simulation results under different frequency ranges, signal to noise ratios and more complex sound source conditions show that the method has higher precision in recovering spherical harmonic coefficients than the traditional method. Under the same conditions, the proposed method brings about a 3dB improvement in SDR index.
Although specific embodiments of the invention have been disclosed for illustrative purposes, it will be appreciated by those skilled in the art that the invention may be implemented with the help of a variety of examples: various alternatives, variations and modifications are possible without departing from the spirit and scope of the invention and the appended claims. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but that the invention will have the scope indicated by the scope of the appended claims.

Claims (3)

1. A multipoint sampling sound field reconstruction method based on double-path self-attention mechanism learning comprises the following steps:
1) Performing multipoint sampling on the target space to obtain sound pressure of each sampling position; then decomposing the sound pressure of each sampling position into an expansion of spherical harmonic coefficients to obtain spherical harmonic coefficients of the local coordinate center sound field of each sampling position; the spherical harmonic coefficient of the local coordinate center sound field is a two-dimensional matrix and comprises a plurality of frequencies and spherical harmonic coefficients in each frequency dimension;
2) Forming a first three-dimensional matrix by each sampling position and the spherical harmonic coefficient of the sound field of the corresponding local coordinate center, and inputting the first three-dimensional matrix into a transfer network to obtain the spherical harmonic coefficient of the global coordinate center; the transfer network comprises a spherical harmonic basis function mapping module, a radial function mapping module, a dual-path self-attention network, a conversion average splicing module and an ascending order module;
the radial function mapping module is used for mapping the space distance information in the spherical harmonic coefficient of the local coordinate center sound field corresponding to each sampling position, generating a radial function matrix according to a group of vectors which are obtained by mapping and have the same dimension as the radial function and contain different frequencies and distances, and sending the radial function matrix to the dual-path self-attention network;
the spherical harmonic basis function mapping module is used for mapping azimuth information in spherical harmonic coefficients of the local coordinate central sound field corresponding to each sampling position, generating a spherical harmonic basis function matrix according to a group of vectors which are obtained by mapping and have the same dimension as the spherical harmonic, and sending the spherical harmonic basis function matrix to the dual-path self-attention network;
the dual-path self-attention network is used for performing attention operation by utilizing the first three-dimensional matrix and the radial function matrix, so as to realize attention splicing of frequency and distance dimensions and obtain a second three-dimensional matrix; then performing attention operation on the dimension corresponding to the azimuth of the second three-dimensional matrix and the spherical harmonic basis function matrix to realize attention splicing of the azimuth dimension, obtaining a third three-dimensional matrix and sending the third three-dimensional matrix to the transformation average splicing module;
the transformation average splicing module is used for carrying out average value obtaining operation on the third three-dimensional matrix, obtaining an average value after summing up two-dimensional matrices aiming at coordinate centers of different positions, realizing fusion of information of different positions, and adding a fourth three-dimensional matrix obtained after fusion with the first three-dimensional matrix to send the fourth three-dimensional matrix to the step-up module;
the ascending module is used for ascending the input spherical harmonic coefficient vector, mapping the low-order spherical harmonic coefficient to the high-order spherical harmonic coefficient and obtaining the spherical harmonic coefficient of the global coordinate center;
3) And decomposing the sound pressure of the target space by utilizing the spherical harmonic coefficient of the global coordinate center, and reconstructing the sound field in the target space.
2. The method according to claim 1, characterized in that the sound pressure p (k, r) at the location r is sampled; wherein the wave numberc is sound velocity, f is sound wave frequency, and a spherical coordinate system is adopted to represent a sampling position r= (ρ, θ, φ), ρ is radius, θ is pitch angle, and φ is horizontal angle.
3. The method of claim 2, wherein the formula is usedDecomposing the sound pressure of the target space; wherein j is n (x) As a spherical Bessel function, Y nm (θ, φ) is the basis function of spherical harmonics, B nm (k) Is the corresponding spherical harmonic coefficient.
CN202310667120.6A 2023-06-07 2023-06-07 Multi-point sampling sound field reconstruction method based on two-way self-attention mechanism learning Pending CN116962938A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310667120.6A CN116962938A (en) 2023-06-07 2023-06-07 Multi-point sampling sound field reconstruction method based on two-way self-attention mechanism learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310667120.6A CN116962938A (en) 2023-06-07 2023-06-07 Multi-point sampling sound field reconstruction method based on two-way self-attention mechanism learning

Publications (1)

Publication Number Publication Date
CN116962938A true CN116962938A (en) 2023-10-27

Family

ID=88455478

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310667120.6A Pending CN116962938A (en) 2023-06-07 2023-06-07 Multi-point sampling sound field reconstruction method based on two-way self-attention mechanism learning

Country Status (1)

Country Link
CN (1) CN116962938A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117253472A (en) * 2023-11-16 2023-12-19 上海交通大学宁波人工智能研究院 Multi-region sound field reconstruction control method based on generation type deep neural network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117253472A (en) * 2023-11-16 2023-12-19 上海交通大学宁波人工智能研究院 Multi-region sound field reconstruction control method based on generation type deep neural network
CN117253472B (en) * 2023-11-16 2024-01-26 上海交通大学宁波人工智能研究院 Multi-region sound field reconstruction control method based on generation type deep neural network

Similar Documents

Publication Publication Date Title
CN109375171B (en) Sound source positioning method based on orthogonal matching pursuit algorithm
CN116962938A (en) Multi-point sampling sound field reconstruction method based on two-way self-attention mechanism learning
CN107544051A (en) Wave arrival direction estimating method of the nested array based on K R subspaces
JP7327840B2 (en) A Direction-of-Arrival Estimation Method for 3D Disjoint Cubic Arrays Based on Cross-Correlation Tensors
CN114527427B (en) Low-frequency wave beam forming sound source positioning method based on spherical microphone array
CN113109759B (en) Underwater sound array signal direction-of-arrival estimation method based on wavelet transform and convolution neural network
CN109557526B (en) Vector hydrophone sparse array arrangement method based on compressed sensing theory
CN111812581B (en) Spherical array sound source direction-of-arrival estimation method based on atomic norms
CN109870669A (en) How soon a kind of two dimension claps mesh free compression Wave beam forming identification of sound source method
CN112444773A (en) Compressed sensing two-dimensional DOA estimation method based on spatial domain fusion
CN112285647A (en) Signal orientation high-resolution estimation method based on sparse representation and reconstruction
CN108614235B (en) Single-snapshot direction finding method for information interaction of multiple pigeon groups
Huang et al. Off-grid DOA estimation in real spherical harmonics domain using sparse Bayesian inference
CN111263291B (en) Sound field reconstruction method based on high-order microphone array
CN113189538A (en) Ternary array based on co-prime sparse arrangement and spatial spectrum estimation method thereof
CN110133578B (en) Seabed reflection sound ray incident angle estimation method based on semi-cylindrical volume array
CN111896929A (en) DOD/DOA estimation algorithm of non-uniform MIMO radar
Chu et al. Two-dimensional total variation norm constrained deconvolution beamforming algorithm for acoustic source identification
Sundström et al. Optimal Transport Based Impulse Response Interpolation in the Presence of Calibration Errors
CN115267673B (en) Sparse sound source imaging method and system considering reconstruction grid offset
CN114252148B (en) Sound field reconstruction method based on prolate ellipsoid wave superposition
CN111157951B (en) Three-dimensional sound source positioning method based on differential microphone array
Chen et al. Sound Field Estimation around a Rigid Sphere with Physics-informed Neural Network
Liu et al. Efficient localization of low-frequency sound source with non-synchronous measurement at coprime positions by alternating direction method of multipliers
Wang et al. TT-Net: Dual-path transformer based sound field translation in the spherical harmonic domain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination