CN111537955A - Multi-sound-source positioning method and device based on spherical microphone array - Google Patents
Multi-sound-source positioning method and device based on spherical microphone array Download PDFInfo
- Publication number
- CN111537955A CN111537955A CN202010255782.9A CN202010255782A CN111537955A CN 111537955 A CN111537955 A CN 111537955A CN 202010255782 A CN202010255782 A CN 202010255782A CN 111537955 A CN111537955 A CN 111537955A
- Authority
- CN
- China
- Prior art keywords
- sparse
- iteration
- sound sources
- voice signal
- preset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S5/00—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
- G01S5/18—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
- G01S5/20—Position of source determined by a plurality of spaced direction-finders
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention discloses a multi-sound-source positioning method and a device based on a spherical microphone array, wherein the multi-sound-source comprises D sound sources, and the method comprises the following steps: s1, acquiring a voice signal transmitted by the spherical microphone array; s2, performing framing windowing and short-time Fourier transform processing on the voice signal to obtain a time-frequency domain voice signal; s3, performing spherical Fourier transform processing on the time-frequency domain voice signal to obtain a spherical harmonic domain voice signal; s4, constructing a sparse dictionary for the spherical harmonic domain voice signals according to the weight of the maximum directional beam former and the preset first grid precision; and S5, calculating to obtain first positions of the D sound sources in the sparse dictionary by using a sparse Bayes learning method and an expectation maximization method, and improving the resolution of sparse positioning of a spherical harmonic domain by using the weight of the maximum directional beam former as a new sparse dictionary, so that the method is suitable for the situations that the sound sources are more and the position intervals are closer, and is more accurate in positioning the sound sources.
Description
Technical Field
The invention relates to the technical field of voice signal processing, in particular to a multi-sound-source positioning method and device based on a spherical microphone array.
Background
At present, the multi-sound-source sparse positioning of the spherical microphone array is mainly to obtain sound source position information in a spherical harmonic domain by using a compressed sensing theory. The sparse dictionary of the most common spherical harmonic domain sparse positioning method is the weight of a delay and sum beam former, and when the number of sound sources is large and the position interval is close, the positioning of the sound source position is inaccurate, and the resolution is poor.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the art described above. Therefore, a first objective of the present invention is to provide a method for positioning multiple sound sources based on a spherical microphone array, in which the weight of a maximum directional beam former is used as a new sparse dictionary to improve the resolution of sparse positioning in a spherical harmonic domain, and the method is suitable for positioning more sound sources and closer positions.
The second purpose of the invention is to provide a multi-sound-source positioning device based on a spherical microphone array.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides a method for positioning multiple sound sources based on a spherical microphone array, where the multiple sound sources are D sound sources, and the method includes:
s1, acquiring a voice signal transmitted by the spherical microphone array;
s2, performing framing windowing and short-time Fourier transform processing on the voice signal to obtain a time-frequency domain voice signal;
s3, performing spherical Fourier transform processing on the time-frequency domain voice signal to obtain a spherical harmonic domain voice signal;
s4, constructing a sparse dictionary for the spherical harmonic domain voice signals according to the weight of the maximum directional beam former and the preset first grid precision;
and S5, calculating the first positions of the D sound sources in the sparse dictionary by using a sparse Bayesian learning method and an expectation maximization method.
According to the multi-sound-source positioning method based on the spherical microphone array provided by the embodiment of the first aspect of the invention, the three-dimensional space characteristics of the spherical microphone array are utilized to carry out omnibearing sampling on the azimuth angle and the pitch angle of a voice signal, voice signals sent out by D sound source positions are obtained, performing framing windowing and short-time Fourier transform processing on the voice signal to obtain a time-frequency domain voice signal, improving the processing efficiency of the voice signal, performing spherical Fourier transform processing and converting the processed signal into a spherical harmonic domain, constructing a sparse dictionary by using the weight value of the maximum directional beam former according to the first grid precision, wherein the first grid precision is (10 degrees, 5 degrees), wherein 10 degrees is an azimuth angle interval angle, 5 degrees is a pitch angle interval angle, in the sparse dictionary, and calculating to obtain first positions of the D sound sources by using a sparse Bayesian learning method and an expectation maximization method. The sparse dictionary designed by the weight of the maximum directional beam former has high resolution, can accurately obtain the sound source position information, and is widely used for positioning the position of multiple sound source.
According to some embodiments of the present invention, obtaining the first positions of the D sound sources in the sparse dictionary by applying a sparse bayesian learning method and an expectation maximization method includes:
s51, setting a first parameter value of sparse Bayesian learning and a first preset iteration number N, and calculating a first mean value and a first covariance of a sparse matrix for the sparse dictionary according to the first parameter value of the sparse Bayesian learning;
s52, estimating a first parameter value of the sparse Bayesian learning according to the first mean value and the first covariance by using an expectation maximization method to obtain a second parameter value; calculating a second mean value and a second covariance of the sparse matrix for the sparse dictionary according to a second parameter value of the sparse Bayesian learning, and completing one iteration; and when the current iteration times are determined to be equal to a first preset iteration time N, stopping iteration, calculating to obtain an Nth average value, and taking the first D highest peaks of the energy spectrum of the Nth average value as the first positions of the D sound sources.
According to some embodiments of the invention, after obtaining the first positions of the D sound sources, further comprising:
s6, setting a second preset iteration number M;
s7, mesh refinement is carried out in a preset area of the first positions of the D sound sources through a first preset rule;
s8, after the sparse dictionary is reconstructed according to the grid precision calculated by the first preset rule, second positions of the D sound sources are obtained by applying a sparse Bayesian learning method and an expectation maximization method, and one iteration of grid refinement is completed; and when the iteration number of the current grid refinement is determined to be equal to a second preset iteration number M, stopping iteration, calculating to obtain a final mean value, and taking the first D highest peaks of the energy spectrum of the mean value as the Mth positions of the D sound sources.
According to some embodiments of the invention, the first preset rule comprises calculating a grid accuracy by the following formula;
wherein, theta(j)Representing a grid set Θ in the jth iteration;andrespectively representing the interval angles of a pitch angle and an azimuth angle in the jth iteration; θ represents a pitch angle; phi represents an azimuth; phiθRepresenting the pitch angle in the jth iteration; phiφIndicating the azimuth in the j-th iteration.
According to some embodiments of the invention, the weights of the maximum directional beamformer are:
wherein, bn(kr) represents the intensity of the mode,representing spherical harmonics, n representing the order, m representing degrees, r representing the radius of the spherical microphone array, theta representing the pitch angle, and phi representing the azimuth angle.
According to some embodiments of the invention, the first grid precision is (10 °, 5 °), wherein 10 ° is an azimuth interval angle and 5 ° is a pitch interval angle.
In order to achieve the above object, a second aspect of the present invention provides a multi-sound-source positioning device based on a spherical microphone array, where the multi-sound-source positioning device includes D sound sources:
the acquisition module is used for acquiring the voice signal transmitted by the spherical microphone array;
the first signal processing module is used for performing frame windowing and short-time Fourier transform processing on the voice signal to obtain a time-frequency domain voice signal;
the second signal processing module is used for carrying out spherical Fourier transform processing on the time-frequency domain voice signal to obtain a spherical harmonic domain voice signal;
the sparse dictionary building module is used for building a sparse dictionary for the spherical harmonic domain voice signals according to the weight of the maximum directional beam former and the preset first grid precision;
and the first calculation module is used for calculating and obtaining first positions of the D sound sources in the sparse dictionary by applying a sparse Bayesian learning method and an expectation maximization method.
According to the multi-sound-source positioning device based on the spherical microphone array provided by the embodiment of the second aspect of the invention, the three-dimensional space characteristics of the spherical microphone array are utilized to carry out omnibearing sampling on the azimuth angle and the pitch angle of the voice signal, so as to obtain the voice signals sent out by D sound source positions, performing framing windowing and short-time Fourier transform processing on the voice signal to obtain a time-frequency domain voice signal, improving the processing efficiency of the voice signal, performing spherical Fourier transform processing and converting the processed signal into a spherical harmonic domain, constructing a sparse dictionary by using the weight value of the maximum directional beam former according to the first grid precision, wherein the first grid precision is (10 degrees, 5 degrees), wherein 10 degrees is an azimuth angle interval angle, 5 degrees is a pitch angle interval angle, in the sparse dictionary, and calculating to obtain first positions of the D sound sources by using a sparse Bayesian learning method and an expectation maximization method. The sparse dictionary designed by the weight of the maximum directional beam former has high resolution, can accurately obtain the sound source position information, and is widely used for positioning the position of multiple sound source.
According to some embodiments of the invention, the ball microphone array based multi-source localization apparatus further comprises:
the first storage module is used for storing a first parameter value of sparse Bayesian learning and a first preset iteration number N;
the first calculation module is used for calculating a first mean value and a first covariance of a sparse matrix for the sparse dictionary according to a first parameter value of sparse Bayesian learning; estimating a first parameter value of the sparse Bayesian learning according to the first mean value and the first covariance by using an expectation maximization method to obtain a second parameter value; calculating a second mean value and a second covariance of the sparse matrix for the sparse dictionary according to a second parameter value of the sparse Bayesian learning, and completing one iteration; and when the current iteration times are determined to be equal to a first preset iteration time N, stopping iteration, calculating to obtain an Nth average value, and taking the first D highest peaks of the energy spectrum of the Nth average value as the first positions of the D sound sources.
According to some embodiments of the invention, the ball microphone array based multi-source localization apparatus further comprises:
the second storage module is used for storing a second preset iteration number M;
the second calculation module is used for carrying out grid refinement in a preset area of the first positions of the D sound sources through a first preset rule; after a sparse dictionary is reconstructed according to the grid precision calculated by the first preset rule, second positions of the D sound sources are obtained by applying a sparse Bayesian learning method and an expectation maximization method, and one iteration of grid refinement is completed; and when the iteration number of the current grid refinement is determined to be equal to a second preset iteration number M, stopping iteration, calculating to obtain a final mean value, and taking the first D highest peaks of the energy spectrum of the mean value as the Mth positions of the D sound sources.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of a method for multi-source localization based on a spherical microphone array according to one embodiment of the present invention;
FIG. 2 is a flow diagram of a method for multiple sound source localization from a sparse dictionary according to one embodiment of the present invention;
FIG. 3 is a flow chart of a method for multi-source localization based on a spherical microphone array according to yet another embodiment of the present invention;
FIG. 4 is a block diagram of a ball microphone array based multi-source pointing device according to a first embodiment of the present invention;
FIG. 5 is a block diagram of a multi-source ball microphone array based positioning apparatus according to a second embodiment of the present invention;
fig. 6 is a block diagram of a multi-source positioning device based on a ball microphone array according to a third embodiment of the present invention.
Reference numerals:
the device comprises a multi-sound-source positioning device 100 based on a spherical microphone array, an acquisition module 1, a first signal processing module 2, a second signal processing module 3, a sparse dictionary construction module 4, a first calculation module 5, a first storage module 6, a second storage module 7 and a second calculation module 8.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
A multi-sound-source positioning method and apparatus based on a ball microphone array according to an embodiment of the present invention will be described with reference to fig. 1 to 6.
FIG. 1 is a flow chart of a method for multi-source localization based on a spherical microphone array according to one embodiment of the present invention; as shown in fig. 1, an embodiment of the first aspect of the present invention provides a method for positioning multiple sound sources based on a spherical microphone array, where the multiple sound sources are D sound sources, and the method includes:
s1, acquiring a voice signal transmitted by the spherical microphone array;
s2, performing framing windowing and short-time Fourier transform processing on the voice signal to obtain a time-frequency domain voice signal;
s3, performing spherical Fourier transform processing on the time-frequency domain voice signal to obtain a spherical harmonic domain voice signal;
s4, constructing a sparse dictionary for the spherical harmonic domain voice signals according to the weight of the maximum directional beam former and the preset first grid precision;
and S5, calculating the first positions of the D sound sources in the sparse dictionary by using a sparse Bayesian learning method and an expectation maximization method.
According to the multi-sound-source positioning method based on the spherical microphone array provided by the first embodiment of the invention, the three-dimensional space characteristics of the spherical microphone array are utilized to carry out omnibearing sampling on the azimuth angle and the pitch angle of a voice signal, voice signals sent out by D sound source positions are obtained, performing framing windowing and short-time Fourier transform processing on the voice signal to obtain a time-frequency domain voice signal, improving the processing efficiency of the voice signal, performing spherical Fourier transform processing and converting the processed signal into a spherical harmonic domain, constructing a sparse dictionary by using the weight value of the maximum directional beam former according to the first grid precision, wherein the first grid precision is (10 degrees, 5 degrees), wherein 10 degrees is an azimuth angle interval angle, 5 degrees is a pitch angle interval angle, in the sparse dictionary, and calculating to obtain first positions of the D sound sources by using a sparse Bayesian learning method and an expectation maximization method. The sparse dictionary designed by the weight of the maximum directional beam former has high resolution, can accurately obtain the sound source position information, and is widely used for positioning the position of multiple sound source.
FIG. 2 is a flow diagram of a method for multiple sound source localization from a sparse dictionary according to one embodiment of the present invention; as shown in fig. 2, according to some embodiments of the present invention, obtaining the first positions of the D sound sources in the sparse dictionary by using a sparse bayesian learning method and an expectation maximization method includes:
s51, setting a first parameter value of sparse Bayesian learning and a first preset iteration number N, and calculating a first mean value and a first covariance of a sparse matrix for the sparse dictionary according to the first parameter value of the sparse Bayesian learning;
s52, estimating a first parameter value of the sparse Bayesian learning according to the first mean value and the first covariance by using an expectation maximization method to obtain a second parameter value; calculating a second mean value and a second covariance of the sparse matrix for the sparse dictionary according to a second parameter value of the sparse Bayesian learning, and completing one iteration; and when the current iteration times are determined to be equal to a first preset iteration time N, stopping iteration, calculating to obtain an Nth average value, and taking the first D highest peaks of the energy spectrum of the Nth average value as the first positions of the D sound sources.
The working principle and the beneficial effects of the technical scheme are as follows: setting a first parameter value of the sparse Bayesian learning, namely a parameter initial value, calculating a first mean value and a first covariance of a sparse matrix for the sparse dictionary according to the first parameter value of the sparse Bayesian learning, and estimating the first parameter value of the sparse Bayesian learning by using the first mean value and the first covariance to obtain a second parameter value. And calculating a second mean value and a second covariance of the sparse matrix through a second parameter value, completing one iteration, stopping the iteration until a convergence condition is met, namely when the current iteration number is equal to a first preset iteration number N, wherein N can be 1000, calculating to obtain an Nth mean value, and taking the first D highest peaks of the energy spectrum of the Nth mean value as the first positions of the D sound sources to improve the positioning accuracy of the sound source positions.
FIG. 3 is a flow chart of a method for multi-source localization based on a spherical microphone array according to yet another embodiment of the present invention; as shown in fig. 3, after obtaining the first positions of the D sound sources, the method further includes:
s6, setting a second preset iteration number M;
s7, mesh refinement is carried out in a preset area of the first positions of the D sound sources through a first preset rule;
s8, after the sparse dictionary is reconstructed according to the grid precision calculated by the first preset rule, second positions of the D sound sources are obtained by applying a sparse Bayesian learning method and an expectation maximization method, and one iteration of grid refinement is completed; and when the iteration number of the current grid refinement is determined to be equal to a second preset iteration number M, stopping iteration, calculating to obtain a final mean value, and taking the first D highest peaks of the energy spectrum of the mean value as the Mth positions of the D sound sources.
The working principle and the beneficial effects of the technical scheme are as follows: when the first positions of the D sound sources are obtained, in order to obtain more accurate sound source position information, mesh refinement is performed in a preset area of the first positions, the preset area is a nearby area of the sound source position, a second preset iteration number M is set for obtaining the sound source position with preset precision, mesh refinement is performed in the preset area of the first positions of the D sound sources through a first preset rule, a sparse dictionary is reconstructed according to the mesh precision calculated according to the first preset rule, the mesh precision calculated through the first preset rule is higher, and illustratively, the first mesh precision is (10 degrees, 5 degrees, wherein 10 degrees is an azimuth angle interval angle, and 5 degrees is a pitch angle interval angle. The grid precision calculated by the first preset rule can be (5 degrees and 2 degrees), a sparse dictionary is reconstructed according to the grid precision calculated by the first preset rule and the weight value of the maximum directional beam former, the sound source positions are recalculated by using the method, the second positions of the D sound sources are obtained, and one iteration of grid refinement is completed; and continuously carrying out grid refinement in the area near the position of the second sound source, reconstructing a sparse dictionary again according to the grid precision recalculated according to the first preset rule and the weight value of the maximum directional beam former, wherein the grid precision recalculated according to the first preset rule can be (4 degrees and 1 degree), recalculating the position of the sound source by using the method to obtain the third positions of the D sound sources, stopping iteration until the iteration number of the current grid refinement is determined to be equal to the second preset iteration number M, calculating to obtain a final average value, taking the first D highest peaks of the energy spectrum of the average value as the Mth positions of the D sound sources to obtain the sound source position with preset precision, and positioning the sound source position more accurately. The second preset iteration number is 3, and under the condition that the calculated amount and the calculated complexity are lower than the preset threshold, the sound source position information with preset precision is obtained.
According to some embodiments of the invention, the first preset rule comprises calculating a grid accuracy by the following formula;
wherein, theta(j)Representing a grid set Θ in the jth iteration;andrespectively representing the interval angles of a pitch angle and an azimuth angle in the jth iteration; θ represents a pitch angle; phi represents an azimuth; phiθRepresenting the pitch angle in the jth iteration; phiφIndicating the azimuth in the j-th iteration.
According to some embodiments of the invention, the weights of the maximum directional beamformer are:
wherein, bn(kr) represents the intensity of the mode,representing spherical harmonics, n representing the order, m representing degrees, r representing the radius of the spherical microphone array, theta representing the pitch angle, and phi representing the azimuth angle. The sparse dictionary designed by the weight of the maximum directional beam former has high resolution, can accurately obtain the sound source position information, and is widely used for positioning the position of multiple sound source.
According to some embodiments of the invention, the first grid precision is (10 °, 5 °), wherein 10 ° is an azimuth interval angle and 5 ° is a pitch interval angle. The first grid precision is (10 degrees and 5 degrees), and the first positions of the D sound sources obtained by calculation can meet the requirement of positioning precision while the calculation amount is reduced.
FIG. 4 is a block diagram of a ball microphone array based multi-source localization apparatus 100 according to a first embodiment of the present invention; as shown in fig. 4, a second embodiment of the present invention provides a multi-sound-source positioning apparatus 100 based on a spherical microphone array, where the multi-sound-source positioning apparatus includes D sound sources:
the acquisition module 1 is used for acquiring the voice signal transmitted by the spherical microphone array;
the first signal processing module 2 is configured to perform frame windowing and short-time fourier transform processing on the voice signal to obtain a time-frequency domain voice signal;
the second signal processing module 3 is configured to perform spherical fourier transform processing on the time-frequency domain voice signal to obtain a spherical harmonic domain voice signal;
the sparse dictionary construction module 4 is used for constructing a sparse dictionary for the spherical harmonic domain voice signal according to the weight of the maximum directional beam former and the preset first grid precision;
and the first calculation module 5 is used for calculating and obtaining first positions of the D sound sources in the sparse dictionary by applying a sparse Bayesian learning method and an expectation maximization method.
The multi-source positioning device 100 based on the spherical microphone array according to the embodiment of the second aspect of the present invention utilizes the three-dimensional spatial characteristics of the spherical microphone array to perform omni-directional sampling on the azimuth angle and the pitch angle of the voice signal, so as to obtain the voice signals emitted from D sound source positions, performing framing windowing and short-time Fourier transform processing on the voice signal to obtain a time-frequency domain voice signal, improving the processing efficiency of the voice signal, performing spherical Fourier transform processing and converting the processed signal into a spherical harmonic domain, constructing a sparse dictionary by using the weight value of the maximum directional beam former according to the first grid precision, wherein the first grid precision is (10 degrees, 5 degrees), wherein 10 degrees is an azimuth angle interval angle, 5 degrees is a pitch angle interval angle, in the sparse dictionary, and calculating to obtain first positions of the D sound sources by using a sparse Bayesian learning method and an expectation maximization method. The sparse dictionary designed by the weight of the maximum directional beam former has high resolution, can accurately obtain the sound source position information, and is widely used for positioning the position of multiple sound source.
FIG. 5 is a block diagram of a multi-source ball microphone array based positioning apparatus 100 according to a second embodiment of the present invention; as shown in fig. 5, the method further includes:
the first storage module 6 is used for storing a first parameter value of sparse Bayesian learning and a first preset iteration number N;
the first calculation module 5 is configured to calculate a first mean value and a first covariance of a sparse matrix for the sparse dictionary according to a first parameter value of sparse bayesian learning; estimating a first parameter value of the sparse Bayesian learning according to the first mean value and the first covariance by using an expectation maximization method to obtain a second parameter value; calculating a second mean value and a second covariance of the sparse matrix for the sparse dictionary according to a second parameter value of the sparse Bayesian learning, and completing one iteration; and when the current iteration times are determined to be equal to a first preset iteration time N, stopping iteration, calculating to obtain an Nth average value, and taking the first D highest peaks of the energy spectrum of the Nth average value as the first positions of the D sound sources.
FIG. 6 is a block diagram of a multi-source ball microphone array based positioning device 100 according to a third embodiment of the present invention; as shown in fig. 6, the method further includes:
a second storage module 7, configured to store a second preset iteration number M;
the second calculation module 8 is used for performing mesh refinement in a preset area of the first positions of the D sound sources through a first preset rule; after a sparse dictionary is reconstructed according to the grid precision calculated by the first preset rule, second positions of the D sound sources are obtained by applying a sparse Bayesian learning method and an expectation maximization method, and one iteration of grid refinement is completed; and when the iteration number of the current grid refinement is determined to be equal to a second preset iteration number M, stopping iteration, calculating to obtain a final mean value, and taking the first D highest peaks of the energy spectrum of the mean value as the Mth positions of the D sound sources.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (9)
1. A multi-sound-source positioning method based on a spherical microphone array is characterized in that the multi-sound source is D sound sources, and comprises the following steps:
s1, acquiring a voice signal transmitted by the spherical microphone array;
s2, performing framing windowing and short-time Fourier transform processing on the voice signal to obtain a time-frequency domain voice signal;
s3, performing spherical Fourier transform processing on the time-frequency domain voice signal to obtain a spherical harmonic domain voice signal;
s4, constructing a sparse dictionary for the spherical harmonic domain voice signals according to the weight of the maximum directional beam former and the preset first grid precision;
and S5, calculating the first positions of the D sound sources in the sparse dictionary by using a sparse Bayesian learning method and an expectation maximization method.
2. The method as claimed in claim 1, wherein the obtaining the first positions of the D sound sources in the sparse dictionary by using a sparse bayesian learning method and an expectation maximization method comprises:
s51, setting a first parameter value of sparse Bayesian learning and a first preset iteration number N, and calculating a first mean value and a first covariance of a sparse matrix for the sparse dictionary according to the first parameter value of the sparse Bayesian learning;
s52, estimating a first parameter value of the sparse Bayesian learning according to the first mean value and the first covariance by using an expectation maximization method to obtain a second parameter value; calculating a second mean value and a second covariance of the sparse matrix for the sparse dictionary according to a second parameter value of the sparse Bayesian learning, and completing one iteration; and when the current iteration times are determined to be equal to a first preset iteration time N, stopping iteration, calculating to obtain an Nth average value, and taking the first D highest peaks of the energy spectrum of the Nth average value as the first positions of the D sound sources.
3. The method of claim 1, wherein after obtaining the first positions of the D sound sources, the method further comprises:
s6, setting a second preset iteration number M;
s7, mesh refinement is carried out in a preset area of the first positions of the D sound sources through a first preset rule;
s8, after the sparse dictionary is reconstructed according to the grid precision calculated by the first preset rule, second positions of the D sound sources are obtained by applying a sparse Bayesian learning method and an expectation maximization method, and one iteration of grid refinement is completed; and when the iteration number of the current grid refinement is determined to be equal to a second preset iteration number M, stopping iteration, calculating to obtain a final mean value, and taking the first D highest peaks of the energy spectrum of the mean value as the Mth positions of the D sound sources.
4. The ball microphone array based multi-source localization method of claim 3, wherein the first preset rule comprises calculating a grid accuracy by the following formula;
wherein, theta(j)Representing a grid set Θ in the jth iteration;andrespectively representing the interval angles of a pitch angle and an azimuth angle in the jth iteration; θ represents a pitch angle; phi represents an azimuth; phiθRepresenting the pitch angle in the jth iteration; phiφDenotes the j (th)Azimuth in the sub-iteration.
5. The method of claim 1, wherein the weights of the maximum directional beamformer are:
6. The ball microphone array based multi-source localization method of claim 5, wherein the first grid precision is (10 °, 5 °), wherein 10 ° is an azimuth interval angle and 5 ° is a pitch interval angle.
7. A multi-sound-source positioning device based on a spherical microphone array, wherein the multi-sound-source comprises D sound sources, and the device comprises:
the acquisition module is used for acquiring the voice signal transmitted by the spherical microphone array;
the first signal processing module is used for performing frame windowing and short-time Fourier transform processing on the voice signal to obtain a time-frequency domain voice signal;
the second signal processing module is used for carrying out spherical Fourier transform processing on the time-frequency domain voice signal to obtain a spherical harmonic domain voice signal;
the sparse dictionary building module is used for building a sparse dictionary for the spherical harmonic domain voice signals according to the weight of the maximum directional beam former and the preset first grid precision;
and the first calculation module is used for calculating and obtaining first positions of the D sound sources in the sparse dictionary by applying a sparse Bayesian learning method and an expectation maximization method.
8. The ball microphone array based multi-source pointing device of claim 7, further comprising:
the first storage module is used for storing a first parameter value of sparse Bayesian learning and a first preset iteration number N;
the first calculation module is used for calculating a first mean value and a first covariance of a sparse matrix for the sparse dictionary according to a first parameter value of sparse Bayesian learning; estimating a first parameter value of the sparse Bayesian learning according to the first mean value and the first covariance by using an expectation maximization method to obtain a second parameter value; calculating a second mean value and a second covariance of the sparse matrix for the sparse dictionary according to a second parameter value of the sparse Bayesian learning, and completing one iteration; and when the current iteration times are determined to be equal to a first preset iteration time N, stopping iteration, calculating to obtain an Nth average value, and taking the first D highest peaks of the energy spectrum of the Nth average value as the first positions of the D sound sources.
9. The ball microphone array based multi-source pointing device of claim 7, further comprising:
the second storage module is used for storing a second preset iteration number M;
the second calculation module is used for carrying out grid refinement in a preset area of the first positions of the D sound sources through a first preset rule; after a sparse dictionary is reconstructed according to the grid precision calculated by the first preset rule, second positions of the D sound sources are obtained by applying a sparse Bayesian learning method and an expectation maximization method, and one iteration of grid refinement is completed; and when the iteration number of the current grid refinement is determined to be equal to a second preset iteration number M, stopping iteration, calculating to obtain a final mean value, and taking the first D highest peaks of the energy spectrum of the mean value as the Mth positions of the D sound sources.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010255782.9A CN111537955A (en) | 2020-04-02 | 2020-04-02 | Multi-sound-source positioning method and device based on spherical microphone array |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010255782.9A CN111537955A (en) | 2020-04-02 | 2020-04-02 | Multi-sound-source positioning method and device based on spherical microphone array |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111537955A true CN111537955A (en) | 2020-08-14 |
Family
ID=71952217
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010255782.9A Pending CN111537955A (en) | 2020-04-02 | 2020-04-02 | Multi-sound-source positioning method and device based on spherical microphone array |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111537955A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114527427A (en) * | 2022-01-27 | 2022-05-24 | 华南理工大学 | Low-frequency beam forming sound source positioning method based on spherical microphone array |
CN116338574A (en) * | 2023-04-10 | 2023-06-27 | 哈尔滨工程大学 | Sparse Bayesian learning underwater sound source positioning method based on matched beam |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103592628A (en) * | 2013-11-12 | 2014-02-19 | 上海大学 | Multi-sound-source positioning method based on formation of real value weight beam in spherical harmonic domain |
CN110718230A (en) * | 2019-08-29 | 2020-01-21 | 云知声智能科技股份有限公司 | Method and system for eliminating reverberation |
-
2020
- 2020-04-02 CN CN202010255782.9A patent/CN111537955A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103592628A (en) * | 2013-11-12 | 2014-02-19 | 上海大学 | Multi-sound-source positioning method based on formation of real value weight beam in spherical harmonic domain |
CN110718230A (en) * | 2019-08-29 | 2020-01-21 | 云知声智能科技股份有限公司 | Method and system for eliminating reverberation |
Non-Patent Citations (4)
Title |
---|
宋涛等: "基于球谐递归关系的球阵列多声源定位方法", 《新型工业化》 * |
张耀允等: "地理信息技术在公路运行速度协调性分析中的应用", 《北方交通》 * |
戴玮: "基于球谐域稀疏贝叶斯学习的室内多声源定位方法研究", 《信息科技辑》 * |
朱宏辉: "《知识驱动型拟人智能控制系统研究》", 31 March 2012 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114527427A (en) * | 2022-01-27 | 2022-05-24 | 华南理工大学 | Low-frequency beam forming sound source positioning method based on spherical microphone array |
CN114527427B (en) * | 2022-01-27 | 2024-03-29 | 华南理工大学 | Low-frequency wave beam forming sound source positioning method based on spherical microphone array |
CN116338574A (en) * | 2023-04-10 | 2023-06-27 | 哈尔滨工程大学 | Sparse Bayesian learning underwater sound source positioning method based on matched beam |
CN116338574B (en) * | 2023-04-10 | 2023-09-19 | 哈尔滨工程大学 | Sparse Bayesian learning underwater sound source positioning method based on matched beam |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106872944B (en) | Sound source positioning method and device based on microphone array | |
CN109272989B (en) | Voice wake-up method, apparatus and computer readable storage medium | |
ES2953525T3 (en) | Voice recognition method and device, storage medium and air conditioner | |
AU2015292238B2 (en) | Planar sensor array | |
CN111123192B (en) | Two-dimensional DOA positioning method based on circular array and virtual extension | |
CN103583054B (en) | For producing the apparatus and method of audio output signal | |
CN111489753B (en) | Anti-noise sound source positioning method and device and computer equipment | |
CN110197112B (en) | Beam domain Root-MUSIC method based on covariance correction | |
CN109932689A (en) | A kind of General Cell optimization method suitable for certain position scene | |
CN111537955A (en) | Multi-sound-source positioning method and device based on spherical microphone array | |
CN104502904B (en) | Torpedo homing beam sharpening method | |
EP3695403B1 (en) | Joint wideband source localization and acquisition based on a grid-shift approach | |
CN108614235B (en) | Single-snapshot direction finding method for information interaction of multiple pigeon groups | |
Chen et al. | Multiple sound source localization, separation, and reconstruction by microphone array: A dnn-based approach | |
CN109254265A (en) | A kind of whistle vehicle positioning method based on microphone array | |
CN111830465B (en) | Two-dimensional Newton orthogonal matching pursuit compressed beam forming method | |
CN113593596A (en) | Robust self-adaptive beam forming directional pickup method based on subarray division | |
CN104008287B (en) | Reconstruction of Sound Field and ghost suppressing method based on PSO MVDR | |
CN110568406B (en) | Positioning method based on acoustic energy under condition of unknown energy attenuation factor | |
Svaizer et al. | Environment aware estimation of the orientation of acoustic sources using a line array | |
CN112083423B (en) | Multi-base sound source high-precision positioning method | |
CN110824484B (en) | Array element position estimation method based on constant modulus algorithm | |
Tsuchiya et al. | Two-dimensional finite-difference time-domain simulation of moving sound source and receiver with directivity | |
US11750971B2 (en) | Three-dimensional sound localization method, electronic device and computer readable storage | |
Wang et al. | A Survey of Target Orientation Detection Algorithms Based on GPU Parallel Computing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200814 |
|
RJ01 | Rejection of invention patent application after publication |