CN111537955A - Multi-sound-source positioning method and device based on spherical microphone array - Google Patents

Multi-sound-source positioning method and device based on spherical microphone array Download PDF

Info

Publication number
CN111537955A
CN111537955A CN202010255782.9A CN202010255782A CN111537955A CN 111537955 A CN111537955 A CN 111537955A CN 202010255782 A CN202010255782 A CN 202010255782A CN 111537955 A CN111537955 A CN 111537955A
Authority
CN
China
Prior art keywords
sparse
iteration
sound sources
voice signal
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010255782.9A
Other languages
Chinese (zh)
Inventor
戴玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd, Xiamen Yunzhixin Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN202010255782.9A priority Critical patent/CN111537955A/en
Publication of CN111537955A publication Critical patent/CN111537955A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/20Position of source determined by a plurality of spaced direction-finders

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a multi-sound-source positioning method and a device based on a spherical microphone array, wherein the multi-sound-source comprises D sound sources, and the method comprises the following steps: s1, acquiring a voice signal transmitted by the spherical microphone array; s2, performing framing windowing and short-time Fourier transform processing on the voice signal to obtain a time-frequency domain voice signal; s3, performing spherical Fourier transform processing on the time-frequency domain voice signal to obtain a spherical harmonic domain voice signal; s4, constructing a sparse dictionary for the spherical harmonic domain voice signals according to the weight of the maximum directional beam former and the preset first grid precision; and S5, calculating to obtain first positions of the D sound sources in the sparse dictionary by using a sparse Bayes learning method and an expectation maximization method, and improving the resolution of sparse positioning of a spherical harmonic domain by using the weight of the maximum directional beam former as a new sparse dictionary, so that the method is suitable for the situations that the sound sources are more and the position intervals are closer, and is more accurate in positioning the sound sources.

Description

Multi-sound-source positioning method and device based on spherical microphone array
Technical Field
The invention relates to the technical field of voice signal processing, in particular to a multi-sound-source positioning method and device based on a spherical microphone array.
Background
At present, the multi-sound-source sparse positioning of the spherical microphone array is mainly to obtain sound source position information in a spherical harmonic domain by using a compressed sensing theory. The sparse dictionary of the most common spherical harmonic domain sparse positioning method is the weight of a delay and sum beam former, and when the number of sound sources is large and the position interval is close, the positioning of the sound source position is inaccurate, and the resolution is poor.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the art described above. Therefore, a first objective of the present invention is to provide a method for positioning multiple sound sources based on a spherical microphone array, in which the weight of a maximum directional beam former is used as a new sparse dictionary to improve the resolution of sparse positioning in a spherical harmonic domain, and the method is suitable for positioning more sound sources and closer positions.
The second purpose of the invention is to provide a multi-sound-source positioning device based on a spherical microphone array.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides a method for positioning multiple sound sources based on a spherical microphone array, where the multiple sound sources are D sound sources, and the method includes:
s1, acquiring a voice signal transmitted by the spherical microphone array;
s2, performing framing windowing and short-time Fourier transform processing on the voice signal to obtain a time-frequency domain voice signal;
s3, performing spherical Fourier transform processing on the time-frequency domain voice signal to obtain a spherical harmonic domain voice signal;
s4, constructing a sparse dictionary for the spherical harmonic domain voice signals according to the weight of the maximum directional beam former and the preset first grid precision;
and S5, calculating the first positions of the D sound sources in the sparse dictionary by using a sparse Bayesian learning method and an expectation maximization method.
According to the multi-sound-source positioning method based on the spherical microphone array provided by the embodiment of the first aspect of the invention, the three-dimensional space characteristics of the spherical microphone array are utilized to carry out omnibearing sampling on the azimuth angle and the pitch angle of a voice signal, voice signals sent out by D sound source positions are obtained, performing framing windowing and short-time Fourier transform processing on the voice signal to obtain a time-frequency domain voice signal, improving the processing efficiency of the voice signal, performing spherical Fourier transform processing and converting the processed signal into a spherical harmonic domain, constructing a sparse dictionary by using the weight value of the maximum directional beam former according to the first grid precision, wherein the first grid precision is (10 degrees, 5 degrees), wherein 10 degrees is an azimuth angle interval angle, 5 degrees is a pitch angle interval angle, in the sparse dictionary, and calculating to obtain first positions of the D sound sources by using a sparse Bayesian learning method and an expectation maximization method. The sparse dictionary designed by the weight of the maximum directional beam former has high resolution, can accurately obtain the sound source position information, and is widely used for positioning the position of multiple sound source.
According to some embodiments of the present invention, obtaining the first positions of the D sound sources in the sparse dictionary by applying a sparse bayesian learning method and an expectation maximization method includes:
s51, setting a first parameter value of sparse Bayesian learning and a first preset iteration number N, and calculating a first mean value and a first covariance of a sparse matrix for the sparse dictionary according to the first parameter value of the sparse Bayesian learning;
s52, estimating a first parameter value of the sparse Bayesian learning according to the first mean value and the first covariance by using an expectation maximization method to obtain a second parameter value; calculating a second mean value and a second covariance of the sparse matrix for the sparse dictionary according to a second parameter value of the sparse Bayesian learning, and completing one iteration; and when the current iteration times are determined to be equal to a first preset iteration time N, stopping iteration, calculating to obtain an Nth average value, and taking the first D highest peaks of the energy spectrum of the Nth average value as the first positions of the D sound sources.
According to some embodiments of the invention, after obtaining the first positions of the D sound sources, further comprising:
s6, setting a second preset iteration number M;
s7, mesh refinement is carried out in a preset area of the first positions of the D sound sources through a first preset rule;
s8, after the sparse dictionary is reconstructed according to the grid precision calculated by the first preset rule, second positions of the D sound sources are obtained by applying a sparse Bayesian learning method and an expectation maximization method, and one iteration of grid refinement is completed; and when the iteration number of the current grid refinement is determined to be equal to a second preset iteration number M, stopping iteration, calculating to obtain a final mean value, and taking the first D highest peaks of the energy spectrum of the mean value as the Mth positions of the D sound sources.
According to some embodiments of the invention, the first preset rule comprises calculating a grid accuracy by the following formula;
Figure BDA0002437256510000031
wherein, theta(j)Representing a grid set Θ in the jth iteration;
Figure BDA0002437256510000032
and
Figure BDA0002437256510000033
respectively representing the interval angles of a pitch angle and an azimuth angle in the jth iteration; θ represents a pitch angle; phi represents an azimuth; phiθRepresenting the pitch angle in the jth iteration; phiφIndicating the azimuth in the j-th iteration.
According to some embodiments of the invention, the weights of the maximum directional beamformer are:
Figure BDA0002437256510000034
wherein, bn(kr) represents the intensity of the mode,
Figure BDA0002437256510000035
representing spherical harmonics, n representing the order, m representing degrees, r representing the radius of the spherical microphone array, theta representing the pitch angle, and phi representing the azimuth angle.
According to some embodiments of the invention, the first grid precision is (10 °, 5 °), wherein 10 ° is an azimuth interval angle and 5 ° is a pitch interval angle.
In order to achieve the above object, a second aspect of the present invention provides a multi-sound-source positioning device based on a spherical microphone array, where the multi-sound-source positioning device includes D sound sources:
the acquisition module is used for acquiring the voice signal transmitted by the spherical microphone array;
the first signal processing module is used for performing frame windowing and short-time Fourier transform processing on the voice signal to obtain a time-frequency domain voice signal;
the second signal processing module is used for carrying out spherical Fourier transform processing on the time-frequency domain voice signal to obtain a spherical harmonic domain voice signal;
the sparse dictionary building module is used for building a sparse dictionary for the spherical harmonic domain voice signals according to the weight of the maximum directional beam former and the preset first grid precision;
and the first calculation module is used for calculating and obtaining first positions of the D sound sources in the sparse dictionary by applying a sparse Bayesian learning method and an expectation maximization method.
According to the multi-sound-source positioning device based on the spherical microphone array provided by the embodiment of the second aspect of the invention, the three-dimensional space characteristics of the spherical microphone array are utilized to carry out omnibearing sampling on the azimuth angle and the pitch angle of the voice signal, so as to obtain the voice signals sent out by D sound source positions, performing framing windowing and short-time Fourier transform processing on the voice signal to obtain a time-frequency domain voice signal, improving the processing efficiency of the voice signal, performing spherical Fourier transform processing and converting the processed signal into a spherical harmonic domain, constructing a sparse dictionary by using the weight value of the maximum directional beam former according to the first grid precision, wherein the first grid precision is (10 degrees, 5 degrees), wherein 10 degrees is an azimuth angle interval angle, 5 degrees is a pitch angle interval angle, in the sparse dictionary, and calculating to obtain first positions of the D sound sources by using a sparse Bayesian learning method and an expectation maximization method. The sparse dictionary designed by the weight of the maximum directional beam former has high resolution, can accurately obtain the sound source position information, and is widely used for positioning the position of multiple sound source.
According to some embodiments of the invention, the ball microphone array based multi-source localization apparatus further comprises:
the first storage module is used for storing a first parameter value of sparse Bayesian learning and a first preset iteration number N;
the first calculation module is used for calculating a first mean value and a first covariance of a sparse matrix for the sparse dictionary according to a first parameter value of sparse Bayesian learning; estimating a first parameter value of the sparse Bayesian learning according to the first mean value and the first covariance by using an expectation maximization method to obtain a second parameter value; calculating a second mean value and a second covariance of the sparse matrix for the sparse dictionary according to a second parameter value of the sparse Bayesian learning, and completing one iteration; and when the current iteration times are determined to be equal to a first preset iteration time N, stopping iteration, calculating to obtain an Nth average value, and taking the first D highest peaks of the energy spectrum of the Nth average value as the first positions of the D sound sources.
According to some embodiments of the invention, the ball microphone array based multi-source localization apparatus further comprises:
the second storage module is used for storing a second preset iteration number M;
the second calculation module is used for carrying out grid refinement in a preset area of the first positions of the D sound sources through a first preset rule; after a sparse dictionary is reconstructed according to the grid precision calculated by the first preset rule, second positions of the D sound sources are obtained by applying a sparse Bayesian learning method and an expectation maximization method, and one iteration of grid refinement is completed; and when the iteration number of the current grid refinement is determined to be equal to a second preset iteration number M, stopping iteration, calculating to obtain a final mean value, and taking the first D highest peaks of the energy spectrum of the mean value as the Mth positions of the D sound sources.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of a method for multi-source localization based on a spherical microphone array according to one embodiment of the present invention;
FIG. 2 is a flow diagram of a method for multiple sound source localization from a sparse dictionary according to one embodiment of the present invention;
FIG. 3 is a flow chart of a method for multi-source localization based on a spherical microphone array according to yet another embodiment of the present invention;
FIG. 4 is a block diagram of a ball microphone array based multi-source pointing device according to a first embodiment of the present invention;
FIG. 5 is a block diagram of a multi-source ball microphone array based positioning apparatus according to a second embodiment of the present invention;
fig. 6 is a block diagram of a multi-source positioning device based on a ball microphone array according to a third embodiment of the present invention.
Reference numerals:
the device comprises a multi-sound-source positioning device 100 based on a spherical microphone array, an acquisition module 1, a first signal processing module 2, a second signal processing module 3, a sparse dictionary construction module 4, a first calculation module 5, a first storage module 6, a second storage module 7 and a second calculation module 8.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
A multi-sound-source positioning method and apparatus based on a ball microphone array according to an embodiment of the present invention will be described with reference to fig. 1 to 6.
FIG. 1 is a flow chart of a method for multi-source localization based on a spherical microphone array according to one embodiment of the present invention; as shown in fig. 1, an embodiment of the first aspect of the present invention provides a method for positioning multiple sound sources based on a spherical microphone array, where the multiple sound sources are D sound sources, and the method includes:
s1, acquiring a voice signal transmitted by the spherical microphone array;
s2, performing framing windowing and short-time Fourier transform processing on the voice signal to obtain a time-frequency domain voice signal;
s3, performing spherical Fourier transform processing on the time-frequency domain voice signal to obtain a spherical harmonic domain voice signal;
s4, constructing a sparse dictionary for the spherical harmonic domain voice signals according to the weight of the maximum directional beam former and the preset first grid precision;
and S5, calculating the first positions of the D sound sources in the sparse dictionary by using a sparse Bayesian learning method and an expectation maximization method.
According to the multi-sound-source positioning method based on the spherical microphone array provided by the first embodiment of the invention, the three-dimensional space characteristics of the spherical microphone array are utilized to carry out omnibearing sampling on the azimuth angle and the pitch angle of a voice signal, voice signals sent out by D sound source positions are obtained, performing framing windowing and short-time Fourier transform processing on the voice signal to obtain a time-frequency domain voice signal, improving the processing efficiency of the voice signal, performing spherical Fourier transform processing and converting the processed signal into a spherical harmonic domain, constructing a sparse dictionary by using the weight value of the maximum directional beam former according to the first grid precision, wherein the first grid precision is (10 degrees, 5 degrees), wherein 10 degrees is an azimuth angle interval angle, 5 degrees is a pitch angle interval angle, in the sparse dictionary, and calculating to obtain first positions of the D sound sources by using a sparse Bayesian learning method and an expectation maximization method. The sparse dictionary designed by the weight of the maximum directional beam former has high resolution, can accurately obtain the sound source position information, and is widely used for positioning the position of multiple sound source.
FIG. 2 is a flow diagram of a method for multiple sound source localization from a sparse dictionary according to one embodiment of the present invention; as shown in fig. 2, according to some embodiments of the present invention, obtaining the first positions of the D sound sources in the sparse dictionary by using a sparse bayesian learning method and an expectation maximization method includes:
s51, setting a first parameter value of sparse Bayesian learning and a first preset iteration number N, and calculating a first mean value and a first covariance of a sparse matrix for the sparse dictionary according to the first parameter value of the sparse Bayesian learning;
s52, estimating a first parameter value of the sparse Bayesian learning according to the first mean value and the first covariance by using an expectation maximization method to obtain a second parameter value; calculating a second mean value and a second covariance of the sparse matrix for the sparse dictionary according to a second parameter value of the sparse Bayesian learning, and completing one iteration; and when the current iteration times are determined to be equal to a first preset iteration time N, stopping iteration, calculating to obtain an Nth average value, and taking the first D highest peaks of the energy spectrum of the Nth average value as the first positions of the D sound sources.
The working principle and the beneficial effects of the technical scheme are as follows: setting a first parameter value of the sparse Bayesian learning, namely a parameter initial value, calculating a first mean value and a first covariance of a sparse matrix for the sparse dictionary according to the first parameter value of the sparse Bayesian learning, and estimating the first parameter value of the sparse Bayesian learning by using the first mean value and the first covariance to obtain a second parameter value. And calculating a second mean value and a second covariance of the sparse matrix through a second parameter value, completing one iteration, stopping the iteration until a convergence condition is met, namely when the current iteration number is equal to a first preset iteration number N, wherein N can be 1000, calculating to obtain an Nth mean value, and taking the first D highest peaks of the energy spectrum of the Nth mean value as the first positions of the D sound sources to improve the positioning accuracy of the sound source positions.
FIG. 3 is a flow chart of a method for multi-source localization based on a spherical microphone array according to yet another embodiment of the present invention; as shown in fig. 3, after obtaining the first positions of the D sound sources, the method further includes:
s6, setting a second preset iteration number M;
s7, mesh refinement is carried out in a preset area of the first positions of the D sound sources through a first preset rule;
s8, after the sparse dictionary is reconstructed according to the grid precision calculated by the first preset rule, second positions of the D sound sources are obtained by applying a sparse Bayesian learning method and an expectation maximization method, and one iteration of grid refinement is completed; and when the iteration number of the current grid refinement is determined to be equal to a second preset iteration number M, stopping iteration, calculating to obtain a final mean value, and taking the first D highest peaks of the energy spectrum of the mean value as the Mth positions of the D sound sources.
The working principle and the beneficial effects of the technical scheme are as follows: when the first positions of the D sound sources are obtained, in order to obtain more accurate sound source position information, mesh refinement is performed in a preset area of the first positions, the preset area is a nearby area of the sound source position, a second preset iteration number M is set for obtaining the sound source position with preset precision, mesh refinement is performed in the preset area of the first positions of the D sound sources through a first preset rule, a sparse dictionary is reconstructed according to the mesh precision calculated according to the first preset rule, the mesh precision calculated through the first preset rule is higher, and illustratively, the first mesh precision is (10 degrees, 5 degrees, wherein 10 degrees is an azimuth angle interval angle, and 5 degrees is a pitch angle interval angle. The grid precision calculated by the first preset rule can be (5 degrees and 2 degrees), a sparse dictionary is reconstructed according to the grid precision calculated by the first preset rule and the weight value of the maximum directional beam former, the sound source positions are recalculated by using the method, the second positions of the D sound sources are obtained, and one iteration of grid refinement is completed; and continuously carrying out grid refinement in the area near the position of the second sound source, reconstructing a sparse dictionary again according to the grid precision recalculated according to the first preset rule and the weight value of the maximum directional beam former, wherein the grid precision recalculated according to the first preset rule can be (4 degrees and 1 degree), recalculating the position of the sound source by using the method to obtain the third positions of the D sound sources, stopping iteration until the iteration number of the current grid refinement is determined to be equal to the second preset iteration number M, calculating to obtain a final average value, taking the first D highest peaks of the energy spectrum of the average value as the Mth positions of the D sound sources to obtain the sound source position with preset precision, and positioning the sound source position more accurately. The second preset iteration number is 3, and under the condition that the calculated amount and the calculated complexity are lower than the preset threshold, the sound source position information with preset precision is obtained.
According to some embodiments of the invention, the first preset rule comprises calculating a grid accuracy by the following formula;
Figure BDA0002437256510000091
wherein, theta(j)Representing a grid set Θ in the jth iteration;
Figure BDA0002437256510000092
and
Figure BDA0002437256510000093
respectively representing the interval angles of a pitch angle and an azimuth angle in the jth iteration; θ represents a pitch angle; phi represents an azimuth; phiθRepresenting the pitch angle in the jth iteration; phiφIndicating the azimuth in the j-th iteration.
According to some embodiments of the invention, the weights of the maximum directional beamformer are:
Figure BDA0002437256510000094
wherein, bn(kr) represents the intensity of the mode,
Figure BDA0002437256510000095
representing spherical harmonics, n representing the order, m representing degrees, r representing the radius of the spherical microphone array, theta representing the pitch angle, and phi representing the azimuth angle. The sparse dictionary designed by the weight of the maximum directional beam former has high resolution, can accurately obtain the sound source position information, and is widely used for positioning the position of multiple sound source.
According to some embodiments of the invention, the first grid precision is (10 °, 5 °), wherein 10 ° is an azimuth interval angle and 5 ° is a pitch interval angle. The first grid precision is (10 degrees and 5 degrees), and the first positions of the D sound sources obtained by calculation can meet the requirement of positioning precision while the calculation amount is reduced.
FIG. 4 is a block diagram of a ball microphone array based multi-source localization apparatus 100 according to a first embodiment of the present invention; as shown in fig. 4, a second embodiment of the present invention provides a multi-sound-source positioning apparatus 100 based on a spherical microphone array, where the multi-sound-source positioning apparatus includes D sound sources:
the acquisition module 1 is used for acquiring the voice signal transmitted by the spherical microphone array;
the first signal processing module 2 is configured to perform frame windowing and short-time fourier transform processing on the voice signal to obtain a time-frequency domain voice signal;
the second signal processing module 3 is configured to perform spherical fourier transform processing on the time-frequency domain voice signal to obtain a spherical harmonic domain voice signal;
the sparse dictionary construction module 4 is used for constructing a sparse dictionary for the spherical harmonic domain voice signal according to the weight of the maximum directional beam former and the preset first grid precision;
and the first calculation module 5 is used for calculating and obtaining first positions of the D sound sources in the sparse dictionary by applying a sparse Bayesian learning method and an expectation maximization method.
The multi-source positioning device 100 based on the spherical microphone array according to the embodiment of the second aspect of the present invention utilizes the three-dimensional spatial characteristics of the spherical microphone array to perform omni-directional sampling on the azimuth angle and the pitch angle of the voice signal, so as to obtain the voice signals emitted from D sound source positions, performing framing windowing and short-time Fourier transform processing on the voice signal to obtain a time-frequency domain voice signal, improving the processing efficiency of the voice signal, performing spherical Fourier transform processing and converting the processed signal into a spherical harmonic domain, constructing a sparse dictionary by using the weight value of the maximum directional beam former according to the first grid precision, wherein the first grid precision is (10 degrees, 5 degrees), wherein 10 degrees is an azimuth angle interval angle, 5 degrees is a pitch angle interval angle, in the sparse dictionary, and calculating to obtain first positions of the D sound sources by using a sparse Bayesian learning method and an expectation maximization method. The sparse dictionary designed by the weight of the maximum directional beam former has high resolution, can accurately obtain the sound source position information, and is widely used for positioning the position of multiple sound source.
FIG. 5 is a block diagram of a multi-source ball microphone array based positioning apparatus 100 according to a second embodiment of the present invention; as shown in fig. 5, the method further includes:
the first storage module 6 is used for storing a first parameter value of sparse Bayesian learning and a first preset iteration number N;
the first calculation module 5 is configured to calculate a first mean value and a first covariance of a sparse matrix for the sparse dictionary according to a first parameter value of sparse bayesian learning; estimating a first parameter value of the sparse Bayesian learning according to the first mean value and the first covariance by using an expectation maximization method to obtain a second parameter value; calculating a second mean value and a second covariance of the sparse matrix for the sparse dictionary according to a second parameter value of the sparse Bayesian learning, and completing one iteration; and when the current iteration times are determined to be equal to a first preset iteration time N, stopping iteration, calculating to obtain an Nth average value, and taking the first D highest peaks of the energy spectrum of the Nth average value as the first positions of the D sound sources.
FIG. 6 is a block diagram of a multi-source ball microphone array based positioning device 100 according to a third embodiment of the present invention; as shown in fig. 6, the method further includes:
a second storage module 7, configured to store a second preset iteration number M;
the second calculation module 8 is used for performing mesh refinement in a preset area of the first positions of the D sound sources through a first preset rule; after a sparse dictionary is reconstructed according to the grid precision calculated by the first preset rule, second positions of the D sound sources are obtained by applying a sparse Bayesian learning method and an expectation maximization method, and one iteration of grid refinement is completed; and when the iteration number of the current grid refinement is determined to be equal to a second preset iteration number M, stopping iteration, calculating to obtain a final mean value, and taking the first D highest peaks of the energy spectrum of the mean value as the Mth positions of the D sound sources.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (9)

1. A multi-sound-source positioning method based on a spherical microphone array is characterized in that the multi-sound source is D sound sources, and comprises the following steps:
s1, acquiring a voice signal transmitted by the spherical microphone array;
s2, performing framing windowing and short-time Fourier transform processing on the voice signal to obtain a time-frequency domain voice signal;
s3, performing spherical Fourier transform processing on the time-frequency domain voice signal to obtain a spherical harmonic domain voice signal;
s4, constructing a sparse dictionary for the spherical harmonic domain voice signals according to the weight of the maximum directional beam former and the preset first grid precision;
and S5, calculating the first positions of the D sound sources in the sparse dictionary by using a sparse Bayesian learning method and an expectation maximization method.
2. The method as claimed in claim 1, wherein the obtaining the first positions of the D sound sources in the sparse dictionary by using a sparse bayesian learning method and an expectation maximization method comprises:
s51, setting a first parameter value of sparse Bayesian learning and a first preset iteration number N, and calculating a first mean value and a first covariance of a sparse matrix for the sparse dictionary according to the first parameter value of the sparse Bayesian learning;
s52, estimating a first parameter value of the sparse Bayesian learning according to the first mean value and the first covariance by using an expectation maximization method to obtain a second parameter value; calculating a second mean value and a second covariance of the sparse matrix for the sparse dictionary according to a second parameter value of the sparse Bayesian learning, and completing one iteration; and when the current iteration times are determined to be equal to a first preset iteration time N, stopping iteration, calculating to obtain an Nth average value, and taking the first D highest peaks of the energy spectrum of the Nth average value as the first positions of the D sound sources.
3. The method of claim 1, wherein after obtaining the first positions of the D sound sources, the method further comprises:
s6, setting a second preset iteration number M;
s7, mesh refinement is carried out in a preset area of the first positions of the D sound sources through a first preset rule;
s8, after the sparse dictionary is reconstructed according to the grid precision calculated by the first preset rule, second positions of the D sound sources are obtained by applying a sparse Bayesian learning method and an expectation maximization method, and one iteration of grid refinement is completed; and when the iteration number of the current grid refinement is determined to be equal to a second preset iteration number M, stopping iteration, calculating to obtain a final mean value, and taking the first D highest peaks of the energy spectrum of the mean value as the Mth positions of the D sound sources.
4. The ball microphone array based multi-source localization method of claim 3, wherein the first preset rule comprises calculating a grid accuracy by the following formula;
Figure FDA0002437256500000021
wherein, theta(j)Representing a grid set Θ in the jth iteration;
Figure FDA0002437256500000022
and
Figure FDA0002437256500000023
respectively representing the interval angles of a pitch angle and an azimuth angle in the jth iteration; θ represents a pitch angle; phi represents an azimuth; phiθRepresenting the pitch angle in the jth iteration; phiφDenotes the j (th)Azimuth in the sub-iteration.
5. The method of claim 1, wherein the weights of the maximum directional beamformer are:
Figure FDA0002437256500000024
wherein, bn(kr) represents the intensity of the mode,
Figure FDA0002437256500000025
representing spherical harmonics, n representing the order, m representing degrees, r representing the radius of the spherical microphone array, theta representing the pitch angle, and phi representing the azimuth angle.
6. The ball microphone array based multi-source localization method of claim 5, wherein the first grid precision is (10 °, 5 °), wherein 10 ° is an azimuth interval angle and 5 ° is a pitch interval angle.
7. A multi-sound-source positioning device based on a spherical microphone array, wherein the multi-sound-source comprises D sound sources, and the device comprises:
the acquisition module is used for acquiring the voice signal transmitted by the spherical microphone array;
the first signal processing module is used for performing frame windowing and short-time Fourier transform processing on the voice signal to obtain a time-frequency domain voice signal;
the second signal processing module is used for carrying out spherical Fourier transform processing on the time-frequency domain voice signal to obtain a spherical harmonic domain voice signal;
the sparse dictionary building module is used for building a sparse dictionary for the spherical harmonic domain voice signals according to the weight of the maximum directional beam former and the preset first grid precision;
and the first calculation module is used for calculating and obtaining first positions of the D sound sources in the sparse dictionary by applying a sparse Bayesian learning method and an expectation maximization method.
8. The ball microphone array based multi-source pointing device of claim 7, further comprising:
the first storage module is used for storing a first parameter value of sparse Bayesian learning and a first preset iteration number N;
the first calculation module is used for calculating a first mean value and a first covariance of a sparse matrix for the sparse dictionary according to a first parameter value of sparse Bayesian learning; estimating a first parameter value of the sparse Bayesian learning according to the first mean value and the first covariance by using an expectation maximization method to obtain a second parameter value; calculating a second mean value and a second covariance of the sparse matrix for the sparse dictionary according to a second parameter value of the sparse Bayesian learning, and completing one iteration; and when the current iteration times are determined to be equal to a first preset iteration time N, stopping iteration, calculating to obtain an Nth average value, and taking the first D highest peaks of the energy spectrum of the Nth average value as the first positions of the D sound sources.
9. The ball microphone array based multi-source pointing device of claim 7, further comprising:
the second storage module is used for storing a second preset iteration number M;
the second calculation module is used for carrying out grid refinement in a preset area of the first positions of the D sound sources through a first preset rule; after a sparse dictionary is reconstructed according to the grid precision calculated by the first preset rule, second positions of the D sound sources are obtained by applying a sparse Bayesian learning method and an expectation maximization method, and one iteration of grid refinement is completed; and when the iteration number of the current grid refinement is determined to be equal to a second preset iteration number M, stopping iteration, calculating to obtain a final mean value, and taking the first D highest peaks of the energy spectrum of the mean value as the Mth positions of the D sound sources.
CN202010255782.9A 2020-04-02 2020-04-02 Multi-sound-source positioning method and device based on spherical microphone array Pending CN111537955A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010255782.9A CN111537955A (en) 2020-04-02 2020-04-02 Multi-sound-source positioning method and device based on spherical microphone array

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010255782.9A CN111537955A (en) 2020-04-02 2020-04-02 Multi-sound-source positioning method and device based on spherical microphone array

Publications (1)

Publication Number Publication Date
CN111537955A true CN111537955A (en) 2020-08-14

Family

ID=71952217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010255782.9A Pending CN111537955A (en) 2020-04-02 2020-04-02 Multi-sound-source positioning method and device based on spherical microphone array

Country Status (1)

Country Link
CN (1) CN111537955A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114527427A (en) * 2022-01-27 2022-05-24 华南理工大学 Low-frequency beam forming sound source positioning method based on spherical microphone array
CN116338574A (en) * 2023-04-10 2023-06-27 哈尔滨工程大学 Sparse Bayesian learning underwater sound source positioning method based on matched beam

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103592628A (en) * 2013-11-12 2014-02-19 上海大学 Multi-sound-source positioning method based on formation of real value weight beam in spherical harmonic domain
CN110718230A (en) * 2019-08-29 2020-01-21 云知声智能科技股份有限公司 Method and system for eliminating reverberation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103592628A (en) * 2013-11-12 2014-02-19 上海大学 Multi-sound-source positioning method based on formation of real value weight beam in spherical harmonic domain
CN110718230A (en) * 2019-08-29 2020-01-21 云知声智能科技股份有限公司 Method and system for eliminating reverberation

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
宋涛等: "基于球谐递归关系的球阵列多声源定位方法", 《新型工业化》 *
张耀允等: "地理信息技术在公路运行速度协调性分析中的应用", 《北方交通》 *
戴玮: "基于球谐域稀疏贝叶斯学习的室内多声源定位方法研究", 《信息科技辑》 *
朱宏辉: "《知识驱动型拟人智能控制系统研究》", 31 March 2012 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114527427A (en) * 2022-01-27 2022-05-24 华南理工大学 Low-frequency beam forming sound source positioning method based on spherical microphone array
CN114527427B (en) * 2022-01-27 2024-03-29 华南理工大学 Low-frequency wave beam forming sound source positioning method based on spherical microphone array
CN116338574A (en) * 2023-04-10 2023-06-27 哈尔滨工程大学 Sparse Bayesian learning underwater sound source positioning method based on matched beam
CN116338574B (en) * 2023-04-10 2023-09-19 哈尔滨工程大学 Sparse Bayesian learning underwater sound source positioning method based on matched beam

Similar Documents

Publication Publication Date Title
CN106872944B (en) Sound source positioning method and device based on microphone array
AU2015292238B2 (en) Planar sensor array
CN103583054B (en) For producing the apparatus and method of audio output signal
Yook et al. Fast sound source localization using two-level search space clustering
CN111489753B (en) Anti-noise sound source positioning method and device and computer equipment
CN110197112B (en) Beam domain Root-MUSIC method based on covariance correction
CN111123192A (en) Two-dimensional DOA positioning method based on circular array and virtual extension
CN109932689A (en) A kind of General Cell optimization method suitable for certain position scene
CN111537955A (en) Multi-sound-source positioning method and device based on spherical microphone array
CN104502904B (en) Torpedo homing beam sharpening method
CN111624553A (en) Sound source positioning method and system, electronic equipment and storage medium
EP3695403B1 (en) Joint wideband source localization and acquisition based on a grid-shift approach
Diaz-Guerra et al. Direction of arrival estimation of sound sources using icosahedral CNNs
CN108614235B (en) Single-snapshot direction finding method for information interaction of multiple pigeon groups
CN113314138B (en) Sound source monitoring and separating method and device based on microphone array and storage medium
Chen et al. Multiple sound source localization, separation, and reconstruction by microphone array: A dnn-based approach
CN113593596B (en) Robust self-adaptive beam forming directional pickup method based on subarray division
CN109254265A (en) A kind of whistle vehicle positioning method based on microphone array
CN110568406B (en) Positioning method based on acoustic energy under condition of unknown energy attenuation factor
Svaizer et al. Environment aware estimation of the orientation of acoustic sources using a line array
CN111830465A (en) Two-dimensional Newton orthogonal matching tracking compressed beam forming method
Lobato et al. Deconvolution with neural grid compression: A method to accurately and quickly process beamforming results
CN110824484B (en) Array element position estimation method based on constant modulus algorithm
Tsuchiya et al. Two-dimensional finite-difference time-domain simulation of moving sound source and receiver with directivity
CN112083423A (en) Multi-base sound source high-precision positioning method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200814

RJ01 Rejection of invention patent application after publication