CN113109764A

CN113109764A - Sound source positioning method and system

Info

Publication number: CN113109764A
Application number: CN202110405303.1A
Authority: CN
Inventors: 蔡希昌; 刘都鑫; 于东池; 白文乐; 董哲
Original assignee: North China University of Technology
Current assignee: North China University of Technology
Priority date: 2021-04-15
Filing date: 2021-04-15
Publication date: 2021-07-13
Anticipated expiration: 2041-04-15
Also published as: CN113109764B

Abstract

The invention relates to a sound source positioning method and a system, wherein the method comprises the following steps: acquiring input sound signals of all microphones of a microphone array; calculating a time difference of arrival of a sound source signal at each of the microphones based on an input sound signal of the microphone; calculating a rough positioning position of a sound source in a rough positioning space according to the time difference of the sound source signal reaching the microphone; calculating a fine localization position of a sound source in a fine localization space according to the input sound signals of the microphones and the coarse localization position; the coarse positioning space and the fine positioning space are both used by the microphone array as a sphere center, and the sphere space is composed of a plurality of discrete points, the discrete points are the top points of the sphere space, and the number of the discrete points in the coarse positioning space is smaller than that of the discrete points in the fine positioning space. According to the method, after the area where the sound source is located is rapidly searched by using the coarse positioning space, the accurate position of the sound source is obtained by searching by using the fine positioning space, and the sound source positioning precision is extremely high.

Description

Sound source positioning method and system

Technical Field

The invention relates to the technical field of sound source positioning, in particular to a sound source positioning method and system.

Background

With the rapid development of scientific technology, more application scenes and technical requirements are brought to the target positioning technology. Therefore, in recent years, sound source localization has become an indispensable important technology in many application scenarios. For example, sound source information is acquired by sound source positioning, so that a microphone in a video conference can automatically track a speaker, more accurate collected voice is acquired, and voice enhancement is realized; the sound positioning capability and the voice interaction capability of the intelligent robot can be further improved; the method can be used for a monitoring system, is an effective supplementary technology for traditional video monitoring, and can play the specific advantages of the system when the video monitoring effect is poor, so that the monitoring strength is enhanced; but also can be widely applied to audio frequency and signal processing fields such as intelligent home, home security alarm, danger detection in home environment, etc. However, the sound source positioning accuracy is often unsatisfactory at present.

Disclosure of Invention

The invention aims to provide a sound source positioning method and a sound source positioning system so as to improve the sound source positioning accuracy.

In order to achieve the purpose, the invention provides the following scheme:

a sound source localization method, the sound source localization method comprising:

acquiring input sound signals of all microphones of a microphone array;

calculating a time difference of arrival of a sound source signal at each of the microphones based on an input sound signal of the microphone;

calculating a rough positioning position of a sound source in a rough positioning space according to the time difference of the sound source signal reaching the microphone;

calculating a fine localization position of a sound source in a fine localization space according to the input sound signals of the microphones and the coarse localization position; the coarse positioning space and the fine positioning space are both used by the microphone array as a sphere center, and the sphere space is composed of a plurality of discrete points, the discrete points are the top points of the sphere space, and the number of the discrete points in the coarse positioning space is smaller than that of the discrete points in the fine positioning space.

Optionally, the microphone array is a ring-shaped microphone array composed of four heart-shaped directional microphones.

Optionally, the calculating of the time difference includes:

representing the input sound signals of every two microphones as a cross-correlation function;

representing the cross-correlation function as a generalized cross-correlation function using a generalized cross-correlation method;

correcting a weighting function in the generalized cross-correlation function by using a SCOT and PHAT combined weighting method to obtain a corrected generalized cross-correlation function;

and calculating the time difference according to the modified generalized cross-correlation function.

Optionally, the calculating of the coarse positioning position includes:

correcting the sound velocity according to the environmental parameters of the spherical space to obtain the corrected sound velocity;

and calculating the coarse positioning position according to the corrected sound velocity and the time difference.

Optionally, the calculating the coarse positioning position according to the corrected sound velocity and the time difference specifically includes:

calculating the initial position of the sound source according to the corrected sound velocity and the time difference;

and obtaining the discrete point closest to the initial position of the sound source in the coarse positioning space to obtain the coarse positioning position.

Optionally, the calculating of the fine positioning position includes:

acquiring a plurality of discrete points in the fine positioning space, wherein the distance from the fine positioning space to the coarse positioning position is within a preset range;

calculating controllable beam response of each discrete point in the fine positioning space, wherein the distance from the microphone to the coarse positioning position is within a preset range;

and obtaining discrete points corresponding to the maximum controllable beam response value to obtain a fine positioning position.

Optionally, the controllable beam response calculation formula is:

where p (θ) is the steerable beam response, M is the number of microphones, M is the mth microphone, n is the nth microphone, ψ_m,n(w) is a weighting function in the steerable beam response,

is the cross power spectrum of the input sound signal of the m-th microphone and the input sound signal of the n-th microphone,

is the self-power spectrum of the input sound signal of the mth microphone,

self-power spectrum of input sound signal for nth microphone, X_m(w) is the Fourier transform of the input sound signal of the mth microphone,

is the conjugate of the input sound signal of the nth microphone, j is a complex number, w is the frequency, τ_m,n(theta) is the time difference between the arrival time of the sound source signal at the mth microphone and the nth microphone when the horizontal azimuth angle or the vertical pitch angle of the mth microphone and the nth microphone is theta, and tau_m,nAnd (theta) is calculated by adopting the corrected sound velocity, and dw is an integral sign.

Optionally, the coarse positioning space and the fine positioning space are both formed by a triangular network, and the discrete point is a vertex of the triangular network; the number of discrete points in the coarse positioning space is as follows:

k₁＝10×4^u+2

the number of discrete points in the fine positioning space is:

k₂＝10×4^v+2

wherein k is₁For coarse localization of the number of discrete points in space, k₂And d, finely positioning the number of discrete points in the space, wherein u and v are both natural numbers, and u is less than v.

A sound source localization system, comprising:

the acquisition module is used for acquiring input sound signals of all microphones of the microphone array;

the first calculation module is used for calculating the time difference of sound source signals reaching the microphones according to the input sound signals of the microphones;

the second calculation module is used for calculating the rough positioning position of the sound source in the rough positioning space according to the time difference of the sound source signal reaching the microphone;

a third calculating module, configured to calculate a fine localization position of the sound source in the fine localization space according to the input sound signals of the microphones and the coarse localization position; the coarse positioning space and the fine positioning space are both used by the microphone array as a sphere center, and the sphere space is composed of a plurality of discrete points, the discrete points are the top points of the sphere space, and the number of the discrete points in the coarse positioning space is smaller than that of the discrete points in the fine positioning space.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

the invention discloses a sound source positioning method and a system, wherein the method comprises the following steps: acquiring input sound signals of all microphones of a microphone array; calculating a time difference of arrival of a sound source signal at each of the microphones based on an input sound signal of the microphone; calculating a rough positioning position of a sound source in a rough positioning space according to the time difference of the sound source signal reaching the microphone; calculating a fine localization position of a sound source in a fine localization space according to the input sound signals of the microphones and the coarse localization position; the coarse positioning space and the fine positioning space are both used by the microphone array as a sphere center, and the sphere space is composed of a plurality of discrete points, the discrete points are the top points of the sphere space, and the number of the discrete points in the coarse positioning space is smaller than that of the discrete points in the fine positioning space. According to the method, after the area where the sound source is located is rapidly searched by using the coarse positioning space, the accurate position of the sound source is obtained by searching by using the fine positioning space, and the sound source positioning precision is extremely high.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a flow chart of a sound source localization method according to the present invention;

FIG. 2 is a first schematic view of a sphere provided by the present invention;

FIG. 3 is a second schematic view of a sphere provided by the present invention;

fig. 4 is a schematic view of a sphere provided by the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention aims to provide a sound source positioning method and a sound source positioning system so as to improve the sound source positioning accuracy

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

As shown in the figure, the sound source localization method includes:

step 101: an input sound signal of each microphone of a microphone array is acquired.

The microphone array is a ring-shaped microphone array formed by four heart-shaped directional microphones. The radius of the annular microphone is 4.5 cm. The cardioid directional microphone can reduce the gain of the rear sound signal when receiving sound signals from the front and both sides. When the input sound signal is obtained through the annular microphone array formed by the heart-shaped directional microphones, the input sound signal in the scanning direction can be effectively obtained, and the input sound signal in the non-scanning direction is reduced.

Step 102: the time difference of arrival of the sound source signal at the microphones is calculated from the input sound signals of the respective microphones.

The Time Difference of arrival of the sound source signal at the microphone is calculated using a Generalized cross-correlation (GCC) algorithm in a TDOA (Time Difference of arrival) algorithm. The method specifically comprises the following steps:

the input sound signals of each two microphones are represented as a cross-correlation function.

The cross-correlation function is expressed as a generalized cross-correlation function using a generalized cross-correlation method.

And correcting the weighting function in the generalized cross-correlation function by using a SCOT and PHAT combined weighting method to obtain a corrected generalized cross-correlation function.

The time difference calculation principle is as follows:

the input sound signal for each pair of microphones in the microphone array may be represented as:

x₁(t)＝h₁(t)*S₁(t)+n₁(t)

x₂(t)＝h₂(t)*s₁(t-τ₁₂)+n₂(t)

wherein x is₁(t) is the input sound signal of the first microphone, x₂(t) is the input sound signal of the second microphone, h₁(t) impulse response of the sound source corresponding to the first microphone, h₂(t) is the impulse response of the sound source corresponding to the second microphone, n₁(t) is the noise signal of the first microphone, n₂(t) is the noise signal of the second microphone, s₁(t) isAcoustic source signal, s₁(t-τ₁₂) As a sound source signal s₁(t) time delay τ₁₂The latter signal, t being time, τ₁₂Is the time difference between the arrival of the sound source signal at the first microphone and the second microphone.

Will input the sound signal x₁(t) and x₂(t) is expressed using a cross-correlation function:

wherein the content of the first and second substances,

as a function of cross-correlation, x₂(t-τ₁₂) For the time delay tau of the input sound signal of the second microphone relative to the input sound signal of the first microphone₁₂The latter signal, dt, is the first integral sign.

According to the wiener-cinchona theorem, the cross-correlation function and its cross-power spectral density are each a fourier transform pair, and thus the cross-correlation function can be expressed as follows:

wherein the content of the first and second substances,

is x₁(t) and x₂Cross power spectrum between (t), X₁(ω) is x₁(t) a Fourier transform of the (t),

is X₂Conjugation of (ω), X₂(ω) is x₂(t) Fourier transform, j is complex number, w is frequency, τ is x₂(t) and x₁(t) time delay.

To overcome the effects of reverberation and noise

The problem that the peak value is not obvious and the positioning accuracy is reduced is caused, and a generalized cross-correlation method is needed, namely, a signal is filtered through a weighting function to highlight the frequency spectrum component of a response signal part, the frequency spectrum of a noise part of an input signal is suppressed, and the positioning accuracy is improved. The generalized cross-correlation function is expressed as:

wherein the content of the first and second substances,

in the generalized cross-correlation function, d ω is the sign of the integral,

in order to obtain the modified weighting function,

is x₁(t) a self-power spectrum of,

is x₂(t) self-power spectrum. The weighting function in the generalized cross-correlation function is corrected by using the SCOT and PHAT combined weighting method, so that the calculation error of the time difference is smaller, the precision is higher, the noise resistance is stronger, and the accuracy and the real-time performance of sound source positioning are ensured.

Generalized cross correlation function

The obtained time difference tau corresponding to the maximum value is the time difference tau of the sound source signal reaching the first microphone and the second microphone₁₂. The calculation formula is as follows:

step 103: and calculating the rough positioning position of the sound source in the rough positioning space according to the time difference of the sound source signal reaching the microphone.

Firstly, space division is carried out to form a spherical space which takes the microphone array as the spherical center and consists of a plurality of discrete points, and the discrete points are the vertexes of the spherical space. The coarse positioning space and the fine positioning space are both formed by triangular networks, and the discrete points are the vertexes of the triangular networks. The number of discrete points in the coarse positioning space is: k is a radical of₁＝10×4^u+2, the number of discrete points in fine positioning space is: k is a radical of₂＝10×4^v+2. Wherein k is₁For coarse localization of the number of discrete points in space, k₂And d, finely positioning the number of discrete points in the space, wherein u and v are both natural numbers, and u is less than v. When u is equal to 0, the spherical space at this time can be regarded as an initial space, which is a regular convex icosahedron composed of 12 discrete points and has 20 triangular networks, and the initial spherical space is shown in fig. 2. When u is equal to 1 or v is equal to 1, the spherical space at this time can be regarded as a recursively subdivided space, which is composed of 42 discrete points and has 80 triangular networks, and the recursively subdivided space is as shown in fig. 3. When u is equal to 4 or v is equal to 1, the spherical space at this time can be regarded as a space after four recursive subdivisions, which is composed of 2562 discrete points and has 5120 triangular networks, and the space after four recursive subdivisions is shown in fig. 4.

The coarse positioning position is then calculated:

and correcting the sound velocity according to the environmental parameters of the spherical space to obtain the corrected sound velocity. The sound velocity correction formula is as follows:

wherein T is the temperature in the spherical space, Pw is the partial pressure of air in the spherical space, Pw ═ Pa × RH, Pa is the saturated vapor pressure of water, RH is the relative humidity in the spherical space, and P is the atmospheric pressure in the spherical space. The sound velocity is corrected, and the sound source positioning precision can be optimized.

And calculating a coarse positioning position according to the corrected sound velocity and the time difference. Wherein, calculating the coarse positioning position according to the corrected sound velocity and the time difference comprises:

and calculating the initial position of the sound source according to the corrected sound velocity and the time difference. The initial position of the sound source is calculated as follows:

τ₁₂for the time difference of arrival of the sound source signal at the first and second microphones, τ₁₃For the time difference between the arrival of the sound source signal at the first microphone and the third microphone, τ₁₄Is the time difference of arrival of the sound source signal at the first microphone and the fourth microphone, c is the corrected sound velocity, m_i(i ═ 1, 2,3,4) is the distance from the sound source to the ith microphone.

(x_i,y_i,z_i) Is the coordinate of the microphone in spherical space, (x)_s,y_s,z_s) Is the calculated position, i.e. the initial position of the sound source.

And obtaining the discrete point closest to the initial position of the sound source in the rough positioning space to obtain a rough positioning position.

Step 104: and calculating a fine localization position of the sound source in the fine localization space according to the input sound signal of each microphone and the coarse localization position. Wherein, the calculation process of the fine positioning position comprises the following steps:

and acquiring a plurality of discrete points with the distance to the coarse positioning position in the fine positioning space within a preset range.

And calculating controllable beam response of each discrete point with the distance from the coarse positioning position to the fine positioning space within a preset range according to the input sound signals of the microphones.

The principle of the fine positioning position calculation is as follows:

and mapping the coarse positioning position (x, y, z) into a coarse positioning space, wherein the mapping method comprises the following steps:

wherein beta is the azimuth angle of the coarse positioning position in the horizontal direction of the sound source,

alpha is the pitch angle of the coarse positioning position in the vertical direction of the sound source, a is a coefficient, and a belongs to (0, 1).

And calculating the accurate position of the sound source by using an SRP-PHAT algorithm, wherein the accurate position can be represented by a pitch angle alpha 'and a horizontal azimuth angle beta' of a certain discrete point in the fine positioning space.

The pitch angle θ 'and the horizontal azimuth angle α' are obtained using a method of maximizing a steerable beam response (SRP), which may be represented using the output power of a Filtered Sum (FS), where the Filtered Sum (FS) function may be represented as:

where M is the number of microphones, G_m(W) is an expression of the filter, X_m(W) Fourier transform of the mth microphone input sound signal, τ_m(θ) is the time difference between the arrival time of the sound source signal at the mth microphone and the relative reference microphone when the horizontal azimuth angle or the vertical pitch angle of the mth microphone and the relative reference microphone is θ.

The steerable beam response (SRP) can be expressed as:

substituting the Filter Sum (FS) function to obtain:

where p (θ) is the steerable beam response, M is the number of microphones, M is the mth microphone, n is the nth microphone, ψ_m,n(w) is a weighting function in the controllable beam response, and is also modified by using a method of joint weighting of SCOT and PHAT,

is the self-power spectrum of the input sound signal of the mth microphone,

Calculating the controllable beam response of the horizontal azimuth angle of the 5 closest discrete points of the coarse positioning position in the fine positioning space, wherein the maximum value is the azimuth angle in the horizontal direction of the sound source, and can be expressed as follows:

where P (β ') is the steerable beam response at azimuth β' in the horizontal direction.

And then calculating the controllable wave beam response of the pitch angle of the discrete point, wherein the maximum value is the pitch angle in the vertical direction of the sound source and can be expressed as:

where P (α ') is the steerable beam response at a vertical pitch angle α'.

The fine localization position, i.e., the precise position, of the sound source is represented by the pitch angle α "and the horizontal azimuth angle β" of a certain discrete point in the fine localization space.

This embodiment also discloses a sound source positioning system, and sound source positioning system includes:

and the acquisition module is used for acquiring the input sound signals of all the microphones of the microphone array.

And the first calculating module is used for calculating the time difference of the sound source signal reaching the microphones according to the input sound signals of the microphones.

And the second calculation module is used for calculating the rough positioning position of the sound source in the rough positioning space according to the time difference of the sound source signal reaching the microphone.

And the third calculation module is used for calculating the fine positioning position of the sound source in the fine positioning space according to the input sound signals of the microphones and the coarse positioning position. The coarse positioning space and the fine positioning space are both spherical spaces which take the microphone as the sphere center and are composed of a plurality of discrete points, the discrete points are the vertexes of the spherical spaces, and the number of the discrete points in the coarse positioning space is smaller than that of the discrete points in the fine positioning space.

The directional microphone is used for acquiring the input sound signal, the microphone has selectivity to the sound source direction, the space and the microphone logarithm which need to be scanned when the sound source is positioned are reduced, the calculated amount is reduced, and the real-time performance of sound source positioning is improved.

The invention adopts a hierarchical searching and positioning method, and divides a sound source positioning space into a coarse positioning space and a fine positioning space by using two positioning modes of thickness. The method comprises the steps of firstly carrying out coarse positioning by using a positioning algorithm with small calculation amount and rapid positioning to obtain the general position of a sound source in a coarse positioning space, and then carrying out accurate positioning by using a positioning algorithm with high accuracy and large calculation amount, so that the method has extremely high sound source positioning accuracy under the condition of ensuring the positioning real-time property.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A sound source localization method, characterized by comprising:

acquiring input sound signals of all microphones of a microphone array;

2. The sound source localization method according to claim 1, wherein the microphone array is a ring-shaped microphone array composed of four cardioid directional microphones.

3. The sound source localization method according to claim 1, wherein the calculation process of the time difference includes:

4. The sound source localization method according to claim 1, wherein the calculation process of the coarse localization position includes:

5. The sound source localization method according to claim 4, wherein the calculating the coarse localization position according to the corrected sound velocity and the time difference specifically includes:

6. The sound source localization method according to claim 1, wherein the calculation process of the fine localization position includes:

7. The sound source localization method of claim 6, wherein the steerable beam response is calculated by the formula:

is the self-power spectrum of the input sound signal of the mth microphone,

8. The sound source localization method according to claim 1, wherein the coarse localization space and the fine localization space are each constituted by a triangular network, and the discrete points are vertices of the triangular network; the number of discrete points in the coarse positioning space is as follows:

k₁＝10×4^u+2

the number of discrete points in the fine positioning space is:

k₂＝10×4^v+2

9. A sound source localization system, comprising: