CN111273231A

CN111273231A - Indoor sound source positioning method based on different microphone array topological structure analysis

Info

Publication number: CN111273231A
Application number: CN202010206270.3A
Authority: CN
Inventors: 孙昊彬; 王玫; 宋浠瑜; 罗丽燕; 周陬; 仇洪冰
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2020-03-23
Filing date: 2020-03-23
Publication date: 2020-06-12

Abstract

The invention discloses an indoor sound source positioning method based on different microphone array topological structure analysis. The method can be used for positioning the sound source when the topological structure of the microphone array is changed, and analyzing the error of the sound source and comparing the error with other types of arrays. Meanwhile, a phase transformation weighting controllable response power positioning algorithm based on random region contraction is used, and a positioning result can be well obtained under the indoor high reverberation condition. A user can select a corresponding microphone array topological structure for analysis according to own requirements. After the microphone array meeting the self requirement is selected, the amplitude-frequency characteristic of the received signal can be calibrated by using an array calibration scheme combining multi-channel low-pass filtering and multi-channel adaptive filtering, and the positioning precision is improved.

Description

Indoor sound source positioning method based on different microphone array topological structure analysis

Technical Field

The invention relates to the field of indoor position service, in particular to an indoor sound source positioning method based on different microphone array topological structure analysis.

Background

Voice is one of the most effective ways for people to communicate information. In a communication system using digital audio technology, people use a microphone to collect voice signals, process or store the voice signals, and apply the signals to man-machine interaction, video conferencing, remote transmission, and the like. Is provided with

Is the distance of the sound source from the microphone array,

is the aperture of the microphone array and,

is the operating wavelength of the sound source, under near field conditions of the microphone, i.e. when

When the voice signal collected by the microphone is established, the voice signal can be considered as a signal without noise interference, and the voice quality is higher. However, in many cases this condition is not met, as in human-computer interaction, video conferencing, etc., the speaking person is typically in the far field of the array. Therefore, in the case of the far field of the array, strong environmental noise, reflected sound, directional interference, etc. are inevitably mixed in the microphone received signal, so that the quality of the picked-up voice signal is greatly reduced.

The single-channel speech cannot achieve accurate sound source localization and tracking, so the algorithm for sound source localization and tracking is generally directed to multi-channel speech. There are three types of commonly used multi-channel sound source localization techniques:

1. the method based on the time difference of arrival is characterized in that under the conditions of low signal-to-noise ratio and strong reverberation, the existing time delay estimation method has large errors, and in addition, the positioning method is only suitable for a single sound source and is difficult to be used for positioning multiple sound sources;

2. the method based on high-resolution spectrum estimation can realize orientation, has poor precision on precise positioning, and has strict requirements on received acoustic signals, so the method is difficult to be used in an actual voice sound source positioning system;

3. the method of controllable beam forming based on maximum output power becomes one of the most popular sound source positioning algorithms at present, and the algorithm has good robustness under high reverberation and high positioning accuracy. In addition, the single-channel speech enhancement method is difficult to suppress directional interference and perform noise reduction processing, so that a far-field beam forming method is required to be adopted for multi-channel speech enhancement and processing, different microphone array topologies are considered, and the spatial filtering effect of the array is improved.

According to the array signal processing theory, the optimal arrangement of the array elements has important influence on the performance of the array processing system. Microphone array topologies can be divided into three categories: one-dimensional arrays (such as linear arrays such as nested linear arrays and equidistant linear arrays), two-dimensional arrays (such as planar arrays such as circular arrays and square arrays), and three-dimensional arrays (such as three-dimensional arrays such as star arrays and spherical arrays). When the array topology is different, for example, the dimension of the array, the number of the array elements, and the array element spacing all affect the positioning accuracy and the operation speed of the microphone array positioning algorithm. In the actual space positioning process, the one-dimensional and two-dimensional array positioning effects are not good, so that the research on a reasonable three-dimensional array topological structure has practical significance.

Currently, indoor mobile sound source localization studies based on microphone arrays are all performed under the assumption that the frequency response of the received signals of the microphone arrays maintains high consistency. However, in an actual test, due to the fact that the manufacturing of the microphone has tolerance, and is influenced by a plurality of composite factors such as the using time and the indoor complex environment, the frequency response characteristic of the received signal of the microphone array has large deviation from a theoretical value, and the positioning accuracy is reduced due to the deviation, so that the calibration of the frequency response of the microphone array is of great significance for further improving the positioning accuracy of the indoor mobile sound source.

Disclosure of Invention

The invention aims to solve the defects and provide an indoor sound source positioning method based on different microphone array topological structure analyses. The method utilizes microphone arrays with different topological structures to obtain voice signals, carries out a positioning algorithm based on phase transformation weighting and controllable corresponding power, and adds an optimization algorithm of random area contraction, analyzes the performance of each microphone array from a positioning result, and provides an array calibration scheme combining multi-channel low-pass filtering and multi-channel adaptive filtering aiming at the actual problem that the sound source positioning capability is insufficient under the condition that interference exists between the microphone arrays and speakers (sound sources) in a real indoor environment, so that the positioning result has higher precision.

The technical scheme for realizing the purpose of the invention is as follows:

an indoor sound source positioning method based on different microphone array topological structure analysis comprises the following steps:

(1) the method comprises the steps that an indoor sound source positioning system of a microphone array is arranged, and the system consists of a microphone array topological structure analysis module, an array self-adaptive filtering correction module, a sound source positioning algorithm and an analysis module which are sequentially connected;

(2) extracting a voice signal: arranging proper microphone arrays indoors, enabling speakers to sound, recording voices of the speakers, and extracting audio signals corresponding to each microphone

、

……

；

(3) Dividing sound source space into multiple grids, and calculating power of each point on the grids in turn

(

The point of maximum power being soundPoints of source localization

=

(

；

(4) Total power of any one point

(

) Performing generalized cross-correlation based on phase transformation on signals of all microphone pairs on a microphone array pairwise and summing:

(

)=

wherein k and l represent the kth and l microphones,

weights representing phase transformations, τ, (

) Represents the time from the sound arriving at the kth microphone from position X; in the formula (II)

Defined as a combined weighting function

：

Taking into account the calculation

(

) Symmetry involved, and removing some fixed energy terms, then

(

) Moiety varying with x

Comprises the following steps:

=

(5) global search is carried out in the whole room, and a coordinate point Y with the maximum energy is obtained by utilizing a random region contraction algorithm (SRC); randomly finding out an N-dimensional matrix in a given initial value, gradually reducing the range in the sequential process until reaching a small enough range, and finding out a peak value; thereby calculating a positioning coordinate point.

In step (4), to simplify the calculation

Can be replaced by:

=

in step (5), the process of the random region shrinking algorithm is as follows:

1) i is defined as the number of iterations,

representing the number of points randomly drawn at the ith iteration,

representing the number of points contained in the sub-search space of the next generation,

representing the next generation sub-search space. Define once per calculation

Is just recorded as one

，

After the ith iteration

The number of times of the operation of the motor,

denotes the stop value,. phi.

The maximum number of allowed calculations.

Representing a new sub-search space

The boundary of (2);

2) the number of initialization iterations i = 0;

3) setting initial parameters:

、

，

；

4) computing

All of

A value of (d);

5) arrange out

So that

≪

；

6) According to

Shrinking the current search space and updating the search space

And new region boundaries

；

7) If it is not

Or is or

And is

Determining the coordinate position of the point, storing the result and outputting the result;

8) if only

If yes, the result is discarded;

9) in that

To find a subset

So that

Is greater than

Average value of (d);

10) repeating the step 3) and the step 4) in the current search space

In the random selection

Points, calculating their corresponding

A value of (d);

11) will be provided with

Put the points in the subset

And selecting

In

Of greatest value

Point put in subset

In and store

Collecting and putting the data into the next iteration for use;

12) order to

And performing the next iteration and returning to the step 5).

The invention has the advantages that: the invention provides a set of indoor sound source positioning method based on different microphone array topological structure analysis and an array calibration scheme integrating multi-channel low-pass filtering and multi-channel adaptive filtering. The method can be used for positioning the sound source when the topological structure of the microphone array is changed, and analyzing the error of the sound source and comparing the error with other types of arrays. Meanwhile, a phase transformation weighting controllable response power positioning algorithm based on random region contraction is used, and a positioning result can be well obtained under the indoor high reverberation condition. A user can select a corresponding microphone array topological structure for analysis according to own requirements. After the microphone array meeting the self requirement is selected, the amplitude-frequency characteristic of the received signal can be calibrated by using an array calibration scheme combining multi-channel low-pass filtering and multi-channel adaptive filtering, and the positioning precision is improved.

Drawings

FIG. 1 is a flow chart of microphone array indoor speaker localization according to an embodiment of the present invention;

FIG. 2 is a diagram of the positioning effect of different microphone array types according to the embodiment of the present invention;

FIG. 3 is a diagram illustrating the positioning effect of different array element spacings of an array according to an embodiment of the present invention;

FIG. 4 shows the positioning error and the calculated amount when the three-dimensional orthogonal array element spacing is 10cm according to the embodiment of the present invention;

FIG. 5 is a schematic diagram of an SRP-PHAT positioning system based on the fusion of multi-channel low-pass filtering and multi-channel adaptive filtering according to an embodiment of the present invention;

FIG. 6 is a graph comparing microphone frequency responses before filtering according to an embodiment of the present invention;

FIG. 7 is a graph comparing the frequency response of a microphone after filtering according to an embodiment of the present invention.

Detailed Description

The invention is further illustrated with reference to the following figures and examples.

Example (b):

as shown in fig. 1, an indoor sound source localization method based on different microphone array topological structure analysis is to set a microphone array indoor speaker localization system, which is composed of three modules: the system comprises a microphone array topological structure analysis module, an array self-adaptive filtering correction module and a speaker positioning algorithm module.

(1) Microphone array topology analysis module:

in order to explore the influence of different array topologies on the positioning result, the present example adopts a controlled variable method to the microphone array: and carrying out variable adjustment on array dimension, array element interval and array element number to form microphone arrays with different topological structures. The three different topological structure arrays of the one-dimensional linear array, the two-dimensional T-shaped array and the three-dimensional orthogonal array are subjected to development analysis, and error analysis shown in figure 2 shows that the topological structure of the three-dimensional orthogonal array has better positioning performance than the other two arrays, and the optimal selection of the number of array elements under the array is shown. Fig. 3 is an analysis of the array element spacing with a determined number of array elements in the array dimension, the graphical result reflecting the optimal choice of array element spacing. Fig. 4 reflects the accuracy and calculated quantity curve of positioning with the determined array dimension, array element spacing and array element number after the above analysis.

(2) The array adaptive filtering correction module:

the array calibration scheme (as shown in fig. 5) combining the multi-channel low-pass filtering and the multi-channel adaptive filtering, which is provided by the present example, is used as an intermediate module connecting the microphone array topology analysis module and the speaker positioning algorithm module, and can correct the array elements on the determined array type, thereby improving the positioning accuracy.

(3) Speaker location algorithm module:

the module calculates the controllable response power of the received signal using phase transformation weighting. And searching a coordinate enabling the controllable response power to reach the maximum in a preset sound source space, and obtaining the position estimation of the real sound source. The voice signals are directly obtained by the microphone array and then separated to obtain multi-path single microphone voice signals. Because the process of searching the maximum value of the power is too large in calculation amount, the system searches the peak value by using a random region contraction optimization algorithm. And comparing the obtained positioning coordinates with the real coordinates, and analyzing the performances of different microphone arrays through the comparison of the errors. The method comprises the following specific steps:

1. extracting voice signals, arranging a proper microphone array indoors, enabling a speaker to sound, recording the voice of the speaker, and extracting audio signals corresponding to each microphone

、

……

。

2. The principle of the controllable response power positioning algorithm is to divide the sound source space into a plurality of grids and calculate the power of each point on the grids in turn

(

The point of maximum power being the sound source locationDot

=

(

。

3. Total power of any one point

(

) It can be considered that signals of all microphone pairs in the microphone array are subjected to generalized cross-correlation based on phase transformation and summed two by two:

(

)=

wherein k and l represent the kth and l microphones,

weights representing phase transformations, τ, (

) Representing the time from the sound arriving at the kth microphone from position X. In the formula (II)

Defined as a combined weighting function

：

Taking into account the calculation

(

) Symmetry involved, and removing some fixed energy terms, then

(

) Moiety varying with x

Comprises the following steps:

=

thus, to simplify the calculation

Can be replaced by:

=

4. and carrying out global search in the whole room, and obtaining a coordinate point Y with the maximum energy by utilizing a random region contraction algorithm (SRC). The basic idea of the random area shrinkage algorithm is to find an N-dimensional matrix at random in a given initial value, and gradually narrow the range in a sequential process until a sufficiently small range is reached to find the peak. Thereby calculating a positioning coordinate point.

The process of the random area shrink algorithm is as follows:

1) i is defined as the number of iterations,

representing the number of points randomly drawn at the ith iteration,

representing the next generation sub-search space. Define once per calculation

Is just recorded as one

，

After the ith iteration

The number of times of the operation of the motor,

denotes the stop value,. phi.

The maximum number of allowed calculations.

Representing a new sub-search space

The boundary of (2);

2) the number of initialization iterations i = 0;

3) setting initial parameters:

、

，

；

4) computing

All of

A value of (d);

5) arrange out

So that

≪

；

6) According to

Shrinking the current search space and updating the search space

And new region boundaries

；

7) If it is not

Or is or

And is

8) if only

If yes, the result is discarded;

9) in that

To find a subset

So that

Is greater than

Average value of (d);

10) similar to steps 3) and 4), in the current search space

In the random selection

Points, calculating their corresponding

A value of (d);

11) will be provided with

Put the points in the subset

And selecting

In

Of greatest value

Point put in subset

In and store

Collecting and putting the data into the next iteration for use;

12) order to

And performing the next iteration and returning to the step 5).

By combining fig. 2, 3 and 4, a microphone array suitable for the actual scene of the user can be selected according to different positioning accuracy requirements, different microphone number requirements and different array sizes. As shown in fig. 5, when the voice of the speaker is transmitted through the acoustic channel generated in the indoor environment, the voice is received by the preamplifier of the microphone array, the received microphone signals are subjected to fusion filtering based on multi-channel low-pass filtering and multi-channel adaptive filtering, noise except the voice signal of the speaker is filtered by the low-pass filter, the amplitude-frequency characteristic of the received signal is calibrated by the adaptive filter, and the amplitude-frequency characteristic before and after calibration is shown in fig. 6 and 7, so that the positioning effect is more accurate.

Claims

1. An indoor sound source positioning method based on different microphone array topological structure analysis is characterized in that: the method comprises the following steps: