CN111157949A

CN111157949A - Voice recognition and sound source positioning method

Info

Publication number: CN111157949A
Application number: CN201811326998.9A
Authority: CN
Inventors: 张梦巧; 王洁莹; 张喜明
Original assignee: China Changfeng Science Technology Industry Group Corp
Current assignee: China Changfeng Science Technology Industry Group Corp
Priority date: 2018-11-08
Filing date: 2018-11-08
Publication date: 2020-05-15

Abstract

The invention provides a voice recognition and sound source positioning method, which comprises time delay estimation and sound source positioning, wherein firstly, the relative time difference of sound source signals reaching microphone elements in an array is estimated through an algorithm; and secondly, calculating the distance difference of the sound source to each array element by using the estimated time difference, and determining the position of the sound source by combining the array topological structure through a geometric algorithm or search.

Description

Voice recognition and sound source positioning method

Technical Field

The invention relates to the field of computer signal processing, in particular to a voice recognition and sound source positioning method.

Background

Since the 80's of the 20 th century, microphone array signal processing techniques have evolved dramatically and have found widespread use in radar, sonar, and communications. This array signal processing idea is later applied to speech signal processing. The international use of microphone array systems for speech signal processing has been studied since 1970. In 1976, Gabfid applied adaptive beamforming techniques in radar and sonar directly to the simple sound acquisition problem. In 1985, Flanagan of AT & T/Bell laboratories in the United states used 21 microphones to form an existing array, and the acquisition of sound source signals was first achieved in an electronic control manner. In the same year, Flanagan et al applied a two-dimensional microphone array to sound pickup in large rooms to suppress the effects of reverberation and noise on the sound source signal. Due to the restriction of the technology at that time, the algorithm cannot be realized in a digital mode by means of a digital signal processing technology, but mainly adopts an analog device, in 1991, Kellermann realizes the algorithm in a full-digital mode by means of the digital signal processing technology, so that the performance of the algorithm is further improved, the hardware cost is reduced, and the flexibility of the system is improved. Microphone array systems have subsequently been used in many applications, including video conferencing, speech recognition, speaker recognition, automotive environment speech acquisition, reverberant environment sound pickup, sound source localization, and hearing aid devices, among others. Currently, a speech processing technology based on a microphone array is becoming a new research hotspot, but the related application technology is not mature.

Disclosure of Invention

The invention aims to provide a voice recognition and sound source positioning method which is expected to be applied to the fields of voice recognition, voice acquisition in a strong noise environment, conference recording in a large place, sound detection, hearing aid devices and the like.

In order to achieve the purpose, the invention adopts the following technical scheme:

a speech recognition and sound source localization method comprises time delay estimation and sound source localization, and is characterized in that: firstly, estimating the relative time difference of sound source signals arriving at microphone elements in an array through an algorithm; and secondly, calculating the distance difference of the sound source to each array element by using the estimated time difference, and determining the position of the sound source by combining the array topological structure through a geometric algorithm or search.

The specific method for estimating the time delay comprises the following steps: assuming that only a unique sound source exists, the microphones are arranged in a uniform linear array, a sound source signal s (k) to be positioned exists in a far-field environment, the first microphone element is selected as a reference point, and the signal received by the nth array element at the moment k is represented as:

y_n(k)＝α_ns(k-t-τ_n1)+v_n(k)

＝α_ns[k-t-F_n(τ)]+v_n(k)

＝x_n(k)+v_n(k),n＝1,2,…,N

α therein_n(N ═ l, 2, …, N) is the attenuation of the signal during propagation, and has a value between [0, 1%]To (c) to (d); t represents the propagation of the signal from s (k) to the array element No. 1The propagation time of (c); v. of_n(k) Representing the additive noise received at the nth array element; tau represents the time delay difference of signals received by the microphone element I and the microphone element 2; f_nThe (τ) function represents the signal delay between the nth and first array elements.

The specific method for positioning the sound source comprises the following steps: and determining the direction angle and the distance of the sound source according to the geometrical relationship between the sound source and the array.

The present invention can be practically applied to the following fields: video conference, the sound source positioning technology can track and position speakers in the video conference; the robot technology realizes the positioning and tracking of a sound source by a robot by utilizing a double-ear time delay model and cross-correlation operation; noise detection, in order to better control the noise in engines and large-scale instruments such as automobiles and motorcycles, a sound source positioning technology is an important method for evaluating the performance of the engines and testing the stability of large-scale machinery; in medical equipment, a sound source positioning technology can be used for analyzing a lesion part, and diagnosis of diseases plays a great promoting role.

Drawings

Fig. 1 is a schematic diagram of sound source localization of the present invention.

Detailed Description

The sound source localization method of the present invention is generally divided into two steps, namely, delay estimation and sound source localization. Firstly, estimating the relative time difference of sound source signals reaching microphone elements in an array through an algorithm; and secondly, calculating the distance difference of the sound source to each array element by using the estimated time difference, and determining the position of the sound source by combining the array topological structure through a geometric algorithm or search.

1. Delay estimation

The geometric shape of the array is crucial to sound source positioning performance, and according to the environment where the microphone array is located, a model for time delay estimation can be divided into an ideal model and a reverberation model. We refer to the model of a microphone element that receives only sound signals arriving at the microphone array via a direct path as an ideal model. Such a model considering not only signals arriving through a direct path but also signals arriving indirectly at the array after the signals emitted from the sound source encounter reflections from walls, tables, etc. is called a reverberation model. Because the number of the paths of the reverberation signal has uncertainty, the algorithm complexity based on the reverberation model is relatively larger than that of an ideal model, and the algorithm based on the reverberation model is used for fitting the influence of interference by a mathematical model and is not like the influence of avoiding the interference of an indirect path signal by the ideal model, so the time delay estimation effect of the algorithm based on the reverberation model is relatively good. Nevertheless, in order to reduce the complexity of the algorithm, the present invention mainly studies the delay estimation of the microphone array with respect to an ideal model.

Assuming only a single sound source, the microphone array is a uniform linear array. In a far-field environment, there is a sound source signal s (k) to be located, and if we select the first microphone element as the reference point, the signal received by the nth array element at time k can be expressed as:

y_n(k)＝α_ns(k-t-τ_n1)+v_n(k)

＝α_ns[k-t-F_n(τ)]+v_n(k)

＝x_n(k)+v_n(k),n＝1,2,…,N

α therein_n(N ═ l, 2, …, N) is the attenuation of the signal during propagation, and has a value between [0, 1%]In the meantime. t represents the propagation time between the signal propagating from s (k) to array element number 1. v. of_n(k) Representing the additive noise received at the nth array element. It is assumed that the noise is uncorrelated with the speech signal and with the noise signal of other elements. τ (note) represents the time delay difference of the signals received by the microphone element I and the microphone element 2. F_nThe (τ) function represents the signal delay between the nth and first array elements. It is assumed here that the microphone array model used is a uniform linear array located in a far-field environment, and then:

F₁(τ)＝0,F₂(τ)＝τ,F_n(τ)＝(n-1)τ,n＝2,…,N

in the near field, the signal arrives at the microphone array in the form of spherical waves, so F_nIs a non-linear function of τ. At this time F_nBoth in relation to the microphone element spacing and in relation to the position of the sound source signal relative to the array. For uniform linear arrays, F_nThe function is known, so the problem of time delay estimation is equivalent to the problem of estimating tau, and the time delay estimation algorithm is used for calculating the multi-channel sound signal of the collected limited frame

2. Sound source localization

After the time delay of the microphone array is estimated, the direction angle and the distance of the sound source can be determined according to the geometric relationship between the sound source and the array, but the positioning accuracy is influenced by a plurality of factors, wherein the main factors influencing the positioning accuracy are a time delay estimation method and a positioning method. The present technique employs an improved sound source localization algorithm, considering the sound source as a point sound source and assuming the sound source is at infinity, then the wavefront is perpendicular to the wave front. The time sequence of the signals received by the microphones A and B is shown in FIG. 1, where L is the distance between two microphone elements, c is the speed of sound propagating in the air, and τ is_ABIs the time difference between the sound source and the two microphones, i.e. the time delay between the array elements, and theta is the direction angle of the sound source.

Claims

1. A speech recognition and sound source localization method comprises time delay estimation and sound source localization, and is characterized in that: firstly, estimating the relative time difference of sound source signals arriving at microphone elements in an array through an algorithm; and secondly, calculating the distance difference of the sound source to each array element by using the estimated time difference, and determining the position of the sound source by combining the array topological structure through a geometric algorithm or search.

2. The method of claim 1, wherein the time delay estimation comprises: assuming that only a unique sound source exists, the microphones are arranged in a uniform linear array, a sound source signal s (k) to be positioned exists in a far-field environment, the first microphone element is selected as a reference point, and the signal received by the nth array element at the moment k is represented as:

y_n(k)＝α_ns(k-t-τ_n1)+v_n(k)

＝α_ns[k-t-F_n(τ)]+v_n(k)

＝x_n(k)+v_n(k),n＝1,2,···,N

α therein_n(N ═ l, 2, …, N) is the attenuation of the signal during propagation, and has a value between [0, 1%]To (c) to (d); t represents the propagation time of the signal from s (k) to array element number 1; v. of_n(k) Representing the additive noise received at the nth array element; tau represents the time delay difference of signals received by the microphone element I and the microphone element 2; f_nThe (τ) function represents the signal delay between the nth and first array elements.

3. The method of claim 1, wherein the sound source is located by: and determining the direction angle and the distance of the sound source according to the geometrical relationship between the sound source and the array.