CN112526495B

CN112526495B - Auricle conduction characteristic-based monaural sound source positioning method and system

Info

Publication number: CN112526495B
Application number: CN202011459187.3A
Authority: CN
Inventors: 童峰; 毛连华; 郭秋涵; 吴燕艺; 傅荣杰
Original assignee: Xiamen Padmate Technology Co ltd; Xiamen University
Current assignee: Xiamen Padmate Technology Co ltd; Xiamen University
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2024-07-30
Anticipated expiration: 2040-12-11
Also published as: CN112526495A

Abstract

The invention provides a auricle sound source positioning method and a auricle sound source positioning system based on auricle conduction characteristics, wherein the auricle sound source positioning method and the auricle sound source positioning system comprise the steps of respectively setting a plurality of test sound sources to be positioned at different azimuth angles or pitch angles under an ideal test environment, playing test sound source signals, recording signals acquired by using a noise microphone as first receiving signals, converting the first receiving signals and the test sound source signals into frequency domains by using discrete Fourier transform, and calculating auricle conduction characteristic responses for the test sound sources; during actual positioning, the noise microphone is utilized to receive a signal sent by a sound source to be detected, and the signal is recorded as a second receiving signal; and finally, converting the second received signal and the auricle conduction characteristic response into a frequency domain by using discrete Fourier transform, and estimating the sound source position by using a sparse recovery algorithm. Single-earphone sound source positioning is achieved, and additional communication, control and hardware overhead among the two earphones caused by the fact that the traditional double-earphone sound source positioning method needs double-earphone microphone acquisition is avoided.

Description

Auricle conduction characteristic-based monaural sound source positioning method and system

Technical Field

The invention relates to the technical field of sound source localization, in particular to a auricle conduction characteristic-based monaural sound source localization method and system.

Background

For wireless earphone products, the sound source localization can provide the functions of directional information, approaching sound early warning and the like for the earphone noise reduction algorithm by estimating the sound source direction, and has important functions of improving the earphone noise reduction performance and improving the user experience. At present, most of earphone sound source localization adopts a biological inspired localization algorithm, such as an ear spectrum clue and a seed binaural clue, namely an Interaural TIME DIFFERENCE (ITD) and an interaural level difference; or directly use the head-related impulse response.

However, the above algorithm needs to use 2 microphones to realize the binaural sound source positioning function, which means that for a wireless earphone product, signals and characteristic information needed for positioning need to be transmitted wirelessly between the ears, resulting in additional wireless communication and power consumption expense.

The auricle with unique appearance is the front part of the auditory organ, and the auricle has directional receiving function, and the auricle cavity including the external auditory canal has resonance, reverberation and buffering functions. Part of the study also indicated the acoustic load that the outer ear forms on the earpiece and measured the effect of different frequencies on the output performance of the earpiece.

For in-ear headphones, taking a microphone exposed at the outlet position of an auditory canal after being worn as an example, taking into consideration that auricles have different transmission characteristics such as reflection, reverberation and the like on incident sounds in different directions in irregular shapes, the invention provides a method and a system for positioning a monaural sound source based on auricle transmission characteristics by measuring and storing different characteristics of incident sound signals in different directions by using artificial ears in advance, constructing a measuring matrix, and converting the monaural sound source direction estimation problem into a sparse recovery problem. According to the technical scheme disclosed by the invention, the single-ear sound source direction estimation of the wireless earphone can be realized, so that the wireless communication and power consumption expenditure of a wireless earphone product can be greatly reduced.

Disclosure of Invention

The invention provides a monaural sound source positioning method and a monaural sound source positioning system based on auricle conduction characteristics, which are used for solving the defects of the prior art.

In one aspect, the present invention provides a method for monaural sound source localization based on auricle conduction characteristics, the method comprising the steps of:

s1: setting a noise microphone at an outlet position of a single artificial ear canal, respectively setting a plurality of test sound sources at different azimuth angles or pitch angles under an ideal test environment to play test sound source signals, recording signals acquired by the noise microphone as first receiving signals, converting the first receiving signals and the test sound source signals into frequency domains by using discrete Fourier transformation, and calculating auricle conduction characteristic responses for the test sound sources based on the frequency domain relation of the first receiving signals and the test sound source signals;

S2: during actual positioning, the noise microphone is utilized to receive a signal sent by a sound source to be detected, and the signal is recorded as a second receiving signal; and

S3: and converting the second received signal and the auricle conduction characteristic response into a frequency domain by using discrete Fourier transform, and estimating the sound source position by using a sparse recovery algorithm based on the frequency domain relation of the second received signal and the auricle conduction characteristic response to obtain the position of the sound source to be detected.

Aiming at the problem that the conventional sound source positioning algorithm adopted by the wireless earphone generally needs double earphones, so that extra wireless communication and power supply overhead among the double earphones are caused, the method provides a method for realizing single-ear sound source positioning by fully utilizing the conduction characteristics of the unique auricle shape to incident sounds in different directions, and therefore the system overhead of the wireless earphone product for realizing the sound source positioning function is obviously reduced.

In a specific embodiment, the noise microphone is positioned within the auricle and outside the outlet of the ear canal.

In a specific embodiment, the noise microphone is connected to the sound source position estimation module through an audio codec chip.

In a specific embodiment, the sound source position estimation module includes a microprocessor for controlling an audio codec chip and a microphone.

In a specific embodiment, the specific step of calculating the auricle conduction characteristic response for each of the test sound sources based on the frequency domain relation of the first received signal and the test sound source signal includes:

based on the frequency domain relation between the first received signal and the test sound source signal

X₁(k)＝H(r_s,k)S₁(k)+W₁(k)

Wherein X ₁ (k) is a vector of M frames of the first received signal in the frequency domain, H (r _s, k) is a vector of auricle conduction characteristic frequency domain response at the test sound source position r _s corresponding to the M frames of the first received signal, W ₁ (k) is a vector of frequency domain noise corresponding to the M frames of the first received signal, and S ₁ (k) is the test sound source signal frequency domain;

And (3) obtaining H (r _s, k) by using known X ₁(k)、W₁ (k) and S ₁ (k), performing the operation on the spatial positions { r ₁,r₂,…,r_D } of all the test sound sources to obtain auricle conduction characteristic frequency domain response matrixes D (k) = { H (r ₁,k),H(r₂,k),…,H(r_D, k) } corresponding to all the test sound sources, and storing the auricle conduction characteristic frequency domain response matrixes D (k) = { H (r ₁,k),H(r₂,k),…,H(r_D, k) }.

In a specific embodiment, estimating the sound source azimuth using a sparse recovery algorithm based on the frequency domain relation of the second received signal and the auricle conduction characteristic response specifically includes:

using the auricle conduction characteristic frequency domain response matrix D (k) as a dictionary based on the frequency domain relation of the second received signal and the auricle conduction characteristic response

X₂(k)＝H(r_s,k)S₂(k)+W₂(k)

Expanding D (k) to obtain

Wherein X ₂ (k) is a vector of M frames of the second received signal in the frequency domain, H (r _s, k) is a vector of auricle conduction characteristic frequency domain response at the test sound source position r _s corresponding to the first received signal in the M frames, W ₂ (k) is a vector of frequency domain noise corresponding to the second received signal in the M frames, S ₂ (k) is a signal frequency domain of the sound source to be measured,The signal frequency domain of the sound source to be detected under dictionary extension;

Based on a compressed sensing algorithm, taking the auricle conduction characteristic frequency domain response matrix D (k) as an observation matrix, and using a sparse recovery algorithm to formula Solving to obtainAccording toThe non-zero elements and the positions of the non-zero elements, and the positions of the sound sources to be detected corresponding to the positions are obtained. Because the number of the target sound sources is far smaller than the number D of the space position sets, the signal vectors of the sound sources to be detected on the extended frequency domain have sparsity under the space position sets, and when the sound sources are single sound sources, the vectorsThe number of non-0 elements in the list is only 1. Regarding the redundant room frequency domain response D (k) as an observation matrix in the compressed sensing algorithm, the high probability lossless reconstruction can be realized under the condition that the observation matrix meets the limited equidistance (RESTRICTED ISOMETRY PROPERTY, RIP)And sound source positionOne-to-one correspondence with non-0 positions in (a). Considering the unique shape of human auricle characteristics, each element in the auricle conduction characteristic frequency domain response matrix D (k) corresponding to sound sources with different azimuth angles and pitch angles is in random distribution characteristics, so that the limited equidistant condition is satisfied, and a sparse recovery algorithm can be adopted to perform a formulaIs a sparse solution of (2).

In a specific embodiment, the said methodThe method for obtaining the azimuth of the sound source to be detected corresponding to the position comprises the following steps of:

setting a threshold eta, and setting the length of the discrete Fourier transform to be the same as the frame length L; selecting the said Higher energy of the middleThe frequency points are overlapped to obtain Taking outThe position corresponding to the maximum element in the model is the spatial position of the sound source signal to be detected. The information of a plurality of frequency points is comprehensively utilized, so that the obtained sound source position is more accurate.

According to a second aspect of the present invention, a computer-readable storage medium is presented, on which a computer program is stored, which computer program, when being executed by a computer processor, carries out the above-mentioned method.

According to a third aspect of the present invention, a monaural sound source localization system based on auricle conduction characteristics is presented, the system comprising a test sound source, a noise microphone and a sound source orientation estimation module:

the noise microphone is arranged at the outlet position of a single artificial ear canal, the test sound sources are respectively arranged at different azimuth angles or pitch angles to play sound source test signals, and the noise microphone is configured to record the acquired signals as first receiving signals in the stage of acquiring auricle conduction characteristics;

The sound source position estimation module is configured to convert the first received signal and the test sound source signal into a frequency domain by using discrete Fourier transform in the auricle conduction characteristic acquisition stage, and calculate auricle conduction characteristic responses for each of the test sound sources based on the frequency domain relation of the first received signal and the test sound source signal; and

The noise microphone is arranged at the actual positioning stage and used for receiving a signal sent by a sound source to be detected and recording the signal as a second receiving signal;

The sound source position estimation module is further configured to convert the second received signal and the auricle conduction characteristic response into a frequency domain by using discrete Fourier transform in an actual positioning stage, and perform sound source position estimation by using a sparse recovery algorithm based on a frequency domain relation of the second received signal and the auricle conduction characteristic response to obtain the position of the sound source to be measured.

The method comprises the steps of setting a noise microphone at an outlet position of a single artificial ear canal, respectively setting a plurality of test sound sources to be positioned at different azimuth angles or pitch angles under an ideal test environment, playing test sound source signals, recording signals acquired by the noise microphone as first receiving signals, converting the first receiving signals and the test sound source signals into frequency domains by using discrete Fourier transform, and calculating auricle conduction characteristic responses for the test sound sources based on the frequency domain relation of the first receiving signals and the test sound source signals; during actual positioning, the noise microphone is utilized to receive a signal sent by a sound source to be detected, and the signal is recorded as a second receiving signal; and finally, converting the second received signal and the auricle conduction characteristic response into a frequency domain by using discrete Fourier transform, and estimating the sound source position by using a sparse recovery algorithm based on the frequency domain relation of the second received signal and the auricle conduction characteristic response to obtain the position of the sound source to be detected. Based on different emission and reverberation characteristics caused by unique auricle shapes of human ears, a sparse recovery equation of a sound source space position is established by utilizing auricle conduction characteristic response of a microphone in the time and frequency domain of signal acquisition, and single-earphone sound source positioning is realized, so that extra inter-earphone communication, control and hardware cost caused by the need of the binaural microphone acquisition of a traditional binaural sound source positioning method are avoided.

Drawings

The accompanying drawings are included to provide a further understanding of the embodiments and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and together with the description serve to explain the principles of the application. Many of the intended advantages of other embodiments and embodiments will be readily appreciated as they become better understood by reference to the following detailed description. Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:

FIG. 1 is a flow chart of a method of monaural sound source localization based on auricle conduction characteristics, according to an embodiment of the invention;

FIG. 2 is a circuit diagram of a noise microphone and its connection to a microprocessor in accordance with a specific embodiment of the present invention;

fig. 3 is a schematic diagram of a monaural acoustic source localization system based on auricle conduction characteristics, according to an embodiment of the invention.

Detailed Description

The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the present application are shown in the drawings.

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

Fig. 1 shows a flowchart of a monaural sound source localization method based on auricle conduction characteristics according to an embodiment of the present invention. As shown in fig. 1, the method comprises the steps of:

S101: the method comprises the steps of setting a noise microphone at an outlet position of a single artificial ear canal, respectively setting a plurality of test sound sources to be positioned at different azimuth angles or pitch angles under an ideal test environment, playing test sound source signals, recording signals acquired by the noise microphone as first receiving signals, converting the first receiving signals and the test sound source signals into frequency domains by using discrete Fourier transformation, and calculating auricle conduction characteristic responses for the test sound sources based on frequency domain relations of the first receiving signals and the test sound source signals.

X₁(k)＝H(r_s,k)S₁(k)+W₁(k)

In this embodiment, the test sound source positions range from 0 degrees to 360 degrees using 12 azimuth angles, and from 0 degrees to 90 degrees using 4 pitch angles, and a total of d=4×12=48 test sound source spatial position sets are formed.

In a specific embodiment, the frequency domain relation between the received signal and the sound source signal used in the present invention can be obtained based on the following process:

After detecting the start point of the received signal based on the end point detection method in the noise microphone, the mth frame signal x _m (t) (m=1, 2, …, M) of the received signal is defined as:

x_m(t)＝h_m(r_s,t)*s(t+mL-τ_m)+ω_m(t)

Wherein M is the number of data frames of a received signal, t is a time sequence, L is a frame length, h _m(r_s, t) is the auricle conduction characteristic response from the position r _s of a sound source to the noise microphone, s _m(t)＝s(t+mL-τ_m) is a sound source signal corresponding to x _m (t), τ _m is the time delay of the sound source signal corresponding to x _m (t), ω _m (t) is the environmental noise corresponding to x _m (t), and is convolution operation;

Converting the mth frame signal x _m (t) (m=1, 2, …, M) to the frequency domain to obtain an array signal of the mth+k frequency point on the mth frame signal:

X_m(k)＝H_m(r_s,m+k)S(m+k)+W_m(m+k)

Wherein X _m(k)、H_m(r_s,m+k)、S(m+k)、W_m (m+k) is the discrete fourier transform of X _m(t)、h_m(r_s,t)、s_m(t)、ω_m (t), respectively;

X _m(k)、H_m(r_s,m+k)、S(m+k)、W_m (m+k) is represented using a vector:

X(k)＝{X₁(1+k),X₂(2+k),…,X_M(M+k)}^T

H(r_s,k)＝{H₁(r_s,1+k),H₂(r_s,2+k),…,H_M(r_s,M+k)}^T

W(k)＝{W₁(1+k),W₂(2+k),…,W_M(M+k)}^T

a prototype of the solution equation used in the present invention can be obtained:

X(k)＝H(r_s,k)S(k)+W(k)

Where X (k) represents a vector of the frequency domain of the received signal, H (r _s, k) represents a vector of the auricle conduction characteristic frequency domain response of the sound source having a distance r _s from the receiving point, S (k) is a vector of the sound source signal frequency domain, and W (k) is a vector of the corresponding frequency domain noise.

It should be appreciated that the above formula derivation related to the principles of the present invention, in which the frequency domain relationship between the first received signal and the test sound source signal and the frequency domain relationship between the second received signal and the auricle conduction characteristic response are obtained based on the formula X (k) =h (r _s, k) S (k) +w (k), is implemented by substituting a specific variable into the formula X (k) =h (r _s, k) S (k) +w (k) according to the specific situation of the auricle conduction characteristic acquisition stage and the actual positioning stage.

S102: and in the actual positioning process, the noise microphone is utilized to receive a signal sent by a sound source to be detected, and the signal is recorded as a second receiving signal.

S103: and converting the second received signal and the auricle conduction characteristic response into a frequency domain by using discrete Fourier transform, and estimating the sound source position by using a sparse recovery algorithm based on the frequency domain relation of the second received signal and the auricle conduction characteristic response to obtain the position of the sound source to be detected.

X₂(k)＝H(r_s,k)S₂(k)+W₂(k)

Expanding D (k) to obtain

Based on a compressed sensing algorithm, taking the auricle conduction characteristic frequency domain response matrix D (k) as an observation matrix, and using a sparse recovery algorithm to formula Solving to obtainAccording toThe non-zero elements and the positions of the non-zero elements, and the positions of the sound sources to be detected corresponding to the positions are obtained.

In a preferred embodiment, the computation complexity of the orthogonal matching pursuit (Orthogonal Matching Pursuit, OMP) algorithm is low, and the main process of using the algorithm as the sparse recovery algorithm in this embodiment is: calculating the maximum correlation position of the observation matrix D (k) and the compressed sampling signal, obtaining an approximate solution of the signal by solving the least square problem, repeating the process under the condition that the iteration times are smaller than the sparsity, and outputting an index set with the maximum correlation and the reconstructed signal.

setting a threshold eta, and setting the length of the discrete Fourier transform to be the same as the frame length L; selecting the said Higher energy of the middleThe frequency points are overlapped to obtain Taking outThe position corresponding to the maximum element in the model is the spatial position of the sound source signal to be detected. In the present embodiment, a frame length l=512, and a threshold value η=20 (0+.η+.100) is set.

It should be noted that, due to unavoidable differences in the auricle shape and artificial ear of different persons, there will be a certain difference between the auricle conduction characteristic frequency domain response matrix D (k) acquired and stored in advance and the auricle conduction characteristic frequency domain response of the actual wearer. Considering that in typical applications of wireless earphone products, such as environmental safety precaution and algorithm parameter adjustment, the accuracy requirement for sound source positioning information is not high (such as 60 degrees for direction angle resolution and 30 degrees for pitch angle resolution in the embodiment), the reduction of sound source positioning accuracy caused by individuals with auricles of different people does not affect the typical use purpose of earphone products.

Fig. 2 shows a circuit diagram of a noise microphone and its connection to a microprocessor of a specific embodiment of the invention, in which the noise microphone is connected to the sound source position estimation module via an audio codec chip 201, and the sound source position estimation module comprises a microprocessor 203 for controlling the audio codec chip 201 and the microphone 202.

Fig. 3 shows a schematic diagram of a monaural sound source localization system based on auricle conduction characteristics according to an embodiment of the invention, comprising a test sound source 301, a noise microphone 302 and a sound source orientation estimation module 303, said noise microphone 302 being arranged in a position within the auricle 304 and outside the outlet of the auditory canal.

In a specific embodiment, the noise microphone 302 is disposed at a single artificial ear canal outlet position, the test sound source 301 is disposed at different azimuth or pitch angle positions to play sound source test signals, and the noise microphone 302 is configured to record the collected signals as first received signals in the stage of acquiring auricle conduction characteristics; the sound source position estimation module 303 is configured to convert the first received signal and the test sound source 301 signal into a frequency domain by using discrete fourier transform in a stage of acquiring auricle conduction characteristics, and calculate auricle conduction characteristic responses for each of the test sound sources 301 based on a frequency domain relationship of the first received signal and the test sound source 301 signal; the noise microphone 302 is configured to receive a signal sent by a sound source to be detected in an actual positioning stage, and record the signal as a second received signal; the sound source position estimation module 303 is further configured to convert the second received signal and the auricle conduction characteristic response to a frequency domain by using discrete fourier transform in an actual positioning stage, and perform sound source position estimation by using a sparse recovery algorithm based on a frequency domain relationship between the second received signal and the auricle conduction characteristic response, so as to obtain a position of the sound source to be measured.

The system establishes a sparse recovery equation of the sound source space position by testing the combined action of the sound source 301, the noise microphone 302 and the sound source azimuth estimation module 303 and utilizing the response of the auricle conduction characteristics of the frequency domain when the microphone is used for collecting the signal construction based on different emission and reverberation characteristics caused by the unique auricle shape of the human ear, and realizes the single-earphone sound source positioning, thereby avoiding the communication, control and hardware cost among the extra binaural earphones caused by the need of the binaural microphone collection of the traditional binaural sound source positioning method.

Embodiments of the present application also relate to a computer readable storage medium having stored thereon a computer program which, when executed by a computer processor, implements the method as described above. The computer program contains program code for performing the method shown in the flow chart. The computer readable medium of the present application may be a computer readable signal medium or a computer readable medium, or any combination of the two.

The above description is only illustrative of the preferred embodiments of the present application and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the application referred to in the present application is not limited to the specific combinations of the technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the inventive concept described above. Such as the above-mentioned features and the technical features disclosed in the present application (but not limited to) having similar functions are replaced with each other.

Claims

1. The auricle conduction characteristic-based monaural sound source positioning method is characterized by comprising the following steps of:

Wherein the specific step of calculating the auricle conduction characteristic response for each of the test sound sources based on the frequency domain relation of the first received signal and the test sound source signal includes: after detecting the start point of the received signal based on the end point detection method in the noise microphone, the mth frame signal x _m (t) of the received signal is defined as:

x_m(t)＝h_m(r_s,t)*s(t+mL-τ_m)+ω_m(t)

Wherein m=1, 2, …, M is the number of data frames of the received signal, t is a time sequence, L is a frame length, h _m(r_s, t) is the auricle conduction characteristic response of the sound source from the position r _s to the noise microphone, s _m(t)＝s(t+mL-τ_m) is the sound source signal corresponding to x _m (t), τ _m is the delay of the sound source signal corresponding to x _m (t), ω _m (t) is the environmental noise corresponding to x _m (t), and is convolution operation;

Converting the m-th frame signal x _m (t) to a frequency domain to obtain an array signal of the m+k frequency points on the m-th frame signal: x _m(k)＝H_m(r_s,m+k)S(m+k)+W_m (m+k)

X _m(k)、H_m(r_s,m+k)、S(m+k)、W_m (m+k) is represented using a vector:

X₁(k)＝{X₁(1+k),X₂(2+k),…,X_M(M+k)}^T；

H(r_s,k)＝{H₁(r_s,1+k),H₂(r_s,2+k),…,H_M(r_s,M+k)}^T;

W₁(k)＝{W₁(1+k),W₂(2+k),…,W_M(M+k)}^T；

Obtaining a frequency domain relationship between the first received signal and the test sound source signal

X₁(k)＝H(r_s,k)S₁(k)+W₁(k)

Obtaining H (r _s, k) by using known X ₁(k)、W₁ (k) and S ₁ (k), performing the above operation on the spatial positions { r ₁,r₂,…,r_D } of all the test sound sources to obtain auricle conduction characteristic frequency domain response matrixes D (k) = { H (r ₁,k),H(r₂,k),…,H(r_D, k) } corresponding to all the test sound sources, and storing;

S3: converting the second received signal and the auricle conduction characteristic response into a frequency domain by using discrete Fourier transform, and estimating the sound source position by using a sparse recovery algorithm based on the frequency domain relation of the second received signal and the auricle conduction characteristic response to obtain the position of the sound source to be detected;

The estimating the sound source azimuth by using a sparse recovery algorithm based on the frequency domain relation of the second received signal and the auricle conduction characteristic response specifically comprises: using the auricle conduction characteristic frequency domain response matrix D (k) as a dictionary based on the frequency domain relation of the second received signal and the auricle conduction characteristic response

X₂(k)＝H(r_s,k)S₂(k)+W₂(k)

Expanding D (k) to obtain

Based on a compressed sensing algorithm, taking the auricle conduction characteristic frequency domain response matrix D (k) as an observation matrix, and using a sparse recovery algorithm to formula Solving to obtainAccording toThe method comprises the steps of obtaining the azimuth of a sound source to be detected corresponding to the position of a non-zero element in a database and the position of the non-zero element;

Wherein the said is based on The method for obtaining the azimuth of the sound source to be detected corresponding to the position comprises the following steps of: setting a threshold eta, and setting the length of the discrete Fourier transform to be the same as the frame length L; selecting the saidHigher energy of the middleThe frequency points are overlapped to obtainTaking outThe position corresponding to the maximum element in the model is the spatial position of the sound source signal to be detected.

2. The method of claim 1, wherein the noise microphone is positioned within the pinna and outside of the canal outlet.

3. The method of claim 1, wherein the noise microphone is coupled to the sound source position estimation module through an audio codec chip.

4. The method of claim 1, wherein the sound source position estimation module comprises a microprocessor for controlling an audio codec chip and a microphone.

5. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a computer processor, implements the method of any of claims 1 to 4.

6. A monaural sound source localization system based on auricle conduction characteristics, wherein the system is applied to a wireless earphone product, and comprises a test sound source, a noise microphone and a sound source position estimation module:

The noise microphone is arranged at the outlet position of a single artificial ear canal, the test sound source is respectively arranged at different azimuth angles or pitch angles to play test sound source signals, and the noise microphone is configured to record the acquired signals as first receiving signals in the stage of acquiring auricle conduction characteristics;

The sound source position estimation module is configured to convert the first received signal and the test sound source signal into frequency domains by using discrete fourier transform in the stage of acquiring auricle conduction characteristics, and calculate auricle conduction characteristic responses for each of the test sound sources based on the frequency domain relationship of the first received signal and the test sound source signal, wherein the specific step of calculating auricle conduction characteristic responses for each of the test sound sources based on the frequency domain relationship of the first received signal and the test sound source signal includes: after detecting the start point of the received signal based on the end point detection method in the noise microphone, the mth frame signal x _m (t) of the received signal is defined as x _m(t)＝h_m(r_s,t)*s(t+mL-τ_m)+ω_m (t) where, m=1, 2, …, M is the number of data frames of the received signal, t is the time sequence, L is the frame length, h _m(r_s, t) is the auricle conduction characteristic response of the sound source from the position r _s to the noise microphone, s _m(t)＝s(t+mL-τ_m) is the sound source signal corresponding to x _m (t), τ _m is the delay of the sound source signal corresponding to x _m (t), ω _m (t) is the ambient noise corresponding to x _m (t), * Is convolution operation; Converting the m-th frame signal x _m (t) to a frequency domain to obtain an array signal of the m+k frequency points on the m-th frame signal: x _m(k)＝H_m(r_s,m+k)S(m+k)+W_m (m+k) wherein X _m(k)、H_m(r_s,m+k)、S(m+k)、W_m (m+k) is the discrete fourier transform of X _m(t)、h_m(r_s,t)、s_m(t)、ω_m (t), respectively; X _m(k)、H_m(r_s,m+k)、S(m+k)、W_m (m+k) is represented using vectors ：X₁(k)＝{X₁(1+k),X₂(2+k),…,X_M(M+k)}^T;H(r_s,k)＝{H₁(r_s,1+k),H₂(r_s,2+k),…,H_M(r_s,M+k)}^T;W₁(k)＝{W₁(1+k),W₂(2+k),…,W_M(M+k)}^T;

X₁(k)＝H(r_s,k)S₁(k)+W₁(k)

Obtaining H (r _s, k) by using known X ₁(k)、W₁ (k) and S ₁ (k), performing the above operation on the spatial positions { r ₁,r₂,…,r_D } of all the test sound sources to obtain auricle conduction characteristic frequency domain response matrixes D (k) = { H (r ₁,k),H(r₂,k),…,H(r_D, k) } corresponding to all the test sound sources, and storing; and

The sound source azimuth estimation module is further configured to convert the second received signal and the auricle conduction characteristic response into a frequency domain by using a discrete fourier transform in an actual positioning stage, perform sound source azimuth estimation by using a sparse recovery algorithm based on a frequency domain relationship of the second received signal and the auricle conduction characteristic response, and obtain an azimuth of the sound source to be measured, where performing sound source azimuth estimation by using a sparse recovery algorithm based on a frequency domain relationship of the second received signal and the auricle conduction characteristic response specifically includes: using the auricle conduction characteristic frequency domain response matrix D (k) as a dictionary based on the frequency domain relation of the second received signal and the auricle conduction characteristic response

X₂(k)＝H(r_s,k)S₂(k)+W₂(k)

Expanding D (k) to obtain