CN112526495B - Auricle conduction characteristic-based monaural sound source positioning method and system - Google Patents
Auricle conduction characteristic-based monaural sound source positioning method and system Download PDFInfo
- Publication number
- CN112526495B CN112526495B CN202011459187.3A CN202011459187A CN112526495B CN 112526495 B CN112526495 B CN 112526495B CN 202011459187 A CN202011459187 A CN 202011459187A CN 112526495 B CN112526495 B CN 112526495B
- Authority
- CN
- China
- Prior art keywords
- sound source
- auricle
- frequency domain
- signal
- received signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000012360 testing method Methods 0.000 claims abstract description 89
- 230000004044 response Effects 0.000 claims abstract description 73
- 238000011084 recovery Methods 0.000 claims abstract description 23
- 239000013598 vector Substances 0.000 claims description 33
- 239000011159 matrix material Substances 0.000 claims description 18
- 230000004807 localization Effects 0.000 claims description 12
- 210000000613 ear canal Anatomy 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 6
- 238000001514 detection method Methods 0.000 claims description 3
- 230000007613 environmental effect Effects 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000004891 communication Methods 0.000 abstract description 7
- 238000010586 diagram Methods 0.000 description 4
- 210000005069 ears Anatomy 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000009467 reduction Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 210000000883 ear external Anatomy 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S11/00—Systems for determining distance or velocity not using reflection or reradiation
- G01S11/14—Systems for determining distance or velocity not using reflection or reradiation using ultrasonic, sonic, or infrasonic waves
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention provides a auricle sound source positioning method and a auricle sound source positioning system based on auricle conduction characteristics, wherein the auricle sound source positioning method and the auricle sound source positioning system comprise the steps of respectively setting a plurality of test sound sources to be positioned at different azimuth angles or pitch angles under an ideal test environment, playing test sound source signals, recording signals acquired by using a noise microphone as first receiving signals, converting the first receiving signals and the test sound source signals into frequency domains by using discrete Fourier transform, and calculating auricle conduction characteristic responses for the test sound sources; during actual positioning, the noise microphone is utilized to receive a signal sent by a sound source to be detected, and the signal is recorded as a second receiving signal; and finally, converting the second received signal and the auricle conduction characteristic response into a frequency domain by using discrete Fourier transform, and estimating the sound source position by using a sparse recovery algorithm. Single-earphone sound source positioning is achieved, and additional communication, control and hardware overhead among the two earphones caused by the fact that the traditional double-earphone sound source positioning method needs double-earphone microphone acquisition is avoided.
Description
Technical Field
The invention relates to the technical field of sound source localization, in particular to a auricle conduction characteristic-based monaural sound source localization method and system.
Background
For wireless earphone products, the sound source localization can provide the functions of directional information, approaching sound early warning and the like for the earphone noise reduction algorithm by estimating the sound source direction, and has important functions of improving the earphone noise reduction performance and improving the user experience. At present, most of earphone sound source localization adopts a biological inspired localization algorithm, such as an ear spectrum clue and a seed binaural clue, namely an Interaural TIME DIFFERENCE (ITD) and an interaural level difference; or directly use the head-related impulse response.
However, the above algorithm needs to use 2 microphones to realize the binaural sound source positioning function, which means that for a wireless earphone product, signals and characteristic information needed for positioning need to be transmitted wirelessly between the ears, resulting in additional wireless communication and power consumption expense.
The auricle with unique appearance is the front part of the auditory organ, and the auricle has directional receiving function, and the auricle cavity including the external auditory canal has resonance, reverberation and buffering functions. Part of the study also indicated the acoustic load that the outer ear forms on the earpiece and measured the effect of different frequencies on the output performance of the earpiece.
For in-ear headphones, taking a microphone exposed at the outlet position of an auditory canal after being worn as an example, taking into consideration that auricles have different transmission characteristics such as reflection, reverberation and the like on incident sounds in different directions in irregular shapes, the invention provides a method and a system for positioning a monaural sound source based on auricle transmission characteristics by measuring and storing different characteristics of incident sound signals in different directions by using artificial ears in advance, constructing a measuring matrix, and converting the monaural sound source direction estimation problem into a sparse recovery problem. According to the technical scheme disclosed by the invention, the single-ear sound source direction estimation of the wireless earphone can be realized, so that the wireless communication and power consumption expenditure of a wireless earphone product can be greatly reduced.
Disclosure of Invention
The invention provides a monaural sound source positioning method and a monaural sound source positioning system based on auricle conduction characteristics, which are used for solving the defects of the prior art.
In one aspect, the present invention provides a method for monaural sound source localization based on auricle conduction characteristics, the method comprising the steps of:
s1: setting a noise microphone at an outlet position of a single artificial ear canal, respectively setting a plurality of test sound sources at different azimuth angles or pitch angles under an ideal test environment to play test sound source signals, recording signals acquired by the noise microphone as first receiving signals, converting the first receiving signals and the test sound source signals into frequency domains by using discrete Fourier transformation, and calculating auricle conduction characteristic responses for the test sound sources based on the frequency domain relation of the first receiving signals and the test sound source signals;
S2: during actual positioning, the noise microphone is utilized to receive a signal sent by a sound source to be detected, and the signal is recorded as a second receiving signal; and
S3: and converting the second received signal and the auricle conduction characteristic response into a frequency domain by using discrete Fourier transform, and estimating the sound source position by using a sparse recovery algorithm based on the frequency domain relation of the second received signal and the auricle conduction characteristic response to obtain the position of the sound source to be detected.
Aiming at the problem that the conventional sound source positioning algorithm adopted by the wireless earphone generally needs double earphones, so that extra wireless communication and power supply overhead among the double earphones are caused, the method provides a method for realizing single-ear sound source positioning by fully utilizing the conduction characteristics of the unique auricle shape to incident sounds in different directions, and therefore the system overhead of the wireless earphone product for realizing the sound source positioning function is obviously reduced.
In a specific embodiment, the noise microphone is positioned within the auricle and outside the outlet of the ear canal.
In a specific embodiment, the noise microphone is connected to the sound source position estimation module through an audio codec chip.
In a specific embodiment, the sound source position estimation module includes a microprocessor for controlling an audio codec chip and a microphone.
In a specific embodiment, the specific step of calculating the auricle conduction characteristic response for each of the test sound sources based on the frequency domain relation of the first received signal and the test sound source signal includes:
based on the frequency domain relation between the first received signal and the test sound source signal
X1(k)=H(rs,k)S1(k)+W1(k)
Wherein X 1 (k) is a vector of M frames of the first received signal in the frequency domain, H (r s, k) is a vector of auricle conduction characteristic frequency domain response at the test sound source position r s corresponding to the M frames of the first received signal, W 1 (k) is a vector of frequency domain noise corresponding to the M frames of the first received signal, and S 1 (k) is the test sound source signal frequency domain;
And (3) obtaining H (r s, k) by using known X 1(k)、W1 (k) and S 1 (k), performing the operation on the spatial positions { r 1,r2,…,rD } of all the test sound sources to obtain auricle conduction characteristic frequency domain response matrixes D (k) = { H (r 1,k),H(r2,k),…,H(rD, k) } corresponding to all the test sound sources, and storing the auricle conduction characteristic frequency domain response matrixes D (k) = { H (r 1,k),H(r2,k),…,H(rD, k) }.
In a specific embodiment, estimating the sound source azimuth using a sparse recovery algorithm based on the frequency domain relation of the second received signal and the auricle conduction characteristic response specifically includes:
using the auricle conduction characteristic frequency domain response matrix D (k) as a dictionary based on the frequency domain relation of the second received signal and the auricle conduction characteristic response
X2(k)=H(rs,k)S2(k)+W2(k)
Expanding D (k) to obtain
Wherein X 2 (k) is a vector of M frames of the second received signal in the frequency domain, H (r s, k) is a vector of auricle conduction characteristic frequency domain response at the test sound source position r s corresponding to the first received signal in the M frames, W 2 (k) is a vector of frequency domain noise corresponding to the second received signal in the M frames, S 2 (k) is a signal frequency domain of the sound source to be measured,The signal frequency domain of the sound source to be detected under dictionary extension;
Based on a compressed sensing algorithm, taking the auricle conduction characteristic frequency domain response matrix D (k) as an observation matrix, and using a sparse recovery algorithm to formula Solving to obtainAccording toThe non-zero elements and the positions of the non-zero elements, and the positions of the sound sources to be detected corresponding to the positions are obtained. Because the number of the target sound sources is far smaller than the number D of the space position sets, the signal vectors of the sound sources to be detected on the extended frequency domain have sparsity under the space position sets, and when the sound sources are single sound sources, the vectorsThe number of non-0 elements in the list is only 1. Regarding the redundant room frequency domain response D (k) as an observation matrix in the compressed sensing algorithm, the high probability lossless reconstruction can be realized under the condition that the observation matrix meets the limited equidistance (RESTRICTED ISOMETRY PROPERTY, RIP)And sound source positionOne-to-one correspondence with non-0 positions in (a). Considering the unique shape of human auricle characteristics, each element in the auricle conduction characteristic frequency domain response matrix D (k) corresponding to sound sources with different azimuth angles and pitch angles is in random distribution characteristics, so that the limited equidistant condition is satisfied, and a sparse recovery algorithm can be adopted to perform a formulaIs a sparse solution of (2).
In a specific embodiment, the said methodThe method for obtaining the azimuth of the sound source to be detected corresponding to the position comprises the following steps of:
setting a threshold eta, and setting the length of the discrete Fourier transform to be the same as the frame length L; selecting the said Higher energy of the middleThe frequency points are overlapped to obtain Taking outThe position corresponding to the maximum element in the model is the spatial position of the sound source signal to be detected. The information of a plurality of frequency points is comprehensively utilized, so that the obtained sound source position is more accurate.
According to a second aspect of the present invention, a computer-readable storage medium is presented, on which a computer program is stored, which computer program, when being executed by a computer processor, carries out the above-mentioned method.
According to a third aspect of the present invention, a monaural sound source localization system based on auricle conduction characteristics is presented, the system comprising a test sound source, a noise microphone and a sound source orientation estimation module:
the noise microphone is arranged at the outlet position of a single artificial ear canal, the test sound sources are respectively arranged at different azimuth angles or pitch angles to play sound source test signals, and the noise microphone is configured to record the acquired signals as first receiving signals in the stage of acquiring auricle conduction characteristics;
The sound source position estimation module is configured to convert the first received signal and the test sound source signal into a frequency domain by using discrete Fourier transform in the auricle conduction characteristic acquisition stage, and calculate auricle conduction characteristic responses for each of the test sound sources based on the frequency domain relation of the first received signal and the test sound source signal; and
The noise microphone is arranged at the actual positioning stage and used for receiving a signal sent by a sound source to be detected and recording the signal as a second receiving signal;
The sound source position estimation module is further configured to convert the second received signal and the auricle conduction characteristic response into a frequency domain by using discrete Fourier transform in an actual positioning stage, and perform sound source position estimation by using a sparse recovery algorithm based on a frequency domain relation of the second received signal and the auricle conduction characteristic response to obtain the position of the sound source to be measured.
The method comprises the steps of setting a noise microphone at an outlet position of a single artificial ear canal, respectively setting a plurality of test sound sources to be positioned at different azimuth angles or pitch angles under an ideal test environment, playing test sound source signals, recording signals acquired by the noise microphone as first receiving signals, converting the first receiving signals and the test sound source signals into frequency domains by using discrete Fourier transform, and calculating auricle conduction characteristic responses for the test sound sources based on the frequency domain relation of the first receiving signals and the test sound source signals; during actual positioning, the noise microphone is utilized to receive a signal sent by a sound source to be detected, and the signal is recorded as a second receiving signal; and finally, converting the second received signal and the auricle conduction characteristic response into a frequency domain by using discrete Fourier transform, and estimating the sound source position by using a sparse recovery algorithm based on the frequency domain relation of the second received signal and the auricle conduction characteristic response to obtain the position of the sound source to be detected. Based on different emission and reverberation characteristics caused by unique auricle shapes of human ears, a sparse recovery equation of a sound source space position is established by utilizing auricle conduction characteristic response of a microphone in the time and frequency domain of signal acquisition, and single-earphone sound source positioning is realized, so that extra inter-earphone communication, control and hardware cost caused by the need of the binaural microphone acquisition of a traditional binaural sound source positioning method are avoided.
Drawings
The accompanying drawings are included to provide a further understanding of the embodiments and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and together with the description serve to explain the principles of the application. Many of the intended advantages of other embodiments and embodiments will be readily appreciated as they become better understood by reference to the following detailed description. Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:
FIG. 1 is a flow chart of a method of monaural sound source localization based on auricle conduction characteristics, according to an embodiment of the invention;
FIG. 2 is a circuit diagram of a noise microphone and its connection to a microprocessor in accordance with a specific embodiment of the present invention;
fig. 3 is a schematic diagram of a monaural acoustic source localization system based on auricle conduction characteristics, according to an embodiment of the invention.
Detailed Description
The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the present application are shown in the drawings.
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
Fig. 1 shows a flowchart of a monaural sound source localization method based on auricle conduction characteristics according to an embodiment of the present invention. As shown in fig. 1, the method comprises the steps of:
S101: the method comprises the steps of setting a noise microphone at an outlet position of a single artificial ear canal, respectively setting a plurality of test sound sources to be positioned at different azimuth angles or pitch angles under an ideal test environment, playing test sound source signals, recording signals acquired by the noise microphone as first receiving signals, converting the first receiving signals and the test sound source signals into frequency domains by using discrete Fourier transformation, and calculating auricle conduction characteristic responses for the test sound sources based on frequency domain relations of the first receiving signals and the test sound source signals.
In a specific embodiment, the specific step of calculating the auricle conduction characteristic response for each of the test sound sources based on the frequency domain relation of the first received signal and the test sound source signal includes:
based on the frequency domain relation between the first received signal and the test sound source signal
X1(k)=H(rs,k)S1(k)+W1(k)
Wherein X 1 (k) is a vector of M frames of the first received signal in the frequency domain, H (r s, k) is a vector of auricle conduction characteristic frequency domain response at the test sound source position r s corresponding to the M frames of the first received signal, W 1 (k) is a vector of frequency domain noise corresponding to the M frames of the first received signal, and S 1 (k) is the test sound source signal frequency domain;
And (3) obtaining H (r s, k) by using known X 1(k)、W1 (k) and S 1 (k), performing the operation on the spatial positions { r 1,r2,…,rD } of all the test sound sources to obtain auricle conduction characteristic frequency domain response matrixes D (k) = { H (r 1,k),H(r2,k),…,H(rD, k) } corresponding to all the test sound sources, and storing the auricle conduction characteristic frequency domain response matrixes D (k) = { H (r 1,k),H(r2,k),…,H(rD, k) }.
In this embodiment, the test sound source positions range from 0 degrees to 360 degrees using 12 azimuth angles, and from 0 degrees to 90 degrees using 4 pitch angles, and a total of d=4×12=48 test sound source spatial position sets are formed.
In a specific embodiment, the frequency domain relation between the received signal and the sound source signal used in the present invention can be obtained based on the following process:
After detecting the start point of the received signal based on the end point detection method in the noise microphone, the mth frame signal x m (t) (m=1, 2, …, M) of the received signal is defined as:
xm(t)=hm(rs,t)*s(t+mL-τm)+ωm(t)
Wherein M is the number of data frames of a received signal, t is a time sequence, L is a frame length, h m(rs, t) is the auricle conduction characteristic response from the position r s of a sound source to the noise microphone, s m(t)=s(t+mL-τm) is a sound source signal corresponding to x m (t), τ m is the time delay of the sound source signal corresponding to x m (t), ω m (t) is the environmental noise corresponding to x m (t), and is convolution operation;
Converting the mth frame signal x m (t) (m=1, 2, …, M) to the frequency domain to obtain an array signal of the mth+k frequency point on the mth frame signal:
Xm(k)=Hm(rs,m+k)S(m+k)+Wm(m+k)
Wherein X m(k)、Hm(rs,m+k)、S(m+k)、Wm (m+k) is the discrete fourier transform of X m(t)、hm(rs,t)、sm(t)、ωm (t), respectively;
X m(k)、Hm(rs,m+k)、S(m+k)、Wm (m+k) is represented using a vector:
X(k)={X1(1+k),X2(2+k),…,XM(M+k)}T
H(rs,k)={H1(rs,1+k),H2(rs,2+k),…,HM(rs,M+k)}T
W(k)={W1(1+k),W2(2+k),…,WM(M+k)}T
a prototype of the solution equation used in the present invention can be obtained:
X(k)=H(rs,k)S(k)+W(k)
Where X (k) represents a vector of the frequency domain of the received signal, H (r s, k) represents a vector of the auricle conduction characteristic frequency domain response of the sound source having a distance r s from the receiving point, S (k) is a vector of the sound source signal frequency domain, and W (k) is a vector of the corresponding frequency domain noise.
It should be appreciated that the above formula derivation related to the principles of the present invention, in which the frequency domain relationship between the first received signal and the test sound source signal and the frequency domain relationship between the second received signal and the auricle conduction characteristic response are obtained based on the formula X (k) =h (r s, k) S (k) +w (k), is implemented by substituting a specific variable into the formula X (k) =h (r s, k) S (k) +w (k) according to the specific situation of the auricle conduction characteristic acquisition stage and the actual positioning stage.
S102: and in the actual positioning process, the noise microphone is utilized to receive a signal sent by a sound source to be detected, and the signal is recorded as a second receiving signal.
S103: and converting the second received signal and the auricle conduction characteristic response into a frequency domain by using discrete Fourier transform, and estimating the sound source position by using a sparse recovery algorithm based on the frequency domain relation of the second received signal and the auricle conduction characteristic response to obtain the position of the sound source to be detected.
In a specific embodiment, estimating the sound source azimuth using a sparse recovery algorithm based on the frequency domain relation of the second received signal and the auricle conduction characteristic response specifically includes:
using the auricle conduction characteristic frequency domain response matrix D (k) as a dictionary based on the frequency domain relation of the second received signal and the auricle conduction characteristic response
X2(k)=H(rs,k)S2(k)+W2(k)
Expanding D (k) to obtain
Wherein X 2 (k) is a vector of M frames of the second received signal in the frequency domain, H (r s, k) is a vector of auricle conduction characteristic frequency domain response at the test sound source position r s corresponding to the first received signal in the M frames, W 2 (k) is a vector of frequency domain noise corresponding to the second received signal in the M frames, S 2 (k) is a signal frequency domain of the sound source to be measured,The signal frequency domain of the sound source to be detected under dictionary extension;
Based on a compressed sensing algorithm, taking the auricle conduction characteristic frequency domain response matrix D (k) as an observation matrix, and using a sparse recovery algorithm to formula Solving to obtainAccording toThe non-zero elements and the positions of the non-zero elements, and the positions of the sound sources to be detected corresponding to the positions are obtained.
In a preferred embodiment, the computation complexity of the orthogonal matching pursuit (Orthogonal Matching Pursuit, OMP) algorithm is low, and the main process of using the algorithm as the sparse recovery algorithm in this embodiment is: calculating the maximum correlation position of the observation matrix D (k) and the compressed sampling signal, obtaining an approximate solution of the signal by solving the least square problem, repeating the process under the condition that the iteration times are smaller than the sparsity, and outputting an index set with the maximum correlation and the reconstructed signal.
In a specific embodiment, the said methodThe method for obtaining the azimuth of the sound source to be detected corresponding to the position comprises the following steps of:
setting a threshold eta, and setting the length of the discrete Fourier transform to be the same as the frame length L; selecting the said Higher energy of the middleThe frequency points are overlapped to obtain Taking outThe position corresponding to the maximum element in the model is the spatial position of the sound source signal to be detected. In the present embodiment, a frame length l=512, and a threshold value η=20 (0+.η+.100) is set.
It should be noted that, due to unavoidable differences in the auricle shape and artificial ear of different persons, there will be a certain difference between the auricle conduction characteristic frequency domain response matrix D (k) acquired and stored in advance and the auricle conduction characteristic frequency domain response of the actual wearer. Considering that in typical applications of wireless earphone products, such as environmental safety precaution and algorithm parameter adjustment, the accuracy requirement for sound source positioning information is not high (such as 60 degrees for direction angle resolution and 30 degrees for pitch angle resolution in the embodiment), the reduction of sound source positioning accuracy caused by individuals with auricles of different people does not affect the typical use purpose of earphone products.
Fig. 2 shows a circuit diagram of a noise microphone and its connection to a microprocessor of a specific embodiment of the invention, in which the noise microphone is connected to the sound source position estimation module via an audio codec chip 201, and the sound source position estimation module comprises a microprocessor 203 for controlling the audio codec chip 201 and the microphone 202.
Fig. 3 shows a schematic diagram of a monaural sound source localization system based on auricle conduction characteristics according to an embodiment of the invention, comprising a test sound source 301, a noise microphone 302 and a sound source orientation estimation module 303, said noise microphone 302 being arranged in a position within the auricle 304 and outside the outlet of the auditory canal.
In a specific embodiment, the noise microphone 302 is disposed at a single artificial ear canal outlet position, the test sound source 301 is disposed at different azimuth or pitch angle positions to play sound source test signals, and the noise microphone 302 is configured to record the collected signals as first received signals in the stage of acquiring auricle conduction characteristics; the sound source position estimation module 303 is configured to convert the first received signal and the test sound source 301 signal into a frequency domain by using discrete fourier transform in a stage of acquiring auricle conduction characteristics, and calculate auricle conduction characteristic responses for each of the test sound sources 301 based on a frequency domain relationship of the first received signal and the test sound source 301 signal; the noise microphone 302 is configured to receive a signal sent by a sound source to be detected in an actual positioning stage, and record the signal as a second received signal; the sound source position estimation module 303 is further configured to convert the second received signal and the auricle conduction characteristic response to a frequency domain by using discrete fourier transform in an actual positioning stage, and perform sound source position estimation by using a sparse recovery algorithm based on a frequency domain relationship between the second received signal and the auricle conduction characteristic response, so as to obtain a position of the sound source to be measured.
The system establishes a sparse recovery equation of the sound source space position by testing the combined action of the sound source 301, the noise microphone 302 and the sound source azimuth estimation module 303 and utilizing the response of the auricle conduction characteristics of the frequency domain when the microphone is used for collecting the signal construction based on different emission and reverberation characteristics caused by the unique auricle shape of the human ear, and realizes the single-earphone sound source positioning, thereby avoiding the communication, control and hardware cost among the extra binaural earphones caused by the need of the binaural microphone collection of the traditional binaural sound source positioning method.
Embodiments of the present application also relate to a computer readable storage medium having stored thereon a computer program which, when executed by a computer processor, implements the method as described above. The computer program contains program code for performing the method shown in the flow chart. The computer readable medium of the present application may be a computer readable signal medium or a computer readable medium, or any combination of the two.
The method comprises the steps of setting a noise microphone at an outlet position of a single artificial ear canal, respectively setting a plurality of test sound sources to be positioned at different azimuth angles or pitch angles under an ideal test environment, playing test sound source signals, recording signals acquired by the noise microphone as first receiving signals, converting the first receiving signals and the test sound source signals into frequency domains by using discrete Fourier transform, and calculating auricle conduction characteristic responses for the test sound sources based on the frequency domain relation of the first receiving signals and the test sound source signals; during actual positioning, the noise microphone is utilized to receive a signal sent by a sound source to be detected, and the signal is recorded as a second receiving signal; and finally, converting the second received signal and the auricle conduction characteristic response into a frequency domain by using discrete Fourier transform, and estimating the sound source position by using a sparse recovery algorithm based on the frequency domain relation of the second received signal and the auricle conduction characteristic response to obtain the position of the sound source to be detected. Based on different emission and reverberation characteristics caused by unique auricle shapes of human ears, a sparse recovery equation of a sound source space position is established by utilizing auricle conduction characteristic response of a microphone in the time and frequency domain of signal acquisition, and single-earphone sound source positioning is realized, so that extra inter-earphone communication, control and hardware cost caused by the need of the binaural microphone acquisition of a traditional binaural sound source positioning method are avoided.
The above description is only illustrative of the preferred embodiments of the present application and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the application referred to in the present application is not limited to the specific combinations of the technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the inventive concept described above. Such as the above-mentioned features and the technical features disclosed in the present application (but not limited to) having similar functions are replaced with each other.
Claims (6)
1. The auricle conduction characteristic-based monaural sound source positioning method is characterized by comprising the following steps of:
s1: setting a noise microphone at an outlet position of a single artificial ear canal, respectively setting a plurality of test sound sources at different azimuth angles or pitch angles under an ideal test environment to play test sound source signals, recording signals acquired by the noise microphone as first receiving signals, converting the first receiving signals and the test sound source signals into frequency domains by using discrete Fourier transformation, and calculating auricle conduction characteristic responses for the test sound sources based on the frequency domain relation of the first receiving signals and the test sound source signals;
Wherein the specific step of calculating the auricle conduction characteristic response for each of the test sound sources based on the frequency domain relation of the first received signal and the test sound source signal includes: after detecting the start point of the received signal based on the end point detection method in the noise microphone, the mth frame signal x m (t) of the received signal is defined as:
xm(t)=hm(rs,t)*s(t+mL-τm)+ωm(t)
Wherein m=1, 2, …, M is the number of data frames of the received signal, t is a time sequence, L is a frame length, h m(rs, t) is the auricle conduction characteristic response of the sound source from the position r s to the noise microphone, s m(t)=s(t+mL-τm) is the sound source signal corresponding to x m (t), τ m is the delay of the sound source signal corresponding to x m (t), ω m (t) is the environmental noise corresponding to x m (t), and is convolution operation;
Converting the m-th frame signal x m (t) to a frequency domain to obtain an array signal of the m+k frequency points on the m-th frame signal: x m(k)=Hm(rs,m+k)S(m+k)+Wm (m+k)
Wherein X m(k)、Hm(rs,m+k)、S(m+k)、Wm (m+k) is the discrete fourier transform of X m(t)、hm(rs,t)、sm(t)、ωm (t), respectively;
X m(k)、Hm(rs,m+k)、S(m+k)、Wm (m+k) is represented using a vector:
X1(k)={X1(1+k),X2(2+k),…,XM(M+k)}T;
H(rs,k)={H1(rs,1+k),H2(rs,2+k),…,HM(rs,M+k)}T;
W1(k)={W1(1+k),W2(2+k),…,WM(M+k)}T;
Obtaining a frequency domain relationship between the first received signal and the test sound source signal
X1(k)=H(rs,k)S1(k)+W1(k)
Wherein X 1 (k) is a vector of M frames of the first received signal in the frequency domain, H (r s, k) is a vector of auricle conduction characteristic frequency domain response at the test sound source position r s corresponding to the M frames of the first received signal, W 1 (k) is a vector of frequency domain noise corresponding to the M frames of the first received signal, and S 1 (k) is the test sound source signal frequency domain;
Obtaining H (r s, k) by using known X 1(k)、W1 (k) and S 1 (k), performing the above operation on the spatial positions { r 1,r2,…,rD } of all the test sound sources to obtain auricle conduction characteristic frequency domain response matrixes D (k) = { H (r 1,k),H(r2,k),…,H(rD, k) } corresponding to all the test sound sources, and storing;
S2: during actual positioning, the noise microphone is utilized to receive a signal sent by a sound source to be detected, and the signal is recorded as a second receiving signal; and
S3: converting the second received signal and the auricle conduction characteristic response into a frequency domain by using discrete Fourier transform, and estimating the sound source position by using a sparse recovery algorithm based on the frequency domain relation of the second received signal and the auricle conduction characteristic response to obtain the position of the sound source to be detected;
The estimating the sound source azimuth by using a sparse recovery algorithm based on the frequency domain relation of the second received signal and the auricle conduction characteristic response specifically comprises: using the auricle conduction characteristic frequency domain response matrix D (k) as a dictionary based on the frequency domain relation of the second received signal and the auricle conduction characteristic response
X2(k)=H(rs,k)S2(k)+W2(k)
Expanding D (k) to obtain
Wherein X 2 (k) is a vector of M frames of the second received signal in the frequency domain, H (r s, k) is a vector of auricle conduction characteristic frequency domain response at the test sound source position r s corresponding to the first received signal in the M frames, W 2 (k) is a vector of frequency domain noise corresponding to the second received signal in the M frames, S 2 (k) is a signal frequency domain of the sound source to be measured,The signal frequency domain of the sound source to be detected under dictionary extension;
Based on a compressed sensing algorithm, taking the auricle conduction characteristic frequency domain response matrix D (k) as an observation matrix, and using a sparse recovery algorithm to formula Solving to obtainAccording toThe method comprises the steps of obtaining the azimuth of a sound source to be detected corresponding to the position of a non-zero element in a database and the position of the non-zero element;
Wherein the said is based on The method for obtaining the azimuth of the sound source to be detected corresponding to the position comprises the following steps of: setting a threshold eta, and setting the length of the discrete Fourier transform to be the same as the frame length L; selecting the saidHigher energy of the middleThe frequency points are overlapped to obtainTaking outThe position corresponding to the maximum element in the model is the spatial position of the sound source signal to be detected.
2. The method of claim 1, wherein the noise microphone is positioned within the pinna and outside of the canal outlet.
3. The method of claim 1, wherein the noise microphone is coupled to the sound source position estimation module through an audio codec chip.
4. The method of claim 1, wherein the sound source position estimation module comprises a microprocessor for controlling an audio codec chip and a microphone.
5. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a computer processor, implements the method of any of claims 1 to 4.
6. A monaural sound source localization system based on auricle conduction characteristics, wherein the system is applied to a wireless earphone product, and comprises a test sound source, a noise microphone and a sound source position estimation module:
The noise microphone is arranged at the outlet position of a single artificial ear canal, the test sound source is respectively arranged at different azimuth angles or pitch angles to play test sound source signals, and the noise microphone is configured to record the acquired signals as first receiving signals in the stage of acquiring auricle conduction characteristics;
The sound source position estimation module is configured to convert the first received signal and the test sound source signal into frequency domains by using discrete fourier transform in the stage of acquiring auricle conduction characteristics, and calculate auricle conduction characteristic responses for each of the test sound sources based on the frequency domain relationship of the first received signal and the test sound source signal, wherein the specific step of calculating auricle conduction characteristic responses for each of the test sound sources based on the frequency domain relationship of the first received signal and the test sound source signal includes: after detecting the start point of the received signal based on the end point detection method in the noise microphone, the mth frame signal x m (t) of the received signal is defined as x m(t)=hm(rs,t)*s(t+mL-τm)+ωm (t) where, m=1, 2, …, M is the number of data frames of the received signal, t is the time sequence, L is the frame length, h m(rs, t) is the auricle conduction characteristic response of the sound source from the position r s to the noise microphone, s m(t)=s(t+mL-τm) is the sound source signal corresponding to x m (t), τ m is the delay of the sound source signal corresponding to x m (t), ω m (t) is the ambient noise corresponding to x m (t), * Is convolution operation; Converting the m-th frame signal x m (t) to a frequency domain to obtain an array signal of the m+k frequency points on the m-th frame signal: x m(k)=Hm(rs,m+k)S(m+k)+Wm (m+k) wherein X m(k)、Hm(rs,m+k)、S(m+k)、Wm (m+k) is the discrete fourier transform of X m(t)、hm(rs,t)、sm(t)、ωm (t), respectively; X m(k)、Hm(rs,m+k)、S(m+k)、Wm (m+k) is represented using vectors :X1(k)={X1(1+k),X2(2+k),…,XM(M+k)}T;H(rs,k)={H1(rs,1+k),H2(rs,2+k),…,HM(rs,M+k)}T;W1(k)={W1(1+k),W2(2+k),…,WM(M+k)}T;
Obtaining a frequency domain relationship between the first received signal and the test sound source signal
X1(k)=H(rs,k)S1(k)+W1(k)
Wherein X 1 (k) is a vector of M frames of the first received signal in the frequency domain, H (r s, k) is a vector of auricle conduction characteristic frequency domain response at the test sound source position r s corresponding to the M frames of the first received signal, W 1 (k) is a vector of frequency domain noise corresponding to the M frames of the first received signal, and S 1 (k) is the test sound source signal frequency domain;
Obtaining H (r s, k) by using known X 1(k)、W1 (k) and S 1 (k), performing the above operation on the spatial positions { r 1,r2,…,rD } of all the test sound sources to obtain auricle conduction characteristic frequency domain response matrixes D (k) = { H (r 1,k),H(r2,k),…,H(rD, k) } corresponding to all the test sound sources, and storing; and
The noise microphone is arranged at the actual positioning stage and used for receiving a signal sent by a sound source to be detected and recording the signal as a second receiving signal;
The sound source azimuth estimation module is further configured to convert the second received signal and the auricle conduction characteristic response into a frequency domain by using a discrete fourier transform in an actual positioning stage, perform sound source azimuth estimation by using a sparse recovery algorithm based on a frequency domain relationship of the second received signal and the auricle conduction characteristic response, and obtain an azimuth of the sound source to be measured, where performing sound source azimuth estimation by using a sparse recovery algorithm based on a frequency domain relationship of the second received signal and the auricle conduction characteristic response specifically includes: using the auricle conduction characteristic frequency domain response matrix D (k) as a dictionary based on the frequency domain relation of the second received signal and the auricle conduction characteristic response
X2(k)=H(rs,k)S2(k)+W2(k)
Expanding D (k) to obtain
Wherein X 2 (k) is a vector of M frames of the second received signal in the frequency domain, H (r s, k) is a vector of auricle conduction characteristic frequency domain response at the test sound source position r s corresponding to the first received signal in the M frames, W 2 (k) is a vector of frequency domain noise corresponding to the second received signal in the M frames, S 2 (k) is a signal frequency domain of the sound source to be measured,The signal frequency domain of the sound source to be detected under dictionary extension;
Based on a compressed sensing algorithm, taking the auricle conduction characteristic frequency domain response matrix D (k) as an observation matrix, and using a sparse recovery algorithm to formula Solving to obtainAccording toThe method comprises the steps of obtaining the azimuth of a sound source to be detected corresponding to the position of a non-zero element in a database and the position of the non-zero element;
Wherein the said is based on The method for obtaining the azimuth of the sound source to be detected corresponding to the position comprises the following steps of: setting a threshold eta, and setting the length of the discrete Fourier transform to be the same as the frame length L; selecting the saidHigher energy of the middleThe frequency points are overlapped to obtainTaking outThe position corresponding to the maximum element in the model is the spatial position of the sound source signal to be detected.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011459187.3A CN112526495B (en) | 2020-12-11 | 2020-12-11 | Auricle conduction characteristic-based monaural sound source positioning method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011459187.3A CN112526495B (en) | 2020-12-11 | 2020-12-11 | Auricle conduction characteristic-based monaural sound source positioning method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112526495A CN112526495A (en) | 2021-03-19 |
CN112526495B true CN112526495B (en) | 2024-07-30 |
Family
ID=74999125
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011459187.3A Active CN112526495B (en) | 2020-12-11 | 2020-12-11 | Auricle conduction characteristic-based monaural sound source positioning method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112526495B (en) |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3999689B2 (en) * | 2003-03-17 | 2007-10-31 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Sound source position acquisition system, sound source position acquisition method, sound reflection element for use in the sound source position acquisition system, and method of forming the sound reflection element |
US7280943B2 (en) * | 2004-03-24 | 2007-10-09 | National University Of Ireland Maynooth | Systems and methods for separating multiple sources using directional filtering |
JP4250133B2 (en) * | 2004-10-26 | 2009-04-08 | 株式会社シマダ製作所 | Single ear hearing improvement device |
WO2007147049A2 (en) * | 2006-06-14 | 2007-12-21 | Think-A-Move, Ltd. | Ear sensor assembly for speech processing |
US9432778B2 (en) * | 2014-04-04 | 2016-08-30 | Gn Resound A/S | Hearing aid with improved localization of a monaural signal source |
DK2928213T3 (en) * | 2014-04-04 | 2018-08-27 | Gn Hearing As | A hearing aid with improved localization of monaural signal sources |
CN106847301A (en) * | 2017-01-03 | 2017-06-13 | 东南大学 | A kind of ears speech separating method based on compressed sensing and attitude information |
EP3373602A1 (en) * | 2017-03-09 | 2018-09-12 | Oticon A/s | A method of localizing a sound source, a hearing device, and a hearing system |
CN109581385B (en) * | 2018-12-17 | 2020-05-19 | 山东大学 | Target positioning device and method based on double-lug-contour bionic sonar of large-ear bats |
CN110133596B (en) * | 2019-05-13 | 2023-06-23 | 江苏第二师范学院(江苏省教育科学研究院) | Array sound source positioning method based on frequency point signal-to-noise ratio and bias soft decision |
-
2020
- 2020-12-11 CN CN202011459187.3A patent/CN112526495B/en active Active
Non-Patent Citations (2)
Title |
---|
基于分布式压缩感知的麦克风阵列声源定位;黄惠祥、郭秋涵、童峰;《兵工学报》;第1725-1731页 * |
基于机器人听觉系统的声源目标定位研究;陈涛;《中国博士学位论文全文数据库信息科技辑》;第17、55-62页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112526495A (en) | 2021-03-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10431239B2 (en) | Hearing system | |
EP3248393B1 (en) | Hearing assistance system | |
CN101938686A (en) | Measurement system and measurement method for head-related transfer function in common environment | |
EP2395909A1 (en) | Distributed sensing of signals linked by sparse filtering | |
Sakamoto et al. | Sound-space recording and binaural presentation system based on a 252-channel microphone array | |
JP6613078B2 (en) | Signal processing apparatus and control method thereof | |
JP2741817B2 (en) | Out-of-head stereophonic headphone listening device | |
Talagala et al. | Binaural sound source localization using the frequency diversity of the head-related transfer function | |
Pollack et al. | Chapter Perspective Chapter: Modern Acquisition of Personalised Head-Related Transfer Functions–An Overview | |
MacDonald | A localization algorithm based on head-related transfer functions | |
CN108122559A (en) | Binaural sound sources localization method based on deep learning in a kind of digital deaf-aid | |
CN111142066A (en) | Direction-of-arrival estimation method, server, and computer-readable storage medium | |
US11510013B2 (en) | Partial HRTF compensation or prediction for in-ear microphone arrays | |
EP3588979B1 (en) | A method for enhancing a signal directionality in a hearing instrument | |
CN102736064A (en) | Compression sensor-based positioning method of sound source of hearing aid | |
CN112526495B (en) | Auricle conduction characteristic-based monaural sound source positioning method and system | |
CN108229030B (en) | Design method of controller parameters of active noise reduction system | |
US11871190B2 (en) | Separating space-time signals with moving and asynchronous arrays | |
CN112153552B (en) | Self-adaptive stereo system based on audio analysis | |
CN109688531B (en) | Method for acquiring high-sound-quality audio conversion information, electronic device and recording medium | |
Oreinos et al. | Effect of higher-order ambisonics on evaluating beamformer benefit in realistic acoustic environments | |
CN115604646B (en) | Panoramic deep space audio processing method | |
Oreinos et al. | Objective analysis of higher-order Ambisonics sound-field reproduction for hearing aid applications | |
WO2015032009A1 (en) | Small system and method for decoding audio signals into binaural audio signals | |
Marin-Hurtado et al. | Practical MWF-based noise-reduction methods for binaural hearing aids |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |