CN110007276A - A kind of sound localization method and system - Google Patents

A kind of sound localization method and system Download PDF

Info

Publication number
CN110007276A
CN110007276A CN201910312565.6A CN201910312565A CN110007276A CN 110007276 A CN110007276 A CN 110007276A CN 201910312565 A CN201910312565 A CN 201910312565A CN 110007276 A CN110007276 A CN 110007276A
Authority
CN
China
Prior art keywords
frame signal
frame
signal
road
obtains
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910312565.6A
Other languages
Chinese (zh)
Other versions
CN110007276B (en
Inventor
黄丽霞
张雪英
王杰
李凤莲
陈桂军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiyuan University of Technology
Original Assignee
Taiyuan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiyuan University of Technology filed Critical Taiyuan University of Technology
Priority to CN201910312565.6A priority Critical patent/CN110007276B/en
Publication of CN110007276A publication Critical patent/CN110007276A/en
Application granted granted Critical
Publication of CN110007276B publication Critical patent/CN110007276B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/20Position of source determined by a plurality of spaced direction-finders

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

The invention discloses a kind of sound localization method and systems.Then the sound source voice signal adding window framing that sound localization method of the invention first obtains quaternary microphone array detects the effective frame signal of signal, and calculates the secondary relevant broad sense spectrum of fusion to the effective frame signal filtered out and subtract orrection phase place transforming function transformation function.To further increase time delay precision, subtract orrection phase place transforming function transformation function calculation delay value using secondary relevant average broad sense spectrum is merged.Sounnd source direction estimation is finally carried out according to the geometric position of microphone array and calculated time delay value, improves the precision of auditory localization.

Description

A kind of sound localization method and system
Technical field
The present invention relates to field of sound source location, in particular to a kind of sound localization method and system.
Background technique
Auditory localization has become a research hotspot of field of voice signal, in video conference, intelligent robot And the fields such as intelligent video monitoring system are widely used.And traditional location algorithm is in low signal-to-noise ratio, high reverberation time Adverse circumstances under, locating accuracy sharply declines.
Summary of the invention
The object of the present invention is to provide a kind of sound localization method and systems, to improve the accuracy rate of auditory localization.
To achieve the above object, the present invention provides following schemes:
The present invention provides a kind of sound localization method, and the sound localization method includes the following steps:
Four tunnel sound source voice signals are collected using quaternary microphone array;The quaternary microphone array includes four Microphone, each microphone acquire sound source voice signal all the way;
The sound source voice signal described in four tunnels synchronizes sub-frame processing, obtains frame signal set, the signal frame set In each frame signal include four tunnel frame signals, respectively first via frame signal, the second tunnel frame signal, third road frame signal and the Four tunnel frame signals;
The validity for judging each frame signal in the frame signal set obtains valid frame signal subset;
According to the valid frame signal subset, the secondary relevant average broad sense spectrum of any two-way valid frame signal fused is obtained Subtract orrection phase place transforming function transformation function;
It obtains the secondary relevant average broad sense spectrum of any two-way valid frame signal fused and subtracts orrection phase place transforming function transformation function most At time point corresponding to big peak value, obtain the time delay value of any two-way microphone sound-source signal;
According to the time delay value of the geometric position of the quaternary microphone array and any two-way microphone sound-source signal, determine The direction position of sound source.
Optionally, the sound source voice signal described in four tunnels synchronizes sub-frame processing, obtains frame signal set, specifically Include:
Using window functionThe sound source voice signal described in four tunnels synchronizes at adding window framing Reason, obtains frame signal xij(n), n indicates that n-th of sampled point, n=1,2 ..., N, N indicate frame length, xij(n) i-th of frame is indicated The signal on the jth road of signal, j=1,2,3,4;
All frame signals are synthesized into frame signal set.
Optionally, the validity for judging each frame signal in the frame signal set, obtains valid frame signal subset, It specifically includes:
Utilize formulaCalculate the short time frame energy of the jth road frame signal of i-th of frame signal;Wherein, Eij Indicate that the short time frame energy of the jth road frame signal of i-th of frame signal, n indicate that n-th of sampled point, n=1,2 ..., N, N indicate Frame length;
Judge whether the short time frame energy of the jth road frame signal of i-th of frame signal is greater than the first preset threshold, obtains first Judging result;
If first judging result indicates that the short time frame energy is not more than first preset threshold, by the value of i Increase by 1, return step " utilizes formulaCalculate the short time frame energy of the jth road frame signal of i-th of frame signal ";
If first judging result indicates that the battle array energy in short-term is greater than first preset threshold, by i-th of frame Signal is set as starting point, and the value of i is increased by 1;
Utilize formulaCalculate the jth road frame signal of i-th of frame signal Zero-crossing rate;Wherein,
Judge whether the zero-crossing rate is greater than the second preset threshold, obtains the second judging result;
If second judging result indicates that the zero-crossing rate is greater than second preset threshold, by i-th of frame signal Jth road frame signal label TijIt is set as 1;
If described, judging result indicates that the zero-crossing rate is not more than second preset threshold, and i-th of frame is believed Number jth road frame signal label TijIt is set as 0;
Utilize formula S S (i)=Ti1&&Ti2&&Ti3&&Ti4, calculate the total of the label of four tunnel frame signals of i-th of frame signal State value SS (i);Wherein, Ti1、Ti2、Ti3And Ti4Respectively indicate the 1st tunnel, the 2nd tunnel, the 3rd road and the 4th road frame of i-th of frame signal The label of signal;
Judge whether total state value SS (i) is equal to 1, obtains third judging result;
If the third judging result indicates that SS (i) is equal to 1, effective signal frame is set by i-th of signal frame;
Judge whether the short time frame energy of the jth road frame signal of i-th of frame signal is less than third predetermined threshold value, obtains the 4th Judging result;
If the 4th judging result indicates that the short time frame energy of the jth road frame signal of i-th of frame signal is less than described the Three preset thresholds then set i-th of signal frame to the terminating point of voice signal, obtain valid frame signal subset;
If the 4th judging result indicates the short time frame energy of the jth road frame signal of i-th of frame signal not less than described The value of i is then increased by 1 by third predetermined threshold value, and return step " utilizes formulaMeter Calculate the zero-crossing rate of the jth road frame signal of i-th of frame signal ".
Optionally, described according to the valid frame signal subset, obtain any secondary correlation of two-way valid frame signal fused Average broad sense spectrum subtract orrection phase place transforming function transformation function, specifically include:
According to the valid frame signal subset, the secondary correlation of each effectively any two-way frame signal of frame signal is calculated;
According to the valid frame signal subset, the power spectrum of each effectively every road frame signal of frame signal is calculated;
According to the power spectrum of every road frame signal, the masking by noise function of each effectively every road frame signal of frame signal is obtained:
Wherein, zpq(ω) indicates the masking by noise function of the road the q frame signal of p-th of effective frame signal, Xpq(ω) is indicated The power spectrum of the road the q frame signal of p-th of effective frame signal, q=1,2,3,4, N (ω) noise power spectrums, α indicate the first system Number, β indicate the second coefficient;
According to the secondary of the masking by noise function of every road frame signal of each effective frame signal and any two-way frame signal Correlation, the secondary relevant broad sense spectrum of any two-way frame signal fusion for obtaining each effectively frame signal subtract orrection phase place transformation letter Number:
Wherein, φls_p(ω) indicates that the road the l frame signal of p-th of effective frame signal and the road s frame signal merge secondary phase The broad sense of pass, which is composed, subtracts orrection phase place transforming function transformation function, l=1,2,3,4, s=1,2,3,4, l ≠ s,
Xpl(ω) and Xps(ω) respectively indicates the power spectrum and the road s frame of the road the l frame signal of p-th of effective frame signal The power spectrum of signal, ρ indicate third coefficient;
Subtract orrection phase place transformation according to the secondary relevant broad sense spectrum of any two-way frame signal of each effective frame signal fusion Function obtains the secondary relevant average broad sense spectrum of any two-way valid frame signal fused and subtracts orrection phase place transforming function transformation function:
Wherein,Indicate the effective frame signal in the road l and the secondary relevant average broad sense of the road s valid frame signal fused Spectrum subtracts orrection phase place transforming function transformation function, and P indicates that valid frame signal subspace concentrates the quantity of effective frame signal.
Optionally, the geometric position according to the quaternary microphone array and any two-way microphone sound-source signal Time delay value determines the direction position of sound source, specifically includes:
According to the time delay value of the geometric position of the quaternary microphone array and any two-way microphone sound-source signal, utilize FormulaAzimuth angle theta of the calculating sound source to coordinate origin
According to the time delay value of the geometric position of the quaternary microphone array and any two-way microphone sound-source signal, utilize FormulaAzimuth pitch angle of the calculating sound source to coordinate origin
Wherein, c is the velocity of sound, and d is distance of the microphone array element to coordinate origin, τ12Indicate No. 1st microphone sound-source signal With the time delay value of No. 2nd microphone sound-source signal, τ13Indicate No. 1st microphone sound-source signal and No. 3rd microphone sound-source signal Time delay value, τ14Indicate the time delay value of No. 1st microphone sound-source signal and No. 4th microphone sound-source signal.
Optionally, the sound source voice signal described in four tunnels synchronizes sub-frame processing, obtains frame signal set, before Further include:
The sound source voice signal described in every road carries out speech enhan-cement processing, obtains speech enhan-cement treated signal;
Bandpass filtering treatment is carried out to the speech enhan-cement treated signal, the signal after obtaining bandpass filtering treatment;
Wavelet threshold denoising is carried out to the signal after the bandpass filtering treatment, obtains pretreated sound source voice letter Number.
A kind of sonic location system, the sonic location system include:
Sound source voice signal obtains module, for collecting four tunnel sound source voice signals using quaternary microphone array; The quaternary microphone array includes four microphones, and each microphone acquires sound source voice signal all the way;
Framing module synchronizes sub-frame processing for the sound source voice signal described in four tunnels, obtains frame signal set, institute Stating each frame signal in signal frame set includes four tunnel frame signals, respectively first via frame signal, the second tunnel frame signal, third Road frame signal and the 4th tunnel frame signal;
Valid frame signal subset obtains module and obtains for judging the validity of each frame signal in the frame signal set To valid frame signal subset;
It merges secondary relevant average broad sense spectrum and subtracts orrection phase place transforming function transformation function acquisition module, for according to the valid frame Signal subset obtains the secondary relevant average broad sense spectrum of any two-way valid frame signal fused and subtracts orrection phase place transforming function transformation function;
Time delay value computing module subtracts for obtaining the secondary relevant average broad sense spectrum of any two-way valid frame signal fused and repairs Time point corresponding to the peak-peak of positive phase transforming function transformation function obtains the time delay value of any two-way microphone sound-source signal;
Direction position determination module, for according to the quaternary microphone array geometric position and any two-way microphone The time delay value of sound-source signal determines the direction position of sound source.
Optionally, the framing module, specifically includes:
Sub-frame processing submodule, for using window functionThe letter of the sound source voice described in four tunnels Number adding window sub-frame processing is synchronized, obtains frame signal xij(n), n indicates that n-th of sampled point, n=1,2 ..., N, N indicate frame It is long, xij(n) signal on the jth road of i-th of frame signal of expression, j=1,2,3,4;
Submodule is synthesized, for all frame signals to be synthesized frame signal set.
Optionally, the valid frame signal subset obtains module, specifically includes:
Short time frame energy balane submodule, for utilizing formulaCalculate the jth road frame of i-th of frame signal The short time frame energy of signal;Wherein, EijIndicate that the short time frame energy of the jth road frame signal of i-th of frame signal, n indicate to adopt for n-th Sampling point, n=1,2 ..., N, N indicate frame length;
First judging submodule, for judging whether the short time frame energy of jth road frame signal of i-th of frame signal is greater than One preset threshold obtains the first judging result;
First judging result handles submodule, if indicating that the short time frame energy is not more than for first judging result The value of i is then increased by 1 by first preset threshold, calls short time frame energy balane submodule, is executed step and " is utilized formulaCalculate the short time frame energy of the jth road frame signal of i-th of frame signal ";If first judging result indicates The energy of battle array in short-term is greater than first preset threshold, then sets starting point for i-th of frame signal, and the value of i is increased by 1;
Zero-crossing rate computational submodule, for utilizing formulaIt calculates i-th The zero-crossing rate of the jth road frame signal of frame signal;Wherein,
Second judgment submodule obtains the second judgement knot for judging whether the zero-crossing rate is greater than the second preset threshold Fruit;
Second judging result handles submodule, if indicating that the zero-crossing rate is greater than described for second judging result Two preset thresholds, then by the label T of the jth road frame signal of i-th of frame signalijIt is set as 1;If described, judging result is indicated The zero-crossing rate is not more than second preset threshold, then by the label T of the jth road frame signal of i-th of frame signalijIt is set as 0;
Total state value SS (i) computational submodule, for utilizing formula S S (i)=Ti1&&Ti2&&Ti3&&Ti4, calculate i-th Total state value SS (i) of the label of four tunnel frame signals of frame signal;Wherein, Ti1、Ti2、Ti3And Ti4Respectively indicate i-th of frame signal The 1st tunnel, the 2nd tunnel, the 3rd road and the 4th tunnel frame signal label;
Third judging submodule obtains third judging result for judging whether total state value SS (i) is equal to 1;
Third result treatment submodule, if indicating that SS (i) is equal to 1 for the third judging result, by i-th of signal Frame is set as effective signal frame;
4th judging submodule, for judge i-th of frame signal jth road frame signal short time frame energy whether less than Three preset thresholds obtain the 4th judging result;
4th judging result handles submodule, if indicating the jth road frame of i-th of frame signal for the 4th judging result The short time frame energy of signal is less than the third predetermined threshold value, then sets i-th of signal frame to the terminating point of voice signal, obtain To valid frame signal subset;If the 4th judging result indicates the short time frame energy of the jth road frame signal of i-th of frame signal not Less than the third predetermined threshold value, then the value of i is increased by 1, call zero-crossing rate computational submodule, executed step and " utilize formulaCalculate the zero-crossing rate of the jth road frame signal of i-th of frame signal ".
Optionally, the secondary relevant average broad sense spectrum of the fusion subtracts orrection phase place transforming function transformation function acquisition module, specific to wrap It includes:
Secondary correlation computational submodule, for calculating any the two of each effectively frame signal according to valid frame signal subset The secondary correlation of road frame signal;
Spectra calculation submodule, for calculating every road frame of each effectively frame signal according to the useful signal subset The power spectrum of signal;
Masking by noise function acquisition submodule obtains each effective frame signal for the power spectrum according to every road frame signal Every road frame signal masking by noise function:
Wherein, zpq(ω) indicates the masking by noise function of the road the q frame signal of p-th of effective frame signal, Xpq(ω) is indicated The power spectrum of the road the q frame signal of p-th of effective frame signal, q=1,2,3,4, N (ω) noise power spectrums, α indicate the first system Number, β indicate the second coefficient;
It merges secondary relevant broad sense spectrum and subtracts orrection phase place transforming function transformation function acquisition submodule, for being believed according to each valid frame Number every road frame signal masking by noise function and any two-way frame signal secondary correlation, obtain each effectively frame signal Any secondary relevant broad sense spectrum of two-way frame signal fusion subtracts orrection phase place transforming function transformation function:
Wherein, φls_p(ω) indicates that the road the l frame signal of p-th of effective frame signal and the road s frame signal merge secondary phase The broad sense of pass, which is composed, subtracts orrection phase place transforming function transformation function, l=1,2,3,4, s=1,2,3,4, l ≠ s,
Xpl(ω) and Xps(ω) respectively indicates the power spectrum and the road s frame of the road the l frame signal of p-th of effective frame signal The power spectrum of signal, ρ indicate third coefficient;
It merges secondary relevant average broad sense spectrum and subtracts orrection phase place transforming function transformation function acquisition submodule, for according to each effective The secondary relevant broad sense spectrum of any two-way frame signal fusion of frame signal subtracts orrection phase place transforming function transformation function, and it is effective to obtain any two-way The secondary relevant average broad sense spectrum of frame signal fusion subtracts orrection phase place transforming function transformation function:
Wherein,Indicate the effective frame signal in the road l and the secondary relevant average broad sense of the road s valid frame signal fused Spectrum subtracts orrection phase place transforming function transformation function, and P indicates that valid frame signal subspace concentrates the quantity of effective frame signal.
The specific embodiment provided according to the present invention, the invention discloses following technical effects:
The invention discloses a kind of sound localization method and systems.Sound localization method of the invention is first to quaternary Mike Wind array obtains the adding window framing of color sound source voice signal, then detects the effective frame signal of signal, and to the valid frame filtered out Signal calculates the secondary relevant broad sense spectrum of fusion and subtracts orrection phase place transforming function transformation function.To further increase time delay precision, using fusion Secondary relevant average broad sense spectrum subtracts orrection phase place transforming function transformation function calculation delay value.Finally according to the geometric position of microphone array Sounnd source direction estimation is carried out with calculated time delay value, improves the precision of auditory localization.
Detailed description of the invention
It in order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will be to institute in embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings Obtain other attached drawings.
Fig. 1 is a kind of flow chart of sound localization method provided by the invention;
Fig. 2 is the illustraton of model of quaternary microphone permutation provided by the invention;
Fig. 3 is the accuracy rate comparison of Delay Estima-tion of the algorithms of different in each frame under -5dB noise circumstance provided by the invention Texts and pictures;
Fig. 4 is the accuracy rate comparison of Delay Estima-tion of the algorithms of different in each frame under 5dB noise circumstance provided by the invention Texts and pictures;
Fig. 5 be noise provided by the invention be the 5dB reverberation time be under 750ms environment algorithms of different in the delay of each frame The accuracy rate of estimation compares texts and pictures;
Fig. 6 is capture card pictorial diagram provided by the invention;
Fig. 7 is the pictorial diagram of microphone provided by the invention;
Fig. 8 is the pictorial diagram of quaternary microphone array provided by the invention;
Fig. 9 is a kind of structure chart of sonic location system provided by the invention.
Specific embodiment
The object of the present invention is to provide a kind of sound localization method and systems, to improve the accuracy rate of auditory localization.
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real Mode is applied to be described in further detail invention.
Embodiment 1
The embodiment of the present invention 1 provides a kind of sound localization method.
As shown in Figure 1, the sound localization method includes the following steps:
Step 101, four tunnel sound source voice signals are collected using quaternary microphone array;The quaternary microphone array Including four microphones, each microphone acquires sound source voice signal all the way;Step 102, the sound source voice signal described in four tunnels Sub-frame processing is synchronized, frame signal set is obtained, each frame signal in the signal frame set includes four tunnel frame signals, point It Wei not first via frame signal, the second tunnel frame signal, third road frame signal and the 4th tunnel frame signal;Step 103, judge the frame letter Number set in each frame signal validity, obtain valid frame signal subset;Step 104, according to the valid frame signal subset, It obtains the secondary relevant average broad sense spectrum of any two-way valid frame signal fused and subtracts orrection phase place transforming function transformation function;Step 105, it obtains Any secondary relevant average broad sense spectrum of two-way valid frame signal fused subtracts corresponding to the peak-peak of orrection phase place transforming function transformation function Time point, obtain the time delay value of any two-way microphone sound-source signal;Step 106, according to the quaternary microphone array The time delay value of geometric position and any two-way microphone sound-source signal determines the direction position of sound source.
Embodiment 2
The embodiment of the present invention 2 provides an a kind of preferred embodiment of sound localization method, but reality of the invention It applies and is not limited to embodiment defined by the embodiment of the present invention 2.
Quaternary microphone array described in step 101 is as shown in Fig. 2, the coordinate of quaternary battle array microphone is m1(d, 0,0), m2(0, D, 0), m3(- d, 0,0), m4(0 ,-d, 0), d are microphone array element to initial point distance.
After obtaining four tunnel sound source voice signals, the sound source voice signal described in every road carries out speech enhan-cement processing, obtains Speech enhan-cement treated signal;Bandpass filtering treatment is carried out to the speech enhan-cement treated signal, obtains bandpass filtering Treated signal;Wavelet threshold denoising is carried out to the signal after the bandpass filtering treatment, obtains pretreated sound source language Sound signal.
The sound source voice signal described in four tunnels described in step 102 synchronizes sub-frame processing, obtains frame signal set, specifically It include: using window functionThe sound source voice signal described in four tunnels synchronizes adding window sub-frame processing, Obtain frame signal xij(n), n indicates that n-th of sampled point, n=1,2 ..., N, N indicate frame length, xij(n) i-th of frame signal is indicated Jth road signal, j=1,2,3,4;All frame signals are synthesized into frame signal set.
The validity that each frame signal in the frame signal set is judged described in step 103 obtains valid frame signal subset, It specifically includes: utilizing formulaCalculate the short time frame energy of the jth road frame signal of i-th of frame signal;Wherein, EijIndicate that the short time frame energy of the jth road frame signal of i-th of frame signal, n indicate n-th of sampled point, n=1,2 ..., N, N table Show frame length;Judge whether the short time frame energy of the jth road frame signal of i-th of frame signal is greater than the first preset threshold, obtains first Judging result;If first judging result indicates that the short time frame energy is not more than first preset threshold, by the value of i Increase by 1, return step " utilizes formulaCalculate the short time frame energy of the jth road frame signal of i-th of frame signal Amount ";If first judging result indicates that the battle array energy in short-term is greater than first preset threshold, by i-th of frame signal It is set as starting point, the value of i is increased by 1;Utilize formulaCalculate i-th of frame letter Number jth road frame signal zero-crossing rate;Wherein,Judge whether the zero-crossing rate is greater than second Preset threshold obtains the second judging result;If second judging result indicates that the zero-crossing rate is greater than the described second default threshold Value, then by the label T of the jth road frame signal of i-th of frame signalijIt is set as 1;If described, judging result indicates the zero passage Rate is not more than second preset threshold, then by the label T of the jth road frame signal of i-th of frame signalijIt is set as 0;Utilize formula SS (i)=Ti1&&Ti2&&Ti3&&Ti4, calculate total state value SS (i) of the label of four tunnel frame signals of i-th of frame signal;Wherein, Ti1、Ti2、Ti3And Ti4Respectively indicate the 1st tunnel of i-th of frame signal, the label on the 2nd tunnel, the 3rd road and the 4th tunnel frame signal;Judge institute It states whether total state value SS (i) is equal to 1, obtains third judging result;If the third judging result indicates that SS (i) is equal to 1, Effective signal frame is set by i-th of signal frame;Judge whether the short time frame energy of the jth road frame signal of i-th of frame signal is small In third predetermined threshold value, the 4th judging result is obtained;If the 4th judging result indicates the jth road frame letter of i-th of frame signal Number short time frame energy be less than the third predetermined threshold value, then set i-th of signal frame to the terminating point of voice signal, obtain Valid frame signal subset;If the 4th judging result indicates that the short time frame energy of the jth road frame signal of i-th of frame signal is not small In the third predetermined threshold value, then the value of i is increased by 1, return step " utilizes formulaMeter Calculate the zero-crossing rate of the jth road frame signal of i-th of frame signal ".
According to the valid frame signal subset described in step 104, any secondary correlation of two-way valid frame signal fused is obtained Average broad sense spectrum subtract orrection phase place transforming function transformation function, specifically include: according to the valid frame signal subset, calculating each valid frame The secondary correlation of any two-way frame signal of signal;According to the valid frame signal subset, the every of each effectively frame signal is calculated The power spectrum of road frame signal;According to the power spectrum of every road frame signal, the noise of each effectively every road frame signal of frame signal is obtained Shelter function:
Wherein, zpq(ω) indicates the masking by noise function of the road the q frame signal of p-th of effective frame signal, Xpq(ω) is indicated The power spectrum of the road the q frame signal of p-th of effective frame signal, q=1,2,3,4, N (ω) noise power spectrums, α indicate the first system Number, β indicate the second coefficient;Believed according to the masking by noise function of every road frame signal of each effective frame signal and any two-way frame Number secondary correlation, the secondary relevant broad sense spectrum of any two-way frame signal fusion for obtaining each effectively frame signal subtracts orrection phase place Transforming function transformation function:
Wherein, φls_p(ω) indicates that the road the l frame signal of p-th of effective frame signal and the road s frame signal merge secondary phase The broad sense of pass, which is composed, subtracts orrection phase place transforming function transformation function, l=1,2,3,4, s=1,2,3,4, l ≠ s,
Xpl(ω) and Xps(ω) respectively indicates the power spectrum and the road s frame of the road the l frame signal of p-th of effective frame signal The power spectrum of signal, ρ indicate third coefficient;It is secondary relevant wide according to the fusion of any two-way frame signal of each effective frame signal Justice spectrum subtracts orrection phase place transforming function transformation function, obtains the secondary relevant average broad sense spectrum of any two-way valid frame signal fused and subtracts amendment phase Bit map function:
Wherein,Indicate the effective frame signal in the road l and the secondary relevant average broad sense of the road s valid frame signal fused Spectrum subtracts orrection phase place transforming function transformation function, and P indicates that valid frame signal subspace concentrates the quantity of effective frame signal.
According to the geometric position of the quaternary microphone array and any two-way microphone sound-source signal described in step 105 Time delay value determines the direction position of sound source, specifically includes:
According to the time delay value of the geometric position of the quaternary microphone array and any two-way microphone sound-source signal, utilize FormulaAzimuth angle theta of the calculating sound source to coordinate origin
According to the time delay value of the geometric position of the quaternary microphone array and any two-way microphone sound-source signal, utilize FormulaAzimuth pitch angle of the calculating sound source to coordinate origin
Wherein, c is the velocity of sound, and d is distance of the microphone array element to coordinate origin, τ12Indicate No. 1st microphone sound-source signal With the time delay value of No. 2nd microphone sound-source signal, τ13Indicate No. 1st microphone sound-source signal and No. 3rd microphone sound-source signal Time delay value, τ14Indicate the time delay value of No. 1st microphone sound-source signal and No. 4th microphone sound-source signal.Specifically, according to institute State quaternary microphone this column geometry site (coordinate of quaternary battle array microphone be m1 (d, 0,0), m2 (0, d, 0), m3 (- d, 0,0), (0 ,-d, 0) m4), the calculation formula x of spherical coordinates2+y2+z2=r2, distance between two points calculation formulaAnd speed formulaSolve azimuthAnd pitching Angle
In order to illustrate a kind of effect of sound localization method of the invention, the present invention is under different signal-to-noise ratio and reverberant ambiance Carry out analog simulation comparison, from Fig. 3,4 it can be seen that under medium noise circumstance (SNB (signal-to-noise ratio Signal-to-noise Ratio)=5dB), phse conversion (PHAT, the Phase Transform) algorithm estimation time delay value accuracy performance can not show a candle to It improves cross-power phase algorithm (MCPSP, Modified Cross Power Spectrum Phase) and broad sense spectrum subtracts amendment mutually Correlation function (GCC-APHAT, Generalized spectral subtraction ameliorated phase Transformation) method, and APHAT is substantially better than MCPSP method;(SNB=-5dB), PHAT under strong noise environment Can sharply it decline, only MCPSP and APHAT also maintain preferable performance.From figure 5 it can be seen that strong reverberation and making an uproar by force Under all existing environmental condition of sound (T60=750ms, SNB=-5dB), APHAT algorithm has compared to PHAT and MCPSP algorithm Preferable time delay precision.More than simultaneous analysis comparison can verify that APHAT algorithm has preferable robustness to noise and reverberation.
For a kind of effect of further instruction sound localization method of the invention, this law is bright to be built and true environment is real It tests, sound-source signal recording is carried out using Beijing Sheng Kece acoustic technique Co., Ltd (SKC) multi-channel data acquisition board Q801, As shown in fig. 6, the array bracket and microphone MP40 in quaternary microphone array are SKC vendor products, as shown in FIG. 7 and 8.
Experiment is completed in an interior 7.2m × 6m × 3.2m, and laboratory door and window is all closed.Known room memory In certain ambient noise and reverberation, the reflection such as the sound tables and chairs including host computer fan and other human interferences etc., sound source is Schoolgirl's speech utterance (I goes to Beijing) is all the Duan Yuyin recorded under practical circumstances.The sample rate of signal is 8kHz, point Frame frame length is 256, and it is 128 that frame, which moves, adds Hamming window.The coordinate of quaternary microphone array is respectively as follows: m1(25cm, 0,0), m2(0, 25cm, 0), m3(- 25cm, 0,0), m4(0, -25cm, 0), it is 70cm that microphone array, which puts height distance ground,.In actual rings Experimental data compares under border, and multi-acoustical position in the following table 1 is chosen in experiment respectively, and each sound source position acquires 10 groups of data.This Invention compares the property of proposed algorithm APHAT Yu other two kinds of algorithms by experimental analysis using PHAT, MCPSP algorithm as reference It can superiority and inferiority.Table 1 is the knot that experimental data compares innovatory algorithm APHAT and the practical auditory localization of PHAT, MCPSP under practical circumstances As shown in table 1, the results are shown in Table 2 for position error, and position root-mean-square error is as shown in table 3: table 1 for fruit comparison
Serial number S(x,y,z) (r,θ,φ) PHAT MCPSP APHAT
1 (1,1,0.76) (1.6,45°,61.6°) (45°,58.6°) (45°,58.6°) (45°,57.3°)
2 (2,1,0.76) (2.36,26.6°,71.2°) (26.6°,74.6°) (26.6°,74.6°) (26.6°,71.9°)
3 (2,2,076) (2.93,45°,75°) (45°,77.4°) (45°,77.4°) (45°,74.1°)
4 (-2,1,0.76) (2.36,-26.6°,71.2°) (-26.6°,74.6°) (-26.6°,74.6°) (-26.6°,71.9°)
5 (-2,2,0.76) (2.93,-45°,75°) (-45°,77.4°) (-45°,77.4°) (-45°,74.1°)
6 (1.2,0.6,076) (1.54,26.6°,60.4°) (37.9°,79.5°) (29.1°,62.6°) (29.1°,61.1°)
7 (-2.4,2.4,0.76) (3.48,-45°,77.4°) (-45°,77.4°) (-45°,77.4°) (-45°,74.1°)
8 (1.5,1.2,0.76) (2.07,38.7°,68.5°) (41.2°,66.5°) (37.9°,79.5°) (41.2°,64.6°)
9 (1.8,1.2,076) (2.29,33.7°,70.6°) (0,234°) (37.9°,79.5°) (37.9°,75.7°)
10 (2,1.2,0.76) (2.45,31°,71.9°) (-36.9°,59.6°) (30.1°,90.2°) (31°,82.4°)
11 (1.2,0,0.76) (1.42,0°,57.7°) (0,59.6°) (0,59.6°) (0,58.2°)
12 (0,1.8,0.76) (1.95,90°,67.1°) (0,234°) (-90°,71.6°) (90°,69.2°)
13 (1.2,2.4,076 (2.79,63.4°,74.2°) (63.4°,74.6°) (63.4°,74.6°) (63.4°,71.9°)
14 (0,1.2,0.76) (1.42,90°,57.7°) (-90°,59.6°) (-90°,59.6°) (90°,58.2°)
15 (-1.2,0,0.76) (1.42,180°,57.7°) (180,234°) (180,59.6°) (180°,58.7°)
16 (0,-1.2,0.76) (1.42,-90°,57.7°) (90°,59.6°) (90°,59.6°) (-90°,58.2°)
17 (-0.6,-1.2,0.76) (1.54,63.4°,60.4°) (60.9°,62.6°) (60.9°,62.6°) (60.9 °, 61.1 °)
18 (0.6,-1.2,0.76) (1.54,-63.4°,60.4°) (-63.4°,74.6°) (-68.2°,68.3°) (- 68.2 °, 66.3 °)
Table 2
Table 3
PHAT MCPSP APHAT
Azimuth angle thetaRMSE 59.4 68.6 1.6
Pitch angle φRMSE 63.2 4.6 2.7
By the experimental result comparative analysis of table 1,2 it can be seen that under practical circumstances, PHAT algorithm estimation orientation angle pitching Angle performance is unstable and error is larger.MCPSP algorithm orientation angular estimation is positioned when sound source is in system coordinates X-axis and Y-axis It will appear orientation opposite errors;And mentioned APHAT algorithm positioning performance is stable and positioning progress is higher, it can from table 3 The azimuthal root-mean-square error REMS of APHAT algorithm is 1.6 out, and the root-mean-square error REMS of pitch angle is 2.7;Algorithm APHAT Orientation angle biased error is receiving in range substantially, and precision is relatively high.This also demonstrates the validity of proposed algorithm herein Energy.
Embodiment 3
Embodiment is 3 present invention provide a kind of sonic location system.
As shown in figure 9, the present invention provides a kind of sonic location system, the sonic location system includes: sound source voice letter Number module 901 is obtained, for collecting four tunnel sound source voice signals using quaternary microphone array;The quaternary microphone array Column include four microphones, and each microphone acquires sound source voice signal all the way;Framing module 902 is used for the sound described in four tunnels Source voice signal synchronizes sub-frame processing, obtains frame signal set, and each frame signal in the signal frame set includes four Road frame signal, respectively first via frame signal, the second tunnel frame signal, third road frame signal and the 4th tunnel frame signal;Valid frame letter Work song collection obtains module 903 and obtains valid frame signal subspace for judging the validity of each frame signal in the frame signal set Collection;It merges secondary relevant average broad sense spectrum and subtracts orrection phase place transforming function transformation function acquisition module 904, for being believed according to the valid frame Work song collection obtains the secondary relevant average broad sense spectrum of any two-way valid frame signal fused and subtracts orrection phase place transforming function transformation function;Time delay It is worth computing module 905, subtracts orrection phase place change for obtaining the secondary relevant average broad sense spectrum of any two-way valid frame signal fused Time point corresponding to the peak-peak of exchange the letters number obtains the time delay value of any two-way microphone sound-source signal;Direction position is true Cover half block 906, for according to the geometric position of the quaternary microphone array and the time delay of any two-way microphone sound-source signal Value, determines the direction position of sound source.
Embodiment 4
The embodiment of the present invention 4 provides an a kind of preferred embodiment of sonic location system.
The framing module 902, specifically includes: sub-frame processing submodule, for using window functionThe sound source voice signal described in four tunnels synchronizes adding window sub-frame processing, obtains frame signal xij (n), n indicates that n-th of sampled point, n=1,2 ..., N, N indicate frame length, xij(n) letter on the jth road of i-th of frame signal is indicated Number, j=1,2,3,4;Submodule is synthesized, for all frame signals to be synthesized frame signal set.
The valid frame signal subset obtains module 903, specifically includes: short time frame energy balane submodule, for utilizing FormulaCalculate the short time frame energy of the jth road frame signal of i-th of frame signal;Wherein, EijIndicate i-th of frame The short time frame energy of the jth road frame signal of signal, n indicate that n-th of sampled point, n=1,2 ..., N, N indicate frame length;First sentences Disconnected submodule is obtained for judging whether the short time frame energy of jth road frame signal of i-th of frame signal is greater than the first preset threshold To the first judging result;First judging result handles submodule, if indicating the short time frame energy for first judging result Amount is not more than first preset threshold, then the value of i is increased by 1, calls short time frame energy balane submodule, executes step " benefit Use formulaCalculate the short time frame energy of the jth road frame signal of i-th of frame signal ";If the first judgement knot Fruit indicates that the battle array energy in short-term is greater than first preset threshold, then starting point is set by i-th of frame signal, by the value of i Increase by 1;Zero-crossing rate computational submodule, for utilizing formulaCalculate i-th of frame The zero-crossing rate of the jth road frame signal of signal;Wherein,Second judgment submodule, for judging Whether the zero-crossing rate is greater than the second preset threshold, obtains the second judging result;Second judging result handles submodule, if for Second judging result indicates that the zero-crossing rate is greater than second preset threshold, then believes the jth road frame of i-th of frame signal Number label TijIt is set as 1;If described, judging result indicates that the zero-crossing rate is not more than second preset threshold, will The label T of the jth road frame signal of i-th of frame signalijIt is set as 0;
Total state value SS (i) computational submodule, for utilizing formula S S (i)=Ti1&&Ti2&&Ti3&&Ti4, calculate i-th Total state value SS (i) of the label of four tunnel frame signals of frame signal;Wherein, Ti1、Ti2、Ti3And Ti4Respectively indicate i-th of frame signal The 1st tunnel, the 2nd tunnel, the 3rd road and the 4th tunnel frame signal label;Third judging submodule, for judging total state value SS (i) whether it is equal to 1, obtains third judging result;Third result treatment submodule, if indicating SS for the third judging result (i) it is equal to 1, then sets effective signal frame for i-th of signal frame;4th judging submodule, for judging i-th of frame signal Whether the short time frame energy of jth road frame signal is less than third predetermined threshold value, obtains the 4th judging result;The processing of 4th judging result Submodule, if indicating the short time frame energy of the jth road frame signal of i-th of frame signal less than described for the 4th judging result Third predetermined threshold value then sets i-th of signal frame to the terminating point of voice signal, obtains valid frame signal subset;If described 4th judging result indicates that the short time frame energy of the jth road frame signal of i-th of frame signal is not less than the third predetermined threshold value, then The value of i is increased by 1, calls zero-crossing rate computational submodule, step is executed and " utilizes formula Calculate the zero-crossing rate of the jth road frame signal of i-th of frame signal ".
The secondary relevant average broad sense spectrum of fusion subtracts orrection phase place transforming function transformation function and obtains module 904, specifically includes: two Secondary relevant calculation submodule, for calculating each effectively any two-way frame signal of frame signal according to valid frame signal subset Secondary correlation;Spectra calculation submodule, for calculating every road frame of each effectively frame signal according to the useful signal subset The power spectrum of signal;Masking by noise function acquisition submodule obtains each valid frame for the power spectrum according to every road frame signal The masking by noise function of every road frame signal of signal:
Wherein, zpq(ω) indicates the masking by noise function of the road the q frame signal of p-th of effective frame signal, Xpq(ω) is indicated The power spectrum of the road the q frame signal of p-th of effective frame signal, q=1,2,3,4, N (ω) noise power spectrums, α indicate the first system Number, β indicate the second coefficient;It merges secondary relevant broad sense spectrum and subtracts orrection phase place transforming function transformation function acquisition submodule, for according to every The masking by noise function of every road frame signal of a effective frame signal and the secondary correlation of any two-way frame signal, acquisition each have The secondary relevant broad sense spectrum of any two-way frame signal fusion of effect frame signal subtracts orrection phase place transforming function transformation function:
Wherein, φls_p(ω) indicates that the road the l frame signal of p-th of effective frame signal and the road s frame signal merge secondary phase The broad sense of pass, which is composed, subtracts orrection phase place transforming function transformation function, l=1,2,3,4, s=1,2,3,4, l ≠ s,
Xpl(ω) and Xps(ω) respectively indicates the power spectrum and the road s frame of the road the l frame signal of p-th of effective frame signal The power spectrum of signal, ρ indicate third coefficient;It merges secondary relevant average broad sense spectrum and subtracts orrection phase place transforming function transformation function acquisition submodule Block subtracts orrection phase place transformation letter for the secondary relevant broad sense spectrum of any two-way frame signal fusion according to each effectively frame signal Number obtains the secondary relevant average broad sense spectrum of any two-way valid frame signal fused and subtracts orrection phase place transforming function transformation function:
Wherein,Indicate the effective frame signal in the road l and the secondary relevant average broad sense of the road s valid frame signal fused Spectrum subtracts orrection phase place transforming function transformation function, and P indicates that valid frame signal subspace concentrates the quantity of effective frame signal.
The specific embodiment provided according to the present invention, the invention discloses following technical effects:
The invention discloses a kind of sound localization method and systems.Sound localization method of the invention is first to quaternary Mike Wind array obtains the adding window framing of color sound source voice signal, then detects the effective frame signal of signal, and to the valid frame filtered out Signal calculates the secondary relevant broad sense spectrum of fusion and subtracts orrection phase place transforming function transformation function.To further increase time delay precision, using fusion Secondary relevant average broad sense spectrum subtracts orrection phase place transforming function transformation function calculation delay value.Finally according to the geometric position of microphone array Sounnd source direction estimation is carried out with calculated time delay value, improves the precision of auditory localization.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For system disclosed in embodiment For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part It is bright.
Specific examples are used herein to describe the principles and implementation manners of the present invention, the explanation of above embodiments Method and its core concept of the invention are merely used to help understand, described embodiment is only that a part of the invention is real Example is applied, instead of all the embodiments, based on the embodiments of the present invention, those of ordinary skill in the art are not making creation Property labour under the premise of every other embodiment obtained, shall fall within the protection scope of the present invention.

Claims (10)

1. a kind of sound localization method, which is characterized in that the sound localization method includes the following steps:
Four tunnel sound source voice signals are collected using quaternary microphone array;The quaternary microphone array includes four Mikes Wind, each microphone acquire sound source voice signal all the way;
The sound source voice signal described in four tunnels synchronizes sub-frame processing, obtains frame signal set, in the signal frame set Each frame signal includes four tunnel frame signals, respectively first via frame signal, the second tunnel frame signal, third road frame signal and the 4th tunnel Frame signal;
The validity for judging each frame signal in the frame signal set obtains valid frame signal subset;
According to the valid frame signal subset, obtains the secondary relevant average broad sense spectrum of any two-way valid frame signal fused and subtract and repair Positive phase transforming function transformation function;
Obtain the maximum peak that the secondary relevant average broad sense spectrum of any two-way valid frame signal fused subtracts orrection phase place transforming function transformation function The corresponding sample point of value, obtains the time delay value of any two-way microphone sound-source signal;
According to the time delay value of the geometric position of the quaternary microphone array and any two-way microphone sound-source signal, sound source is determined Direction position.
2. a kind of sound localization method according to claim 1, which is characterized in that the letter of the sound source voice described in four tunnels Number sub-frame processing is synchronized, obtains frame signal set, specifically include:
Using window functionThe sound source voice signal described in four tunnels synchronizes adding window sub-frame processing, obtains To frame signal xij(n), n indicates that n-th of sampled point, n=1,2 ..., N, N indicate frame length, xij(n) i-th of frame signal is indicated The signal on jth road, j=1,2,3,4;
All frame signals are synthesized into frame signal set.
3. a kind of sound localization method according to claim 1, which is characterized in that in the judgement frame signal set The validity of each frame signal obtains valid frame signal subset, specifically includes:
Utilize formulaCalculate the short time frame energy of the jth road frame signal of i-th of frame signal;Wherein, EijIt indicates The short time frame energy of the jth road frame signal of i-th of frame signal, n indicate that n-th of sampled point, n=1,2 ..., N, N indicate frame length;
Judge whether the short time frame energy of the jth road frame signal of i-th of frame signal is greater than the first preset threshold, obtains the first judgement As a result;
If first judging result indicates that the short time frame energy is not more than first preset threshold, the value of i is increased 1, return step " utilizes formulaCalculate the short time frame energy of the jth road frame signal of i-th of frame signal ";
If first judging result indicates that the battle array energy in short-term is greater than first preset threshold, by i-th of frame signal It is set as starting point, the value of i is increased by 1;
Utilize formulaCalculate the zero passage of the jth road frame signal of i-th of frame signal Rate;Wherein,
Judge whether the zero-crossing rate is greater than the second preset threshold, obtains the second judging result;
If second judging result indicates that the zero-crossing rate is greater than second preset threshold, by the jth of i-th of frame signal The label T of road frame signalijIt is set as 1;
If described, judging result indicates that the zero-crossing rate is not more than second preset threshold, by i-th frame signal The label T of jth road frame signalijIt is set as 0;
Utilize formula S S (i)=Ti1&&Ti2&&Ti3&&Ti4, calculate total state of the label of four tunnel frame signals of i-th of frame signal Value SS (i);Wherein, Ti1、Ti2、Ti3And Ti4Respectively indicate the 1st tunnel, the 2nd tunnel, the 3rd road and the 4th tunnel frame signal of i-th of frame signal Label;
Judge whether total state value SS (i) is equal to 1, obtains third judging result;
If the third judging result indicates that SS (i) is equal to 1, effective signal frame is set by i-th of signal frame;
Judge whether the short time frame energy of the jth road frame signal of i-th of frame signal is less than third predetermined threshold value, obtains the 4th judgement As a result;
If it is pre- that the 4th judging result indicates that the short time frame energy of the jth road frame signal of i-th of frame signal is less than the third If threshold value, then i-th of signal frame is set to the terminating point of voice signal, obtains valid frame signal subset;
If the 4th judging result indicates that the short time frame energy of the jth road frame signal of i-th of frame signal is not less than the third The value of i is then increased by 1 by preset threshold, and return step " utilizes formulaCalculate the The zero-crossing rate of the jth road frame signal of i frame signal ".
4. sound localization method according to claim 1, which is characterized in that it is described according to the valid frame signal subset, It obtains the secondary relevant average broad sense spectrum of any two-way valid frame signal fused and subtracts orrection phase place transforming function transformation function, specifically include:
According to the valid frame signal subset, auto-correlation and cross-correlation are combined, calculate any of each effectively frame signal The secondary correlation of two-way frame signal;
According to the valid frame signal subset, the power spectrum of each effectively every road frame signal of frame signal is calculated;
According to the power spectrum of every road frame signal, the masking by noise function of each effectively every road frame signal of frame signal is obtained:
Wherein, zpq(ω) indicates the masking by noise function of the road the q frame signal of p-th of effective frame signal, Xpq(ω) is indicated p-th The power spectrum of the road the q frame signal of effective frame signal, q=1,2,3,4, N (ω) noise power spectrums, α indicate the first coefficient, β table Show the second coefficient;
According to the masking by noise function of every road frame signal of each effective frame signal and the secondary correlation of any two-way frame signal, The secondary relevant broad sense spectrum of any two-way frame signal fusion for obtaining each effectively frame signal subtracts orrection phase place transforming function transformation function:
Wherein, φls_p(ω) indicates that the road the l frame signal of p-th of effective frame signal and the fusion of the road s frame signal are secondary relevant Broad sense, which is composed, subtracts orrection phase place transforming function transformation function, l=1,2,3,4, s=1,2,3,4, l ≠ s, Xpl(ω) and Xps(ω) respectively indicates the power spectrum of the road the l frame signal of p-th of effective frame signal and the function of the road s frame signal Rate spectrum, ρ indicate third coefficient;
Subtract orrection phase place transforming function transformation function according to the secondary relevant broad sense spectrum of any two-way frame signal of each effective frame signal fusion, It obtains the secondary relevant average broad sense spectrum of any two-way valid frame signal fused and subtracts orrection phase place transforming function transformation function:
Wherein,Indicate that the effective frame signal in the road l and the secondary relevant average broad sense spectrum of the road s valid frame signal fused subtract Orrection phase place transforming function transformation function, P indicate that valid frame signal subspace concentrates the quantity of effective frame signal.
5. sound localization method according to claim 1, which is characterized in that described according to the quaternary microphone array The time delay value of geometric position and any two-way microphone sound-source signal determines the direction position of sound source, specifically includes:
According to the time delay value of the geometric position of the quaternary microphone array and any two-way microphone sound-source signal, formula is utilizedAzimuth angle theta of the calculating sound source to coordinate origin
According to the time delay value of the geometric position of the quaternary microphone array and any two-way microphone sound-source signal, formula is utilizedAzimuth pitch angle of the calculating sound source to coordinate origin
Wherein, c is the velocity of sound, and d is distance of the microphone array element to coordinate origin, τ12Indicate No. 1st microphone sound-source signal and the 2nd The time delay value of road microphone sound-source signal, τ13Indicate the time delay of No. 1st microphone sound-source signal and No. 3rd microphone sound-source signal Value, τ14Indicate the time delay value of No. 1st microphone sound-source signal and No. 4th microphone sound-source signal.
6. sound localization method according to claim 1, which is characterized in that the sound source voice signal described in four tunnels into The synchronous sub-frame processing of row, obtains frame signal set, before further include:
The sound source voice signal described in every road carries out speech enhan-cement processing, obtains speech enhan-cement treated signal;
Bandpass filtering treatment is carried out to the speech enhan-cement treated signal, the signal after obtaining bandpass filtering treatment;
Wavelet threshold denoising is carried out to the signal after the bandpass filtering treatment, obtains pretreated sound source voice signal.
7. a kind of sonic location system, which is characterized in that the sonic location system includes:
Sound source voice signal obtains module, for collecting four tunnel sound source voice signals using quaternary microphone array;It is described Quaternary microphone array includes four microphones, and each microphone acquires sound source voice signal all the way;
Framing module synchronizes sub-frame processing for the sound source voice signal described in four tunnels, obtains frame signal set, the letter Each frame signal in number frame set includes four tunnel frame signals, respectively first via frame signal, the second tunnel frame signal, third road frame Signal and the 4th tunnel frame signal;
Valid frame signal subset obtains module to be had for judging the validity of each frame signal in the frame signal set Imitate frame signal subset;
It merges secondary relevant average broad sense spectrum and subtracts orrection phase place transforming function transformation function acquisition module, for according to effective frame signal Subset obtains the secondary relevant average broad sense spectrum of any two-way valid frame signal fused and subtracts orrection phase place transforming function transformation function;
Time delay value computing module subtracts amendment phase for obtaining the secondary relevant average broad sense spectrum of any two-way valid frame signal fused Time point corresponding to the peak-peak of bit map function obtains the time delay value of any two-way microphone sound-source signal;
Direction position determination module, for according to the quaternary microphone array geometric position and any two-way microphone sound source The time delay value of signal determines the direction position of sound source.
8. a kind of sonic location system according to claim 7, which is characterized in that the framing module specifically includes:
Sub-frame processing submodule, for using window functionThe sound source voice signal described in four tunnels carries out Synchronous adding window sub-frame processing, obtains frame signal xij(n), n indicates that n-th of sampled point, n=1,2 ..., N, N indicate frame length, xij (n) signal on the jth road of i-th of frame signal of expression, j=1,2,3,4;
Submodule is synthesized, for all frame signals to be synthesized frame signal set.
9. a kind of sonic location system according to claim 7, which is characterized in that the valid frame signal subset obtains mould Block specifically includes:
Short time frame energy balane submodule, for utilizing formulaCalculate the jth road frame signal of i-th of frame signal Short time frame energy;Wherein, EijIndicate that the short time frame energy of the jth road frame signal of i-th of frame signal, n indicate n-th of sampling Point, n=1,2 ..., N, N indicate frame length;
First judging submodule, for judging it is pre- whether the short time frame energy of jth road frame signal of i-th of frame signal is greater than first If threshold value, the first judging result is obtained;
First judging result handles submodule, if indicating the short time frame energy no more than described for first judging result The value of i is then increased by 1 by the first preset threshold, calls short time frame energy balane submodule, is executed step and " is utilized formulaCalculate the short time frame energy of the jth road frame signal of i-th of frame signal ";If first judging result indicates The energy of battle array in short-term is greater than first preset threshold, then sets starting point for i-th of frame signal, and the value of i is increased by 1;
Zero-crossing rate computational submodule, for utilizing formulaCalculate i-th of frame letter Number jth road frame signal zero-crossing rate;Wherein,
Second judgment submodule obtains the second judging result for judging whether the zero-crossing rate is greater than the second preset threshold;
Second judging result handles submodule, if it is pre- to indicate that the zero-crossing rate is greater than described second for second judging result If threshold value, then by the label T of the jth road frame signal of i-th of frame signalijIt is set as 1;If described described in judging result expression Zero-crossing rate is not more than second preset threshold, then by the label T of the jth road frame signal of i-th of frame signalijIt is set as 0;
Total state value SS (i) computational submodule, for utilizing formula S S (i)=Ti1&&Ti2&&Ti3&&Ti4, calculate i-th of frame letter Number four tunnel frame signals label total state value SS (i);Wherein, Ti1、Ti2、Ti3And Ti4Respectively indicate the of i-th of frame signal The label on 1 tunnel, the 2nd tunnel, the 3rd road and the 4th tunnel frame signal;
Third judging submodule obtains third judging result for judging whether total state value SS (i) is equal to 1;
Third result treatment submodule sets i-th of signal frame if indicating that SS (i) is equal to 1 for the third judging result It is set to useful signal frame;
4th judging submodule, for judging it is pre- whether the short time frame energy of jth road frame signal of i-th of frame signal is less than third If threshold value, the 4th judging result is obtained;
4th judging result handles submodule, if indicating the jth road frame signal of i-th of frame signal for the 4th judging result Short time frame energy be less than the third predetermined threshold value, then set i-th of signal frame to the terminating point of voice signal, had Imitate frame signal subset;If the 4th judging result indicates that the short time frame energy of the jth road frame signal of i-th of frame signal is not less than The value of i is then increased by 1 by the third predetermined threshold value, calls zero-crossing rate computational submodule, is executed step and " is utilized formulaCalculate the zero-crossing rate of the jth road frame signal of i-th of frame signal ".
10. sonic location system according to claim 7, which is characterized in that the secondary relevant average broad sense of fusion Spectrum subtracts orrection phase place transforming function transformation function and obtains module, specifically includes:
Secondary correlation computational submodule calculates every for according to valid frame signal subset, auto-correlation and cross-correlation to be combined The secondary correlation of any two-way frame signal of a effective frame signal;
Spectra calculation submodule, for calculating every road frame signal of each effectively frame signal according to the useful signal subset Power spectrum;
Masking by noise function acquisition submodule obtains the every of each effectively frame signal for the power spectrum according to every road frame signal The masking by noise function of road frame signal:
Wherein, zpq(ω) indicates the masking by noise function of the road the q frame signal of p-th of effective frame signal, Xpq(ω) is indicated p-th The power spectrum of the road the q frame signal of effective frame signal, q=1,2,3,4, N (ω) noise power spectrums, α indicate the first coefficient, β table Show the second coefficient;
It merges secondary relevant broad sense spectrum and subtracts orrection phase place transforming function transformation function acquisition submodule, for according to each effectively frame signal The secondary correlation of the masking by noise function of every road frame signal and any two-way frame signal obtains any of each effectively frame signal Two-way frame signal merges secondary relevant broad sense spectrum and subtracts orrection phase place transforming function transformation function:
Wherein, φls_p(ω) indicates that the road the l frame signal of p-th of effective frame signal and the fusion of the road s frame signal are secondary relevant Broad sense, which is composed, subtracts orrection phase place transforming function transformation function, l=1,2,3,4, s=1,2,3,4, l ≠ s, Xpl(ω) and Xps(ω) respectively indicates the power spectrum of the road the l frame signal of p-th of effective frame signal and the function of the road s frame signal Rate spectrum, ρ indicate third coefficient;
It merges secondary relevant average broad sense spectrum and subtracts orrection phase place transforming function transformation function acquisition submodule, for being believed according to each valid frame Number the secondary relevant broad sense spectrum of any two-way frame signal fusion subtract orrection phase place transforming function transformation function, obtain any two-way valid frame letter Number secondary relevant average broad sense spectrum of fusion subtracts orrection phase place transforming function transformation function:
Wherein,Indicate that the effective frame signal in the road l and the secondary relevant average broad sense spectrum of the road s valid frame signal fused subtract Orrection phase place transforming function transformation function, P indicate that valid frame signal subspace concentrates the quantity of effective frame signal.
CN201910312565.6A 2019-04-18 2019-04-18 Sound source positioning method and system Active CN110007276B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910312565.6A CN110007276B (en) 2019-04-18 2019-04-18 Sound source positioning method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910312565.6A CN110007276B (en) 2019-04-18 2019-04-18 Sound source positioning method and system

Publications (2)

Publication Number Publication Date
CN110007276A true CN110007276A (en) 2019-07-12
CN110007276B CN110007276B (en) 2021-01-12

Family

ID=67172766

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910312565.6A Active CN110007276B (en) 2019-04-18 2019-04-18 Sound source positioning method and system

Country Status (1)

Country Link
CN (1) CN110007276B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110703198A (en) * 2019-10-22 2020-01-17 哈尔滨工程大学 Quaternary cross array envelope spectrum estimation method based on frequency selection
CN110706717A (en) * 2019-09-06 2020-01-17 西安合谱声学科技有限公司 Microphone array panel-based human voice detection orientation method

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101901602A (en) * 2010-07-09 2010-12-01 中国科学院声学研究所 Method for reducing noise by using hearing threshold of impaired hearing
CN102110441A (en) * 2010-12-22 2011-06-29 中国科学院声学研究所 Method for generating sound masking signal based on time reversal
CN102707262A (en) * 2012-06-20 2012-10-03 太仓博天网络科技有限公司 Sound localization system based on microphone array
CN103235287A (en) * 2013-04-17 2013-08-07 华北电力大学(保定) Sound source localization camera shooting tracking device
KR20130114437A (en) * 2012-04-09 2013-10-17 주식회사 센서웨이 The time delay estimation method based on cross-correlation and apparatus thereof
EP2680263A1 (en) * 2012-06-27 2014-01-01 Orange Estimation of low complexity coupling
CN103607361A (en) * 2013-06-05 2014-02-26 西安电子科技大学 Time frequency overlap signal parameter estimation method under Alpha stable distribution noise
EP2543037B1 (en) * 2010-03-29 2014-03-05 Fraunhofer Gesellschaft zur Förderung der angewandten Wissenschaft E.V. A spatial audio processor and a method for providing spatial parameters based on an acoustic input signal
CN104076331A (en) * 2014-06-18 2014-10-01 南京信息工程大学 Sound source positioning method for seven-element microphone array
US9081083B1 (en) * 2011-06-27 2015-07-14 Amazon Technologies, Inc. Estimation of time delay of arrival
CN104991573A (en) * 2015-06-25 2015-10-21 北京品创汇通科技有限公司 Locating and tracking method and apparatus based on sound source array
CN106098077A (en) * 2016-07-28 2016-11-09 浙江诺尔康神经电子科技股份有限公司 Artificial cochlea's speech processing system of a kind of band noise reduction and method
CN106226739A (en) * 2016-07-29 2016-12-14 太原理工大学 Merge the double sound source localization method of Substrip analysis
CN107102296A (en) * 2017-04-27 2017-08-29 大连理工大学 A kind of sonic location system based on distributed microphone array
CN107644650A (en) * 2017-09-29 2018-01-30 山东大学 A kind of improvement sound localization method based on progressive serial orthogonalization blind source separation algorithm and its realize system
US20180074163A1 (en) * 2016-09-08 2018-03-15 Nanjing Avatarmind Robot Technology Co., Ltd. Method and system for positioning sound source by robot
CN108198568A (en) * 2017-12-26 2018-06-22 太原理工大学 A kind of method and system of more auditory localizations
CN108333575A (en) * 2018-02-02 2018-07-27 浙江大学 Moving sound time delay filtering method based on Gaussian prior and Operations of Interva Constraint
US20180359563A1 (en) * 2017-06-12 2018-12-13 Ryo Tanaka Method for accurately calculating the direction of arrival of sound at a microphone array

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2543037B1 (en) * 2010-03-29 2014-03-05 Fraunhofer Gesellschaft zur Förderung der angewandten Wissenschaft E.V. A spatial audio processor and a method for providing spatial parameters based on an acoustic input signal
CN101901602A (en) * 2010-07-09 2010-12-01 中国科学院声学研究所 Method for reducing noise by using hearing threshold of impaired hearing
CN102110441A (en) * 2010-12-22 2011-06-29 中国科学院声学研究所 Method for generating sound masking signal based on time reversal
US9081083B1 (en) * 2011-06-27 2015-07-14 Amazon Technologies, Inc. Estimation of time delay of arrival
KR20130114437A (en) * 2012-04-09 2013-10-17 주식회사 센서웨이 The time delay estimation method based on cross-correlation and apparatus thereof
CN102707262A (en) * 2012-06-20 2012-10-03 太仓博天网络科技有限公司 Sound localization system based on microphone array
EP2680263A1 (en) * 2012-06-27 2014-01-01 Orange Estimation of low complexity coupling
CN103235287A (en) * 2013-04-17 2013-08-07 华北电力大学(保定) Sound source localization camera shooting tracking device
CN103607361A (en) * 2013-06-05 2014-02-26 西安电子科技大学 Time frequency overlap signal parameter estimation method under Alpha stable distribution noise
CN104076331A (en) * 2014-06-18 2014-10-01 南京信息工程大学 Sound source positioning method for seven-element microphone array
CN104991573A (en) * 2015-06-25 2015-10-21 北京品创汇通科技有限公司 Locating and tracking method and apparatus based on sound source array
CN106098077A (en) * 2016-07-28 2016-11-09 浙江诺尔康神经电子科技股份有限公司 Artificial cochlea's speech processing system of a kind of band noise reduction and method
CN106226739A (en) * 2016-07-29 2016-12-14 太原理工大学 Merge the double sound source localization method of Substrip analysis
US20180074163A1 (en) * 2016-09-08 2018-03-15 Nanjing Avatarmind Robot Technology Co., Ltd. Method and system for positioning sound source by robot
CN107102296A (en) * 2017-04-27 2017-08-29 大连理工大学 A kind of sonic location system based on distributed microphone array
US20180359563A1 (en) * 2017-06-12 2018-12-13 Ryo Tanaka Method for accurately calculating the direction of arrival of sound at a microphone array
CN107644650A (en) * 2017-09-29 2018-01-30 山东大学 A kind of improvement sound localization method based on progressive serial orthogonalization blind source separation algorithm and its realize system
CN108198568A (en) * 2017-12-26 2018-06-22 太原理工大学 A kind of method and system of more auditory localizations
CN108333575A (en) * 2018-02-02 2018-07-27 浙江大学 Moving sound time delay filtering method based on Gaussian prior and Operations of Interva Constraint

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
LIXIA HUANG 等: "Classification of Improved Cross-correlation Function to Determine Speaker Location from Microphone Array", 《 2019 IEEE FIFTH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE)》 *
YUNMEI GONG 等: "Time delays of arrival estimation for sound source location based on coherence method in correlated noise environments", 《2010 SECOND INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS, NETWORKS AND APPLICATIONS》 *
张传义 等: "基于广义互功率谱相位法的声源定位技术", 《东北大学学报(自然科学版)》 *
程方晓 等: "基于改进时延估计的声源定位算法", 《吉林大学学报(理学版)》 *
黄丽霞 等: "融合平滑滤波器和子带分析的双声源定位", 《计算机仿真》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110706717A (en) * 2019-09-06 2020-01-17 西安合谱声学科技有限公司 Microphone array panel-based human voice detection orientation method
CN110706717B (en) * 2019-09-06 2021-11-09 西安合谱声学科技有限公司 Microphone array panel-based human voice detection orientation method
CN110703198A (en) * 2019-10-22 2020-01-17 哈尔滨工程大学 Quaternary cross array envelope spectrum estimation method based on frequency selection
CN110703198B (en) * 2019-10-22 2022-03-22 哈尔滨工程大学 Quaternary cross array envelope spectrum estimation method based on frequency selection

Also Published As

Publication number Publication date
CN110007276B (en) 2021-01-12

Similar Documents

Publication Publication Date Title
CN104076331B (en) A kind of sound localization method of seven yuan of microphone arrays
CN107102296B (en) Sound source positioning system based on distributed microphone array
CN106872944B (en) Sound source positioning method and device based on microphone array
CN104459625B (en) The sound source locating device and method of two-microphone array are moved based on track
CN105068048B (en) Distributed microphone array sound localization method based on spatial sparsity
CN111239687B (en) Sound source positioning method and system based on deep neural network
CN109839612A (en) Sounnd source direction estimation method based on time-frequency masking and deep neural network
CN103308889B (en) Passive sound source two-dimensional DOA (direction of arrival) estimation method under complex environment
CN110444208A (en) A kind of speech recognition attack defense method and device based on gradient estimation and CTC algorithm
CN109036467B (en) TF-LSTM-based CFFD extraction method, voice emotion recognition method and system
CN104991573A (en) Locating and tracking method and apparatus based on sound source array
CN108877827A (en) Voice-enhanced interaction method and system, storage medium and electronic equipment
CN103901401A (en) Binaural sound source positioning method based on binaural matching filter
CN105204001A (en) Sound source positioning method and system
CN110007276A (en) A kind of sound localization method and system
CN108877809A (en) A kind of speaker's audio recognition method and device
CN107167770A (en) A kind of microphone array sound source locating device under the conditions of reverberation
CN110534126A (en) A kind of auditory localization and sound enhancement method and system based on fixed beam formation
CN103901400B (en) A kind of based on delay compensation and ears conforming binaural sound source of sound localization method
CN108896962A (en) Iteration localization method based on sound position fingerprint
CN106371057B (en) Voice sound source direction-finding method and device
CN107144818A (en) Binaural sound sources localization method based on two-way ears matched filter Weighted Fusion
CN104535964A (en) Helmet type microphone array sound source positioning method based on low-frequency diffraction delay inequalities
CN106886010A (en) A kind of sound bearing recognition methods based on mini microphone array
CN109461447A (en) A kind of end-to-end speaker's dividing method and system based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant