CN110491411A - In conjunction with the method for microphone sound source angle and phonetic feature similarity separation speaker - Google Patents

In conjunction with the method for microphone sound source angle and phonetic feature similarity separation speaker Download PDF

Info

Publication number
CN110491411A
CN110491411A CN201910908195.2A CN201910908195A CN110491411A CN 110491411 A CN110491411 A CN 110491411A CN 201910908195 A CN201910908195 A CN 201910908195A CN 110491411 A CN110491411 A CN 110491411A
Authority
CN
China
Prior art keywords
speaker
thres
threshold
variable rate
angle variable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910908195.2A
Other languages
Chinese (zh)
Other versions
CN110491411B (en
Inventor
汪俊
李索恒
张志齐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yitu Information Technology Co Ltd
Original Assignee
Shanghai Yitu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yitu Information Technology Co Ltd filed Critical Shanghai Yitu Information Technology Co Ltd
Priority to CN201910908195.2A priority Critical patent/CN110491411B/en
Publication of CN110491411A publication Critical patent/CN110491411A/en
Application granted granted Critical
Publication of CN110491411B publication Critical patent/CN110491411B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/0308Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Abstract

The invention discloses a kind of method of combination microphone sound source angle and phonetic feature similarity separation speaker, the step of this method includes: the real-time angle variable rate for calculating microphone sound-source signal relative to microphone;Calculate the probability changing value of speaker in real time according to the characteristic similarity of the voice signal of microphone input;In conjunction with the angle variable rate and probability changing value, whether real-time judgment speaker changes.The present invention carries out speaker's separation by combining microphone sound-source signal angle and voice signal, not only increases the flexibility ratio and accuracy of speaker's separation, and reduces the restrictive condition of speaker's separation.

Description

In conjunction with the method for microphone sound source angle and phonetic feature similarity separation speaker
Technical field
The present invention relates to field of computer technology, and more particularly to speech Separation technology, more specifically, it relates to one kind The method for carrying out speaker's separation in conjunction with microphone sound-source signal angle and phonic signal character similarity.
Background technique
Current speaker's isolation technics generallys use following two method:
1. doing speaker's separation using angle difference of the different speakers before microphone.The shortcomings that this method, is, when multiple Angle of the speaker before microphone close in the case where, be difficult to differentiate speaker;Meanwhile this method is required with primary record Sound, the sound-source signal angle that microphone obtains is constant (sound source and microphone be not active), just can guarantee precision, therefore flexibility It is poor.
2. doing speaker's separation using voice signal.The advantages of this method is not dependent on hardware (microphone), the disadvantage is that It is affected (noise, reverberation have an impact to it) by quality of speech signal, therefore accuracy rate is poor, number is more or has more people It speaks and will lead to poor performance.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of combination microphone sound source angles and phonetic feature similarity point Method from speaker, this method restrictive condition is few, and flexibility ratio and accuracy are high.
In order to solve the above technical problems, combination microphone sound source angle of the invention and the separation of phonetic feature similarity are spoken The method of people, step include:
Angle variable rate of the microphone sound-source signal relative to microphone is calculated in real time;
Calculate the probability changing value of speaker in real time according to the characteristic similarity of the voice signal of microphone input;
In conjunction with the angle variable rate and probability changing value, whether real-time judgment speaker changes.
The calculation formula of the threshold value thres of the angle variable rate are as follows:
Wherein, v is speaker's movement velocity, and r is speaker at a distance from microphone.
When v is the mankind's slowly maximum speed of walking, the threshold value of the angle variable rate is thres_1;When v is the mankind When the maximum speed of power-walking, the threshold value of the angle variable rate is thres_2;Two threshold values of the probability changing value are Threshold_1 and threshold_2;The determination method are as follows:
When the angle variable rate is less than thres_1, and the probability changing value is less than threshold_2, then it is determined as It is identical to talk about people;
When the angle variable rate is less than thres_1, but the probability changing value is then determined as in threshold_2 or more Speaker is different;
When the angle variable rate is in thres_1 or more, but it is less than thres_2, and the probability changing value is less than When threshold_1, then it is determined as that speaker is identical;
When the angle variable rate is in thres_1 or more, but it is less than thres_2, and the probability changing value is in threshold_1 When above, then it is determined as speaker's difference;
When the angle variable rate is in thres_2 or more, then it is determined as speaker's difference.
The value range of the r is preferably 0.2~0.5 meter, and the value range of the thres_1 is preferably 0.17~ 0.43/m, the value range of the thres_2 are preferably 0.23~0.57/m.
The threshold_1 is preferably that 0.3, threshold_2 is preferably 0.5.
The feature of the voice signal includes vocal print feature.
The present invention passes through to microphone sound-source signal angle progress real-time tracking school synchronous with phonic signal character similarity Just, and microphone sound source angle situation of change and speech recognition result is combined to carry out speaker's separation, not only increases speaker Isolated flexibility ratio and accuracy, and reduce the restrictive condition of speaker's separation.
Detailed description of the invention
Fig. 1 is sounding position to change with time figure relative to the angle of microphone.
Specific embodiment
To have more specific understanding to technology contents of the invention, feature and effect, now in conjunction with specific embodiment, to this hair Bright technical solution is further described in detail.
1. being positioned using microphone sound-source signal and separating speaker
Principle using microphone sound-source signal separation speaker is: there is the audio signal reception device in multiple and different orientation inside microphone, When radio reception, synchronization, multiple audio signal reception devices, which receive the same sound-source signal, different phase differences, according to this phase Difference can calculate specific orientation of the sound-source signal relative to microphone, and it is (same to isolate speaker according to this orientation The same speaker in orientation, different direction difference speaker).Algorithm confidence calculations formula are as follows: 1- is with the same person's before The opposite variation of general bearing.Algorithm supports streaming computing, and the speaker at each moment can be calculated in real time.
The specific method is as follows:
It is quieted down by microphone, obtains angle of the sounding position relative to microphone in a period of time: θ=(θ12,…,θT)。
During the speech, if speaker carries out position adjustment, it is assumed that speaker's movement velocity is no more than v(unit: m/s), it says Talking about people is r(unit at a distance from microphone: m), then according to formula:
The threshold value thres of available angle variable rate d θ:
Wherein, speaker's movement velocity v and speaker and microphone distance r are adjustable parameter.
Speaker's movement velocity should be no more than the speed that the mankind slowly walk, i.e. v≤1.5m/s under normal circumstances.Therefore, As v=1.5m/s, the small threshold thres_1 of an available angle variable rate:
If the angle variable rate at front and back moment is no more than threshold value thres_1, that is, it is determined as same people.
In addition, speed v=2m/s of mankind's power-walking, it is possible thereby to which the larger threshold an of angle variable rate is calculated Value thres_2:
If the angle variable rate at front and back moment is greater than thres_2, it is determined as that speaker is changed.
If the angle variable rate at front and back moment between two threshold values of thres_1 and thres_2, will be known in conjunction with voice at this time Other separation algorithm as a result, judging whether front and back moment speaker identical.
Under normal circumstances, r value range is therefore [0.2,0.5] rice can estimate that thres_1, thres_2's is general Value range (being not necessarily suitble to all situations) is (unit: every millisecond of degree):
Thres_1 ∈ [0.17,0.43]
Thres_2 ∈ [0.23,0.57]
For example, setting speaker and microphone distance r=0.3m, then angle change is determined as same no more than 0.3/ms People.As shown in Figure 1, the rate of angle change is then determined as speaker in time period less than 0.3/ms in 10~180ms It is identical;And between 180~210ms, angle of speaking suddenly rises, and the rate of angle change has been more than 0.3/ms, then it is assumed that this When speaker changed.
2. utilizing speech signal separation speaker
Principle using speech signal separation speaker is: extracting feature (for example, vocal print is related to the voice signal at each moment Feature), using increment streaming cluster (that is, being gathered for each new sample according to the similarity with clustering cluster before One kind known before, or output new one kind) algorithm, it is exactly a few individuals that cluster, which comes out several clusters, to realize speaker point From.Negatively correlated (the distance of the average distance of the calculation of the confidence level of speech recognition separation algorithm and current time and all clusters Smaller, confidence level is higher).Algorithm supports streaming computing, and the speaker at each moment can be calculated in real time.
The specific method is as follows:
The output that speech recognition separation algorithm is obtained by softmax layers is the probability of artificial different people of speaking, probability and be 1, Wherein, maximum probability is by isolated speaker:
The probability for allowing speech recognition separation algorithm to export is engraved in a certain range at front and back two to be fluctuated.Two threshold values are set It (threshold) is respectively 0.3 and 0.5, if the probability value p of later moment in timet+1Compared to the probability value p of previous momenttChanging value Less than 0.3, then still think that the speaker at former and later two moment is same people;If the probability value p of later moment in timet+1Compared to it is previous when The probability value p at quartertChanging value be more than 0.5, then be determined as that the speaker at former and later two moment is changed;If when forward and backward The probability changing value at quarter then needs to determine the forward and backward moment in conjunction with microphone sound-source signal positioning result between 0.3~0.5 Whether speaker is identical.
1 speech recognition separation algorithm probability output of table
3. the positioning of microphone sound-source signal and speech recognition result fusion
In conjunction with microphone sound-source signal location algorithm result and speech recognition separation algorithm as a result, by the result of two algorithms Mutually correction, available more accurate speaker's separating resulting.Specific determination method is following (ginseng is shown in Table 2):
If microphone sound bearing, i.e. the angle variable rate at front and back moment moves in thres_2 threshold value, and speech recognition point The probability changing value at the forward and backward moment from algorithm output is then determined as same speaker less than 0.3;
If the probability changing value at the forward and backward moment of speech recognition separation algorithm output is within 0.5, and when before and after microphone sound source The angle variable rate at quarter is then determined as same speaker in thres_1;
If the probability changing value at the forward and backward moment of speech recognition separation algorithm output is more than the moment before and after 0.5 or microphone sound source Angle variable rate be more than thres_2, then be determined as that speaker is changed;
If speech recognition separating resulting fluctuates in bigger range simultaneously with microphone sound-source signal positioning result, that is, The probability changing value at the forward and backward moment of speech recognition separation algorithm output is between 0.3~0.5, before and after microphone sound-source signal The angle variable rate at moment is then determined as that the speaker at former and later two moment is different between thres_1 and thres_2.
2 result of table merges speaker's criterion

Claims (6)

1. the method for combining microphone sound source angle and phonetic feature similarity separation speaker, which is characterized in that step includes:
Angle variable rate of the microphone sound-source signal relative to microphone is calculated in real time;
Calculate the probability changing value of speaker in real time according to the characteristic similarity of the voice signal of microphone input;
In conjunction with the angle variable rate and probability changing value, whether real-time judgment speaker changes.
2. the method according to claim 1, wherein the calculation formula of the threshold value thres of the angle variable rate Are as follows:, wherein v is speaker's movement velocity, and r is speaker at a distance from microphone.
3. according to the method described in claim 2, it is characterized in that, when v be the mankind slowly walking maximum speed when, the angle The threshold value for spending change rate is thres_1;When v is the maximum speed of mankind's power-walking, the threshold value of the angle variable rate is thres_2;Two threshold values of the probability changing value are threshold_1 and threshold_2;The determination method are as follows:
When the angle variable rate is less than thres_1, and the probability changing value is less than threshold_2, then it is determined as It is identical to talk about people;
When the angle variable rate is less than thres_1, but the probability changing value is then determined as in threshold_2 or more Speaker is different;
When the angle variable rate is in thres_1 or more, but it is less than thres_2, and the probability changing value is less than When threshold_1, then it is determined as that speaker is identical;
When the angle variable rate is in thres_1 or more, but it is less than thres_2, and the probability changing value is in threshold_1 When above, then it is determined as speaker's difference;
When the angle variable rate is in thres_2 or more, then it is determined as speaker's difference.
4. the method according to claim 1, wherein the value range of the r be 0.2~0.5 meter, it is described The value range of thres_1 is 0.17~0.43/m, and the value range of the thres_2 is 0.23~0.57/m.
5. the method according to claim 1, wherein the threshold_1 is that 0.3, threshold_2 is 0.5。
6. the method according to claim 1, wherein the feature of the voice signal includes vocal print feature.
CN201910908195.2A 2019-09-25 2019-09-25 Method for separating speaker by combining microphone sound source angle and voice characteristic similarity Active CN110491411B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910908195.2A CN110491411B (en) 2019-09-25 2019-09-25 Method for separating speaker by combining microphone sound source angle and voice characteristic similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910908195.2A CN110491411B (en) 2019-09-25 2019-09-25 Method for separating speaker by combining microphone sound source angle and voice characteristic similarity

Publications (2)

Publication Number Publication Date
CN110491411A true CN110491411A (en) 2019-11-22
CN110491411B CN110491411B (en) 2022-05-17

Family

ID=68544207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910908195.2A Active CN110491411B (en) 2019-09-25 2019-09-25 Method for separating speaker by combining microphone sound source angle and voice characteristic similarity

Country Status (1)

Country Link
CN (1) CN110491411B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111524527A (en) * 2020-04-30 2020-08-11 合肥讯飞数码科技有限公司 Speaker separation method, device, electronic equipment and storage medium
CN112382306A (en) * 2020-12-02 2021-02-19 苏州思必驰信息科技有限公司 Method and device for separating speaker audio
CN113362831A (en) * 2021-07-12 2021-09-07 科大讯飞股份有限公司 Speaker separation method and related equipment thereof

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100198598A1 (en) * 2009-02-05 2010-08-05 Nuance Communications, Inc. Speaker Recognition in a Speech Recognition System
US20120045066A1 (en) * 2010-08-17 2012-02-23 Honda Motor Co., Ltd. Sound source separation apparatus and sound source separation method
US20160111112A1 (en) * 2014-10-17 2016-04-21 Fujitsu Limited Speaker change detection device and speaker change detection method
CN107297745A (en) * 2017-06-28 2017-10-27 上海木爷机器人技术有限公司 voice interactive method, voice interaction device and robot
CN108305615A (en) * 2017-10-23 2018-07-20 腾讯科技(深圳)有限公司 A kind of object identifying method and its equipment, storage medium, terminal
US20180286411A1 (en) * 2017-03-29 2018-10-04 Honda Motor Co., Ltd. Voice processing device, voice processing method, and program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100198598A1 (en) * 2009-02-05 2010-08-05 Nuance Communications, Inc. Speaker Recognition in a Speech Recognition System
US20120045066A1 (en) * 2010-08-17 2012-02-23 Honda Motor Co., Ltd. Sound source separation apparatus and sound source separation method
US20160111112A1 (en) * 2014-10-17 2016-04-21 Fujitsu Limited Speaker change detection device and speaker change detection method
US20180286411A1 (en) * 2017-03-29 2018-10-04 Honda Motor Co., Ltd. Voice processing device, voice processing method, and program
CN107297745A (en) * 2017-06-28 2017-10-27 上海木爷机器人技术有限公司 voice interactive method, voice interaction device and robot
CN108305615A (en) * 2017-10-23 2018-07-20 腾讯科技(深圳)有限公司 A kind of object identifying method and its equipment, storage medium, terminal

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111524527A (en) * 2020-04-30 2020-08-11 合肥讯飞数码科技有限公司 Speaker separation method, device, electronic equipment and storage medium
CN111524527B (en) * 2020-04-30 2023-08-22 合肥讯飞数码科技有限公司 Speaker separation method, speaker separation device, electronic device and storage medium
CN112382306A (en) * 2020-12-02 2021-02-19 苏州思必驰信息科技有限公司 Method and device for separating speaker audio
CN112382306B (en) * 2020-12-02 2022-05-10 思必驰科技股份有限公司 Method and device for separating speaker audio
CN113362831A (en) * 2021-07-12 2021-09-07 科大讯飞股份有限公司 Speaker separation method and related equipment thereof

Also Published As

Publication number Publication date
CN110491411B (en) 2022-05-17

Similar Documents

Publication Publication Date Title
US9818431B2 (en) Multi-speaker speech separation
CN110970053B (en) Multichannel speaker-independent voice separation method based on deep clustering
CN110491411A (en) In conjunction with the method for microphone sound source angle and phonetic feature similarity separation speaker
US20180358003A1 (en) Methods and apparatus for improving speech communication and speech interface quality using neural networks
US8996367B2 (en) Sound processing apparatus, sound processing method and program
Nakadai et al. Improvement of recognition of simultaneous speech signals using av integration and scattering theory for humanoid robots
CN107621625B (en) Sound source positioning method based on double micro microphones
CN110610718B (en) Method and device for extracting expected sound source voice signal
CN106887233B (en) Audio data processing method and system
US11222652B2 (en) Learning-based distance estimation
US20100111290A1 (en) Call Voice Processing Apparatus, Call Voice Processing Method and Program
TW202147862A (en) Robust speaker localization in presence of strong noise interference systems and methods
KR20210137146A (en) Speech augmentation using clustering of queues
Ochi et al. Multi-Talker Speech Recognition Based on Blind Source Separation with ad hoc Microphone Array Using Smartphones and Cloud Storage.
JP3925734B2 (en) Target sound detection method, signal input delay time detection method, and sound signal processing apparatus
CN103901400A (en) Binaural sound source positioning method based on delay compensation and binaural coincidence
Araki et al. Meeting recognition with asynchronous distributed microphone array
KR102580828B1 (en) Multi-channel voice activity detection
JP2005227512A (en) Sound signal processing method and its apparatus, voice recognition device, and program
US11528571B1 (en) Microphone occlusion detection
WO2010127489A1 (en) Detection signal delay method, detection device and encoder
Zhu et al. Long-term speech information based threshold for voice activity detection in massive microphone network
WO2021021814A3 (en) Acoustic zoning with distributed microphones
CN113327589B (en) Voice activity detection method based on attitude sensor
Lee et al. End-to-End Multi-Channel Speech Enhancement Using Inter-Channel Time-Restricted Attention on Raw Waveform.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant