CN110491411A - In conjunction with the method for microphone sound source angle and phonetic feature similarity separation speaker - Google Patents
In conjunction with the method for microphone sound source angle and phonetic feature similarity separation speaker Download PDFInfo
- Publication number
- CN110491411A CN110491411A CN201910908195.2A CN201910908195A CN110491411A CN 110491411 A CN110491411 A CN 110491411A CN 201910908195 A CN201910908195 A CN 201910908195A CN 110491411 A CN110491411 A CN 110491411A
- Authority
- CN
- China
- Prior art keywords
- speaker
- thres
- threshold
- variable rate
- angle variable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/028—Voice signal separating using properties of sound source
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/0308—Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Abstract
The invention discloses a kind of method of combination microphone sound source angle and phonetic feature similarity separation speaker, the step of this method includes: the real-time angle variable rate for calculating microphone sound-source signal relative to microphone;Calculate the probability changing value of speaker in real time according to the characteristic similarity of the voice signal of microphone input;In conjunction with the angle variable rate and probability changing value, whether real-time judgment speaker changes.The present invention carries out speaker's separation by combining microphone sound-source signal angle and voice signal, not only increases the flexibility ratio and accuracy of speaker's separation, and reduces the restrictive condition of speaker's separation.
Description
Technical field
The present invention relates to field of computer technology, and more particularly to speech Separation technology, more specifically, it relates to one kind
The method for carrying out speaker's separation in conjunction with microphone sound-source signal angle and phonic signal character similarity.
Background technique
Current speaker's isolation technics generallys use following two method:
1. doing speaker's separation using angle difference of the different speakers before microphone.The shortcomings that this method, is, when multiple
Angle of the speaker before microphone close in the case where, be difficult to differentiate speaker;Meanwhile this method is required with primary record
Sound, the sound-source signal angle that microphone obtains is constant (sound source and microphone be not active), just can guarantee precision, therefore flexibility
It is poor.
2. doing speaker's separation using voice signal.The advantages of this method is not dependent on hardware (microphone), the disadvantage is that
It is affected (noise, reverberation have an impact to it) by quality of speech signal, therefore accuracy rate is poor, number is more or has more people
It speaks and will lead to poor performance.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of combination microphone sound source angles and phonetic feature similarity point
Method from speaker, this method restrictive condition is few, and flexibility ratio and accuracy are high.
In order to solve the above technical problems, combination microphone sound source angle of the invention and the separation of phonetic feature similarity are spoken
The method of people, step include:
Angle variable rate of the microphone sound-source signal relative to microphone is calculated in real time;
Calculate the probability changing value of speaker in real time according to the characteristic similarity of the voice signal of microphone input;
In conjunction with the angle variable rate and probability changing value, whether real-time judgment speaker changes.
The calculation formula of the threshold value thres of the angle variable rate are as follows:
Wherein, v is speaker's movement velocity, and r is speaker at a distance from microphone.
When v is the mankind's slowly maximum speed of walking, the threshold value of the angle variable rate is thres_1;When v is the mankind
When the maximum speed of power-walking, the threshold value of the angle variable rate is thres_2;Two threshold values of the probability changing value are
Threshold_1 and threshold_2;The determination method are as follows:
When the angle variable rate is less than thres_1, and the probability changing value is less than threshold_2, then it is determined as
It is identical to talk about people;
When the angle variable rate is less than thres_1, but the probability changing value is then determined as in threshold_2 or more
Speaker is different;
When the angle variable rate is in thres_1 or more, but it is less than thres_2, and the probability changing value is less than
When threshold_1, then it is determined as that speaker is identical;
When the angle variable rate is in thres_1 or more, but it is less than thres_2, and the probability changing value is in threshold_1
When above, then it is determined as speaker's difference;
When the angle variable rate is in thres_2 or more, then it is determined as speaker's difference.
The value range of the r is preferably 0.2~0.5 meter, and the value range of the thres_1 is preferably 0.17~
0.43/m, the value range of the thres_2 are preferably 0.23~0.57/m.
The threshold_1 is preferably that 0.3, threshold_2 is preferably 0.5.
The feature of the voice signal includes vocal print feature.
The present invention passes through to microphone sound-source signal angle progress real-time tracking school synchronous with phonic signal character similarity
Just, and microphone sound source angle situation of change and speech recognition result is combined to carry out speaker's separation, not only increases speaker
Isolated flexibility ratio and accuracy, and reduce the restrictive condition of speaker's separation.
Detailed description of the invention
Fig. 1 is sounding position to change with time figure relative to the angle of microphone.
Specific embodiment
To have more specific understanding to technology contents of the invention, feature and effect, now in conjunction with specific embodiment, to this hair
Bright technical solution is further described in detail.
1. being positioned using microphone sound-source signal and separating speaker
Principle using microphone sound-source signal separation speaker is: there is the audio signal reception device in multiple and different orientation inside microphone,
When radio reception, synchronization, multiple audio signal reception devices, which receive the same sound-source signal, different phase differences, according to this phase
Difference can calculate specific orientation of the sound-source signal relative to microphone, and it is (same to isolate speaker according to this orientation
The same speaker in orientation, different direction difference speaker).Algorithm confidence calculations formula are as follows: 1- is with the same person's before
The opposite variation of general bearing.Algorithm supports streaming computing, and the speaker at each moment can be calculated in real time.
The specific method is as follows:
It is quieted down by microphone, obtains angle of the sounding position relative to microphone in a period of time: θ=(θ1,θ2,…,θT)。
During the speech, if speaker carries out position adjustment, it is assumed that speaker's movement velocity is no more than v(unit: m/s), it says
Talking about people is r(unit at a distance from microphone: m), then according to formula:
The threshold value thres of available angle variable rate d θ:
Wherein, speaker's movement velocity v and speaker and microphone distance r are adjustable parameter.
Speaker's movement velocity should be no more than the speed that the mankind slowly walk, i.e. v≤1.5m/s under normal circumstances.Therefore,
As v=1.5m/s, the small threshold thres_1 of an available angle variable rate:
If the angle variable rate at front and back moment is no more than threshold value thres_1, that is, it is determined as same people.
In addition, speed v=2m/s of mankind's power-walking, it is possible thereby to which the larger threshold an of angle variable rate is calculated
Value thres_2:
If the angle variable rate at front and back moment is greater than thres_2, it is determined as that speaker is changed.
If the angle variable rate at front and back moment between two threshold values of thres_1 and thres_2, will be known in conjunction with voice at this time
Other separation algorithm as a result, judging whether front and back moment speaker identical.
Under normal circumstances, r value range is therefore [0.2,0.5] rice can estimate that thres_1, thres_2's is general
Value range (being not necessarily suitble to all situations) is (unit: every millisecond of degree):
Thres_1 ∈ [0.17,0.43]
Thres_2 ∈ [0.23,0.57]
For example, setting speaker and microphone distance r=0.3m, then angle change is determined as same no more than 0.3/ms
People.As shown in Figure 1, the rate of angle change is then determined as speaker in time period less than 0.3/ms in 10~180ms
It is identical;And between 180~210ms, angle of speaking suddenly rises, and the rate of angle change has been more than 0.3/ms, then it is assumed that this
When speaker changed.
2. utilizing speech signal separation speaker
Principle using speech signal separation speaker is: extracting feature (for example, vocal print is related to the voice signal at each moment
Feature), using increment streaming cluster (that is, being gathered for each new sample according to the similarity with clustering cluster before
One kind known before, or output new one kind) algorithm, it is exactly a few individuals that cluster, which comes out several clusters, to realize speaker point
From.Negatively correlated (the distance of the average distance of the calculation of the confidence level of speech recognition separation algorithm and current time and all clusters
Smaller, confidence level is higher).Algorithm supports streaming computing, and the speaker at each moment can be calculated in real time.
The specific method is as follows:
The output that speech recognition separation algorithm is obtained by softmax layers is the probability of artificial different people of speaking, probability and be 1,
Wherein, maximum probability is by isolated speaker:
The probability for allowing speech recognition separation algorithm to export is engraved in a certain range at front and back two to be fluctuated.Two threshold values are set
It (threshold) is respectively 0.3 and 0.5, if the probability value p of later moment in timet+1Compared to the probability value p of previous momenttChanging value
Less than 0.3, then still think that the speaker at former and later two moment is same people;If the probability value p of later moment in timet+1Compared to it is previous when
The probability value p at quartertChanging value be more than 0.5, then be determined as that the speaker at former and later two moment is changed;If when forward and backward
The probability changing value at quarter then needs to determine the forward and backward moment in conjunction with microphone sound-source signal positioning result between 0.3~0.5
Whether speaker is identical.
1 speech recognition separation algorithm probability output of table
3. the positioning of microphone sound-source signal and speech recognition result fusion
In conjunction with microphone sound-source signal location algorithm result and speech recognition separation algorithm as a result, by the result of two algorithms
Mutually correction, available more accurate speaker's separating resulting.Specific determination method is following (ginseng is shown in Table 2):
If microphone sound bearing, i.e. the angle variable rate at front and back moment moves in thres_2 threshold value, and speech recognition point
The probability changing value at the forward and backward moment from algorithm output is then determined as same speaker less than 0.3;
If the probability changing value at the forward and backward moment of speech recognition separation algorithm output is within 0.5, and when before and after microphone sound source
The angle variable rate at quarter is then determined as same speaker in thres_1;
If the probability changing value at the forward and backward moment of speech recognition separation algorithm output is more than the moment before and after 0.5 or microphone sound source
Angle variable rate be more than thres_2, then be determined as that speaker is changed;
If speech recognition separating resulting fluctuates in bigger range simultaneously with microphone sound-source signal positioning result, that is,
The probability changing value at the forward and backward moment of speech recognition separation algorithm output is between 0.3~0.5, before and after microphone sound-source signal
The angle variable rate at moment is then determined as that the speaker at former and later two moment is different between thres_1 and thres_2.
2 result of table merges speaker's criterion
。
Claims (6)
1. the method for combining microphone sound source angle and phonetic feature similarity separation speaker, which is characterized in that step includes:
Angle variable rate of the microphone sound-source signal relative to microphone is calculated in real time;
Calculate the probability changing value of speaker in real time according to the characteristic similarity of the voice signal of microphone input;
In conjunction with the angle variable rate and probability changing value, whether real-time judgment speaker changes.
2. the method according to claim 1, wherein the calculation formula of the threshold value thres of the angle variable rate
Are as follows:, wherein v is speaker's movement velocity, and r is speaker at a distance from microphone.
3. according to the method described in claim 2, it is characterized in that, when v be the mankind slowly walking maximum speed when, the angle
The threshold value for spending change rate is thres_1;When v is the maximum speed of mankind's power-walking, the threshold value of the angle variable rate is
thres_2;Two threshold values of the probability changing value are threshold_1 and threshold_2;The determination method are as follows:
When the angle variable rate is less than thres_1, and the probability changing value is less than threshold_2, then it is determined as
It is identical to talk about people;
When the angle variable rate is less than thres_1, but the probability changing value is then determined as in threshold_2 or more
Speaker is different;
When the angle variable rate is in thres_1 or more, but it is less than thres_2, and the probability changing value is less than
When threshold_1, then it is determined as that speaker is identical;
When the angle variable rate is in thres_1 or more, but it is less than thres_2, and the probability changing value is in threshold_1
When above, then it is determined as speaker's difference;
When the angle variable rate is in thres_2 or more, then it is determined as speaker's difference.
4. the method according to claim 1, wherein the value range of the r be 0.2~0.5 meter, it is described
The value range of thres_1 is 0.17~0.43/m, and the value range of the thres_2 is 0.23~0.57/m.
5. the method according to claim 1, wherein the threshold_1 is that 0.3, threshold_2 is
0.5。
6. the method according to claim 1, wherein the feature of the voice signal includes vocal print feature.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910908195.2A CN110491411B (en) | 2019-09-25 | 2019-09-25 | Method for separating speaker by combining microphone sound source angle and voice characteristic similarity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910908195.2A CN110491411B (en) | 2019-09-25 | 2019-09-25 | Method for separating speaker by combining microphone sound source angle and voice characteristic similarity |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110491411A true CN110491411A (en) | 2019-11-22 |
CN110491411B CN110491411B (en) | 2022-05-17 |
Family
ID=68544207
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910908195.2A Active CN110491411B (en) | 2019-09-25 | 2019-09-25 | Method for separating speaker by combining microphone sound source angle and voice characteristic similarity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110491411B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111524527A (en) * | 2020-04-30 | 2020-08-11 | 合肥讯飞数码科技有限公司 | Speaker separation method, device, electronic equipment and storage medium |
CN112382306A (en) * | 2020-12-02 | 2021-02-19 | 苏州思必驰信息科技有限公司 | Method and device for separating speaker audio |
CN113362831A (en) * | 2021-07-12 | 2021-09-07 | 科大讯飞股份有限公司 | Speaker separation method and related equipment thereof |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100198598A1 (en) * | 2009-02-05 | 2010-08-05 | Nuance Communications, Inc. | Speaker Recognition in a Speech Recognition System |
US20120045066A1 (en) * | 2010-08-17 | 2012-02-23 | Honda Motor Co., Ltd. | Sound source separation apparatus and sound source separation method |
US20160111112A1 (en) * | 2014-10-17 | 2016-04-21 | Fujitsu Limited | Speaker change detection device and speaker change detection method |
CN107297745A (en) * | 2017-06-28 | 2017-10-27 | 上海木爷机器人技术有限公司 | voice interactive method, voice interaction device and robot |
CN108305615A (en) * | 2017-10-23 | 2018-07-20 | 腾讯科技(深圳)有限公司 | A kind of object identifying method and its equipment, storage medium, terminal |
US20180286411A1 (en) * | 2017-03-29 | 2018-10-04 | Honda Motor Co., Ltd. | Voice processing device, voice processing method, and program |
-
2019
- 2019-09-25 CN CN201910908195.2A patent/CN110491411B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100198598A1 (en) * | 2009-02-05 | 2010-08-05 | Nuance Communications, Inc. | Speaker Recognition in a Speech Recognition System |
US20120045066A1 (en) * | 2010-08-17 | 2012-02-23 | Honda Motor Co., Ltd. | Sound source separation apparatus and sound source separation method |
US20160111112A1 (en) * | 2014-10-17 | 2016-04-21 | Fujitsu Limited | Speaker change detection device and speaker change detection method |
US20180286411A1 (en) * | 2017-03-29 | 2018-10-04 | Honda Motor Co., Ltd. | Voice processing device, voice processing method, and program |
CN107297745A (en) * | 2017-06-28 | 2017-10-27 | 上海木爷机器人技术有限公司 | voice interactive method, voice interaction device and robot |
CN108305615A (en) * | 2017-10-23 | 2018-07-20 | 腾讯科技(深圳)有限公司 | A kind of object identifying method and its equipment, storage medium, terminal |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111524527A (en) * | 2020-04-30 | 2020-08-11 | 合肥讯飞数码科技有限公司 | Speaker separation method, device, electronic equipment and storage medium |
CN111524527B (en) * | 2020-04-30 | 2023-08-22 | 合肥讯飞数码科技有限公司 | Speaker separation method, speaker separation device, electronic device and storage medium |
CN112382306A (en) * | 2020-12-02 | 2021-02-19 | 苏州思必驰信息科技有限公司 | Method and device for separating speaker audio |
CN112382306B (en) * | 2020-12-02 | 2022-05-10 | 思必驰科技股份有限公司 | Method and device for separating speaker audio |
CN113362831A (en) * | 2021-07-12 | 2021-09-07 | 科大讯飞股份有限公司 | Speaker separation method and related equipment thereof |
Also Published As
Publication number | Publication date |
---|---|
CN110491411B (en) | 2022-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9818431B2 (en) | Multi-speaker speech separation | |
CN110970053B (en) | Multichannel speaker-independent voice separation method based on deep clustering | |
CN110491411A (en) | In conjunction with the method for microphone sound source angle and phonetic feature similarity separation speaker | |
US20180358003A1 (en) | Methods and apparatus for improving speech communication and speech interface quality using neural networks | |
US8996367B2 (en) | Sound processing apparatus, sound processing method and program | |
Nakadai et al. | Improvement of recognition of simultaneous speech signals using av integration and scattering theory for humanoid robots | |
CN107621625B (en) | Sound source positioning method based on double micro microphones | |
CN110610718B (en) | Method and device for extracting expected sound source voice signal | |
CN106887233B (en) | Audio data processing method and system | |
US11222652B2 (en) | Learning-based distance estimation | |
US20100111290A1 (en) | Call Voice Processing Apparatus, Call Voice Processing Method and Program | |
TW202147862A (en) | Robust speaker localization in presence of strong noise interference systems and methods | |
KR20210137146A (en) | Speech augmentation using clustering of queues | |
Ochi et al. | Multi-Talker Speech Recognition Based on Blind Source Separation with ad hoc Microphone Array Using Smartphones and Cloud Storage. | |
JP3925734B2 (en) | Target sound detection method, signal input delay time detection method, and sound signal processing apparatus | |
CN103901400A (en) | Binaural sound source positioning method based on delay compensation and binaural coincidence | |
Araki et al. | Meeting recognition with asynchronous distributed microphone array | |
KR102580828B1 (en) | Multi-channel voice activity detection | |
JP2005227512A (en) | Sound signal processing method and its apparatus, voice recognition device, and program | |
US11528571B1 (en) | Microphone occlusion detection | |
WO2010127489A1 (en) | Detection signal delay method, detection device and encoder | |
Zhu et al. | Long-term speech information based threshold for voice activity detection in massive microphone network | |
WO2021021814A3 (en) | Acoustic zoning with distributed microphones | |
CN113327589B (en) | Voice activity detection method based on attitude sensor | |
Lee et al. | End-to-End Multi-Channel Speech Enhancement Using Inter-Channel Time-Restricted Attention on Raw Waveform. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |