CN108520756A - A kind of method and device of speaker's speech Separation - Google Patents

A kind of method and device of speaker's speech Separation Download PDF

Info

Publication number
CN108520756A
CN108520756A CN201810231676.XA CN201810231676A CN108520756A CN 108520756 A CN108520756 A CN 108520756A CN 201810231676 A CN201810231676 A CN 201810231676A CN 108520756 A CN108520756 A CN 108520756A
Authority
CN
China
Prior art keywords
audio signal
audio
obtains
speaker
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810231676.XA
Other languages
Chinese (zh)
Other versions
CN108520756B (en
Inventor
孙学京
刘恩
张晨
张兴涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tuoling Inc
Original Assignee
Beijing Tuoling Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tuoling Inc filed Critical Beijing Tuoling Inc
Priority to CN201810231676.XA priority Critical patent/CN108520756B/en
Publication of CN108520756A publication Critical patent/CN108520756A/en
Application granted granted Critical
Publication of CN108520756B publication Critical patent/CN108520756B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Abstract

The invention discloses a kind of method and device of speaker's speech Separation, method includes:Obtain the audio signal of preset format;It is pre-processed for the audio signal, first audio signal that obtains that treated;Audio separating treatment is carried out for first audio signal, obtains the second audio signal of different direction speaker;Enhancing processing is carried out for second audio signal, obtains the third audio signal of enhanced different direction speaker;Export the third audio signal.Technical solution using the present invention realizes quickly and accurately separation without the audio signal of multiple speakers in orientation.

Description

A kind of method and device of speaker's speech Separation
Technical field
The present invention relates to technical field of voice recognition, and in particular to a kind of method and device of speaker's speech Separation.
Background technology
With the development of science and technology, every field is higher and higher for the pursuit of audio quality, all kinds of audio documents Acquiring way is more and more abundant, and data volume is in explosive growth, to also more and more difficult to the management of audio documents.In recent years Come, people begin one's study audio retrieval technology, to the multi-media voices such as call voice, broadcasting speech and conference voice document into Row management.Wherein, maximum to the retrieval difficulty of conference voice, because including multiple channels in conference voice document, more Speaker.
Existing audio separation method is broadly divided into single channel (Mike) technology and multichannel (Mike) technology.Single Mike's skill Art includes mainly the audio separation method based on model and the separation method based on distance scale;More Mike's technologies include mainly wave Beam forms separation method and blind source separation method.
Wherein, the audio separation method based on model includes two steps of training and identification:To inputting sound in training process Frequency carries out the laggard step of feature extraction and is trained and stores the model after training;Feature is carried out to input audio in identification process After carrying out speaker's separation and speaker clustering after extraction, matching primitives are further carried out with the model of storage, judgement is each Speaker finally obtains the audio signal after separation.Separation method based on distance scale then passes through the left and right of calculating every bit The distance of two segment signals of adjacent certain window length, is further compared with the threshold value of setting, obtains the jump of audio signal Height, to the audio signal after being detached.Wave beam forming separation method by input audio carry out auditory localization in real time, And enhancing processing is further carried out according to speaker orientation, obtain the audio signal of each speaker.Blind source separation method passes through Blind source separating processing is carried out to input audio, to obtain the audio signal of each speaker.
But the separation method based on model, it is desirable that the time that each speaker continuously speaks in dialogue is longer, and Algorithm complexity is excessively high;The problems such as separation method based on distance scale, there are testing number excessive redundancy cut-points.And wave beam Form separation method, the methods of blind source separation method, primarily directed to linear microphone array and plane microphone array etc. into Row processing, and there are certain deficiencies for the effect handled in complex environment.
Therefore, under complex environment, more quickly and accurately separation does not have to the audio signal of multiple speakers in orientation, It is current technical problem urgently to be resolved hurrily.
Invention content
The purpose of the present invention is to provide a kind of method and devices of speaker's speech Separation, realize quickly and accurately Separation does not have to the audio signal of multiple speakers in orientation.
To achieve the above object, the present invention provides a kind of method of speaker's speech Separation, including:
Obtain the audio signal of preset format;
It is pre-processed for the audio signal, first audio signal that obtains that treated;
Audio separating treatment is carried out for first audio signal, obtains the second audio letter of different direction speaker Number;
Enhancing processing is carried out for second audio signal, obtains the third audio of enhanced different direction speaker Signal;
Export the third audio signal.
Further, it in method described above, is pre-processed for the audio signal, obtains that treated first Audio signal, including:
Obtain the modes of emplacement parameter and ambient parameters of wheat battle array;
According to the modes of emplacement parameter of wheat battle array, conversion process is carried out to the audio signal, obtains being located at same flat The transducing audio signal in face;
Time-frequency conversion is carried out to the transducing audio signal, obtains the corresponding frequency-region signal of the transducing audio signal;
According to the ambient parameters, audio enhancing processing is carried out to the frequency-region signal, obtains enhanced frequency domain Signal;
Time-frequency inverse transformation is carried out for enhanced frequency-region signal, time-domain signal is obtained, as first audio signal.
Further, in method described above, audio separating treatment is carried out to first audio signal, obtains difference The second audio signal of orientation speaker, including:
According to first audio signal, obtains the corresponding auditory localization result of first audio signal and speaker knows Other result;
According to the auditory localization result and the Speaker Identification as a result, carrying out audio point to first audio signal From processing, second audio signal is obtained.
Further, in method described above, according to first audio signal, first audio signal pair is obtained The auditory localization result and Speaker Identification answered are as a result, include:
Speech detection processing is carried out to first audio signal, obtains testing result;
According to the testing result, auditory localization processing is carried out to first audio signal, obtains the auditory localization As a result;
According to preset identification model, Speaker Identification processing is carried out to first audio signal, obtains described speak People's recognition result.
Further, in method described above, according to the auditory localization result and the Speaker Identification as a result, right First audio signal carries out audio separating treatment, obtains second audio signal, including:
According to the auditory localization result and the Speaker Identification as a result, using Beamforming Method, to described first Audio signal carries out audio separating treatment, obtains second audio signal.
Further, in method described above, according to the auditory localization result and the Speaker Identification as a result, right First audio signal carries out audio separating treatment, obtains second audio signal, including:
Choose audio separation method corresponding with the auditory localization result;
According to the Speaker Identification as a result, using the audio separation method, sound is carried out to first audio signal Frequency separating treatment obtains second audio signal.
Further, in method described above, enhancing processing is carried out to second audio signal, is obtained enhanced Third audio signal, including:
Based on the Speaker Identification as a result, second audio signal is smoothed and audio conversion point The correcting process set obtains the third audio signal.
The present invention also provides a kind of devices of speaker's speech Separation, including:
Acquisition module, the audio signal for obtaining preset format;
Preprocessing module is pre-processed for being directed to the audio signal, first audio signal that obtains that treated;
Audio separation module obtains different direction and speaks for carrying out audio separating treatment to first audio signal The second audio signal of people;
Enhancing processing module obtains enhanced third audio for carrying out enhancing processing to second audio signal Signal;
Output module, for exporting the third audio signal.
The method and device of speaker's speech Separation of the present invention, is located in advance by the audio signal to preset format Reason, first audio signal that obtains that treated carry out audio separating treatment to the first audio signal, obtain different direction speaker The second audio signal, enhancing processing is carried out to the second audio signal, obtains the third sound of enhanced different direction speaker Frequency signal exports third audio signal, realizes quickly and accurately separation without the audio signal of multiple speakers in orientation.
Description of the drawings
Fig. 1 is the flow chart of the embodiment of the method for speaker's speech Separation of the present invention;
Fig. 2 is the wheat battle array placement schematic diagram of the present invention four tunnel audio signals of acquisition;
Fig. 3 is the structural schematic diagram of the device embodiment of speaker's speech Separation of the present invention.
Specific implementation mode
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with the specific embodiment of the invention and The present embodiment technical solution is clearly and completely described in corresponding attached drawing.Obviously, described embodiment is only this implementation Example a part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not having There is the every other embodiment obtained under the premise of making creative work, belongs to the range of the present embodiment protection.
The (if present)s such as term " first ", " second " in specification and claims and above-mentioned attached drawing are to be used for area Not similar part, without being used to describe specific sequence or precedence.It should be appreciated that the data used in this way are appropriate In the case of can be interchanged, so that embodiments herein described herein can be to the sequence other than illustrating herein Implement.
The following examples are used to illustrate the present invention, but are not intended to limit the scope of the present invention..
Embodiment 1
Fig. 1 is the flow chart of the embodiment of the method for speaker's speech Separation of the present invention, as shown in Figure 1, the present embodiment The method of speaker's speech Separation can specifically include following steps:
100, the audio signal of preset format is obtained.
The audio signal of preset format in the present embodiment can be the audio signal of Ambisonic A formats.Its In, the audio signal of Ambisonic A formats is four tunnel audio signals (left front road (Left-Front-Up, LFU), You Qianlu (Right-Front-Down, RFD), left back road (Left-Back-Down, LBD), the right way of escape (Right-Back-Up, RBU)). Fig. 2 is the wheat battle array placement schematic diagram of the present invention four tunnel audio signals of acquisition.
101, it is pre-processed for the audio signal of acquisition, first audio signal that obtains that treated.
During specific implementation at one, when obtaining the audio signal of preset format, the placement side of wheat battle array can be obtained Formula parameter and ambient parameters, so as to according to the modes of emplacement parameter of wheat battle array, to the audio signal of the preset format of acquisition into Row conversion process, the transducing audio signal being generally aligned in the same plane, and time-frequency conversion is carried out to transducing audio signal, turned The corresponding frequency-region signal of audio signal is changed, and, according to ambient parameters, audio enhancing processing is carried out to frequency-region signal, is obtained To enhanced frequency-region signal, time-frequency inverse transformation further is carried out to frequency-region signal, obtains time-domain signal, believed as the first audio Number.
For example, after getting the modes of emplacement of Mai Zhen, can the modes of emplacement based on wheat battle array to audio signal according to public affairs Formula (1) carry out rotation processing, so as to get audio signal be generally aligned in the same plane.
Wherein, A is transition matrix:
Wherein, θhFor angle of heading, θpFor pitch angle, θbFor inclination angle, f (θhpb) be and θh、θpAnd θbRelevant letter Number.
After obtaining conversion signal, discrete fourier transform (Discrete Fourier may be used Transformation), the methods of fast Fourier transform (Fast Fourier Transformation, FFT) is to conversion signal Time-frequency conversion processing is carried out by road.By taking DFT as an example, time-frequency conversion processing can be carried out to conversion signal according to formula (2):
Wherein, AN domain index value when n is, k are frequency domain index value, and L is audio frequency process frame length, LfFor the length of time-frequency conversion, j For imaginary part unit, M is number of channels, and x (n) is audio time domain sample value, and X (k) is audio frequency coefficient.
After obtaining frequency-region signal, reverberation time (RT can be passed through according to 4 tunnel audio signal estimated noise energy spectrums60) Parameter and through and reflectivity (Direct-to-Reverberant Energy Ratio, DRR) parameter Estimation reverberation energy Spectrum, the noise energy spectrum and reverberation energy spectrum for being based further on estimation carry out audio enhancing processing by road, to obtained frequency Domain signal carries out the processing such as denoising, dereverberation, so as to get frequency-region signal enhanced.
It, can be according to the modes of emplacement parameter and ambient parameters of wheat battle array, to the multichannel sound of reception in the present embodiment Frequency signal is pre-processed, and influence of the environment to follow audio separating treatment is reduced.
102, audio separating treatment is carried out to the first audio signal, obtains the second audio signal of different direction speaker.
In the present embodiment, after obtaining the first audio signal, first audio can be obtained according to first audio signal The corresponding auditory localization result of signal and Speaker Identification are as a result, and according to auditory localization result and Speaker Identification as a result, right First audio signal carries out audio separating treatment, to obtain the second audio signal of different direction speaker.
During specific implementation at one, speech detection processing can be carried out to the first audio signal, be examined accordingly Survey as a result, so as to according to the testing result, auditory localization processing carried out to the first audio signal, obtain auditory localization as a result, with And according to preset identification model, Speaker Identification processing is carried out to the first audio signal, obtains Speaker Identification result.
For example, may be used multiple signal classification (Multiple Signal Classification, MUSIC) algorithm, The methods of broad sense cross-correlation (Generalized Cross Correlation, GCC) realizes auditory localization, specific by taking GCC as an example It can realize in the following way:
A) cross-correlation of each road audio is calculated separately according to formula (3):
Wherein, K1To originate frequency point, K2To end frequency point.
B) it is smoothed based on voice detection results according to formula (4):
Gsm(i, j)=Gsm(i,j)*fsm+(1-fsm)*G(i,j) (4)
Wherein, fsmFor smoothing factor:
Vad is speech detection handling result.
C) cross-correlation function after smooth is further processed, obtains auditory localization result.
In the present embodiment, can the mode based on model carry out Speaker Identification, obtain Speaker Identification as a result, such as Gauss Mixed model (Gaussian Mixed Model, GMM), Hidden Markov Model (Hidden Markov Model, HMM) are deep Spend neural network (Deep Neural Networks, DNN) etc..
After obtaining auditory localization result and Speaker Identification result, Wave beam forming mode may be used, to first via sound Frequency signal carries out audio separating treatment, obtains the second audio signal of different direction speaker.
Audio separation method corresponding with auditory localization result can also be chosen, and according to Speaker Identification as a result, profit With audio separation method, audio separating treatment is carried out to the first audio signal, obtains the second audio letter of different direction speaker Number.
For example, audio separating treatment can be carried out using formula (5), the second audio letter of different direction speaker is obtained Number.
Wherein, VdoaFor in the weighted factor of Sounnd source direction:
τ is time delay, and S is sound source number, VspeFor simple sund source when weighted factor.
As S > 1,Beamforming Method may be used and obtain the audio signal of Sounnd source direction.When S≤ When 1, Vdoa=Vspe, for example be set as showing when (1,0,0,0) using the 1st tunnel audio as the audio signal after separation.
102, enhancing processing is carried out to the second audio signal of different direction speaker, obtains enhanced different direction and says Talk about the third audio signal of people.
For example, can be based on Speaker Identification as a result, the second audio signal to different direction speaker is smoothly located The correcting process of reason and audio position of conversion point, obtains the third audio signal of different direction speaker, to ensure audio Continuity.
103, third audio signal is exported.
The executive agent of the method for speaker's speech Separation of the present embodiment can be the device of speaker's speech Separation, should The device of speaker's speech Separation can specifically be integrated by software, such as the device of speaker's speech Separation specifically can be with It is applied for one, the present invention is to this without being particularly limited to.
The method of speaker's speech Separation of the present embodiment, obtains the audio signal of preset format, by audio signal It is pre-processed, first audio signal that obtains that treated carries out audio separating treatment to the first audio signal, obtains not Tongfang The second audio signal of position speaker, carries out enhancing processing to the second audio signal, obtains enhanced different direction speaker Third audio signal, export third audio signal, realize quickly and accurately separation without orientation multiple speakers sound Frequency signal.
Embodiment 2
Fig. 3 is the structural schematic diagram of the device embodiment of speaker's speech Separation of the present invention, as shown in figure 3, this implementation The device of speaker's speech Separation of example may include acquisition module 10, preprocessing module 11, audio separation module 12, at enhancing Manage module 13 and output module 14.
Wherein, acquisition module 10, the audio signal for obtaining preset format.
The audio signal of preset format in the present embodiment can be the audio signal of Ambisonic A formats.Its In, the audio signal of Ambisonic A formats is four tunnel audio signals (left front road (Left-Front-Up, LFU), You Qianlu (Right-Front-Down, RFD), left back road (Left-Back-Down, LBD), the right way of escape (Right-Back-Up, RBU)). Fig. 2 is the wheat battle array placement schematic diagram of the present invention four tunnel audio signals of acquisition
Preprocessing module 11 is received audio signal and is pre-processed for docking, first audio signal that obtains that treated.Tool Body, preprocessing module 11 can obtain the modes of emplacement parameter and ambient parameters of wheat battle array;According to the modes of emplacement of wheat battle array Parameter carries out conversion process, the transducing audio signal being generally aligned in the same plane to multi-channel audio signal;To conversion signal into Row time-frequency conversion obtains the corresponding frequency-region signal of conversion signal;According to ambient parameters, audio enhancing is carried out to frequency-region signal Processing, obtains enhanced frequency-region signal;Time-frequency inverse transformation is carried out to enhanced audio signal, obtains audio time domain signal, As the first audio signal.
Audio separation module 12 obtains different direction speaker for carrying out audio separating treatment to the first audio signal The second audio signal.Specifically, audio separation module 12 can obtain the first audio signal and correspond to according to the first audio signal Auditory localization result and Speaker Identification as a result, for example, to the first audio signal carry out speech detection processing, obtain detection knot Fruit;According to testing result, auditory localization processing is carried out to the first audio signal, obtains auditory localization result;According to preset knowledge Other model carries out Speaker Identification processing to the first audio signal, obtains Speaker Identification result.
Audio separation module 12 can also according to auditory localization result and Speaker Identification as a result, to the first audio signal into Row audio separating treatment obtains the second audio signal of different direction speaker.For example, according to auditory localization result and can say People's recognition result is talked about, using beam-forming technology, audio separating treatment is carried out to the first audio signal, obtains speaking without orientation The second audio signal of people.Or choose audio separation method corresponding with auditory localization result;Known according to speaker Not as a result, using audio separation method, audio separating treatment is carried out to the first audio signal, obtains the of different direction speaker Two audio signals.
Enhance processing module 13, carries out enhancing processing for the second audio signal to different direction speaker, increased The third audio signal of different direction speaker after strong.Specifically, enhancing processing module 13 can be based on Speaker Identification knot Fruit, is smoothed the second audio signal and the correcting process of audio position of conversion point, obtains different direction speaker Third audio signal.
Output module 14, the third audio signal for exporting different direction speaker.
The device of speaker's speech Separation of the present embodiment, by using the realization machine of above-mentioned each module separating audio signals System is identical as the realization mechanism of above-mentioned embodiment illustrated in fig. 1, can refer to the record of above-mentioned embodiment illustrated in fig. 1 in detail, herein It repeats no more.
The device of speaker's speech Separation of the present embodiment, obtains the audio signal of preset format, by audio signal It is pre-processed, first audio signal that obtains that treated carries out audio separating treatment to the first audio signal, obtains not Tongfang The second audio signal of position speaker, carries out enhancing processing to the second audio signal, obtains enhanced different direction speaker Third audio signal, export third audio signal, realize quickly and accurately separation without orientation multiple speakers sound Frequency signal.
Although above having used general explanation and specific embodiment, the present invention is described in detail, at this On the basis of invention, it can be made some modifications or improvements, this will be apparent to those skilled in the art.Therefore, These modifications or improvements without departing from theon the basis of the spirit of the present invention belong to the scope of protection of present invention.

Claims (8)

1. a kind of method of speaker's speech Separation, which is characterized in that including:
Obtain the audio signal of preset format;
It is pre-processed for the audio signal, first audio signal that obtains that treated;
Audio separating treatment is carried out for first audio signal, obtains the second audio signal of different direction speaker;
Enhancing processing is carried out for second audio signal, obtains the third audio letter of enhanced different direction speaker Number;
Export the third audio signal.
2. according to the method described in claim 1, it is characterized in that, pre-processed for the audio signal, handled The first audio signal afterwards, including:
Obtain the modes of emplacement parameter and ambient parameters of wheat battle array;
According to the modes of emplacement parameter of wheat battle array, conversion process is carried out to the audio signal, is generally aligned in the same plane Transducing audio signal;
Time-frequency conversion is carried out to the transducing audio signal, obtains the corresponding frequency-region signal of the transducing audio signal;
According to the ambient parameters, audio enhancing processing is carried out to the frequency-region signal, obtains enhanced frequency-region signal;
Time-frequency inverse transformation is carried out for enhanced frequency-region signal, time-domain signal is obtained, as first audio signal.
3. method according to claim 1 or 2, which is characterized in that carried out at audio separation to first audio signal Reason, obtains the second audio signal of different direction speaker, including:
According to first audio signal, the corresponding auditory localization result of first audio signal and Speaker Identification knot are obtained Fruit;
According to the auditory localization result and the Speaker Identification as a result, being carried out at audio separation to first audio signal Reason, obtains second audio signal.
4. according to the method described in claim 3, it is characterized in that, according to first audio signal, first sound is obtained The corresponding auditory localization result of frequency signal and Speaker Identification are as a result, include:
Speech detection processing is carried out to first audio signal, obtains testing result;
According to the testing result, auditory localization processing is carried out to first audio signal, obtains the auditory localization result;
According to preset identification model, Speaker Identification processing is carried out to first audio signal, the speaker is obtained and knows Other result.
5. according to the method described in claim 3, it is characterized in that, according to the auditory localization result and the Speaker Identification As a result, carrying out audio separating treatment to first audio signal, second audio signal is obtained, including:
According to the auditory localization result and the Speaker Identification as a result, using Beamforming Method, to first audio Signal carries out audio separating treatment, obtains second audio signal.
6. according to the method described in claim 3, it is characterized in that, according to the auditory localization result and the Speaker Identification As a result, carrying out audio separating treatment to first audio signal, second audio signal is obtained, including:
Choose audio separation method corresponding with the auditory localization result;
According to the Speaker Identification as a result, using the audio separation method, audio point is carried out to first audio signal From processing, second audio signal is obtained.
7. according to the method described in claim 3, it is characterized in that, carry out enhancing processing to second audio signal, obtain Enhanced third audio signal, including:
Based on the Speaker Identification as a result, being smoothed to second audio signal and audio position of conversion point Correcting process obtains the third audio signal.
8. a kind of device of speaker's speech Separation, which is characterized in that including:
Acquisition module, the audio signal for obtaining preset format;
Preprocessing module is pre-processed for being directed to the audio signal, first audio signal that obtains that treated;
Audio separation module obtains different direction speaker for carrying out audio separating treatment for first audio signal The second audio signal;
Enhance processing module, enhancing processing is carried out for being directed to second audio signal, obtains enhanced third audio letter Number;
Output module, for exporting the third audio signal.
CN201810231676.XA 2018-03-20 2018-03-20 Method and device for separating speaker voice Active CN108520756B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810231676.XA CN108520756B (en) 2018-03-20 2018-03-20 Method and device for separating speaker voice

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810231676.XA CN108520756B (en) 2018-03-20 2018-03-20 Method and device for separating speaker voice

Publications (2)

Publication Number Publication Date
CN108520756A true CN108520756A (en) 2018-09-11
CN108520756B CN108520756B (en) 2020-09-01

Family

ID=63433795

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810231676.XA Active CN108520756B (en) 2018-03-20 2018-03-20 Method and device for separating speaker voice

Country Status (1)

Country Link
CN (1) CN108520756B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110021302A (en) * 2019-03-06 2019-07-16 厦门快商通信息咨询有限公司 A kind of Intelligent office conference system and minutes method
CN110459239A (en) * 2019-03-19 2019-11-15 深圳壹秘科技有限公司 Role analysis method, apparatus and computer readable storage medium based on voice data
CN111899758A (en) * 2020-09-07 2020-11-06 腾讯科技(深圳)有限公司 Voice processing method, device, equipment and storage medium
CN112382306A (en) * 2020-12-02 2021-02-19 苏州思必驰信息科技有限公司 Method and device for separating speaker audio
CN112634935A (en) * 2021-03-10 2021-04-09 北京世纪好未来教育科技有限公司 Voice separation method and device, electronic equipment and readable storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1818909A1 (en) * 2004-12-03 2007-08-15 HONDA MOTOR CO., Ltd. Voice recognition system
CN101720558A (en) * 2007-04-19 2010-06-02 埃波斯开发有限公司 Voice and position localization
CN102831898A (en) * 2012-08-31 2012-12-19 厦门大学 Microphone array voice enhancement device with sound source direction tracking function and method thereof
CN103456312A (en) * 2013-08-29 2013-12-18 太原理工大学 Single channel voice blind separation method based on computational auditory scene analysis
CN103811020A (en) * 2014-03-05 2014-05-21 东北大学 Smart voice processing method
CN104049235A (en) * 2014-06-23 2014-09-17 河北工业大学 Microphone array in sound source orienting device
CN104936091A (en) * 2015-05-14 2015-09-23 科大讯飞股份有限公司 Intelligent interaction method and system based on circle microphone array
CN105120421A (en) * 2015-08-21 2015-12-02 北京时代拓灵科技有限公司 Method and apparatus of generating virtual surround sound
CN105355203A (en) * 2015-11-03 2016-02-24 重庆码头联智科技有限公司 Method for speech judgment by virtue of gravity sensor intelligent wearable equipment
CN105872940A (en) * 2016-06-08 2016-08-17 北京时代拓灵科技有限公司 Virtual reality sound field generating method and system
CN106098075A (en) * 2016-08-08 2016-11-09 腾讯科技(深圳)有限公司 Audio collection method and apparatus based on microphone array
CN106816156A (en) * 2017-02-04 2017-06-09 北京时代拓灵科技有限公司 A kind of enhanced method and device of audio quality

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1818909A1 (en) * 2004-12-03 2007-08-15 HONDA MOTOR CO., Ltd. Voice recognition system
CN101720558A (en) * 2007-04-19 2010-06-02 埃波斯开发有限公司 Voice and position localization
CN102831898A (en) * 2012-08-31 2012-12-19 厦门大学 Microphone array voice enhancement device with sound source direction tracking function and method thereof
CN103456312A (en) * 2013-08-29 2013-12-18 太原理工大学 Single channel voice blind separation method based on computational auditory scene analysis
CN103811020A (en) * 2014-03-05 2014-05-21 东北大学 Smart voice processing method
CN104049235A (en) * 2014-06-23 2014-09-17 河北工业大学 Microphone array in sound source orienting device
CN104936091A (en) * 2015-05-14 2015-09-23 科大讯飞股份有限公司 Intelligent interaction method and system based on circle microphone array
CN105120421A (en) * 2015-08-21 2015-12-02 北京时代拓灵科技有限公司 Method and apparatus of generating virtual surround sound
CN105355203A (en) * 2015-11-03 2016-02-24 重庆码头联智科技有限公司 Method for speech judgment by virtue of gravity sensor intelligent wearable equipment
CN105872940A (en) * 2016-06-08 2016-08-17 北京时代拓灵科技有限公司 Virtual reality sound field generating method and system
CN106098075A (en) * 2016-08-08 2016-11-09 腾讯科技(深圳)有限公司 Audio collection method and apparatus based on microphone array
CN106816156A (en) * 2017-02-04 2017-06-09 北京时代拓灵科技有限公司 A kind of enhanced method and device of audio quality

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张雄伟,李轶南,时文华,胡永刚,陈栩杉: "非负组合模型及其在声源分离中的应用", 《JOURNAL OF DATA ACQUISITION AND PROCESSING》 *
陈洁: "背景音乐自动分离系统设计与实现", 《现代电子技术》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110021302A (en) * 2019-03-06 2019-07-16 厦门快商通信息咨询有限公司 A kind of Intelligent office conference system and minutes method
CN110459239A (en) * 2019-03-19 2019-11-15 深圳壹秘科技有限公司 Role analysis method, apparatus and computer readable storage medium based on voice data
CN111899758A (en) * 2020-09-07 2020-11-06 腾讯科技(深圳)有限公司 Voice processing method, device, equipment and storage medium
CN111899758B (en) * 2020-09-07 2024-01-30 腾讯科技(深圳)有限公司 Voice processing method, device, equipment and storage medium
CN112382306A (en) * 2020-12-02 2021-02-19 苏州思必驰信息科技有限公司 Method and device for separating speaker audio
CN112382306B (en) * 2020-12-02 2022-05-10 思必驰科技股份有限公司 Method and device for separating speaker audio
CN112634935A (en) * 2021-03-10 2021-04-09 北京世纪好未来教育科技有限公司 Voice separation method and device, electronic equipment and readable storage medium
CN112634935B (en) * 2021-03-10 2021-06-11 北京世纪好未来教育科技有限公司 Voice separation method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN108520756B (en) 2020-09-01

Similar Documents

Publication Publication Date Title
CN108520756A (en) A kind of method and device of speaker's speech Separation
Chen et al. Continuous speech separation: Dataset and analysis
Yoshioka et al. Multi-microphone neural speech separation for far-field multi-talker speech recognition
CN110120227B (en) Voice separation method of deep stack residual error network
CN110970053B (en) Multichannel speaker-independent voice separation method based on deep clustering
Kingsbury et al. Recognizing reverberant speech with RASTA-PLP
CN106782565A (en) A kind of vocal print feature recognition methods and system
CN102565759B (en) Binaural sound source localization method based on sub-band signal to noise ratio estimation
CN107346664A (en) A kind of ears speech separating method based on critical band
Huang et al. Audio replay spoof attack detection using segment-based hybrid feature and densenet-LSTM network
Cai et al. Multi-Channel Training for End-to-End Speaker Recognition Under Reverberant and Noisy Environment.
CN110858476A (en) Sound collection method and device based on microphone array
Sainath et al. Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature Extraction.
Venkatesan et al. Binaural classification-based speech segregation and robust speaker recognition system
CN107895582A (en) Towards the speaker adaptation speech-emotion recognition method in multi-source information field
Taherian et al. Multi-channel conversational speaker separation via neural diarization
Kamble et al. Teager energy subband filtered features for near and far-field automatic speech recognition
Huang et al. Audio-replay Attacks Spoofing Detection for Automatic Speaker Verification System
Martín-Doñas et al. Multi-channel block-online source extraction based on utterance adaptation
CN113345421B (en) Multi-channel far-field target voice recognition method based on angle spectrum characteristics
Gaffar et al. A multi-frame blocking for signal segmentation in voice command recognition
CN114189781A (en) Noise reduction method and system for double-microphone neural network noise reduction earphone
Yang et al. A target speaker separation neural network with joint-training
Melhem et al. Improving Deep Attractor Network by BGRU and GMM for Speech Separation
Yoshioka et al. Picknet: Real-time channel selection for ad hoc microphone arrays

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant