CN106531179A - Multi-channel speech enhancement method based on semantic prior selective attention - Google Patents
Multi-channel speech enhancement method based on semantic prior selective attention Download PDFInfo
- Publication number
- CN106531179A CN106531179A CN201510574907.3A CN201510574907A CN106531179A CN 106531179 A CN106531179 A CN 106531179A CN 201510574907 A CN201510574907 A CN 201510574907A CN 106531179 A CN106531179 A CN 106531179A
- Authority
- CN
- China
- Prior art keywords
- voice
- signal
- activation word
- target
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Circuit For Audible Band Transducer (AREA)
- Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
Abstract
The invention provides a multi-channel speech enhancement method based on semantic prior selective attention. The method comprises the following steps: picking speech signals from any directions in a reverberant environment by virtue of a multi-microphone array, collecting the multiple paths of speech signals and pre-processing the speech signals; detecting special activation words in the pre-processed speech signals by virtue of an activation word speech recognition model; processing signals which are not cut and include activation word segments, so as to obtain a complete activation word segment; analyzing the activation word segment by virtue of a multi-channel phase difference sound localization method based on reverberation robust, so as to obtain an acoustic wave reaching direction of a target sound source; and enhancing speech in the direction and inhibiting noise from other directions and room reverberation in a remote speak scene, so that enhanced speech in the target direction is obtained. The method provided by the invention is applicable to such occasions as intelligent household electrical appliance, smart home, vehicle-mounted and wearable devices and the like that remote speak type speech input and interaction are required, and the method is especially applicable to complex acoustic noise and interference environment occasions.
Description
Technical field
The present invention relates to speech processes field, the multichannel of more particularly to a kind of selective attention based on semantic priori
Sound enhancement method.
Background technology
With voice communication and the continuous popularization of man-machine voice interaction system, people increasingly expect to cast aside microphone and ear
The loaded down with trivial details equipment such as machine, realizes man machine language's exchange of similar human conversation general nature.However, voice is a kind of
Sound wave, can be subject to the multiple anti-of various impacts, the decay of such as sound wave, wall and barrier when transmitting in atmosphere
Penetrate (reverberation), simultaneous other sound sources and environment noise etc..At multiple voice systems and multiple speakers
When same environment, how to guarantee that system is properly received voice messaging, further determined voice system and can move towards practical.
Speech enhan-cement is the effective means for extracting targeted voice signal in a kind of complicated noise, is divided into single-channel voice
Strengthen and multicenter voice strengthens.
Single-channel voice is strengthened the main difference being distributed in time-frequency domain using voice and noise and realizes that noise is eliminated.It is single
Enhanced two key problems of channel speech are Noise Estimation and a priori SNR estimation;The former is the pass for reducing noise
Key factor, and the latter is then related to the degree of residual " music noise ".Single channel strengthens algorithm and under many circumstances can
Signal to noise ratio is enough significantly improved, especially has preferable eradicating efficacy to stationary noise (white noise, car are made an uproar).
Multicenter voice strengthens the ability that make use of microphone array to pick up spatial information, can be with reference to time domain, frequency domain
And spatial information, obtain the receiving ability with space distinction.Generally, multicenter voice strengthens needs priori
Arrival bearing angle information, so as to form reliable steering vector, using vacant filtering theory, to from non-targeted
The back drop in direction is suppressed, and for single-channel voice strengthens, multicenter voice enhancing possesses preferably
The ability of noise suppressed.
Why human auditory can process many sound sources and have a problem of reverberation, in addition can also detect when many people speak with
Oneself voice interested is tracked, main cause is that human auditory has specific Selective attention ability.As the mankind couple
When certain target sound is interested, can be according to specific tasks and environment, choosing target voice is most had with ambient sound
The feature of distinction, and compared according to priori and screened, exclusive PCR sound simultaneously obtains target voice.
For voice application, noise that may be present or interference in the actual scene such as daily household, vehicle-mounted and outdoor
It is many.And existing speech enhan-cement or separation method, pickup undistorted to target voice is all extremely difficult to,
And while eliminate or suppress the purpose of non-targeted signal, particularly presence, reverberation are larger in multiple coherent sound sources simultaneously
In the case of low signal-to-noise ratio.
Speech enhan-cement based on multichannel (microphone array) receives the amplitude and phase place of signal using multiple microphones
Difference, can form spatial selectivity to the signal of target direction so that beam shaping (Beamforming, BM),
Spatial activity detection (Directive speech activity detection, DSAD) algorithm points to target direction,
So as to suppress or refuse the interference signal in non-targeted direction.But the direction of arrival (DOA) of target sound source still cannot
Know in advance.In the case where simple sund source is assumed, mesh can be determined with sound localization (Source Location, SL) technology
The DOA of mark sound source, but in actual application environment, it is this to assume to be difficult to meet.In most cases, can simultaneously
There is multi-acoustical, and number is unknown.There iing the reverberation field of room reflections, situation can be more complicated, causes target sound
The noise in source is excessive.
The content of the invention
It is an object of the invention to overcome the drawbacks described above that current multi-channel speech enhancement method is present, will be based on semantic
Identification of sound source and combined based on the sound localization technology of signal processing, merge microphone array " space filtering "
A kind of characteristic, it is proposed that multi-channel speech enhancement method of the selective attention based on semantic priori, can effective gram
Take noise and interference.
To achieve these goals, the invention provides a kind of multichannel language of the selective attention based on semantic priori
Sound Enhancement Method, methods described include:Many microphone array pickups come from the language of any direction in reverberant ambiance
Message number, gathers multi-path voice signal and carries out pretreatment;After activation word speech recognition model inspection pretreatment
Voice signal present in specific activation word;Process is carried out to the not cleaved signal comprising activation word section to obtain
Complete activation word section;Carried out to activating word section using the multichannel phase difference sound localization method based on reverberation robust
Process, obtain the sound wave arrival direction of target sound source;The voice of the direction is strengthened, and suppresses other directions
Noise and far say RMR room reverb under scene, acquire the enhancing voice of target direction.
In above-mentioned technical proposal, the concrete grammar includes:
Step 1) pickup of many microphone arrays comes from the voice signal of any direction in reverberant ambiance, gathers multichannel
Voice signal;
Step 2) to step 1) the multi-path voice signal that gathers carries out pretreatment;
Step 3) it is specific sharp using whether there is in the pretreated voice signal of activation word speech recognition model inspection
Word living;If testing result is affirmative, retain the not cleaved signal comprising activation word section, into step 4);
Otherwise, proceed to step 1);
Step 4) Voice activity detector is carried out to the not cleaved signal comprising activation word section obtain complete activation
Word section;It is analyzed to activating word section using the multichannel phase difference sound localization method based on reverberation robust, is obtained
The sound wave arrival direction of target sound source;The voice of the direction is strengthened, and is suppressed remaining directivity noise and is come
From RMR room reverb under scene is said in the diffusion noise of environment and far, the enhancing voice of target direction is got.
In above-mentioned technical proposal, the step 2) detailed process be:If there is acoustics to return in multi-path voice signal
Ripple, the multi-path voice signal to picking up carry out Echo Cancellation, suppress diffusion background noise and gain control;It is no
Then, only it is diffused background noise to suppress and gain control to multi-path voice signal.
In above-mentioned technical proposal, the step 3) in using the pretreated language of activation word speech recognition model inspection
In message number with the presence or absence of the specific detailed process for activating word it is:According to a large amount of activation word data or specific of priori
The data of speaker, training obtain the activation word speech recognition model that speaker is related or speaker is unrelated;Using
Identification decoding policy is detected and is calculated confidence level to activating word content, so as to complete discriminant classification, voice is known
Do not combine with keyword retrieval algorithm, realize the detection to activating word.
In above-mentioned technical proposal, the step 4) specifically include:
Step 4-1) starting point and the detection of tail point of word will be activated by Voice activity detector, obtain complete multichannel
Activation word section;
Step 4-2) carried out point to activating word section using the multichannel phase difference sound localization method based on reverberation robust
Analysis;The sound wave arrival direction information of target sound source is obtained, that is, gets the target speaker side for sending the certain semantic
To;According to sound wave arrival direction information, the voice of the direction is strengthened;
Step 4-3) further suppress remaining directivity noise and come from the diffusion of environment to make an uproar using multichannel post filtering
Sound and RMR room reverb under scene is far said, acquire the enhancing voice of target direction.
In above-mentioned technical proposal, step 4-2) specifically include:
Step 4-2-1) activation word section is transformed to into time-frequency domain, on each frequency, the Coherent Part to signal respectively
It is tracked with incoherent part;
Step 4-2-2) count the time frequency point occupied by direct sound wave;
Step 4-2-3) in the time frequency point occupied by direct sound wave, signal arrival is obtained in low frequency without spacial aliasing part
The distribution of the time difference;
Step 4-2-4) in HFS, according to the signal step-out time information that low frequency is obtained, remove spacial aliasing
Affect, obtain the signal step-out time information of Whole frequency band;Then obtain sound wave arrival direction information;
Step 4-2-5) according to sound wave arrival direction information, the voice of the direction is strengthened.
In above-mentioned technical proposal, step 4-2-5) in enhanced mode carried out to voice have two kinds:
First kind of way:According to sound wave arrival direction information, known direction voice is carried out using Beamforming Method
Strengthen, suppression comes from coherence's sound source in other directions;
The second way:Extraterrestrial target Speech signal detection is carried out using the known direction, acceptance comes from target area
The voice in domain, refusal come from the sound source in other directions.
It is an advantage of the current invention that:
1st, the bright method of we can be used for intelligent appliance, smart home, vehicle-mounted and wearable device etc. needs far to say formula
Phonetic entry and the occasion of interaction, are particularly well-suited to acoustic noise and the interference environment occasion of complexity;
2nd, the method for the present invention can be selectively picked up under the conditions of hands-free (far-field hands-free) is far said
Echo signal, suppresses interference and noise.
Description of the drawings
Fig. 1 is the flow chart of the multi-channel speech enhancement method of the selective attention based on semantic priori of the present invention;
Fig. 2 is the flow chart that the utilization known direction of the present invention carries out extraterrestrial target Speech signal detection.
Specific embodiment
Target voice distinguishes over the feature of other sound to be had a lot, and this category feature will be made full use of to be detected, then need
Pay the utmost attention to the most and most reliable features of priori.For example, when speaker plays sound, with speaker sound
The related sound of sound is construed as echo interference;If the semanteme of target voice is known, then semanteme is exactly bright
Aobvious distinction feature;If the sound wave arrival direction of target voice (Direction of Arrival, DOA), it is known that
So can be used for removing a large amount of unrelated sound by detecting DOA information.By the detection to various distinction information
With compare, may finally suppress the impact of sound, and filter out target language segment from mixing sound.
Describe the present invention below in conjunction with the accompanying drawings.
As shown in figure 1, a kind of multi-channel speech enhancement method of the selective attention based on semantic priori, the side
Method includes:
Step 1) pickup of many microphone arrays comes from the voice signal of any direction in reverberant ambiance, gathers multichannel
Voice signal;
Step 2) to step 1) the multi-path voice signal that gathers carries out pretreatment;
If there is acoustic echo in voice signal, the multi-path voice signal to picking up carries out Echo Cancellation, suppression
Diffusion background noise and gain control;Otherwise, only it is diffused background noise to suppress and must to multi-path voice signal
The gain control wanted;
Step 3) it is specific sharp using whether there is in the pretreated voice signal of activation word speech recognition model inspection
Word living;If testing result is affirmative, retain the not cleaved signal comprising activation word section, into step 4);
Otherwise, proceed to step 1);
According to a large amount of activation word data or the data of certain speaker dependent of priori, it is related that training obtains speaker
Or the activation word speech recognition model that speaker is unrelated;Detected to activating word content using identification decoding policy
And confidence level is calculated, and so as to complete discriminant classification, speech recognition and keyword retrieval algorithm are combined, it is right to realize
The detection of activation word.
Step 4) speech enhan-cement is carried out to the not cleaved signal comprising activation word section;Specifically include:
Step 4-1) pass through Voice activity detector (VAD:Voice Activity Detection) word will be activated
Starting point and the detection of tail point, obtain complete multichannel activation word section;
Step 4-2) carried out point to activating word section using the multichannel phase difference sound localization method based on reverberation robust
Analysis;The DOA information of target sound source is obtained, that is, gets the target speaker direction for sending the certain semantic;Specifically
Including:
Step 4-2-1) activation word section is transformed to into time-frequency domain, on each frequency, the Coherent Part to signal respectively
It is tracked with incoherent part;
Step 4-2-2) count the time frequency point occupied by direct sound wave;
Step 4-2-3) in the time frequency point occupied by direct sound wave, step-out time is obtained in low frequency without spacial aliasing part
(TDOA:Time Difference Of Arrival) distribution;
Step 4-2-4) in HFS, according to the signal step-out time information that low frequency is obtained, remove spacial aliasing
Affect, obtain the TDOA of the signal of Whole frequency band, obtain then DOA information;
Step 4-2-5) according to DOA information, the voice of known direction is strengthened;Step 4-2-5) in
Enhanced mode is carried out to the voice of known direction two kinds:
First kind of way:According to DOA information, known direction voice is strengthened using Beamforming Method, pressed down
System comes from coherence's sound source in other directions;
In the present embodiment, the minimum variance using multichannel based on diagonal loading (Diagonal Loading) without
Distortion response Beamforming Method suppresses to come from coherence's sound source in other directions, in other embodiments, may be used also
The suppression of directional interference is realized with the blind source separate technology (Blind Source Separation) based on filial generation.
The second way:Extraterrestrial target Speech signal detection (DSAD) is carried out using the known direction, receives to come from
The voice of target area, refusal come from the sound source in other directions.
As shown in Fig. 2 by taking dual pathways DSAD as an example, utilizing beam reference energy ratio to each time frequency point
(Beam-to-Reference Ratio, BRR) and signal to noise ratio snr make decisions.Judgement threshold for BRR
Value, combines direct sound wave mixed phase acoustic energy ratio (Direct-to-Reverberate Ratio, DRR) follow-up mechanism,
The detection threshold value of each time frequency point is adjusted, so as to improve each time frequency point likelihood according to environment self-adaption
The accuracy of estimation, reduces the impact of high frequency aliasing using Sidelobe Suppression mechanism, improves then complete with the accurate of judgement
Property.
Step 4-3) further suppress remaining directivity noise and come from the diffusion of environment to make an uproar using multichannel post filtering
Sound and far say RMR room reverb under scene;Acquire enhancing voice.
Claims (7)
1. a kind of multi-channel speech enhancement method of the selective attention based on semantic priori, methods described include:It is many
Microphone array pickup comes from the voice signal of any direction in reverberant ambiance, and collection multi-path voice signal is gone forward side by side
Row pretreatment;Using specific activation word present in the pretreated voice signal of activation word speech recognition model inspection;
The not cleaved signal comprising activation word section is carried out processing and obtains complete activation word section;Using based on reverberation Shandong
The multichannel phase difference sound localization method of rod is analyzed to activating word section, obtains the sound wave arrival side of target sound source
To;The voice of the direction is strengthened, and is suppressed the noise in other directions and is far said RMR room reverb under scene,
Acquire the enhancing voice of target direction.
2. the multi-channel speech enhancement method of the selective attention based on semantic priori according to claim 1,
Characterized in that, the concrete grammar includes:
Step 1) pickup of many microphone arrays comes from the voice signal of any direction in reverberant ambiance, gathers multichannel
Voice signal;
Step 2) to step 1) the multi-path voice signal that gathers carries out pretreatment;
Step 3) it is specific sharp using whether there is in the pretreated voice signal of activation word speech recognition model inspection
Word living;If testing result is affirmative, retain the not cleaved signal comprising activation word section, into step 4);
Otherwise, proceed to step 1);
Step 4) Voice activity detector is carried out to the not cleaved signal comprising activation word section obtain complete activation
Word section;It is analyzed to activating word section using the multichannel phase difference sound localization method based on reverberation robust, is obtained
The sound wave arrival direction of target sound source;The voice of the direction is strengthened, and is suppressed remaining directivity noise and is come
From RMR room reverb under scene is said in the diffusion noise of environment and far, the enhancing voice of target direction is got.
3. the multi-channel speech enhancement method of the selective attention based on semantic priori according to claim 2,
Characterized in that, the step 2) detailed process be:If there is acoustic echo in multi-path voice signal, to picking up
The multi-path voice signal got carries out Echo Cancellation, suppresses diffusion background noise and gain control;Otherwise, it is only right
Multi-path voice signal is diffused background noise and suppresses and gain control.
4. the multi-channel speech enhancement method of the selective attention based on semantic priori according to claim 2,
Characterized in that, the step 3) in using in the activation pretreated voice signal of word speech recognition model inspection
With the presence or absence of the detailed process of specific activation word it is:According to a large amount of activation word data of priori or speaker dependent
Data, training obtain the activation word speech recognition model that speaker is related or speaker is unrelated;Using identification decoding
Strategy is detected and is calculated confidence level to activating word content, so as to complete discriminant classification, by speech recognition and key
Word and search algorithm combines, and realizes the detection to activating word.
5. the multi-channel speech enhancement method of the selective attention based on semantic priori according to claim 2,
Characterized in that, the step 4) specifically include:
Step 4-1) starting point and the detection of tail point of word will be activated by Voice activity detector, obtain complete multichannel
Activation word section;
Step 4-2) carried out point to activating word section using the multichannel phase difference sound localization method based on reverberation robust
Analysis;The sound wave arrival direction information of target sound source is obtained, that is, gets the target speaker side for sending the certain semantic
To;According to sound wave arrival direction information, the voice of the direction is strengthened;
Step 4-3) further suppress remaining directivity noise and come from the diffusion of environment to make an uproar using multichannel post filtering
Sound and RMR room reverb under scene is far said, acquire the enhancing voice of target direction.
6. the multi-channel speech enhancement method of the selective attention based on semantic priori according to claim 5,
Characterized in that, step 4-2) specifically include:
Step 4-2-1) activation word section is transformed to into time-frequency domain, on each frequency, the Coherent Part to signal respectively
It is tracked with incoherent part;
Step 4-2-2) count the time frequency point occupied by direct sound wave;
Step 4-2-3) in the time frequency point occupied by direct sound wave, signal arrival is obtained in low frequency without spacial aliasing part
The distribution of the time difference;
Step 4-2-4) in HFS, according to the signal step-out time information that low frequency is obtained, remove spacial aliasing
Affect, obtain the signal step-out time information of Whole frequency band;Then obtain sound wave arrival direction information;
Step 4-2-5) according to sound wave arrival direction information, the voice of the direction is strengthened.
7. the multi-channel speech enhancement method of the selective attention based on semantic priori according to claim 6,
Characterized in that, step 4-2-5) in enhanced mode carried out to voice have two kinds:
First kind of way:According to sound wave arrival direction information, known direction voice is carried out using Beamforming Method
Strengthen, suppression comes from coherence's sound source in other directions;
The second way:Extraterrestrial target Speech signal detection is carried out using the known direction, acceptance comes from target area
The voice in domain, refusal come from the sound source in other directions.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510574907.3A CN106531179B (en) | 2015-09-10 | 2015-09-10 | A kind of multi-channel speech enhancement method of the selective attention based on semantic priori |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510574907.3A CN106531179B (en) | 2015-09-10 | 2015-09-10 | A kind of multi-channel speech enhancement method of the selective attention based on semantic priori |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106531179A true CN106531179A (en) | 2017-03-22 |
CN106531179B CN106531179B (en) | 2019-08-20 |
Family
ID=58346225
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510574907.3A Active CN106531179B (en) | 2015-09-10 | 2015-09-10 | A kind of multi-channel speech enhancement method of the selective attention based on semantic priori |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106531179B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106960672A (en) * | 2017-03-30 | 2017-07-18 | 国家计算机网络与信息安全管理中心 | The bandwidth expanding method and device of a kind of stereo audio |
CN107146614A (en) * | 2017-04-10 | 2017-09-08 | 北京猎户星空科技有限公司 | A kind of audio signal processing method, device and electronic equipment |
CN107346661A (en) * | 2017-06-01 | 2017-11-14 | 李昕 | A kind of distant range iris tracking and acquisition method based on microphone array |
CN108122563A (en) * | 2017-12-19 | 2018-06-05 | 北京声智科技有限公司 | Improve voice wake-up rate and the method for correcting DOA |
CN108447483A (en) * | 2018-05-18 | 2018-08-24 | 深圳市亿道数码技术有限公司 | Speech recognition system |
CN108877827A (en) * | 2017-05-15 | 2018-11-23 | 福州瑞芯微电子股份有限公司 | Voice-enhanced interaction method and system, storage medium and electronic equipment |
CN110047494A (en) * | 2019-04-15 | 2019-07-23 | 北京小米智能科技有限公司 | Equipment response method, equipment and storage medium |
CN110164423A (en) * | 2018-08-06 | 2019-08-23 | 腾讯科技(深圳)有限公司 | A kind of method, equipment and the storage medium of orientation angular estimation |
CN110875045A (en) * | 2018-09-03 | 2020-03-10 | 阿里巴巴集团控股有限公司 | Voice recognition method, intelligent device and intelligent television |
CN110992977A (en) * | 2019-12-03 | 2020-04-10 | 北京声智科技有限公司 | Method and device for extracting target sound source |
CN111081234A (en) * | 2018-10-18 | 2020-04-28 | 珠海格力电器股份有限公司 | Voice acquisition method, device, equipment and storage medium |
CN112289335A (en) * | 2019-07-24 | 2021-01-29 | 阿里巴巴集团控股有限公司 | Voice signal processing method and device and pickup equipment |
CN113257251A (en) * | 2021-05-11 | 2021-08-13 | 深圳优地科技有限公司 | Robot user identification method, apparatus and storage medium |
CN113643714A (en) * | 2021-10-14 | 2021-11-12 | 阿里巴巴达摩院(杭州)科技有限公司 | Audio processing method, device, storage medium and computer program |
CN113823311A (en) * | 2021-08-19 | 2021-12-21 | 安徽创变信息科技有限公司 | Voice recognition method and device based on audio enhancement |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020116196A1 (en) * | 1998-11-12 | 2002-08-22 | Tran Bao Q. | Speech recognizer |
CN102819009A (en) * | 2012-08-10 | 2012-12-12 | 汽车零部件研究及发展中心有限公司 | Driver sound localization system and method for automobile |
CN204390737U (en) * | 2014-07-29 | 2015-06-10 | 科大讯飞股份有限公司 | A kind of home voice disposal system |
-
2015
- 2015-09-10 CN CN201510574907.3A patent/CN106531179B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020116196A1 (en) * | 1998-11-12 | 2002-08-22 | Tran Bao Q. | Speech recognizer |
CN102819009A (en) * | 2012-08-10 | 2012-12-12 | 汽车零部件研究及发展中心有限公司 | Driver sound localization system and method for automobile |
CN204390737U (en) * | 2014-07-29 | 2015-06-10 | 科大讯飞股份有限公司 | A kind of home voice disposal system |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106960672A (en) * | 2017-03-30 | 2017-07-18 | 国家计算机网络与信息安全管理中心 | The bandwidth expanding method and device of a kind of stereo audio |
CN107146614A (en) * | 2017-04-10 | 2017-09-08 | 北京猎户星空科技有限公司 | A kind of audio signal processing method, device and electronic equipment |
CN108877827B (en) * | 2017-05-15 | 2021-04-20 | 福州瑞芯微电子股份有限公司 | Voice-enhanced interaction method and system, storage medium and electronic equipment |
CN108877827A (en) * | 2017-05-15 | 2018-11-23 | 福州瑞芯微电子股份有限公司 | Voice-enhanced interaction method and system, storage medium and electronic equipment |
CN107346661A (en) * | 2017-06-01 | 2017-11-14 | 李昕 | A kind of distant range iris tracking and acquisition method based on microphone array |
CN107346661B (en) * | 2017-06-01 | 2020-06-12 | 伊沃人工智能技术(江苏)有限公司 | Microphone array-based remote iris tracking and collecting method |
CN108122563A (en) * | 2017-12-19 | 2018-06-05 | 北京声智科技有限公司 | Improve voice wake-up rate and the method for correcting DOA |
CN108447483A (en) * | 2018-05-18 | 2018-08-24 | 深圳市亿道数码技术有限公司 | Speech recognition system |
CN108447483B (en) * | 2018-05-18 | 2023-11-21 | 深圳市亿道数码技术有限公司 | speech recognition system |
WO2020029882A1 (en) * | 2018-08-06 | 2020-02-13 | 腾讯科技(深圳)有限公司 | Azimuth estimation method, device, and storage medium |
CN110164423A (en) * | 2018-08-06 | 2019-08-23 | 腾讯科技(深圳)有限公司 | A kind of method, equipment and the storage medium of orientation angular estimation |
US11908456B2 (en) | 2018-08-06 | 2024-02-20 | Tencent Technology (Shenzhen) Company Limited | Azimuth estimation method, device, and storage medium |
CN110164423B (en) * | 2018-08-06 | 2023-01-20 | 腾讯科技(深圳)有限公司 | Azimuth angle estimation method, azimuth angle estimation equipment and storage medium |
EP3836136A4 (en) * | 2018-08-06 | 2021-09-08 | Tencent Technology (Shenzhen) Company Limited | Azimuth estimation method, device, and storage medium |
CN110875045A (en) * | 2018-09-03 | 2020-03-10 | 阿里巴巴集团控股有限公司 | Voice recognition method, intelligent device and intelligent television |
WO2020048431A1 (en) * | 2018-09-03 | 2020-03-12 | 阿里巴巴集团控股有限公司 | Voice processing method, electronic device and display device |
CN111081234A (en) * | 2018-10-18 | 2020-04-28 | 珠海格力电器股份有限公司 | Voice acquisition method, device, equipment and storage medium |
CN110047494A (en) * | 2019-04-15 | 2019-07-23 | 北京小米智能科技有限公司 | Equipment response method, equipment and storage medium |
CN112289335A (en) * | 2019-07-24 | 2021-01-29 | 阿里巴巴集团控股有限公司 | Voice signal processing method and device and pickup equipment |
CN110992977B (en) * | 2019-12-03 | 2021-06-22 | 北京声智科技有限公司 | Method and device for extracting target sound source |
CN110992977A (en) * | 2019-12-03 | 2020-04-10 | 北京声智科技有限公司 | Method and device for extracting target sound source |
CN113257251A (en) * | 2021-05-11 | 2021-08-13 | 深圳优地科技有限公司 | Robot user identification method, apparatus and storage medium |
CN113823311A (en) * | 2021-08-19 | 2021-12-21 | 安徽创变信息科技有限公司 | Voice recognition method and device based on audio enhancement |
CN113823311B (en) * | 2021-08-19 | 2023-11-21 | 广州市盛为电子有限公司 | Voice recognition method and device based on audio enhancement |
CN113643714A (en) * | 2021-10-14 | 2021-11-12 | 阿里巴巴达摩院(杭州)科技有限公司 | Audio processing method, device, storage medium and computer program |
CN113643714B (en) * | 2021-10-14 | 2022-02-18 | 阿里巴巴达摩院(杭州)科技有限公司 | Audio processing method, device, storage medium and computer program |
Also Published As
Publication number | Publication date |
---|---|
CN106531179B (en) | 2019-08-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106531179A (en) | Multi-channel speech enhancement method based on semantic prior selective attention | |
CN110556103B (en) | Audio signal processing method, device, system, equipment and storage medium | |
CN110503970B (en) | Audio data processing method and device and storage medium | |
US11158333B2 (en) | Multi-stream target-speech detection and channel fusion | |
CN106782563B (en) | Smart home voice interaction system | |
EP3360250B1 (en) | A sound signal processing apparatus and method for enhancing a sound signal | |
CN102164328B (en) | Audio input system used in home environment based on microphone array | |
CN101828407B (en) | Based on the microphone array processor of spatial analysis | |
CN108962272A (en) | Sound pick-up method and system | |
CN108122563A (en) | Improve voice wake-up rate and the method for correcting DOA | |
US10957338B2 (en) | 360-degree multi-source location detection, tracking and enhancement | |
Brutti et al. | Multiple source localization based on acoustic map de-emphasis | |
US11264017B2 (en) | Robust speaker localization in presence of strong noise interference systems and methods | |
CN110610718B (en) | Method and device for extracting expected sound source voice signal | |
CN110875056B (en) | Speech transcription device, system, method and electronic device | |
CN107889001B (en) | Expandable microphone array and establishing method thereof | |
CN106328130A (en) | Robot voice addressed rotation system and method | |
CN106992010A (en) | Without the microphone array speech enhancement device under the conditions of direct sound wave | |
CN110120217A (en) | A kind of audio data processing method and device | |
CN110992967A (en) | Voice signal processing method and device, hearing aid and storage medium | |
WO2020118290A1 (en) | System and method for acoustic localization of multiple sources using spatial pre-filtering | |
WO2013132216A1 (en) | Method and apparatus for determining the number of sound sources in a targeted space | |
CN113223544A (en) | Audio direction positioning detection device and method and audio processing system | |
CN116343808A (en) | Flexible microphone array voice enhancement method and device, electronic equipment and medium | |
Lee et al. | Space-time voice activity detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |