CN106531179A - Multi-channel speech enhancement method based on semantic prior selective attention - Google Patents

Multi-channel speech enhancement method based on semantic prior selective attention Download PDF

Info

Publication number
CN106531179A
CN106531179A CN201510574907.3A CN201510574907A CN106531179A CN 106531179 A CN106531179 A CN 106531179A CN 201510574907 A CN201510574907 A CN 201510574907A CN 106531179 A CN106531179 A CN 106531179A
Authority
CN
China
Prior art keywords
voice
signal
activation word
target
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510574907.3A
Other languages
Chinese (zh)
Other versions
CN106531179B (en
Inventor
付强
王晓飞
国雁萌
颜永红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Original Assignee
Institute of Acoustics CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS filed Critical Institute of Acoustics CAS
Priority to CN201510574907.3A priority Critical patent/CN106531179B/en
Publication of CN106531179A publication Critical patent/CN106531179A/en
Application granted granted Critical
Publication of CN106531179B publication Critical patent/CN106531179B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Circuit For Audible Band Transducer (AREA)
  • Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)

Abstract

The invention provides a multi-channel speech enhancement method based on semantic prior selective attention. The method comprises the following steps: picking speech signals from any directions in a reverberant environment by virtue of a multi-microphone array, collecting the multiple paths of speech signals and pre-processing the speech signals; detecting special activation words in the pre-processed speech signals by virtue of an activation word speech recognition model; processing signals which are not cut and include activation word segments, so as to obtain a complete activation word segment; analyzing the activation word segment by virtue of a multi-channel phase difference sound localization method based on reverberation robust, so as to obtain an acoustic wave reaching direction of a target sound source; and enhancing speech in the direction and inhibiting noise from other directions and room reverberation in a remote speak scene, so that enhanced speech in the target direction is obtained. The method provided by the invention is applicable to such occasions as intelligent household electrical appliance, smart home, vehicle-mounted and wearable devices and the like that remote speak type speech input and interaction are required, and the method is especially applicable to complex acoustic noise and interference environment occasions.

Description

A kind of multi-channel speech enhancement method of the selective attention based on semantic priori
Technical field
The present invention relates to speech processes field, the multichannel of more particularly to a kind of selective attention based on semantic priori Sound enhancement method.
Background technology
With voice communication and the continuous popularization of man-machine voice interaction system, people increasingly expect to cast aside microphone and ear The loaded down with trivial details equipment such as machine, realizes man machine language's exchange of similar human conversation general nature.However, voice is a kind of Sound wave, can be subject to the multiple anti-of various impacts, the decay of such as sound wave, wall and barrier when transmitting in atmosphere Penetrate (reverberation), simultaneous other sound sources and environment noise etc..At multiple voice systems and multiple speakers When same environment, how to guarantee that system is properly received voice messaging, further determined voice system and can move towards practical. Speech enhan-cement is the effective means for extracting targeted voice signal in a kind of complicated noise, is divided into single-channel voice Strengthen and multicenter voice strengthens.
Single-channel voice is strengthened the main difference being distributed in time-frequency domain using voice and noise and realizes that noise is eliminated.It is single Enhanced two key problems of channel speech are Noise Estimation and a priori SNR estimation;The former is the pass for reducing noise Key factor, and the latter is then related to the degree of residual " music noise ".Single channel strengthens algorithm and under many circumstances can Signal to noise ratio is enough significantly improved, especially has preferable eradicating efficacy to stationary noise (white noise, car are made an uproar).
Multicenter voice strengthens the ability that make use of microphone array to pick up spatial information, can be with reference to time domain, frequency domain And spatial information, obtain the receiving ability with space distinction.Generally, multicenter voice strengthens needs priori Arrival bearing angle information, so as to form reliable steering vector, using vacant filtering theory, to from non-targeted The back drop in direction is suppressed, and for single-channel voice strengthens, multicenter voice enhancing possesses preferably The ability of noise suppressed.
Why human auditory can process many sound sources and have a problem of reverberation, in addition can also detect when many people speak with Oneself voice interested is tracked, main cause is that human auditory has specific Selective attention ability.As the mankind couple When certain target sound is interested, can be according to specific tasks and environment, choosing target voice is most had with ambient sound The feature of distinction, and compared according to priori and screened, exclusive PCR sound simultaneously obtains target voice.
For voice application, noise that may be present or interference in the actual scene such as daily household, vehicle-mounted and outdoor It is many.And existing speech enhan-cement or separation method, pickup undistorted to target voice is all extremely difficult to, And while eliminate or suppress the purpose of non-targeted signal, particularly presence, reverberation are larger in multiple coherent sound sources simultaneously In the case of low signal-to-noise ratio.
Speech enhan-cement based on multichannel (microphone array) receives the amplitude and phase place of signal using multiple microphones Difference, can form spatial selectivity to the signal of target direction so that beam shaping (Beamforming, BM), Spatial activity detection (Directive speech activity detection, DSAD) algorithm points to target direction, So as to suppress or refuse the interference signal in non-targeted direction.But the direction of arrival (DOA) of target sound source still cannot Know in advance.In the case where simple sund source is assumed, mesh can be determined with sound localization (Source Location, SL) technology The DOA of mark sound source, but in actual application environment, it is this to assume to be difficult to meet.In most cases, can simultaneously There is multi-acoustical, and number is unknown.There iing the reverberation field of room reflections, situation can be more complicated, causes target sound The noise in source is excessive.
The content of the invention
It is an object of the invention to overcome the drawbacks described above that current multi-channel speech enhancement method is present, will be based on semantic Identification of sound source and combined based on the sound localization technology of signal processing, merge microphone array " space filtering " A kind of characteristic, it is proposed that multi-channel speech enhancement method of the selective attention based on semantic priori, can effective gram Take noise and interference.
To achieve these goals, the invention provides a kind of multichannel language of the selective attention based on semantic priori Sound Enhancement Method, methods described include:Many microphone array pickups come from the language of any direction in reverberant ambiance Message number, gathers multi-path voice signal and carries out pretreatment;After activation word speech recognition model inspection pretreatment Voice signal present in specific activation word;Process is carried out to the not cleaved signal comprising activation word section to obtain Complete activation word section;Carried out to activating word section using the multichannel phase difference sound localization method based on reverberation robust Process, obtain the sound wave arrival direction of target sound source;The voice of the direction is strengthened, and suppresses other directions Noise and far say RMR room reverb under scene, acquire the enhancing voice of target direction.
In above-mentioned technical proposal, the concrete grammar includes:
Step 1) pickup of many microphone arrays comes from the voice signal of any direction in reverberant ambiance, gathers multichannel Voice signal;
Step 2) to step 1) the multi-path voice signal that gathers carries out pretreatment;
Step 3) it is specific sharp using whether there is in the pretreated voice signal of activation word speech recognition model inspection Word living;If testing result is affirmative, retain the not cleaved signal comprising activation word section, into step 4); Otherwise, proceed to step 1);
Step 4) Voice activity detector is carried out to the not cleaved signal comprising activation word section obtain complete activation Word section;It is analyzed to activating word section using the multichannel phase difference sound localization method based on reverberation robust, is obtained The sound wave arrival direction of target sound source;The voice of the direction is strengthened, and is suppressed remaining directivity noise and is come From RMR room reverb under scene is said in the diffusion noise of environment and far, the enhancing voice of target direction is got.
In above-mentioned technical proposal, the step 2) detailed process be:If there is acoustics to return in multi-path voice signal Ripple, the multi-path voice signal to picking up carry out Echo Cancellation, suppress diffusion background noise and gain control;It is no Then, only it is diffused background noise to suppress and gain control to multi-path voice signal.
In above-mentioned technical proposal, the step 3) in using the pretreated language of activation word speech recognition model inspection In message number with the presence or absence of the specific detailed process for activating word it is:According to a large amount of activation word data or specific of priori The data of speaker, training obtain the activation word speech recognition model that speaker is related or speaker is unrelated;Using Identification decoding policy is detected and is calculated confidence level to activating word content, so as to complete discriminant classification, voice is known Do not combine with keyword retrieval algorithm, realize the detection to activating word.
In above-mentioned technical proposal, the step 4) specifically include:
Step 4-1) starting point and the detection of tail point of word will be activated by Voice activity detector, obtain complete multichannel Activation word section;
Step 4-2) carried out point to activating word section using the multichannel phase difference sound localization method based on reverberation robust Analysis;The sound wave arrival direction information of target sound source is obtained, that is, gets the target speaker side for sending the certain semantic To;According to sound wave arrival direction information, the voice of the direction is strengthened;
Step 4-3) further suppress remaining directivity noise and come from the diffusion of environment to make an uproar using multichannel post filtering Sound and RMR room reverb under scene is far said, acquire the enhancing voice of target direction.
In above-mentioned technical proposal, step 4-2) specifically include:
Step 4-2-1) activation word section is transformed to into time-frequency domain, on each frequency, the Coherent Part to signal respectively It is tracked with incoherent part;
Step 4-2-2) count the time frequency point occupied by direct sound wave;
Step 4-2-3) in the time frequency point occupied by direct sound wave, signal arrival is obtained in low frequency without spacial aliasing part The distribution of the time difference;
Step 4-2-4) in HFS, according to the signal step-out time information that low frequency is obtained, remove spacial aliasing Affect, obtain the signal step-out time information of Whole frequency band;Then obtain sound wave arrival direction information;
Step 4-2-5) according to sound wave arrival direction information, the voice of the direction is strengthened.
In above-mentioned technical proposal, step 4-2-5) in enhanced mode carried out to voice have two kinds:
First kind of way:According to sound wave arrival direction information, known direction voice is carried out using Beamforming Method Strengthen, suppression comes from coherence's sound source in other directions;
The second way:Extraterrestrial target Speech signal detection is carried out using the known direction, acceptance comes from target area The voice in domain, refusal come from the sound source in other directions.
It is an advantage of the current invention that:
1st, the bright method of we can be used for intelligent appliance, smart home, vehicle-mounted and wearable device etc. needs far to say formula Phonetic entry and the occasion of interaction, are particularly well-suited to acoustic noise and the interference environment occasion of complexity;
2nd, the method for the present invention can be selectively picked up under the conditions of hands-free (far-field hands-free) is far said Echo signal, suppresses interference and noise.
Description of the drawings
Fig. 1 is the flow chart of the multi-channel speech enhancement method of the selective attention based on semantic priori of the present invention;
Fig. 2 is the flow chart that the utilization known direction of the present invention carries out extraterrestrial target Speech signal detection.
Specific embodiment
Target voice distinguishes over the feature of other sound to be had a lot, and this category feature will be made full use of to be detected, then need Pay the utmost attention to the most and most reliable features of priori.For example, when speaker plays sound, with speaker sound The related sound of sound is construed as echo interference;If the semanteme of target voice is known, then semanteme is exactly bright Aobvious distinction feature;If the sound wave arrival direction of target voice (Direction of Arrival, DOA), it is known that So can be used for removing a large amount of unrelated sound by detecting DOA information.By the detection to various distinction information With compare, may finally suppress the impact of sound, and filter out target language segment from mixing sound.
Describe the present invention below in conjunction with the accompanying drawings.
As shown in figure 1, a kind of multi-channel speech enhancement method of the selective attention based on semantic priori, the side Method includes:
Step 1) pickup of many microphone arrays comes from the voice signal of any direction in reverberant ambiance, gathers multichannel Voice signal;
Step 2) to step 1) the multi-path voice signal that gathers carries out pretreatment;
If there is acoustic echo in voice signal, the multi-path voice signal to picking up carries out Echo Cancellation, suppression Diffusion background noise and gain control;Otherwise, only it is diffused background noise to suppress and must to multi-path voice signal The gain control wanted;
Step 3) it is specific sharp using whether there is in the pretreated voice signal of activation word speech recognition model inspection Word living;If testing result is affirmative, retain the not cleaved signal comprising activation word section, into step 4); Otherwise, proceed to step 1);
According to a large amount of activation word data or the data of certain speaker dependent of priori, it is related that training obtains speaker Or the activation word speech recognition model that speaker is unrelated;Detected to activating word content using identification decoding policy And confidence level is calculated, and so as to complete discriminant classification, speech recognition and keyword retrieval algorithm are combined, it is right to realize The detection of activation word.
Step 4) speech enhan-cement is carried out to the not cleaved signal comprising activation word section;Specifically include:
Step 4-1) pass through Voice activity detector (VAD:Voice Activity Detection) word will be activated Starting point and the detection of tail point, obtain complete multichannel activation word section;
Step 4-2) carried out point to activating word section using the multichannel phase difference sound localization method based on reverberation robust Analysis;The DOA information of target sound source is obtained, that is, gets the target speaker direction for sending the certain semantic;Specifically Including:
Step 4-2-1) activation word section is transformed to into time-frequency domain, on each frequency, the Coherent Part to signal respectively It is tracked with incoherent part;
Step 4-2-2) count the time frequency point occupied by direct sound wave;
Step 4-2-3) in the time frequency point occupied by direct sound wave, step-out time is obtained in low frequency without spacial aliasing part (TDOA:Time Difference Of Arrival) distribution;
Step 4-2-4) in HFS, according to the signal step-out time information that low frequency is obtained, remove spacial aliasing Affect, obtain the TDOA of the signal of Whole frequency band, obtain then DOA information;
Step 4-2-5) according to DOA information, the voice of known direction is strengthened;Step 4-2-5) in Enhanced mode is carried out to the voice of known direction two kinds:
First kind of way:According to DOA information, known direction voice is strengthened using Beamforming Method, pressed down System comes from coherence's sound source in other directions;
In the present embodiment, the minimum variance using multichannel based on diagonal loading (Diagonal Loading) without Distortion response Beamforming Method suppresses to come from coherence's sound source in other directions, in other embodiments, may be used also The suppression of directional interference is realized with the blind source separate technology (Blind Source Separation) based on filial generation.
The second way:Extraterrestrial target Speech signal detection (DSAD) is carried out using the known direction, receives to come from The voice of target area, refusal come from the sound source in other directions.
As shown in Fig. 2 by taking dual pathways DSAD as an example, utilizing beam reference energy ratio to each time frequency point (Beam-to-Reference Ratio, BRR) and signal to noise ratio snr make decisions.Judgement threshold for BRR Value, combines direct sound wave mixed phase acoustic energy ratio (Direct-to-Reverberate Ratio, DRR) follow-up mechanism, The detection threshold value of each time frequency point is adjusted, so as to improve each time frequency point likelihood according to environment self-adaption The accuracy of estimation, reduces the impact of high frequency aliasing using Sidelobe Suppression mechanism, improves then complete with the accurate of judgement Property.
Step 4-3) further suppress remaining directivity noise and come from the diffusion of environment to make an uproar using multichannel post filtering Sound and far say RMR room reverb under scene;Acquire enhancing voice.

Claims (7)

1. a kind of multi-channel speech enhancement method of the selective attention based on semantic priori, methods described include:It is many Microphone array pickup comes from the voice signal of any direction in reverberant ambiance, and collection multi-path voice signal is gone forward side by side Row pretreatment;Using specific activation word present in the pretreated voice signal of activation word speech recognition model inspection; The not cleaved signal comprising activation word section is carried out processing and obtains complete activation word section;Using based on reverberation Shandong The multichannel phase difference sound localization method of rod is analyzed to activating word section, obtains the sound wave arrival side of target sound source To;The voice of the direction is strengthened, and is suppressed the noise in other directions and is far said RMR room reverb under scene, Acquire the enhancing voice of target direction.
2. the multi-channel speech enhancement method of the selective attention based on semantic priori according to claim 1, Characterized in that, the concrete grammar includes:
Step 1) pickup of many microphone arrays comes from the voice signal of any direction in reverberant ambiance, gathers multichannel Voice signal;
Step 2) to step 1) the multi-path voice signal that gathers carries out pretreatment;
Step 3) it is specific sharp using whether there is in the pretreated voice signal of activation word speech recognition model inspection Word living;If testing result is affirmative, retain the not cleaved signal comprising activation word section, into step 4); Otherwise, proceed to step 1);
Step 4) Voice activity detector is carried out to the not cleaved signal comprising activation word section obtain complete activation Word section;It is analyzed to activating word section using the multichannel phase difference sound localization method based on reverberation robust, is obtained The sound wave arrival direction of target sound source;The voice of the direction is strengthened, and is suppressed remaining directivity noise and is come From RMR room reverb under scene is said in the diffusion noise of environment and far, the enhancing voice of target direction is got.
3. the multi-channel speech enhancement method of the selective attention based on semantic priori according to claim 2, Characterized in that, the step 2) detailed process be:If there is acoustic echo in multi-path voice signal, to picking up The multi-path voice signal got carries out Echo Cancellation, suppresses diffusion background noise and gain control;Otherwise, it is only right Multi-path voice signal is diffused background noise and suppresses and gain control.
4. the multi-channel speech enhancement method of the selective attention based on semantic priori according to claim 2, Characterized in that, the step 3) in using in the activation pretreated voice signal of word speech recognition model inspection With the presence or absence of the detailed process of specific activation word it is:According to a large amount of activation word data of priori or speaker dependent Data, training obtain the activation word speech recognition model that speaker is related or speaker is unrelated;Using identification decoding Strategy is detected and is calculated confidence level to activating word content, so as to complete discriminant classification, by speech recognition and key Word and search algorithm combines, and realizes the detection to activating word.
5. the multi-channel speech enhancement method of the selective attention based on semantic priori according to claim 2, Characterized in that, the step 4) specifically include:
Step 4-1) starting point and the detection of tail point of word will be activated by Voice activity detector, obtain complete multichannel Activation word section;
Step 4-2) carried out point to activating word section using the multichannel phase difference sound localization method based on reverberation robust Analysis;The sound wave arrival direction information of target sound source is obtained, that is, gets the target speaker side for sending the certain semantic To;According to sound wave arrival direction information, the voice of the direction is strengthened;
Step 4-3) further suppress remaining directivity noise and come from the diffusion of environment to make an uproar using multichannel post filtering Sound and RMR room reverb under scene is far said, acquire the enhancing voice of target direction.
6. the multi-channel speech enhancement method of the selective attention based on semantic priori according to claim 5, Characterized in that, step 4-2) specifically include:
Step 4-2-1) activation word section is transformed to into time-frequency domain, on each frequency, the Coherent Part to signal respectively It is tracked with incoherent part;
Step 4-2-2) count the time frequency point occupied by direct sound wave;
Step 4-2-3) in the time frequency point occupied by direct sound wave, signal arrival is obtained in low frequency without spacial aliasing part The distribution of the time difference;
Step 4-2-4) in HFS, according to the signal step-out time information that low frequency is obtained, remove spacial aliasing Affect, obtain the signal step-out time information of Whole frequency band;Then obtain sound wave arrival direction information;
Step 4-2-5) according to sound wave arrival direction information, the voice of the direction is strengthened.
7. the multi-channel speech enhancement method of the selective attention based on semantic priori according to claim 6, Characterized in that, step 4-2-5) in enhanced mode carried out to voice have two kinds:
First kind of way:According to sound wave arrival direction information, known direction voice is carried out using Beamforming Method Strengthen, suppression comes from coherence's sound source in other directions;
The second way:Extraterrestrial target Speech signal detection is carried out using the known direction, acceptance comes from target area The voice in domain, refusal come from the sound source in other directions.
CN201510574907.3A 2015-09-10 2015-09-10 A kind of multi-channel speech enhancement method of the selective attention based on semantic priori Active CN106531179B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510574907.3A CN106531179B (en) 2015-09-10 2015-09-10 A kind of multi-channel speech enhancement method of the selective attention based on semantic priori

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510574907.3A CN106531179B (en) 2015-09-10 2015-09-10 A kind of multi-channel speech enhancement method of the selective attention based on semantic priori

Publications (2)

Publication Number Publication Date
CN106531179A true CN106531179A (en) 2017-03-22
CN106531179B CN106531179B (en) 2019-08-20

Family

ID=58346225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510574907.3A Active CN106531179B (en) 2015-09-10 2015-09-10 A kind of multi-channel speech enhancement method of the selective attention based on semantic priori

Country Status (1)

Country Link
CN (1) CN106531179B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106960672A (en) * 2017-03-30 2017-07-18 国家计算机网络与信息安全管理中心 The bandwidth expanding method and device of a kind of stereo audio
CN107146614A (en) * 2017-04-10 2017-09-08 北京猎户星空科技有限公司 A kind of audio signal processing method, device and electronic equipment
CN107346661A (en) * 2017-06-01 2017-11-14 李昕 A kind of distant range iris tracking and acquisition method based on microphone array
CN108122563A (en) * 2017-12-19 2018-06-05 北京声智科技有限公司 Improve voice wake-up rate and the method for correcting DOA
CN108447483A (en) * 2018-05-18 2018-08-24 深圳市亿道数码技术有限公司 Speech recognition system
CN108877827A (en) * 2017-05-15 2018-11-23 福州瑞芯微电子股份有限公司 Voice-enhanced interaction method and system, storage medium and electronic equipment
CN110047494A (en) * 2019-04-15 2019-07-23 北京小米智能科技有限公司 Equipment response method, equipment and storage medium
CN110164423A (en) * 2018-08-06 2019-08-23 腾讯科技(深圳)有限公司 A kind of method, equipment and the storage medium of orientation angular estimation
CN110875045A (en) * 2018-09-03 2020-03-10 阿里巴巴集团控股有限公司 Voice recognition method, intelligent device and intelligent television
CN110992977A (en) * 2019-12-03 2020-04-10 北京声智科技有限公司 Method and device for extracting target sound source
CN111081234A (en) * 2018-10-18 2020-04-28 珠海格力电器股份有限公司 Voice acquisition method, device, equipment and storage medium
CN112289335A (en) * 2019-07-24 2021-01-29 阿里巴巴集团控股有限公司 Voice signal processing method and device and pickup equipment
CN113257251A (en) * 2021-05-11 2021-08-13 深圳优地科技有限公司 Robot user identification method, apparatus and storage medium
CN113643714A (en) * 2021-10-14 2021-11-12 阿里巴巴达摩院(杭州)科技有限公司 Audio processing method, device, storage medium and computer program
CN113823311A (en) * 2021-08-19 2021-12-21 安徽创变信息科技有限公司 Voice recognition method and device based on audio enhancement

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020116196A1 (en) * 1998-11-12 2002-08-22 Tran Bao Q. Speech recognizer
CN102819009A (en) * 2012-08-10 2012-12-12 汽车零部件研究及发展中心有限公司 Driver sound localization system and method for automobile
CN204390737U (en) * 2014-07-29 2015-06-10 科大讯飞股份有限公司 A kind of home voice disposal system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020116196A1 (en) * 1998-11-12 2002-08-22 Tran Bao Q. Speech recognizer
CN102819009A (en) * 2012-08-10 2012-12-12 汽车零部件研究及发展中心有限公司 Driver sound localization system and method for automobile
CN204390737U (en) * 2014-07-29 2015-06-10 科大讯飞股份有限公司 A kind of home voice disposal system

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106960672A (en) * 2017-03-30 2017-07-18 国家计算机网络与信息安全管理中心 The bandwidth expanding method and device of a kind of stereo audio
CN107146614A (en) * 2017-04-10 2017-09-08 北京猎户星空科技有限公司 A kind of audio signal processing method, device and electronic equipment
CN108877827B (en) * 2017-05-15 2021-04-20 福州瑞芯微电子股份有限公司 Voice-enhanced interaction method and system, storage medium and electronic equipment
CN108877827A (en) * 2017-05-15 2018-11-23 福州瑞芯微电子股份有限公司 Voice-enhanced interaction method and system, storage medium and electronic equipment
CN107346661A (en) * 2017-06-01 2017-11-14 李昕 A kind of distant range iris tracking and acquisition method based on microphone array
CN107346661B (en) * 2017-06-01 2020-06-12 伊沃人工智能技术(江苏)有限公司 Microphone array-based remote iris tracking and collecting method
CN108122563A (en) * 2017-12-19 2018-06-05 北京声智科技有限公司 Improve voice wake-up rate and the method for correcting DOA
CN108447483A (en) * 2018-05-18 2018-08-24 深圳市亿道数码技术有限公司 Speech recognition system
CN108447483B (en) * 2018-05-18 2023-11-21 深圳市亿道数码技术有限公司 speech recognition system
WO2020029882A1 (en) * 2018-08-06 2020-02-13 腾讯科技(深圳)有限公司 Azimuth estimation method, device, and storage medium
CN110164423A (en) * 2018-08-06 2019-08-23 腾讯科技(深圳)有限公司 A kind of method, equipment and the storage medium of orientation angular estimation
US11908456B2 (en) 2018-08-06 2024-02-20 Tencent Technology (Shenzhen) Company Limited Azimuth estimation method, device, and storage medium
CN110164423B (en) * 2018-08-06 2023-01-20 腾讯科技(深圳)有限公司 Azimuth angle estimation method, azimuth angle estimation equipment and storage medium
EP3836136A4 (en) * 2018-08-06 2021-09-08 Tencent Technology (Shenzhen) Company Limited Azimuth estimation method, device, and storage medium
CN110875045A (en) * 2018-09-03 2020-03-10 阿里巴巴集团控股有限公司 Voice recognition method, intelligent device and intelligent television
WO2020048431A1 (en) * 2018-09-03 2020-03-12 阿里巴巴集团控股有限公司 Voice processing method, electronic device and display device
CN111081234A (en) * 2018-10-18 2020-04-28 珠海格力电器股份有限公司 Voice acquisition method, device, equipment and storage medium
CN110047494A (en) * 2019-04-15 2019-07-23 北京小米智能科技有限公司 Equipment response method, equipment and storage medium
CN112289335A (en) * 2019-07-24 2021-01-29 阿里巴巴集团控股有限公司 Voice signal processing method and device and pickup equipment
CN110992977B (en) * 2019-12-03 2021-06-22 北京声智科技有限公司 Method and device for extracting target sound source
CN110992977A (en) * 2019-12-03 2020-04-10 北京声智科技有限公司 Method and device for extracting target sound source
CN113257251A (en) * 2021-05-11 2021-08-13 深圳优地科技有限公司 Robot user identification method, apparatus and storage medium
CN113823311A (en) * 2021-08-19 2021-12-21 安徽创变信息科技有限公司 Voice recognition method and device based on audio enhancement
CN113823311B (en) * 2021-08-19 2023-11-21 广州市盛为电子有限公司 Voice recognition method and device based on audio enhancement
CN113643714A (en) * 2021-10-14 2021-11-12 阿里巴巴达摩院(杭州)科技有限公司 Audio processing method, device, storage medium and computer program
CN113643714B (en) * 2021-10-14 2022-02-18 阿里巴巴达摩院(杭州)科技有限公司 Audio processing method, device, storage medium and computer program

Also Published As

Publication number Publication date
CN106531179B (en) 2019-08-20

Similar Documents

Publication Publication Date Title
CN106531179A (en) Multi-channel speech enhancement method based on semantic prior selective attention
CN110556103B (en) Audio signal processing method, device, system, equipment and storage medium
CN110503970B (en) Audio data processing method and device and storage medium
US11158333B2 (en) Multi-stream target-speech detection and channel fusion
CN106782563B (en) Smart home voice interaction system
EP3360250B1 (en) A sound signal processing apparatus and method for enhancing a sound signal
CN102164328B (en) Audio input system used in home environment based on microphone array
CN101828407B (en) Based on the microphone array processor of spatial analysis
CN108962272A (en) Sound pick-up method and system
CN108122563A (en) Improve voice wake-up rate and the method for correcting DOA
US10957338B2 (en) 360-degree multi-source location detection, tracking and enhancement
Brutti et al. Multiple source localization based on acoustic map de-emphasis
US11264017B2 (en) Robust speaker localization in presence of strong noise interference systems and methods
CN110610718B (en) Method and device for extracting expected sound source voice signal
CN110875056B (en) Speech transcription device, system, method and electronic device
CN107889001B (en) Expandable microphone array and establishing method thereof
CN106328130A (en) Robot voice addressed rotation system and method
CN106992010A (en) Without the microphone array speech enhancement device under the conditions of direct sound wave
CN110120217A (en) A kind of audio data processing method and device
CN110992967A (en) Voice signal processing method and device, hearing aid and storage medium
WO2020118290A1 (en) System and method for acoustic localization of multiple sources using spatial pre-filtering
WO2013132216A1 (en) Method and apparatus for determining the number of sound sources in a targeted space
CN113223544A (en) Audio direction positioning detection device and method and audio processing system
CN116343808A (en) Flexible microphone array voice enhancement method and device, electronic equipment and medium
Lee et al. Space-time voice activity detection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant