CN111653272A - Vehicle-mounted voice enhancement algorithm based on deep belief network - Google Patents

Vehicle-mounted voice enhancement algorithm based on deep belief network Download PDF

Info

Publication number
CN111653272A
CN111653272A CN202010484415.6A CN202010484415A CN111653272A CN 111653272 A CN111653272 A CN 111653272A CN 202010484415 A CN202010484415 A CN 202010484415A CN 111653272 A CN111653272 A CN 111653272A
Authority
CN
China
Prior art keywords
vehicle
belief network
deep belief
layer
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010484415.6A
Other languages
Chinese (zh)
Inventor
周伟
钱龙
施建阳
张英鹏
李鹏华
董莉娜
计超
易军
郑福建
汪彦
郭鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Science and Technology
Original Assignee
Chongqing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Science and Technology filed Critical Chongqing University of Science and Technology
Priority to CN202010484415.6A priority Critical patent/CN111653272A/en
Publication of CN111653272A publication Critical patent/CN111653272A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/16Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/175Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
    • G10K11/178Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
    • G10K11/1785Methods, e.g. algorithms; Devices
    • G10K11/17853Methods, e.g. algorithms; Devices of the filter
    • G10K11/17854Methods, e.g. algorithms; Devices of the filter the filter being an adaptive filter
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention provides a vehicle-mounted voice enhancement algorithm based on a deep belief network, which comprises the following steps: step 1: dividing the vehicle-mounted voice signal into a training sample signal and a testing sample signal; step 2: optimizing the learning rate, the initial weight and the number of hidden nodes of the DBN by adopting a QPSO algorithm; and step 3: replacing the sigmoid function with a hyperbolic tangent (tanh) activation function to optimize the deep belief network model; and 4, step 4: carrying out greedy unsupervised learning layer by layer on the optimized deep belief network to obtain an abstract voice characteristic vector of the input vehicle-mounted voice signal; and 5: and inputting the abstract voice signal into a minimum mean square error algorithm to obtain a voice enhancement signal. The invention combines the deep belief network and the traditional least mean square error algorithm to carry out voice enhancement, not only utilizes the strong learning capability and the feature extraction capability of the deep belief network, but also combines the high efficiency of the traditional voice enhancement algorithm.

Description

Vehicle-mounted voice enhancement algorithm based on deep belief network
Technical Field
The invention relates to a voice enhancement technology, in particular to a vehicle-mounted voice enhancement algorithm based on a deep belief network.
Background
In recent years, with the rapid development of economy, the consumption level of people is increasingly improved, and an automobile becomes a main walking tool for people to go out. Statistics show that the number of motor vehicles in the whole country in the last half of 2019 is 3.4 hundred million, 1242 thousands of newly registered automobiles and 1408 thousands of newly-licensed drivers. With the ever increasing market competition, automotive manufacturers are constantly upgrading in-vehicle electronic devices, such as in-vehicle multimedia systems, in-vehicle navigation systems, in-vehicle hands-free systems, in-vehicle control systems, etc., in order to meet the diverse demands of users during driving.
Under the ideal condition, a driver sends out command voice for controlling the vehicle-mounted electronic equipment, and the vehicle-mounted voice recognition system calls the corresponding vehicle-mounted electronic equipment according to the recognized content. However, there are various background noises in a real vehicle-mounted environment: such as engine noise, tire noise, wind noise, vehicle air conditioning noise, and also human noise from the passenger compartment. At present, the speech recognition accuracy rate in a quiet environment reaches about 98%, but in a real environment, particularly in a complex vehicle-mounted noise environment, the speech recognition accuracy rate will be sharply reduced.
Disclosure of Invention
The invention aims to: the vehicle-mounted voice enhancement algorithm with high voice recognition accuracy can be used in a real environment, particularly a complex vehicle-mounted noise environment.
The invention provides a vehicle-mounted voice enhancement algorithm based on a deep belief network, which comprises the following steps:
step 1: dividing the vehicle-mounted voice signal into a training sample signal and a testing sample signal;
step 2: optimizing the learning rate, the initial weight and the number of hidden nodes of the DBN by adopting a QPSO algorithm;
QPSO algorithm represents quantum particle swarm algorithm, and DBN represents deep belief network.
And step 3: replacing a sigmoid function with a tanh activation function, and optimizing the deep belief network model;
and 4, step 4: carrying out greedy unsupervised learning layer by layer on the optimized deep belief network to obtain an abstract voice characteristic vector of the input vehicle-mounted voice signal;
and 5: and inputting the abstract voice signal into a minimum mean square error algorithm to obtain a voice enhancement signal.
Further, the step 2 includes training the DBN by using a restricted boltzmann machine; the limited Boltzmann machine represents the current state of the system through energy, and the energy expression is as follows:
Figure BDA0002518600590000021
where n represents the number of nodes of the visual layer v, m represents the number of nodes of the hidden layer h, a represents the bias of the visual layer, b tableIndicating the bias of the hidden layer, WijRepresenting the weight value from the visible layer node to the hidden layer node, wherein theta is { W, a, b } representing the set of all parameters of the system;
the probability distribution of the whole system is as follows:
Figure BDA0002518600590000022
the logarithmic derivative of the edge probability distribution is taken using the following equation:
Figure BDA0002518600590000023
for the training samples, distribution P (h | v, θ) is represented by "data", and distribution P (v, h | θ) is represented by "model". Wherein, use<·>pRepresents a mathematical expectation with respect to the distribution P;
the edge probability distribution logarithmic derivative formula is expressed as:
Figure BDA0002518600590000031
Figure BDA0002518600590000032
Figure BDA0002518600590000033
the RBM parameter is obtained by adopting the following formula:
Figure BDA0002518600590000034
wherein k represents the number of sampling times, which is the learning rate;
RBM denotes a restricted Boltzmann machine;
optimizing the learning rate, the initial weight and the number of hidden nodes by adopting the following formulas:
Figure BDA0002518600590000035
P=μPi+(1-μ)Pj
Figure BDA0002518600590000036
in the formula: z is the size of the population; mu and u are the interval [0,1]Random numbers which meet the uniform distribution; zbAverage point of optimal position for all particle individuals; p is a radical ofiFor the individual optimal position of the particle i, pjFor the individual global optimum position of particle i, X (t) for the position of particle i in the t-th iteration αpTo compress the expansion factor.
Further, in step 3, the sigmoid function expression adopts the following formula:
Figure BDA0002518600590000037
the tanh activation function expression adopts the following formula:
Figure BDA0002518600590000038
the invention has the beneficial effects that: the deep belief network is combined with the traditional minimum mean square error algorithm to carry out voice enhancement, so that the strong learning capability and the feature extraction capability of the deep belief network are utilized, and the high efficiency of the traditional voice enhancement algorithm is combined. And (3) performing feature learning on the vehicle-mounted voice signal through a deep belief network, and inputting the feature learning into an LMS algorithm to obtain the optimal voice enhancement effect. Speech enhancement, as a pre-processing scheme, is an effective and necessary way to suppress noise and interference, providing convenience for subsequent speech recognition.
Drawings
FIG. 1 is a system block diagram of the present invention.
FIG. 2 is a flow chart of the QPSO algorithm.
Fig. 3 is a schematic diagram of an LMS algorithm adaptive noise canceller.
The method is concretely implemented as follows:
the following provides a more detailed description of the embodiments and the operation of the present invention with reference to the accompanying drawings.
As shown in figure 1, the invention provides a vehicle-mounted voice enhancement algorithm based on a deep belief network.
The method comprises the following specific steps:
step 1: and preprocessing the acquired vehicle-mounted voice signals, namely removing the mean value and normalizing, and dividing the samples into training samples and testing samples.
Step 2: firstly, a deep belief network model is constructed, wherein the DBN model consists of a plurality of RBMs and has the function of automatically extracting high-level features. The RBM represents the current state of the system through energy, and the energy expression is as follows:
Figure BDA0002518600590000041
wherein n represents the number of nodes of the visual layer v, m represents the number of nodes of the hidden layer h, a and b represent the bias of the visual layer and the hidden layer respectively, and WijThe weight value from the visible layer inode to the hidden layer j node is represented. The set of all parameters of the system is denoted by θ ═ { W, a, b }.
The probability distribution of the whole system is as follows:
Figure BDA0002518600590000051
the denominator part is called as a normalization factor, so that the value of the system probability is in the range of [0,1 ]. From the above equation, the edge probability distribution is obtained, which is estimated by using the maximum likelihood method, and the logarithmic derivative of which has:
Figure BDA0002518600590000052
for the training samples, distribution P (h | v, θ) is represented by "data", and distribution P (v, h | θ) is represented by "model". Wherein, use<·>pRepresenting about a distributionMathematical expectation of P. Because θ ═ W, a, b, the derivative defined in conjunction with the energy, the above equation can be expressed as:
Figure BDA0002518600590000053
Figure BDA0002518600590000054
Figure BDA0002518600590000055
and (4) approximating the log likelihood probability by Gibbs sampling, and obtaining the gradient of the RBM parameter for updating. The calculation expression is as follows:
Figure BDA0002518600590000056
where k represents the number of samples, which is the learning rate.
Optimizing the learning rate, the initial weight and the number of hidden layers by utilizing a QPSO algorithm, wherein the calculation expression is as follows:
Figure BDA0002518600590000057
P=μPi+(1-μ)Pj
Figure BDA0002518600590000061
in the formula: z is the size of the population; mu and u are the interval [0,1]Random numbers which meet the uniform distribution; zbAverage point of optimal position for all particle individuals; p is a radical ofiAnd pjFor particle i the individual optimum position and the global optimum position, X (t) for the position of particle i in the t iteration αpTo compress the expansion factor.
The QPSO algorithm is a flow chart as shown in fig. 2, and the process of optimizing the DBN by using the QPSO algorithm can be described as follows:
1) initializing a QPSO algorithm, wherein the QPSO algorithm comprises the positions and optimization ranges of particles, compression expansion factors, iteration times and the like, and the DBN learning rate, the initial weight and the number of hidden layers to be optimized are mapped to the positions of the particles;
2) calculating the fitness of each particle in the population to obtain the individual optimal position of each particle and the global optimal position of the population;
3) calculating to obtain the optimal average point of the individual positions of all the particles in the population, and then updating the positions of the particles;
4) and repeating 1) -3) until an iteration stop condition is met, wherein the output optimization result is the parameter of the DBN.
And step 3: and (3) setting basic parameters of the DBN according to the dimension of the voice signal, the size of the sample set and the result optimized by the QPSO algorithm in the step (2), wherein the basic parameters comprise parameters such as the number of hidden layer units, the number of model layers, training times, batch size, momentum learning rate, punishment rate, initial bias, initial weight and the like. And the expression of the traditional sigmoid function is as follows:
Figure BDA0002518600590000062
replaced with a tanh activation function, the expression is as follows:
Figure BDA0002518600590000063
the problem that the gradient of the traditional DBN activation function is easy to disappear is solved, and the convergence and the stability of the network are effectively improved.
And 4, step 4: the method comprises the steps of obtaining high-level expression of input vehicle-mounted voice signals by a layer-by-layer feature extraction method, optimizing weights among network connections, firstly adopting an unsupervised training mode, training RBMs layer by layer, keeping features of the voice signals as much as possible, then carrying out fine adjustment on a back propagation network, and learning ideal high-level abstract voice feature signals. The whole process is as follows:
1) and initializing parameters. And setting parameters including a weight matrix, a visible layer bias vector, a hidden layer bias vector, sampling times, iteration times, a learning rate and the like according to the step 3.
2) And (6) updating the parameters. And performing Gibbs sampling for multiple times by using a contrast divergence algorithm, and updating the parameters by using a parameter updating formula.
3) And (5) training layer by layer. And training each RBM layer by layer until all RBMs are trained.
4) And (6) fine adjustment. And adjusting the weight and the bias of each layer of network by using an error back propagation mechanism of the network model.
And 5: and inputting the high-level abstract voice signal into a self-adaptive filtering algorithm to perform voice enhancement. Based on the schematic diagram of fig. 3, where s (n) denotes the original speech signal; x (n) represents a speech signal containing noise; y (n) represents the output signal of the filter; v (n) represents a noise signal; v. of0(n) and v1(n) represents two uncorrelated noise signals; e (n) represents an error signal, expressed as follows:
e(n)=x(n)-y(n)=s(n)+v0(n)-y(n)
squaring both sides and then taking the mathematical expectation, one can obtain
E[e2(n)]=E[s2(n)]+[(v0(n)-y(n))2]+2E[s(n)(v0(n)-y(n))]
The adaptive process of the LMS algorithm is to automatically adjust the tap weight of the filter
Figure BDA0002518600590000071
So that the error E [ E ]2(n)]To a minimum. Because s (n) and v0(n) are independent of each other, so that E [ E ] is2(n)]To a minimum, only E [ (v) is required0(n)-y(n))2]The requirement can be met at minimum.
The iterative formula of the least mean square error LMS algorithm based on the steepest descent method is as follows:
Figure BDA0002518600590000081
Figure BDA0002518600590000082
so that it is possible to obtain:
Figure BDA0002518600590000083
where μ denotes a step factor.
The order of the adaptive filtering is M, the filter coefficient is F, the input signal sequence is X, and the output is:
Figure BDA0002518600590000084
e(n)=d(n)-y(n)
it is possible to obtain:
Figure BDA0002518600590000085
let F be [ w ]0w1…wM-1]T,Xj=[x1jx2j...xnj]Then the output of the filter can be written in matrix form:
Figure BDA0002518600590000086
Figure BDA0002518600590000087
the cost function is defined as:
Figure BDA0002518600590000088
when the cost function in the above equation is minimized, it is considered that optimal filtering is achieved, and such adaptive filtering becomes least mean square adaptive filtering.
For least mean square adaptive filtering, the filter coefficients that minimize the mean square error need to be determined, and gradient descent methods are generally used to solve such problems. The iterative formula of the filter coefficient vector is:
Figure BDA0002518600590000089
wherein the content of the first and second substances,
Figure BDA0002518600590000091
is the gradient of the cost function.
Because of the transient gradient-2XjejFor unbiased estimation of true gradient values, in practical applications, the instantaneous gradient can be used instead of the true gradient, that is:
Figure BDA0002518600590000092
Fj+1=Fj+μejXj
and through gradual iteration, the optimal filter coefficient can be obtained, and the self-adaptive filtering of the input vehicle-mounted voice signal is realized to perform voice enhancement.
The LMS-based speech noise reduction algorithm comprises the following specific steps:
1) first, wavelet transform is performed on a speech signal x (n) containing noise, so as to obtain the following formula:
x(n)=s(n)+v0(n)
2) then, bionic wavelet transform is carried out on the voice signal x (n) containing the noise to obtain wavelet coefficients containing the noise, which is as follows
Figure BDA0002518600590000093
Wherein the content of the first and second substances,
Figure BDA0002518600590000094
representing the coefficients produced after a speech signal containing noise has been transformed using a biomimetic wavelet.
3) And (4) self-adaptive filtering processing. The input signal of the adaptive canceller is N ═ N0n1…nM-1]TThe output signal of the filter is:
Figure BDA0002518600590000095
according to the method, after noise filtering is carried out through a vehicle-mounted voice enhancement algorithm based on a deep belief network, an enhanced vehicle-mounted voice signal is output, and finally, the calculated signal-to-noise ratio and the calculated PESQ value are improved compared with those of a traditional voice enhancement algorithm. The algorithm can effectively eliminate the noise of the original voice signal, furthest reserve the information of the original voice signal and effectively enhance the processed voice signal.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (3)

1. A vehicle-mounted voice enhancement algorithm based on a deep belief network is characterized by comprising the following steps:
step 1: dividing the vehicle-mounted voice signal into a training sample signal and a testing sample signal;
step 2: optimizing the learning rate, the initial weight and the number of hidden nodes of the DBN by adopting a QPSO algorithm;
and step 3: replacing a sigmoid function with a tanh activation function, and optimizing the deep belief network model;
and 4, step 4: carrying out greedy unsupervised learning layer by layer on the optimized deep belief network to obtain an abstract voice characteristic vector of the input vehicle-mounted voice signal;
and 5: and inputting the abstract voice signal into a minimum mean square error algorithm to obtain a voice enhancement signal.
2. The vehicle-mounted speech enhancement algorithm based on the deep belief network according to claim 1, characterized in that: the step 2 comprises training the DBN by adopting a restricted Boltzmann machine;
the limited Boltzmann machine represents the current state of the system through energy, and the energy expression is as follows:
Figure FDA0002518600580000011
where n represents the number of nodes of the visual layer v, m represents the number of nodes of the hidden layer h, a represents the bias of the visual layer, b represents the bias of the hidden layer, WijRepresenting the weight value from the visible layer node to the hidden layer node, wherein theta is { W, a, b } representing the set of all parameters of the system;
the probability distribution of the whole system is as follows:
Figure FDA0002518600580000012
the logarithmic derivative of the edge probability distribution is taken using the following equation:
Figure FDA0002518600580000013
for the training samples, "data" is used to represent the distribution P (h | v, θ), and "model" is used to represent the distribution P (v, h | θ), where<·>pRepresents a mathematical expectation with respect to the distribution P;
the edge probability distribution logarithmic derivative formula is expressed as:
Figure FDA0002518600580000021
Figure FDA0002518600580000022
Figure FDA0002518600580000023
the RBM parameter is obtained by adopting the following formula:
Figure FDA0002518600580000024
where k represents the number of samples, learning rate
Optimizing the learning rate, the initial weight and the number of hidden nodes by adopting the following formulas:
Figure FDA0002518600580000025
P=μPi+(1-μ)Pj
Figure FDA0002518600580000026
in the formula: z is the size of the population; mu and u are the interval [0,1]Random numbers which meet the uniform distribution; zbAverage point of optimal position for all particle individuals; p is a radical ofiFor the individual optimal position of the particle i, pjFor the individual global optimum position of particle i, X (t) for the position of particle i in the t-th iteration αpTo compress the expansion factor.
3. The vehicle-mounted speech enhancement algorithm based on the deep belief network according to claim 1, characterized in that: in the step 3, the sigmoid function expression adopts the following formula:
Figure FDA0002518600580000027
the tanh activation function expression adopts the following formula:
Figure FDA0002518600580000028
CN202010484415.6A 2020-06-01 2020-06-01 Vehicle-mounted voice enhancement algorithm based on deep belief network Pending CN111653272A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010484415.6A CN111653272A (en) 2020-06-01 2020-06-01 Vehicle-mounted voice enhancement algorithm based on deep belief network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010484415.6A CN111653272A (en) 2020-06-01 2020-06-01 Vehicle-mounted voice enhancement algorithm based on deep belief network

Publications (1)

Publication Number Publication Date
CN111653272A true CN111653272A (en) 2020-09-11

Family

ID=72352034

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010484415.6A Pending CN111653272A (en) 2020-06-01 2020-06-01 Vehicle-mounted voice enhancement algorithm based on deep belief network

Country Status (1)

Country Link
CN (1) CN111653272A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112687294A (en) * 2020-12-21 2021-04-20 重庆科技学院 Vehicle-mounted noise identification method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106024001A (en) * 2016-05-03 2016-10-12 电子科技大学 Method used for improving speech enhancement performance of microphone array
CN107358966A (en) * 2017-06-27 2017-11-17 北京理工大学 Based on deep learning speech enhan-cement without reference voice quality objective evaluation method
CN108615533A (en) * 2018-03-28 2018-10-02 天津大学 A kind of high-performance sound enhancement method based on deep learning
CN109086817A (en) * 2018-07-25 2018-12-25 西安工程大学 A kind of Fault Diagnosis for HV Circuit Breakers method based on deepness belief network
CN109492746A (en) * 2016-09-06 2019-03-19 青岛理工大学 Deepness belief network parameter optimization method based on GA-PSO Hybrid Algorithm
CN109671433A (en) * 2019-01-10 2019-04-23 腾讯科技(深圳)有限公司 A kind of detection method and relevant apparatus of keyword

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106024001A (en) * 2016-05-03 2016-10-12 电子科技大学 Method used for improving speech enhancement performance of microphone array
CN109492746A (en) * 2016-09-06 2019-03-19 青岛理工大学 Deepness belief network parameter optimization method based on GA-PSO Hybrid Algorithm
CN107358966A (en) * 2017-06-27 2017-11-17 北京理工大学 Based on deep learning speech enhan-cement without reference voice quality objective evaluation method
CN108615533A (en) * 2018-03-28 2018-10-02 天津大学 A kind of high-performance sound enhancement method based on deep learning
CN109086817A (en) * 2018-07-25 2018-12-25 西安工程大学 A kind of Fault Diagnosis for HV Circuit Breakers method based on deepness belief network
CN109671433A (en) * 2019-01-10 2019-04-23 腾讯科技(深圳)有限公司 A kind of detection method and relevant apparatus of keyword

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
朱萌: ""面向大数据的高压断路器状态信息挖掘与故障诊断"", 《中国优秀博硕士学位论文全文数据库(硕士) 工程科技Ⅱ辑》 *
王伟军 等: "一种基于自适应滤波的语音降噪方法研究", 《现代电子技术》 *
王涛: ""语音降噪处理技术的研究"", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科辑》 *
邢传玺,宋扬 著: "《浅海环境参数反演及声信号处理技术》", 30 June 2018, 北京理工大学出版社 *
黄家华等: ""基于参数优化的深度信念网络TE过程故障诊断"", 《第30届中国过程控制会议(CPCC 2019)摘要集》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112687294A (en) * 2020-12-21 2021-04-20 重庆科技学院 Vehicle-mounted noise identification method

Similar Documents

Publication Publication Date Title
CN109841226B (en) Single-channel real-time noise reduction method based on convolution recurrent neural network
CN108766419B (en) Abnormal voice distinguishing method based on deep learning
Droppo et al. Evaluation of the SPLICE algorithm on the Aurora2 database.
CN110867181A (en) Multi-target speech enhancement method based on SCNN and TCNN joint estimation
Dharanipragada et al. A nonlinear unsupervised adaptation technique for speech recognition.
CN112331224A (en) Lightweight time domain convolution network voice enhancement method and system
Hilger et al. Quantile based histogram equalization for noise robust speech recognition
CN108735199B (en) Self-adaptive training method and system of acoustic model
CN110634476B (en) Method and system for rapidly building robust acoustic model
CN113936681B (en) Speech enhancement method based on mask mapping and mixed cavity convolution network
WO2022012206A1 (en) Audio signal processing method, device, equipment, and storage medium
WO2005098820A1 (en) Speech recognition device and speech recognition method
CN109346084A (en) Method for distinguishing speek person based on depth storehouse autoencoder network
CN111899757A (en) Single-channel voice separation method and system for target speaker extraction
CN113539293B (en) Single-channel voice separation method based on convolutional neural network and joint optimization
CN112270405A (en) Filter pruning method and system of convolution neural network model based on norm
CN113763965A (en) Speaker identification method with multiple attention characteristics fused
CN111653272A (en) Vehicle-mounted voice enhancement algorithm based on deep belief network
CN114678030A (en) Voiceprint identification method and device based on depth residual error network and attention mechanism
CN111798828A (en) Synthetic audio detection method, system, mobile terminal and storage medium
CN111091809B (en) Regional accent recognition method and device based on depth feature fusion
CN114863938A (en) Bird language identification method and system based on attention residual error and feature fusion
Xu et al. Robust speech recognition based on noise and SNR classification-a multiple-model framework.
CN109871448B (en) Short text classification method and system
CN111667836B (en) Text irrelevant multi-label speaker recognition method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200911

RJ01 Rejection of invention patent application after publication