WO2022036761A1 - 融合入耳麦克风和耳外麦克风的深度学习降噪方法及设备 - Google Patents

融合入耳麦克风和耳外麦克风的深度学习降噪方法及设备 Download PDF

Info

Publication number
WO2022036761A1
WO2022036761A1 PCT/CN2020/112890 CN2020112890W WO2022036761A1 WO 2022036761 A1 WO2022036761 A1 WO 2022036761A1 CN 2020112890 W CN2020112890 W CN 2020112890W WO 2022036761 A1 WO2022036761 A1 WO 2022036761A1
Authority
WO
WIPO (PCT)
Prior art keywords
ear microphone
audio signal
amplitude spectrum
ear
noise reduction
Prior art date
Application number
PCT/CN2020/112890
Other languages
English (en)
French (fr)
Inventor
闫永杰
Original Assignee
大象声科(深圳)科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 大象声科(深圳)科技有限公司 filed Critical 大象声科(深圳)科技有限公司
Publication of WO2022036761A1 publication Critical patent/WO2022036761A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • H04R1/1083Reduction of ambient noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/16Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/175Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
    • G10K11/178Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
    • G10K11/1781Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase characterised by the analysis of input or output signals, e.g. frequency range, modes, transfer functions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/16Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/175Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
    • G10K11/178Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
    • G10K11/1785Methods, e.g. algorithms; Devices
    • G10K11/17853Methods, e.g. algorithms; Devices of the filter

Definitions

  • the present application relates to the technical field of speech noise reduction for electronic devices, and in particular, to a deep learning noise reduction method and device that integrates in-ear microphones and out-of-ear microphones.
  • Speech noise reduction technology refers to separating or extracting the target speech signal from the noisy speech signal.
  • the noise reduction technology of an extra-ear single microphone and an extra-ear microphone array is usually used to separate or extract the target speech signal from the noisy speech signal. It becomes extremely difficult to separate or extract the target voice signal, making voice calls impossible.
  • the in-ear microphone can physically and effectively isolate the external environmental noise and ensure that the picked-up signal has a high signal-to-noise ratio.
  • the in-ear microphone picks up the wearer's voice through the ear canal, resulting in the lack of high frequency of the target voice collected by it. Therefore, using the in-ear microphone or the out-of-ear microphone alone for speech noise reduction has great limitations.
  • the present application provides at least a deep learning noise reduction method and device that integrates the signals of the in-ear microphone and the out-of-ear microphone, which can effectively improve the quality of the call, especially the intelligibility of speech in a strong noise environment.
  • a first aspect of the present application provides a deep learning noise reduction method integrating an in-ear microphone and an out-of-ear microphone, the noise reduction method comprising:
  • the audio signal of the in-ear microphone after filtering and the audio signal of the out-of-ear microphone are respectively input into the network model, and the predicted amplitude spectrum of the network model output is obtained;
  • the predicted amplitude spectrum is resynthesized and then output as a signal after noise reduction.
  • the noise reduction method further includes:
  • the step of respectively inputting the filtered audio signal of the in-ear microphone and the audio signal of the out-of-ear microphone into the network model includes:
  • the audio amplitude spectrum of the in-ear microphone and the audio amplitude spectrum of the out-of-ear microphone are input into the network model.
  • the method further includes:
  • the filter-processed audio signal of the in-ear microphone is reconstructed at a high frequency, and the frequency of the audio signal of the in-ear microphone is widened to a preset signal frequency.
  • the method includes:
  • the high-frequency reconstructed audio signal of the in-ear microphone and the audio signal of the out-of-ear microphone are respectively input into the network model to obtain a predicted amplitude spectrum output by the network model.
  • the step of acquiring the target amplitude spectrum of the network model includes:
  • the standard audio signal is subjected to short-time Fourier transform to obtain the target amplitude spectrum of the network model.
  • a second aspect of the present application provides a deep learning noise reduction device integrating an in-ear microphone and an out-of-ear microphone, the noise-reduction device comprising a body, a data processing module, and the in-ear microphone and the out-of-ear microphone described in any one of the above;
  • the in-ear microphone, the out-of-ear microphone and the data processing module are arranged in the body part;
  • the data processing module is respectively connected with the in-ear microphone and the out-of-ear microphone;
  • the in-ear microphone is arranged on the side of the body portion facing the human external auditory canal;
  • the extra-ear microphone is arranged on the inner side of the body portion away from the human ear canal;
  • the in-ear microphone is used to acquire audio signals in the ear canal
  • the extra-ear microphone is used to acquire audio signals outside the ear canal
  • the data processing module is configured to perform high-pass filtering on the acquired audio signal of the in-ear microphone, and input the filtered audio signal of the in-ear microphone and the audio signal of the out-of-ear microphone into the network model respectively, Obtain the predicted amplitude spectrum output by the network model, and under the condition that the error between the target amplitude spectrum and the predicted amplitude spectrum is within a preset range, resynthesize the predicted amplitude spectrum and output it as algorithm prediction noise reduction signal after.
  • the noise reduction device further includes a handle portion connected to the body portion;
  • the extra-ear microphone includes a first extra-ear microphone and a second extra-ear microphone;
  • the second extra-ear microphone is disposed at one end of the handle portion away from the body portion.
  • a third aspect of the present application provides an electronic device, including a memory and a processor coupled to each other, the processor is configured to execute program instructions stored in the memory, so as to realize the depth of fusion of the in-ear microphone and the out-of-ear microphone in the first aspect above Learn how to reduce noise.
  • a fourth aspect of the present application provides a computer storage medium on which program instructions are stored, and when the program instructions are executed by a processor, the deep learning noise reduction method incorporating an in-ear microphone and an out-of-ear microphone in the first aspect above is implemented.
  • the noise reduction device acquires the audio signal of the in-ear microphone and the audio signal of the out-of-ear microphone; acquires the target amplitude spectrum of the network model; uses a high-pass filter to perform high-pass filtering on the audio signal of the in-ear microphone;
  • the audio signal and the audio signal of the out-of-ear microphone are respectively input into the network model, and the predicted amplitude spectrum output by the network model is obtained; when the error between the target amplitude spectrum and the predicted amplitude spectrum is within the preset range, the predicted amplitude spectrum is reprocessed.
  • the synthesized output is the algorithm predicting the denoised signal. It can use the in-ear microphone to naturally filter air noise.
  • Noise reduction effect High-pass filtering is used to process the audio signal of the in-ear microphone, filtering and suppressing the influence of low-frequency signals in the audio signal of the in-ear microphone on noise reduction, and improving the quality of voice calls in a noisy environment.
  • FIG. 1 is a schematic flowchart of a first embodiment of a deep learning noise reduction method that integrates an in-ear microphone and an out-of-ear microphone provided by the present application;
  • FIG. 2 is a schematic flowchart of a second embodiment of a deep learning noise reduction method that integrates an in-ear microphone and an out-of-ear microphone provided by the present application;
  • 3 is a schematic flowchart of high-frequency reconstruction in the deep learning noise reduction method provided by the present application for merging in-ear microphones and out-of-ear microphones;
  • FIG. 4 is a schematic structural diagram of a first embodiment of a deep learning noise reduction device integrating an in-ear microphone and an out-of-ear microphone provided by the present application;
  • FIG. 5 is a schematic structural diagram of a second embodiment of a deep learning noise reduction device integrating an in-ear microphone and an out-of-ear microphone provided by the present application;
  • FIG. 6 is a schematic structural diagram of a third embodiment of a deep learning noise reduction device integrating an in-ear microphone and an out-of-ear microphone provided by the present application;
  • FIG. 7 is a schematic diagram of a framework of an embodiment of an electronic device provided by the present application.
  • FIG. 8 is a schematic diagram of a framework of an embodiment of a computer storage medium provided by the present application.
  • FIG. 1 is the first embodiment of the deep learning noise reduction method provided by the present application that integrates the in-ear microphone and the out-of-ear microphone. Schematic diagram of the process.
  • the deep learning noise reduction method integrating the in-ear microphone and the out-of-ear microphone in this embodiment can be applied to a noise reduction device, and can also be applied to a server with data processing capability. This application takes the noise reduction device as an example for description.
  • the deep learning noise reduction method integrating the in-ear microphone and the out-of-ear microphone in this embodiment includes the following steps:
  • S101 Acquire the audio signal of the in-ear microphone and the audio signal of the out-of-ear microphone.
  • the noise reduction device in this embodiment is provided with an in-ear microphone and an out-of-ear microphone, wherein the in-ear microphone is arranged in a position facing the user's ear canal, has a natural suppression effect on air noise, and is used to obtain the audio signal in the user's ear canal;
  • the out-of-ear microphone is disposed at a position facing the external environment, and is used to acquire audio signals of the environment where the user is wearing the noise reduction device.
  • the noise reduction device reduces the influence of noise in the audio signal on the quality of the voice call by processing the audio signal of the in-ear microphone and the audio signal of the out-of-ear microphone.
  • the in-ear microphone acquires the audio signal in the user's ear canal, and this part of the audio signal mainly includes low-frequency noise signals and the user's voice signal, and the in-ear microphone may be an air transmission microphone.
  • the audio signal of the out-of-ear microphone includes the ambient noise during the user's call and the audio signal generated during the user's call.
  • the training target of the neural network model needs to be set in advance, so that the audio signal in the input neural network model is trained towards the training target.
  • the standard audio signal of the network model is obtained, and the standard audio signal is subjected to short-time Fourier transform to obtain the target amplitude spectrum of the network model.
  • the standard audio signal is an audio signal in an ideal state, that is, when the user is in a noise-free environment, the audio signal obtained by the noise reduction device is used as the standard audio signal.
  • the network model is a convolutional cyclic neural network. In other embodiments, it may also be a long short-term memory neural network or a deep fully convolutional neural network, which is not limited in this embodiment.
  • S103 Use a high-pass filter to perform high-pass filter processing on the audio signal of the in-ear microphone.
  • this embodiment uses a high-pass filter to perform high-pass filtering on the audio signal of the in-ear microphone. Low frequency noise signal in the audio signal of an in-ear microphone.
  • the high-pass filter mainly filters and suppresses the audio signal lower than the preset frequency in the audio signal, that is, the audio signal lower than the preset frequency will be suppressed, including the audio signal or the noise signal.
  • the preset frequency is 100 Hz.
  • the filtering processing of the audio signal of the in-ear microphone may also be implemented by digital filtering.
  • S104 Input the filtered audio signal of the in-ear microphone and the audio signal of the out-of-ear microphone into the network model, respectively, to obtain a predicted amplitude spectrum output by the network model.
  • the noise reduction device inputs the audio signal of the out-of-ear microphone obtained in S101 and the filtered audio signal of the in-ear microphone obtained in S103, respectively, into the network model for training, and obtains the preset amplitude spectrum output by the network model.
  • the noise reduction device subjects the filtered audio signal of the in-ear microphone to a short-time Fourier transform (STFT, short-time Fourier transform) to obtain the audio amplitude spectrum of the in-ear microphone.
  • STFT short-time Fourier transform
  • the noise reduction device performs short-time Fourier transform on the audio signal of the out-of-ear microphone to obtain the audio amplitude spectrum of the out-of-ear microphone.
  • the noise reduction device inputs the audio amplitude spectrum of the in-ear microphone and the audio amplitude spectrum of the out-of-ear microphone respectively into the network model for training, and obtains the predicted amplitude spectrum (Estimated Magnitude Spectrogram) of the network model.
  • S107 Resynthesize the predicted amplitude spectrum and output it as a signal predicted by the algorithm after noise reduction.
  • the error between the target amplitude spectrum and the predicted amplitude spectrum needs to be calculated.
  • the mean square error between the target amplitude spectrum and the predicted amplitude spectrum can be calculated to determine whether the mean square error is within a preset range, and if so, perform S107, and output the predicted amplitude spectrum after resynthesis as a signal predicted by the algorithm after noise reduction; If not, the network parameters of the network model are updated based on the mean square error, until the error between the predicted amplitude spectrum output by the updated network model and the target amplitude spectrum is within a preset range.
  • the mean square error reflects the degree of difference between the target amplitude spectrum and the predicted amplitude spectrum.
  • the network parameters of the network model may be updated by means of backpropagation-gradient descent.
  • the noise reduction device acquires the audio signal of the in-ear microphone and the audio signal of the out-of-ear microphone; acquires the target amplitude spectrum of the network model; uses a high-pass filter to perform high-pass filtering on the audio signal of the in-ear microphone; The audio signal of the microphone and the audio signal of the out-of-ear microphone are respectively input into the network model, and the predicted amplitude spectrum output by the network model is obtained; when the error between the target amplitude spectrum and the predicted amplitude spectrum is within the preset range, the predicted amplitude spectrum is passed through After resynthesis, the output is the signal after the algorithm predicts the noise reduction.
  • the in-ear microphone is used to filter air noise naturally, the audio signal in the human ear canal is obtained through the in-ear microphone, and the audio signal of the in-ear microphone and the audio signal of the out-of-ear microphone are input into the network model for training.
  • the ideal noise reduction effect under the noise ratio; high-pass filtering is used to process the audio signal of the in-ear microphone, filtering and suppressing the influence of the low-frequency signal in the audio signal of the in-ear microphone on the noise reduction, and improving the quality of voice calls in a noisy environment.
  • FIG. 2 is a schematic flowchart of a second embodiment of a deep learning noise reduction method that integrates an in-ear microphone and an out-of-ear microphone provided by the present application. Specifically, the method of the embodiment of the present disclosure may include the following steps:
  • S201 Acquire the audio signal of the in-ear microphone and the audio signal of the out-of-ear microphone.
  • S203 Use a high-pass filter to perform high-pass filter processing on the audio signal of the in-ear microphone.
  • S204 Reconstruct the audio signal of the in-ear microphone after filtering at high frequency, and widen the frequency of the audio signal of the in-ear microphone to a preset signal frequency.
  • the filtered and processed audio signal of the in-ear microphone can be reconstructed at high frequency to widen the audio signal of the in-ear microphone.
  • frequency to the preset signal frequency wherein the preset signal frequency is a signal frequency range value that can be clearly and comfortably recognized by the human ear.
  • FIG. 3 is a schematic flowchart of high-frequency reconstruction in the deep learning noise reduction method provided by the present application that integrates the in-ear microphone and the out-of-ear microphone.
  • the high-frequency reconstruction process in this embodiment may include the following steps:
  • S1 subject the filtered audio signal of the in-ear microphone to short-time Fourier transform to obtain the audio amplitude spectrum of the in-ear microphone.
  • S2 Input the audio amplitude spectrum of the in-ear microphone into the network model to obtain the predicted amplitude spectrum of the in-ear microphone.
  • the standard audio signal of the in-ear microphone is acquired, and the standard audio of the in-ear microphone is subjected to short-time Fourier transform to obtain the target amplitude spectrum of the network model.
  • S5 Determine whether the error is within the preset range of the in-ear microphone.
  • the acquired in-ear microphone audio signal is subjected to short-time Fourier transform, and then input into the network model for training, and the predicted amplitude spectrum of the in-ear microphone and the target amplitude spectrum of the in-ear microphone are compared.
  • the error is within the preset range, if so, perform S6, and use the predicted amplitude spectrum of the in-ear microphone as the widened amplitude spectrum; if not, update the network parameters of the network model based on the error until the in-ear output of the updated network model is reached.
  • the error between the target amplitude spectrum of the microphone and the predicted amplitude spectrum is within a preset range, and the preset amplitude spectrum is output as the target amplitude spectrum of the in-ear microphone.
  • the network model in the high-frequency reconstruction is a long short-term memory neural network, and in other embodiments, it may also be a convolutional recurrent neural network or a deep fully convolutional neural network, or the like.
  • S205 Input the high-frequency reconstructed audio signal of the in-ear microphone and the audio signal of the outer-ear microphone into the network model to obtain a predicted amplitude spectrum output by the network model.
  • the network model in S205 in this embodiment is different from the high-frequency reconstruction network model in S204.
  • S208 Resynthesize the predicted amplitude spectrum and output it as a signal predicted by the algorithm after noise reduction.
  • the audio signal of the out-of-ear microphone and the target audio signal of the in-ear microphone are respectively input into the network model to obtain the target audio signal of the in-ear microphone Calculate the error between the target amplitude spectrum and the predicted amplitude spectrum after merging with the audio signal of the out-of-ear microphone, and determine whether the error is within a preset range.
  • the algorithm predicts the denoised signal; if not, the network parameters of the network model are updated until the error between the predicted amplitude spectrum output by the network model and the target amplitude spectrum is within the preset range, and the predicted amplitude spectrum is resynthesized and output as The algorithm predicts the denoised signal.
  • the in-ear microphone is used to naturally filter air noise
  • the audio signal of the ear is obtained through the in-ear microphone
  • the audio signal of the in-ear microphone and the audio signal of the out-of-ear microphone are input into the network model for training.
  • Ideal noise reduction effect under noise ratio high-pass filtering is used to process the audio signal of the in-ear microphone, filtering and suppressing the influence of low-frequency signals in the audio signal of the in-ear microphone on noise reduction, and improving the quality of voice calls in a noisy environment
  • using high-frequency reconstruction The audio signal frequency of the in-ear microphone is widened to the preset signal frequency, which optimizes the noise reduction process.
  • the writing order of each step does not mean a strict execution order but constitutes any limitation on the implementation process, and the specific execution order of each step should be based on its function and possible Internal logic is determined.
  • FIG. 4 is a schematic structural diagram of a first embodiment of a deep learning noise reduction device integrating an in-ear microphone and an out-of-ear microphone provided by the present application.
  • the noise reduction device 40 of this embodiment includes a body part 41 , a data processing module (not shown in the figure), an in-ear microphone 42 and an out-of-ear microphone 43 ; the in-ear microphone 42 , the out-of-ear microphone 43 and the data processing module are arranged in the main body part 41
  • the data processing module is connected with the in-ear microphone 42 and the ear microphone 43 respectively; the in-ear microphone 42 is arranged on the side of the body part 41 towards the inside of the human ear; the ear microphone 42 is arranged on the side of the body part 41 away from the human ear.
  • the in-ear microphone 42 is used to acquire the audio signal of the in-ear microphone 42; the out-of-ear microphone 43 is used to acquire the audio signal of the out-of-ear microphone 43; the data processing module is used for high-pass filtering the acquired audio signal of the in-ear microphone 42, and the The filtered audio signal of the in-ear microphone 42 and the audio signal of the out-of-ear microphone 43 are respectively input into the network model to obtain the predicted amplitude spectrum output by the network model, when the error between the target amplitude spectrum and the predicted amplitude spectrum is within the preset range , the predicted amplitude spectrum is re-synthesized and output as the signal after algorithm prediction and noise reduction.
  • FIG. 5 is a schematic structural diagram of a second embodiment of a deep learning noise reduction method that integrates an in-ear microphone and an out-of-ear microphone provided by the present application.
  • the noise reduction device 50 of this embodiment includes a body portion 51 , a data processing module (not shown in the figure), an in-ear microphone 52 , an out-of-ear microphone 53 and a handle portion 54 .
  • the body portion 51 is connected with the handle portion 54 .
  • the extra-ear microphone 53 includes a first extra-ear microphone 531 and a second extra-ear microphone 532.
  • the second extra-ear microphone 532 is disposed at one end of the handle portion 54 away from the body portion 51, so that the second extra-ear microphone 532 is close to the mouth of the human body. It is used to obtain audio signals from the human mouth and noise signals in the environment.
  • FIG. 6 is a schematic structural diagram of a third embodiment of a deep learning noise reduction device integrating an in-ear microphone and an out-of-ear microphone provided by the present application.
  • the noise reduction device 60 in this embodiment can also be of a neck-hung type, including two body parts 61 , and the two body parts 61 are communicatively connected.
  • the body portion 61 is provided with an in-ear microphone 62 .
  • the in-ear microphone 62 includes a first in-ear microphone 621 and a second in-ear microphone 622 .
  • the out-of-ear microphone 63 includes a first out-of-ear microphone 631, a second out-of-ear microphone 632 and a third in-ear microphone 633.
  • the first out-of-ear microphone 631 is arranged on the side away from the first in-ear microphone 621
  • the second out-of-ear microphone 632 is arranged on the side away from the first in-ear microphone 621.
  • the third extra-ear microphone 633 can be disposed on the side close to the first in-ear microphone 621 , or can be disposed close to the second in-ear microphone 622 , so as to directly acquire the human body’s mouth. Voice.
  • FIG. 7 is a schematic diagram of a framework of an embodiment of an electronic device provided by the present application.
  • the electronic device 70 includes a memory 71 and a processor 72 that are coupled to each other, and the processor 72 is configured to execute program instructions stored in the memory 71, so as to implement any of the above-mentioned embodiments of the deep learning noise reduction method incorporating an in-ear microphone and an out-of-ear microphone. step.
  • the electronic device 70 may include, but is not limited to, a microcomputer and a server.
  • the electronic device 70 may also include mobile devices such as a notebook computer, a tablet computer, a headset, and a mobile phone, which is not limited herein.
  • the processor 72 is configured to control itself and the memory 71 to implement the steps of any of the above-mentioned embodiments of the deep learning noise reduction method integrating the in-ear microphone and the out-of-ear microphone.
  • the processor 72 may also be referred to as a CPU (Central Processing Unit, central processing unit).
  • the processor 72 may be an integrated circuit chip with signal processing capability.
  • the processor 72 may also be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field-Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • a general purpose processor may be a microprocessor or any conventional processor or the like.
  • the processor 72 may be jointly implemented by an integrated circuit chip.
  • FIG. 8 is a schematic diagram of a framework of an embodiment of a computer storage medium provided by the present application.
  • the computer-readable storage medium 80 stores program instructions 801 that can be executed by the processor, and the program instructions 801 are used to implement the steps of any of the above-mentioned embodiments of the deep learning noise reduction method integrating an in-ear microphone and an out-of-ear microphone.
  • the functions or modules included in the apparatus provided in this embodiment may be used to execute the methods described in the above method embodiments, and the specific implementation may refer to the descriptions in the above method embodiments. Repeat.
  • the disclosed method and apparatus may be implemented in other manners.
  • the device implementations described above are only illustrative.
  • the division of modules or units is only a logical function division. In actual implementation, there may be other divisions.
  • units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit if implemented as a software functional unit and sold or used as a stand-alone product, may be stored in a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

本申请公开了一种融合入耳麦克风和耳外麦克风的深度学习降噪方法及设备,该降噪方法包括:获取入耳麦克风的音频信号和耳外麦克风的音频信号;获取网络模型的目标幅度谱;基于高通滤波技术对入耳麦克风的音频信号进行滤波处理;将滤波处理后的入耳麦克风的音频信号和耳外麦克风的音频信号分别输入网络模型中,得到网络模型输出的预测幅度谱;在目标幅度谱与预测幅度谱的误差在预设范围内的情况下,将预测幅度谱经过再合成后输出为算法预测降噪后的信号。上述方案,提高了噪声环境下的语音通话质量。

Description

融合入耳麦克风和耳外麦克风的深度学习降噪方法及设备 技术领域
本申请涉及电子设备语音降噪技术领域,特别是涉及融合入耳麦克风和耳外麦克风的深度学习降噪方法及设备。
背景技术
语音降噪技术是指从带噪语音信号中分离或提取出目标语音信号。目前的降噪方案中,通常采用耳外单麦克风和耳外麦克风阵列降噪技术从带噪音语音信号分离或提取出目标语音信号,当降噪装置处于十分嘈杂环境时,从带噪音语音信号中分离或提取出的目标语音信号变得极为困难,导致无法进行语音通话。入耳麦克风从物理上可以有效的隔绝外部环境噪声,确保拾取信号具有较高的信噪比。但入耳麦克风通过耳道拾取佩戴者的语音,导致其采集的目标语音高频缺失。因此,单独使用入耳麦克风或耳外麦克风进行语音降噪,具有很大的局限性。
发明内容
本申请至少提供一种融合入耳麦克风和耳外麦克风信号的深度学习降噪方法及设备,能够有效的提升通话质量,尤其是能在强噪声环境中提高语音的可懂度。
本申请第一方面提供了一种融合入耳麦克风和耳外麦克风的深度学习降噪方法,所述降噪方法包括:
获取入耳麦克风的音频信号和耳外麦克风的音频信号;
获取网络模型的目标幅度谱;
基于高通滤波技术对所述入耳麦克风的音频信号进行滤波处理;
将滤波处理后的所述入耳麦克风的音频信号和所述耳外麦克风的音频信号分别输入所述网络模型中,得到所述网络模型输出的预测幅度 谱;
在所述目标幅度谱与所述预测幅度谱的误差在预设范围内的情况下,将所述预测幅度谱经过再合成后输出为降噪后信号。
在一些实施例中,所述降噪方法还包括:
在所述目标幅度谱与所述预测幅度谱的误差在所述预设范围外的情况下,基于所述误差更新所述网络模型的网络参数,直至更新后的所述网络模型输出的预设幅度谱与所述目标幅度谱的误差在所述预设范围内。
在一些实施例中,所述将滤波处理后的所述入耳麦克风的音频信号和所述耳外麦克风的音频信号分别输入所述网络模型中的步骤,包括:
将滤波处理后的所述入耳麦克风的音频信号经过短时傅里叶变换,得到所述入耳麦克风的音频幅度谱;
将所述耳外麦克风的音频信号经过所述短时傅里叶变换,得到所述耳外麦克风的音频幅度谱;
将所述入耳麦克风的音频幅度谱以及所述耳外麦克风的音频幅度谱输入所述网络模型。
在一些实施例中,所述使用高通滤波器对所述入耳麦克风的音频信号进行高通滤波处理的步骤之后,还包括:
将滤波处理后的所述入耳麦克风的音频信号经过高频重建,将所述入耳麦克风的音频信号频率拓宽到预设信号频率。
在一些实施例中,所述将滤波处理后的所述入耳麦克风的音频信号经过高频重建,将所述入耳麦克风的音频信号频率拓宽到预设信号频率的步骤之后,包括:
将所述高频重建后的入耳麦克风的音频信号和所述耳外麦克风的音频信号分别输入所述网络模型中,得到所述网络模型输出的预测幅度谱。
在一些实施例中,所述获取网络模型的目标幅度谱的步骤,包括:
获取标准音频信号;
将所述标准音频信号经过短时傅里叶变换,得到所述网络模型的目 标幅度谱。
本申请第二方面提供了一种融合入耳麦克风和耳外麦克风的深度学习降噪装置,所述降噪装置包括本体部、数据处理模块及上述任一项所述的入耳麦克风和耳外麦克风;
所述入耳麦克风、所述耳外麦克风和所述数据处理模块设置于所述本体部内;
所述数据处理模块分别与所述入耳麦克风和所述耳外麦克风连接;
所述入耳麦克风设置于所述本体部朝向人体外耳道一侧;
所述耳外麦克风设置于所述本体部远离所述人体耳道内一侧;
所述入耳麦克风用于获取耳道内的音频信号;
所述耳外麦克风用于获取耳道外的音频信号;
所述数据处理模块用于对获取的所述入耳麦克风的音频信号进行高通滤波,并将滤波后的所述入耳麦克风的音频信号和所述耳外麦克风的音频信号分别输入所述网络模型中,得到所述网络模型输出的预测幅度谱,在所述目标幅度谱与所述预测幅度谱的误差在预设范围内的情况下,将所述预测幅度谱经过再合成后输出为算法预测降噪后的信号。
在一些实施例中,所述降噪装置还包括柄状部,所述柄状部连接于所述本体部;所述耳外麦克风包括第一耳外麦克风和第二耳外麦克风;
所述第二耳外麦克风设置于所述柄状部远离所述本体部一端。
本申请第三方面提供了一种电子设备,包括相互耦接的存储器和处理器,处理器用于执行存储器中存储的程序指令,以实现上述第一方面中的融合入耳麦克风和耳外麦克风的深度学习降噪方法。
本申请第四方面提供了一种计算机存储介质,其上存储有程序指令,程序指令被处理器执行时实现上述第一方面中的融合入耳麦克风和耳外麦克风的深度学习降噪方法。
上述方案,降噪装置获取入耳麦克风的音频信号和耳外麦克风的音频信号;获取网络模型的目标幅度谱;使用高通滤波器对入耳麦克风的音频信号进行高通滤波处理;将滤波处理后的入耳麦克风的音频信号和耳外麦克风的音频信号分别输入网络模型中,得到网络模型输出的预测 幅度谱;在目标幅度谱与预测幅度谱的误差在预设范围内的情况下,将预测幅度谱经过再合成后输出为算法预测降噪后的信号。能够利用入耳麦克风对空气噪声具有自然过滤的特性,通过获取入耳麦克风的音频信号,将入耳麦克风音频信号与耳外麦克风音频信号输入网络模型中进行训练,可实现在极低信噪比下的理想降噪效果;利用高通滤波对入耳麦克风的音频信号进行处理,过滤抑制入耳麦克风音频信号中低频信号对降噪的影响,提高了噪声环境下的语音通话质量。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本申请。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本申请的实施例,并与说明书一起用于说明本申请的技术方案。
图1是本申请提供的融合入耳麦克风和耳外麦克风的深度学习降噪方法第一实施例的流程示意图;
图2是本申请提供的融合入耳麦克风和耳外麦克风的深度学习降噪方法第二实施例的流程示意图;
图3是本申请提供的融合入耳麦克风和耳外麦克风的深度学习降噪方法中高频重建的流程示意图;
图4是本申请提供的融合入耳麦克风和耳外麦克风的深度学习降噪装置第一实施例的结构示意图;
图5是本申请提供的融合入耳麦克风和耳外麦克风的深度学习降噪装置第二实施例的结构示意图;
图6是本申请提供的融合入耳麦克风和耳外麦克风的深度学习降噪装置第三实施例的结构示意图;
图7是本申请提供的电子设备一实施例的框架示意图;
图8是本申请提供的计算机存储介质一实施例的框架示意图。
具体实施方式
下面结合说明书附图,对本申请实施例的方案进行详细说明。
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、接口、技术之类的具体细节,以便透彻理解本申请。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。此外,本文中的“多”表示两个或者多于两个。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。
本申请提出了一种融合入耳麦克风和耳外麦克风的深度学习降噪方法,可应用于耳塞式耳机、入耳式耳机等贴合用户耳部的语音通话场景,通过本申请的融合入耳麦克风和耳外麦克风的深度学习降噪方法可以提高噪声环境下的语音通话的质量,具体请参见图1,图1是本申请提供的融合入耳麦克风和耳外麦克风的深度学习降噪方法第一实施例的流程示意图。本实施例的融合入耳麦克风和耳外麦克风的深度学习降噪方法可应用于降噪装置,也可应用于具有数据处理能力的服务器,本申请以降噪装置为例进行说明。
具体而言,本实施例的融合入耳麦克风和耳外麦克风的深度学习降噪方法包括以下步骤:
S101:获取入耳麦克风的音频信号和耳外麦克风的音频信号。
本实施例中的降噪装置上设置有入耳麦克风和耳外麦克风,其中,入耳麦克风设置于朝向用户耳道内的位置,对空气噪声有自然抑制作用,用于获取到用户耳道内的音频信号;耳外麦克风设置于朝向外界环境的位置,用于获取用户佩戴降噪装置时所处环境的音频信号。降噪装置通过对入耳麦克风的音频信号和耳外麦克风的音频信号的处理,降低音频信号中噪声对语音通话质量的影响。
其中,入耳麦克风获取用户耳道内的音频信号,这部分音频信号主 要包括低频的噪声信号以及用户的语音信号,入耳麦克风可以为气传麦克风。耳外麦克风的音频信号包括用户通话时的环境噪音和用户通话过程中产生的音频信号。
S102:获取网络模型的目标幅度谱。
由于深度神经网络对噪声具有极强的抑制能力,为了在语音通话中高度还原人声音频信号,需预先设置神经网络模型的训练目标,以使输入神经网络模型中的音频信号朝着训练目标训练。具体地,本实施例获取网络模型的标准音频信号,将标准音频信号经过短时傅里叶变换,得到网络模型的目标幅度谱。
其中,标准音频信号为理想状态下的音频信号,即用户处于无噪声环境下时,降噪装置所获取的音频信号作为标准音频信号。
需要说明的是,网络模型为卷积循环神经网络,在其他实施例中,还可以为长短期记忆神经网络或深度全卷积神经网络等,本实施例对此不进行限定。
S103:使用高通滤波器对入耳麦克风的音频信号进行高通滤波处理。
由于入耳麦克风获取的是耳道内的音频信号,包含有低频噪声信号,为了避免低频噪声信号对降噪效果的影响,本实施例使用高通滤波器对入耳麦克风的音频信号进行高通滤波处理,过滤抑制入耳麦克风音频信号中的低频噪声信号。
其中,高通滤波器主要过滤抑制音频信号中低于预设频率的音频信号,也就是低于预设频率的音频信号都将被抑制,其中包括音频信号或噪声信号。在具体实施例中,预设频率为100赫兹。
需要说明的是,在其他实施例中,也可以通过数字滤波实现对入耳麦克风音频信号的滤波处理。
S104:将滤波处理后的入耳麦克风的音频信号和耳外麦克风的音频信号分别输入网络模型中,得到网络模型输出的预测幅度谱。
降噪装置将S101中获取耳外麦克风的音频信号和S103中获取的滤波后的入耳麦克风的音频信号,分别输入网络模型进行训练,得到网络 模型输出的预设幅度谱。
具体地,降噪装置将滤波处理后的入耳麦克风的音频信号经过短时傅里叶变换(STFT,short-time Fourier transform),得到入耳麦克风的音频幅度谱。降噪装置将耳外麦克风的音频信号经过短时傅里叶变换,得到耳外麦克风的音频幅度谱。降噪装置将入耳麦克风的音频幅度谱以及耳外麦克风的音频幅度谱分别输入网络模型进行训练,得到网络模型的预测幅度谱(Estimated Magnitude Spectrogram)。
S105:计算目标幅度谱与预测幅度谱的误差。
S106:判断误差是否在预设范围内。
S107:将预测幅度谱经过再合成后输出为算法预测降噪后的信号。
为了获知通过网络模型训练所得的预测幅度谱是否满足要求,本实施例需计算目标幅度谱与预测幅度谱的误差。具体可计算目标幅度谱与预测幅度谱的均方误差,判断均方误差是否在预设范围内,若是,则执行S107,将预测幅度谱经过再合成后输出为算法预测降噪后的信号;若否,则基于均方误差更新网络模型的网络参数,直至更新后的网络模型输出的预测幅度谱与目标幅度谱的误差在预设范围内。
其中,均方误差反映了目标幅度谱与预测幅度谱之间的差异程度,均方误差越小,表示网络模型训练所得的预测幅度谱越接近于目标幅度谱,均方误差越大,表示网络模型训练所得的预测幅度谱与目标幅度谱差异越大。
需要说明的是,在具体实施例中,可采用反向传播-梯度下降的方式更新网络模型的网络参数。
上述方案中,降噪装置获取入耳麦克风的音频信号和耳外麦克风的音频信号;获取网络模型的目标幅度谱;使用高通滤波器对入耳麦克风的音频信号进行高通滤波处理;将滤波处理后的入耳麦克风的音频信号和耳外麦克风的音频信号分别输入网络模型中,得到网络模型输出的预测幅度谱;在目标幅度谱与预测幅度谱的误差在预设范围内的情况下,将预测幅度谱经过再合成后输出为算法预测降噪后的信号。本实施例利用入耳麦克风对空气噪声具有自然过滤的特性,通过入耳麦克风获取人 体耳道内的音频信号,将入耳麦克风音频信号与耳外麦克风音频信号输入网络模型中进行训练,可实现在极低信噪比下的理想降噪效果;利用高通滤波对入耳麦克风的音频信号进行处理,过滤抑制入耳麦克风音频信号中低频信号对降噪的影响,提高了噪声环境下的语音通话质量。
请继续参阅图2,图2是本申请提供的融合入耳麦克风和耳外麦克风的深度学习降噪方法第二实施例的流程示意图。具体而言,本公开实施例的方法可以包括如下步骤:
S201:获取入耳麦克风的音频信号和耳外麦克风的音频信号。
S202:获取网络模型的目标幅度谱。
S203:使用高通滤波器对入耳麦克风的音频信号进行高通滤波处理。
本实施例S201~S203的详细描述可参阅上述实施例S101~S103的详细描述,对此不进行赘述。
S204:将滤波处理后的入耳麦克风的音频信号经过高频重建,将入耳麦克风的音频信号频率拓宽到预设信号频率。
由于入耳麦克风获取的音频信号主要包括低频语音信号和低频噪声信号,为了拓宽入耳麦克风音频信号频率,本实施例可将滤波处理后的入耳麦克风的音频信号进行高频重建,拓宽入耳麦克风的音频信号频率至预设信号频率,其中,预设信号频率为人耳能够清晰舒适辨识的信号频率范围值。
具体地,滤波后的入耳麦克风音频信号高频重建过程可参阅图3,图3是本申请提供的融合入耳麦克风和耳外麦克风的深度学习降噪方法中高频重建的流程示意图。本实施例的高频重建过程可以包括如下步骤:
S1:将滤波处理后的入耳麦克风的音频信号经过短时傅里叶变换,得到入耳麦克风的音频幅度谱。
S2:将入耳麦克风的音频幅度谱输入网络模型,得到入耳麦克风的预测幅度谱。
S3:获取网络模型的目标幅度谱。
本实施例通过获取入耳麦克风的标准音频信号,将入耳麦克风的标准音频经过短时傅里叶变换,得到网络模型的目标幅度谱。
S4:计算网络模型中入耳麦克风的目标幅度谱与预测幅度谱之间的误差。
S5:判断误差是否在入耳麦克风的预设范围内。
S6:将入耳麦克风的预测幅度谱作为拓宽后的幅度谱。
为了拓宽入耳麦克风音频信号的频率,本实施例将获取的入耳麦克风音频信号经过短时傅里叶变换后,输入网络模型中进行训练,比较入耳麦克风的预测幅度谱与入耳麦克风的目标幅度谱的误差是否在预设范围内,若是,则执行S6,将入耳麦克风的预测幅度谱作为拓宽后的幅度谱;若否,则基于误差更新网络模型的网络参数,直至更新后的网络模型输出的入耳麦克风的目标幅度谱与预测幅度谱的误差在预设范围内,将预设幅度谱输出为入耳麦克风的目标幅度谱。
其中,高频重建中的网络模型为长短期记忆神经网络,在其他实施例中,还可以为卷积循环神经网络或深度全卷积神经网络等。
S205:将高频重建后的入耳麦克风的音频信号和外耳麦克风的音频信号输入网络模型中,得到网络模型输出的预测幅度谱。
需要说明的是,本实施例S205中的网络模型不同于S204中高频重建的网络模型。
S206:计算目标幅度谱与预测幅度谱的误差。
S207:判断误差是否在预设范围内。
S208:将预测幅度谱经过再合成后输出为算法预测降噪后的信号。
基于S204中获取的入耳麦克风的目标音频信号,即高频重建后入耳麦克风的音频信号,将耳外麦克风的音频信号与入耳麦克风的目标音频信号分别输入网络模型中,得到入耳麦克风的目标音频信号和耳外麦克风的音频信号融合后的预测幅度谱,计算目标幅度谱与预测幅度谱的误差,判断误差是否在预设范围内,若是,则执行S208,将预测幅度谱经过再合成后输出为算法预测降噪后的信号;若否,则更新网络模型的网络参数,直至网络模型输出的预测幅度谱与目标幅度谱间的误差在预 设范围内,将预测幅度谱经过再合成后输出为算法预测降噪后的信号。
上述方案中,利用入耳麦克风对空气噪声具有自然过滤的特性,通过入耳麦克风获取耳部的音频信号,将入耳麦克风音频信号与耳外麦克风音频信号输入网络模型中进行训练,可实现在极低信噪比下的理想降噪效果;利用高通滤波对入耳麦克风的音频信号进行处理,过滤抑制入耳麦克风音频信号中低频信号对降噪的影响,提高了噪声环境下的语音通话质量;利用高频重建将入耳麦克风的音频信号频率拓宽到预设信号频率,优化了降噪过程。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
请参阅图4,图4是本申请提供的融合入耳麦克风和耳外麦克风的深度学习降噪装置第一实施例的结构示意图。
本实施例的降噪装置40包括本体部41、数据处理模块(图中未示)、入耳麦克风42和耳外麦克风43;入耳麦克风42、耳外麦克风43和数据处理模块设置于本体部41内;数据处理模块分别与入耳麦克风42和耳外麦克风43连接;入耳麦克风42设置于本体部41朝向人体耳内一侧;耳外麦克风42设置于本体部41远离人体耳内一侧。
其中,入耳麦克风42用于获取入耳麦克风42的音频信号;耳外麦克风43用于获取耳外麦克风43的音频信号;数据处理模块用于对获取的入耳麦克风42的音频信号进行高通滤波,并将滤波后的入耳麦克风42的音频信号和耳外麦克风43的音频信号分别输入网络模型中,得到网络模型输出的预测幅度谱,在目标幅度谱与预测幅度谱的误差在预设范围内的情况下,将预测幅度谱经过再合成后输出为算法预测降噪后的信号。
请继续参阅图5,图5是本申请提供的融合入耳麦克风和耳外麦克风的深度学习降噪方法第二实施例的结构示意图。本实施例的降噪装置50包括本体部51、数据处理模块(图中未示)、入耳麦克风52、耳外麦克风53和柄状部54。
其中,本体部51与柄状部54连接。耳外麦克风53包括第一耳外麦克 风531和第二耳外麦克风532,第二耳外麦克风532设置于柄状部54远离本体部51一端,使得第二耳外麦克风532靠近人体口部,用于获取人体口部发出的音频信号和环境中的噪音信号。
请继续参阅图6,图6是本申请提供的融合入耳麦克风和耳外麦克风的深度学习降噪装置第三实施例的结构示意图。本实施例的降噪装置60还可为挂脖式,包括两个本体部61,两个本体部61通信连接。本体部61上设置有入耳麦克风62,入耳麦克风62包括第一入耳麦克风621和第二入耳麦克风622,第一入耳麦克风621和第二入耳麦克风622分别设置于本体部61靠近人体耳内一侧。耳外麦克风63包括第一耳外麦克风631、第二耳外麦克风632和第三入耳麦克风633,第一耳外麦克风631设置在远离第一入耳麦克风621一侧上,第二耳外麦克风632设置在远离第二入耳麦克风622一侧上,第三耳外麦克风633可设置于靠近第一入耳麦克风621一侧,也可设置于靠近第二入耳麦克风622,用于直接获取人体口部发出的人声。
请参阅图7,图7是本申请提供的电子设备一实施例的框架示意图。电子设备70包括相互耦接的存储器71和处理器72,处理器72用于执行存储器71中存储的程序指令,以实现上述任一融合入耳麦克风和耳外麦克风的深度学习降噪方法实施例的步骤。在一个具体的实施场景中,电子设备70可以包括但不限于:微型计算机、服务器,此外,电子设备70还可以包括笔记本电脑、平板电脑、耳机、手机等移动设备,在此不做限定。
具体而言,处理器72用于控制其自身以及存储器71以实现上述任一融合入耳麦克风和耳外麦克风的深度学习降噪方法实施例的步骤。处理器72还可以称为CPU(Central Processing Unit,中央处理单元)。处理器72可能是一种集成电路芯片,具有信号的处理能力。处理器72还可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是 微处理器或者是任何常规的处理器等。另外,处理器72可以由集成电路芯片共同实现。
请参阅图8,图8是本申请提供的计算机存储介质一实施例的框架示意图。计算机可读存储介质80存储有能够被处理器运行的程序指令801,程序指令801用于实现上述任一融合入耳麦克风和耳外麦克风的深度学习降噪方法实施例的步骤。
在一些实施例中,本实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述,为了简洁,这里不再赘述。
上文对各个实施例的描述倾向于强调各个实施例之间的不同之处,其相同或相似之处可以互相参考,为了简洁,本文不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的方法和装置,可以通过其它的方式实现。例如,以上所描述的装置实施方式仅仅是示意性的,例如,模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性、机械或其它的形式。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施方式方法的全部或部分步骤。而前述的存储介质包 括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。

Claims (10)

  1. 一种融合入耳麦克风和耳外麦克风的深度学习降噪方法,其特征在于,所述降噪方法包括:
    获取入耳麦克风的音频信号和耳外麦克风的音频信号;
    获取网络模型的目标幅度谱;
    使用高通滤波器对所述入耳麦克风的音频信号进行高通滤波处理;
    将滤波处理后的所述入耳麦克风的音频信号和所述耳外麦克风的音频信号分别输入所述网络模型中,得到所述网络模型输出的预测幅度谱;
    在所述目标幅度谱与所述预测幅度谱的误差在预设范围内的情况下,将所述预测幅度谱经过再合成后输出为算法预测降噪后的信号。
  2. 根据权利要求1所述的融合入耳麦克风和耳外麦克风的深度学习降噪方法,其特征在于,
    所述降噪方法还包括:
    在所述目标幅度谱与所述预测幅度谱的误差在所述预设范围外的情况下,基于所述误差更新所述网络模型的网络参数,直至更新后的所述网络模型输出的预设幅度谱与所述目标幅度谱的误差在所述预设范围内。
  3. 根据权利要求1所述的融合入耳麦克风和耳外麦克风的深度学习降噪方法,其特征在于,
    所述将滤波处理后的所述入耳麦克风的音频信号和所述耳外麦克风的音频信号分别输入所述网络模型中的步骤,包括:
    将滤波处理后的所述入耳麦克风的音频信号经过短时傅里叶变换,得到所述入耳麦克风的音频幅度谱;
    将所述耳外麦克风的音频信号经过所述短时傅里叶变换,得到所述耳外麦克风的音频幅度谱;
    将所述入耳麦克风的音频幅度谱以及所述耳外麦克风的音频幅度 谱输入所述网络模型。
  4. 根据权利要求1所述的融合入耳麦克风和耳外麦克风的深度学习降噪方法,其特征在于,
    所述使用高通滤波器对所述入耳麦克风的音频信号进行高通滤波处理的步骤之后,还包括:
    将滤波处理后的所述入耳麦克风的音频信号经过高频重建,将所述入耳麦克风的音频信号频率拓宽到预设信号频率。
  5. 根据权利要求4所述的融合入耳麦克风和耳外麦克风的深度学习降噪方法,其特征在于,所述将滤波处理后的所述入耳麦克风的音频信号经过高频重建,将所述入耳麦克风的音频信号频率拓宽到预设信号频率的步骤之后,包括:
    将所述高频重建后的入耳麦克风的音频信号和所述耳外麦克风的音频信号分别输入所述网络模型中,得到所述网络模型输出的预测幅度谱。
  6. 根据权利要求1所述的融合入耳麦克风和耳外麦克风的深度学习降噪方法,其特征在于,所述获取网络模型的目标幅度谱的步骤,包括:
    获取标准音频信号;
    将所述标准音频信号经过短时傅里叶变换,得到所述网络模型的目标幅度谱。
  7. 一种融合入耳麦克风和耳外麦克风的深度学习降噪装置,其特征在于,所述降噪装置包括本体部、数据处理模块及上述任一项所述的入耳麦克风和耳外麦克风;
    所述入耳麦克风、所述耳外麦克风和所述数据处理模块设置于所述本体部内;
    所述数据处理模块分别与所述入耳麦克风和所述耳外麦克风连接;
    所述入耳麦克风设置于所述本体部朝向人体外耳道一侧;
    所述耳外麦克风设置于所述本体部远离所述人体耳道内一侧;
    所述入耳麦克风用于获取耳道内的音频信号;
    所述耳外麦克风用于获取耳道外的音频信号;
    所述数据处理模块用于对获取的所述入耳麦克风的音频信号进行高通滤波,并将滤波后的所述入耳麦克风的音频信号和所述耳外麦克风的音频信号分别输入所述网络模型中,得到所述网络模型输出的预测幅度谱,在所述目标幅度谱与所述预测幅度谱的误差在预设范围内的情况下,将所述预测幅度谱经过再合成后输出为算法预测降噪后的信号。
  8. 根据权利要求7所述的降噪装置,其特征在于,所述降噪装置还包括柄状部,所述柄状部连接于所述本体部;所述耳外麦克风包括第一耳外麦克风和第二耳外麦克风;
    所述第二耳外麦克风设置于所述柄状部远离所述本体部一端。
  9. 一种电子设备,其特征在于,所述设备包括存储器以及与所述存储器耦接的处理器;
    其中,所述存储器用于存储程序数据,所述处理器用于执行所述程序数据以实现如权利要求1~6任一项所述的融合入耳麦克风和耳外麦克风的深度学习降噪方法。
  10. 一种计算机存储介质,其特征在于,所述计算机存储介质用于存储程序数据,所述程序数据在被处理器执行时,用以实现如权利要求1~6任一项所述的融合入耳麦克风和耳外麦克风的深度学习降噪方法。
PCT/CN2020/112890 2020-08-17 2020-09-01 融合入耳麦克风和耳外麦克风的深度学习降噪方法及设备 WO2022036761A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010825493.8A CN112055278B (zh) 2020-08-17 2020-08-17 融合入耳麦克风和耳外麦克风的深度学习降噪设备
CN202010825493.8 2020-08-17

Publications (1)

Publication Number Publication Date
WO2022036761A1 true WO2022036761A1 (zh) 2022-02-24

Family

ID=73599198

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/112890 WO2022036761A1 (zh) 2020-08-17 2020-09-01 融合入耳麦克风和耳外麦克风的深度学习降噪方法及设备

Country Status (2)

Country Link
CN (1) CN112055278B (zh)
WO (1) WO2022036761A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113163286A (zh) * 2021-03-22 2021-07-23 九音(南京)集成电路技术有限公司 一种通话降噪方法、耳机和计算机存储介质
CN115884032B (zh) * 2023-02-20 2023-07-04 深圳市九音科技有限公司 一种后馈式耳机的智慧通话降噪方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014187332A1 (zh) * 2013-05-22 2014-11-27 歌尔声学股份有限公司 一种强噪声环境下的耳机通讯方法和一种耳机
US20190325887A1 (en) * 2018-04-18 2019-10-24 Nokia Technologies Oy Enabling in-ear voice capture using deep learning
CN110931031A (zh) * 2019-10-09 2020-03-27 大象声科(深圳)科技有限公司 一种融合骨振动传感器和麦克风信号的深度学习语音提取和降噪方法
CN111131947A (zh) * 2019-12-05 2020-05-08 北京小鸟听听科技有限公司 耳机信号处理方法、系统和耳机

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7447630B2 (en) * 2003-11-26 2008-11-04 Microsoft Corporation Method and apparatus for multi-sensory speech enhancement
DK179837B1 (en) * 2017-12-30 2019-07-29 Gn Audio A/S MICROPHONE APPARATUS AND HEADSET
TWI729404B (zh) * 2018-08-17 2021-06-01 宏達國際電子股份有限公司 補償耳內音訊信號的方法、電子裝置及記錄媒體
CN111432303B (zh) * 2020-03-19 2023-01-10 交互未来(北京)科技有限公司 单耳耳机、智能电子设备、方法和计算机可读介质
CN111510807A (zh) * 2020-03-30 2020-08-07 广州酷狗计算机科技有限公司 耳机和语音信号的获取方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014187332A1 (zh) * 2013-05-22 2014-11-27 歌尔声学股份有限公司 一种强噪声环境下的耳机通讯方法和一种耳机
US20190325887A1 (en) * 2018-04-18 2019-10-24 Nokia Technologies Oy Enabling in-ear voice capture using deep learning
CN110931031A (zh) * 2019-10-09 2020-03-27 大象声科(深圳)科技有限公司 一种融合骨振动传感器和麦克风信号的深度学习语音提取和降噪方法
CN111131947A (zh) * 2019-12-05 2020-05-08 北京小鸟听听科技有限公司 耳机信号处理方法、系统和耳机

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHENGYOU ZHANG, ZICHENG LIU, SINCLAIR M., ACERO A., LI DENG, DROPPO J., XUEDONG HUANG, YANLI ZHENG: "Multi-sensory microphones for robust speech detection,enhancement and recognition", ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2004. PROCEEDINGS. (ICASSP ' 04). IEEE INTERNATIONAL CONFERENCE ON MONTREAL, QUEBEC, CANADA 17-21 MAY 2004, PISCATAWAY, NJ, USA,IEEE, PISCATAWAY, NJ, USA, vol. 3, 17 May 2004 (2004-05-17) - 21 May 2004 (2004-05-21), Piscataway, NJ, USA , pages 781 - 784, XP010718306, ISBN: 978-0-7803-8484-2 *

Also Published As

Publication number Publication date
CN112055278A (zh) 2020-12-08
CN112055278B (zh) 2022-03-08

Similar Documents

Publication Publication Date Title
US11363390B2 (en) Perceptually guided speech enhancement using deep neural networks
JP6150988B2 (ja) 特に「ハンズフリー」電話システム用の、小数遅延フィルタリングにより音声信号のノイズ除去を行うための手段を含むオーディオ装置
EP3453189B1 (en) Device and method for improving the quality of in- ear microphone signals in noisy environments
AU2015349054B2 (en) Method and apparatus for fast recognition of a user's own voice
US20060206320A1 (en) Apparatus and method for noise reduction and speech enhancement with microphones and loudspeakers
KR101660670B1 (ko) 이어폰에 적용되는 심박수 검출 방법 및 심박수 검출이 가능한 이어폰
KR101660671B1 (ko) 이어폰에 적용되는 심박수 검출 방법 및 심박수 검출이 가능한 이어폰
CN112087701B (zh) 用于风检测的麦克风的扬声器仿真
WO2022036761A1 (zh) 融合入耳麦克风和耳外麦克风的深度学习降噪方法及设备
US8948424B2 (en) Hearing device and method for operating a hearing device with two-stage transformation
KR20220062598A (ko) 오디오 신호 생성을 위한 시스템 및 방법
US20190043518A1 (en) Capture and extraction of own voice signal
CN113507662B (zh) 降噪处理方法、装置、设备、存储介质及程序
US8280062B2 (en) Sound corrector, sound measurement device, sound reproducer, sound correction method, and sound measurement method
TWI397057B (zh) 音訊分離裝置及其操作方法
US10972844B1 (en) Earphone and set of earphones
EP4342188A1 (en) Wearable hearing assist device with artifact remediation
WO2019079948A1 (en) HEADER AND METHOD FOR PERFORMING AN ADAPTIVE SELF-ACCORD FOR A HEADPHONES
KR101850693B1 (ko) 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치 및 방법
WO2017207286A1 (fr) Combine audio micro/casque comprenant des moyens de detection d'activite vocale multiples a classifieur supervise
TWI534796B (zh) 抗噪耳罩裝置及其聲音處理方法
WO2022140927A1 (zh) 音频降噪的方法和系统
US11955133B2 (en) Audio signal processing method and system for noise mitigation of a voice signal measured by an audio sensor in an ear canal of a user
US11330376B1 (en) Hearing device with multiple delay paths
CN107172516A (zh) 一种耳机及心率检测方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20949967

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 05.07.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20949967

Country of ref document: EP

Kind code of ref document: A1