WO2023165565A1 - 音频增强方法和装置、计算机存储介质 - Google Patents

音频增强方法和装置、计算机存储介质 Download PDF

Info

Publication number
WO2023165565A1
WO2023165565A1 PCT/CN2023/079312 CN2023079312W WO2023165565A1 WO 2023165565 A1 WO2023165565 A1 WO 2023165565A1 CN 2023079312 W CN2023079312 W CN 2023079312W WO 2023165565 A1 WO2023165565 A1 WO 2023165565A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
update interval
audio
matrix
microphone
Prior art date
Application number
PCT/CN2023/079312
Other languages
English (en)
French (fr)
Inventor
李林锴
陆丛希
孙鸿程
Original Assignee
上海又为智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海又为智能科技有限公司 filed Critical 上海又为智能科技有限公司
Publication of WO2023165565A1 publication Critical patent/WO2023165565A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R25/00Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception
    • H04R25/43Electronic input selection or mixing based on input signal analysis, e.g. mixing or selection between microphone and telecoil or between microphones with different directivity characteristics
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/20Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic
    • H04R2430/25Array processing for suppression of unwanted side-lobes in directivity characteristics, e.g. a blocking matrix

Definitions

  • the present application relates to a beamforming technology, and more specifically, to an audio enhancement method and device, and a computer storage medium.
  • Beamforming algorithms are often applied to audio equipment such as headphones, hearing aids, and speakers.
  • the basic principle is to pick up sound through two or more microphones, and calculate the time when the same sound arrives at different microphones to determine the source of the sound.
  • the algorithm can be used to retain or eliminate the sound from a certain direction.
  • a Bluetooth wireless headset with an environmental noise reduction function can configure two microphones to be placed up and down, so that a person's mouth is roughly on a straight line where the two microphones are connected. Picking up the wearer's voice in this way can help eliminate ambient noise, thereby improving the sound quality during calls.
  • hearing aids on the market are generally equipped with two microphones, and the two microphones can be placed front and back, so that the extraction of the front sound (relative to the wearer's orientation, the same below) and the rear sound can be realized through the beamforming algorithm. so that the wearer can better focus on the sound ahead during a conversation.
  • the typical beamforming algorithm can only preserve the sound in a certain direction, and all the sounds in other directions will be cut. This is not suitable for application scenarios such as wanting to simulate the sound collection effect of the human ear through two or more microphones on a hearing aid. Therefore, it is necessary to provide an improved beamforming algorithm.
  • An object of the present application is to provide an audio enhancement method and device, and a computer storage medium, so as to solve the problem that a beamforming algorithm suppresses sound in a non-target direction too much.
  • an audio enhancement method comprising: generating a group of audio collection signals by a microphone array, wherein each audio collection signal in the group of audio collection signals is generated by the microphone array Generated by one of the microphones, and each microphone in the microphone array is spaced apart from each other; the group of audio acquisition signals is delayed and summed to generate the delayed sum signal Y DSB (k, l), where k represents the frequency Window (frequency bin), and l represents the frame index; block matrix processing is performed on the group of audio acquisition signals to generate into a blocking matrix signal Y BM (k, l); utilize the adaptive filter matrix W ANC to filter the blocking matrix signal Y BM (k, l), and filter the blocking matrix signal from the delay summation signal Y DSB (k, l) to obtain an enhanced audio output signal Y OUT (k, l); wherein the adaptive filter matrix W ANC is based on at least one attenuation function ⁇ (t), with The audio output signal Y
  • the microphone array includes at least two microphones located on the same audio processing device.
  • the audio processing device is adapted to be worn in the pinna of a person.
  • one of the at least two microphones is oriented towards the pinna and the other of the at least two microphones is oriented away from the pinna.
  • the audio output signal is determined by the following equation:
  • the adaptive filter matrix W ANC is determined by the following equation: where P est (k,l) is determined by the following equation: where ⁇ is the forgetting factor and M is the number of microphones in the microphone array.
  • the at least one decay function includes a first decay function and a second decay function, the first decay function is updated at a first predetermined update interval, and the second decay function is updated at a second update at predetermined update intervals; wherein, the first attenuation function corresponds to high-frequency signals greater than or equal to a predetermined frequency threshold; and the second attenuation function corresponds to low-frequency signals less than a predetermined frequency threshold, and the first predetermined update The interval is shorter than the second predetermined update interval.
  • each of the attenuation functions ⁇ (t) is updated in the current update interval based on its value in the first update interval.
  • each point in the attenuation function ⁇ (t) in the current update interval is assigned 0 based on the value of the corresponding point in the first update interval ⁇ 1 to update the weight of the change.
  • said weight is a linear function of time within said current update interval.
  • the weight is a linear increasing function with respect to time within the current update interval.
  • said weight is a non-linear function of time within said current update interval.
  • each of the attenuation functions ⁇ (t) is also updated in the current update interval based on its value at the end of a previous update interval.
  • each of the attenuation functions ⁇ (t) satisfies the following equation within the current update interval (NT, (N+1)T]: Where N is a positive integer.
  • an audio enhancement device in another aspect of the present application, includes a non-transitory computer storage medium on which one or more executable instructions are stored, and the one or more executable instructions are stored by After execution, the processor executes any audio enhancement method as described above.
  • the audio enhancement device may be a hearing aid device.
  • a non-transitory computer storage medium on which one or more executable instructions are stored, and the one or more executable instructions are executed by a processor and then executed as described above. any of the audio enhancement methods described above.
  • FIG. 1 shows a schematic diagram of a beamforming algorithm according to an example
  • FIG. 2 shows a schematic diagram of a beamforming algorithm according to an example
  • FIG. 3 shows a schematic diagram of a beamforming algorithm according to an embodiment of the present application
  • Fig. 4 shows an audio enhancement method according to one embodiment of the present application
  • FIG. 5 shows a schematic diagram of a beamforming algorithm according to an embodiment of the present application
  • FIG. 6 shows a schematic diagram of a beamforming algorithm according to an embodiment of the present application.
  • Fig. 7 shows a schematic diagram of the effect of a beamforming algorithm according to an embodiment of the present application
  • Fig. 8 shows a schematic diagram of the effect of a beamforming algorithm according to an embodiment of the present application
  • FIG. 9 shows a schematic diagram of the effect of a beamforming algorithm according to an embodiment of the present application.
  • FIG. 1 and 2 illustrate beamforming algorithms according to some examples.
  • sound emitted by a sound source 101 can be picked up by a microphone 102-1 and a microphone 102-2 such as a hearing aid.
  • the microphone 102-1 and the microphone 102-2 can be arranged on the left and right sides of the hearing aid wearer 103 (for example, in the auricles on both sides), and the distance between them can be a constant value d.
  • the distance d may depend on the inter-ear distance of the wearer 103 .
  • the wearer 103 faces upward in FIG. 1 (ie, in front of the wearer) at the illustrated angle of 0°.
  • delay beamformer 201 and blocking matrix 202 receive and process signals from microphone 102-1 and microphone 102-2, respectively.
  • the signal Y DSB processed by the delay beamformer 201 can satisfy, for example,
  • the least mean square adaptive filter (LMS filter) 203 with adjustable parameters will further process Y BM , and send the processed result to the summation unit 204, and the signal Y GSC (k from the summation unit 204 output , l) satisfy Where WANC (k, l) is the iteration coefficient of the LMS filter 203, and * represents the conjugate.
  • LMS filter least mean square adaptive filter
  • equation (2) can be expressed as:
  • is a forgetting factor.
  • the introduction of the forgetting factor ⁇ can emphasize the amount of information provided by new data and gradually reduce the influence of earlier data, preventing data saturation.
  • the above-mentioned beamforming algorithm can only preserve the sound in a preset direction, and will completely reduce the sound in other directions. For example, returning to Figure 1, if the reserved direction is set to 90°, then this algorithm will almost completely preserve the sound in the 90° direction, but almost completely eliminate the signal in the 0° direction, and the sound from the 0° direction to 90° Sound between directions is also attenuated depending on the angle. For application scenarios such as two or more microphones used on a hearing aid to simulate the sound collection effect of the human ear, this method of signal processing that only preserves orientation may not be ideal.
  • the structure of the pinna of the human ear has the effect of assisting sound collection, which makes people receive sound better from the front than from the rear, and has different effects on sounds of different frequencies. Therefore, if the effect of simulating the pinna of the human ear is realized on the hearing aid, a beamforming method that can customize the adjustment of sounds in different directions is needed. In addition, it is more expected that the method can also be adjusted specifically for sounds of different frequencies.
  • This application proposes an algorithm that can control the degree of attenuation and/or control the degree of attenuation of signals of different frequencies with low power consumption, so that the application based on the algorithm is more in line with the auditory experience of the human ear.
  • Fig. 3 shows a schematic diagram of a beamforming algorithm according to an embodiment of the present application.
  • the configuration scheme of the iterative coefficient of the LMS filter 303 in the beamforming algorithm according to some examples of the present application will be changed: in the above formula (1), the coefficient ⁇ is set to is a fixed value, and according to some examples of the present application, the coefficient ⁇ is set as a function ⁇ (t) that can change with time in the beamforming algorithm, and in some examples, different functions can also be set for different frequencies (or frequency bands)
  • ⁇ 1 (t), ⁇ 2 (t), . . . will be described in detail below.
  • a delay unit 305 is added in FIG. 3 .
  • the delay unit 305 can delay a series of coefficients U for a period of time (referred to as an update interval in the context of this application, denoted as T), and then use it to calculate the attenuation function ⁇ (t) for the LMS filter 303, thereby realizing Parameter update for LMS filter 303 .
  • the coefficient U may be the value of the attenuation function ⁇ (t) in the first update interval, and the delay unit 305 may delay and output this part of the coefficients U multiple times.
  • This partial coefficient U is also referred to as a reduction coefficient U in the context of the present application.
  • Fig. 5 shows a schematic diagram of a beamforming algorithm according to an embodiment of the present application.
  • curves A, B, and C represent the reduction coefficients U updated in periods #1, #2, and #3, respectively.
  • Curves A, B, and C shown in FIG. 5 have the same shape, which means that the reduction coefficient U is the same in time periods #1, #2, and #3.
  • the reduction coefficient U represented by the shown curve A is the initial part of the attenuation function ⁇ (t), and the curve A can be continuously updated at an update interval T by a delay unit 305 such as shown in FIG. 3 .
  • the copy is updated to obtain curves B, C and subsequent curves (not shown) as shown in the figure. This process of updating and copying is equivalent to delaying and outputting the curve A multiple times.
  • the updated reduction coefficient U will not be applied immediately, but will be gradually applied to the attenuation function ⁇ (t) after a delay of an update interval T middle.
  • the attenuation coefficient U copied in the previous update will be applied in the next update interval.
  • the updated curves A, B and C generated in time periods #1, #2 and #3 will be applied to time periods #2, #3 and #4 respectively to form corresponding curves A', B' and C'.
  • Curves A', B' and C' will be the corresponding parts of the decay function ⁇ (t).
  • each point of the attenuation function ⁇ (t) in the current update interval can be updated based on the value of the corresponding point in the attenuation coefficient U, for example, the value of the corresponding point in the attenuation coefficient U can be assigned a A weight between 0 and 1. In this way, the updated values of each point within the current update interval will be limited within a controllable range.
  • each point in the current update interval and its corresponding point in the attenuation coefficient U are specified in one-to-one correspondence in time order.
  • the weights assigned may be a linear function of time over the current update interval. In some other examples, the assigned weight may also be a non-linear function with respect to time within the current update interval.
  • the weight assigned to the decay function ⁇ (t) may be a linear function with respect to time, or a nonlinear function with respect to time.
  • the decay function ⁇ (t) with respect to time can be expressed by Equation (3):
  • N represents the number of the latest update from the current time point.
  • the decay function ⁇ (t) can be expressed by equation (4) as:
  • the weights assigned in the decay function ⁇ (t) may be non-linear functions with respect to time.
  • the decay function ⁇ (t) with respect to time can be expressed as:
  • N represents the number of the latest update from the current time point.
  • the value of ⁇ (t) in the range of (2T, 3T] (or the shape of curve B’) is related to the value of ⁇ (t) in (0, T] is related to the value of (or in other words, the shape of curve A). Since curves A, B and C in Figure 5 are updated in time periods #1, #2 and #3 respectively, the shape of curve B is related to The shape of curve A is consistent, in other words, the shape of curve B' is related to the shape of curve B. Curve B is curve A The update copy in the time period 2#, so that the updated coefficients can be used in the time period 2T-3T to realize the adjustment for the LMS filter 303 .
  • curves B and C are duplicates of curve A, so at the start of each predetermined update interval, the attenuation coefficient may have the same value (start value of curves B and C).
  • curves B and C can also be fine-tuned with respect to curve A, and at this time, at the starting point of each predetermined update interval, the attenuation coefficient can have different values (starting point values of curves B and C).
  • the human ear responds differently to sounds of different frequencies in different directions, so it is also expected that the beamforming algorithm can respond differently to sounds of different frequencies.
  • the foregoing response adjustment can be realized by setting different update intervals for sound signals of different frequencies.
  • the attenuation of low-frequency and high-frequency sounds can be controlled separately by setting the update intervals of low-frequency and high-frequency sounds, so that the frequency response of the human ear pinna can be simulated.
  • Fig. 6 shows a schematic diagram of a beamforming algorithm according to an embodiment of the present application.
  • the update interval T 1 of the low-frequency sound is greater than the update interval T 2 of the high-frequency sound, so that the attenuation function ⁇ (t) reflects a stronger suppression of the low-frequency sound. This is done because low-frequency sounds diffract better than high-frequency sounds, and low-frequency sounds from sources outside the direction of the target are more likely to travel to the microphone than high-frequency sounds. Additionally, this configuration provides better rejection of low-frequency noise in non-target directions.
  • the threshold for distinguishing low-frequency sounds from high-frequency sounds can also be other frequencies than 4000 Hz, or customized thresholds can be configured according to different hearing aid wearers, so as to better adapt to the wearer physiological characteristics. These customized thresholds can be determined by, for example, actual tests, or can also be determined by statistical data. In other examples, other schemes may be used to distinguish low-frequency and high-frequency sounds, and the scheme for distinguishing is not limited to dividing audible frequencies into two intervals. Correspondingly, the number of attenuation functions is not limited to two.
  • audio may be divided into low-frequency sounds (e.g., frequencies less than 2000 Hz), mid-frequency sounds (e.g., between 2000 Hz and 6000 Hz) and high-frequency sounds (e.g., frequency greater than or equal to 6000Hz) these three intervals.
  • the hearing aid device is adapted to be worn in the pinna of a person, for example, one microphone in the hearing aid may be oriented towards the pinna and the other microphone may be oriented away from the pinna.
  • Fig. 4 shows an audio enhancement method 40 according to an embodiment of the present application, and the audio enhancement method 40 includes illustrated steps S402, S404, S406 and S408. It should be noted that although a feasible sequence is shown in a schematic sequence in FIG. S408. The following will focus on the working principles of steps S402, S404, S406 and S408 of the audio enhancement method 40 in FIG. 4, and the corresponding examples described above together with other figures are cited here, and will not be repeated here due to space limitations.
  • the audio enhancement method 40 generates an audio collection signal in step S402 .
  • sounds such as emitted by sound source 101 may be picked up by microphones 102-1 and 102-2, such as hearing aids, as described above.
  • the microphone 102-1 and the microphone 102-2 can be arranged on the left and right sides of the wearer 103 of the hearing aid, and the distance between them can be a constant value d.
  • the distance d may depend on the inter-ear distance of the wearer 103 .
  • the wearer 103 is facing upwards in FIG. 1 at the illustrated angle of 0°.
  • the sound source 101 is located in the left front of the wearer 103 and forms an angle ⁇ with the midline of the wearer's 103 visual field.
  • Short-time Fourier transform is performed on the signals received by the microphone 102-1 and the microphone 102-2 respectively, and the transformation result of y 1 (t) is Y 1 (k, l), and the transformation result of y 2 (t) is Y 2 (k, l), where k represents a frequency bin, and l represents a frame index.
  • the audio enhancement method 40 performs delay and summation processing on the audio collection signal in step S404.
  • delay beamformer 201 may receive and process signals from microphone 102-1 and microphone 102-2 as described above.
  • the signal Y DSB processed by the delay beamformer 201 can be, for example, I'm satisfied
  • the audio enhancement method 40 performs blocking matrix processing on the audio acquisition signal in step S406.
  • blocking matrix 202 may receive and process signals from microphone 102 - 1 and microphone 102 - 2 as described above.
  • the audio enhancement method 40 performs filtering processing on the blocking matrix signal Y BM (k, l) in step S408.
  • the parameter adjustable LMS filter 303 will further process Y BM , and send the processed result to the summation unit 204, the signal Y GSC output from the summation unit 204 (k,l) satisfies
  • WANC (k, l) is the iteration coefficient of the LMS filter 303, and * represents the conjugate.
  • the attenuation function ⁇ (t) satisfies the relationship defined by equation (3).
  • the delay unit 305 realizes that ⁇ (t) is updated at a predetermined update interval T, which will not be repeated here.
  • Fig. 7, Fig. 8 and Fig. 9 respectively show the effect of testing the beamforming algorithm according to some examples of the present application in the three directions of 90°, 0° and -90° shown in Fig. 1 . It can be seen from the figure that the beamforming algorithm according to some examples of the present application can obtain the frequency response curve of beamforming as shown in the figure according to the frequency response curves of microphone 1 and microphone 2 in the microphone array, and the obtained frequency response curve is the same as The frequency response curve of the real human ear is more consistent.
  • the frequency response curve obtained by the beamforming algorithm does not over-suppress specific directions, so the beamforming algorithm according to some examples of this application has better adaptability to applications that need to simulate the response characteristics of the human ear .
  • the beamforming algorithm according to some examples of the present application not only has a good noise suppression effect, but also takes into account the response characteristics of the human ear, so it is especially suitable for application scenarios such as hearing aids that require faithful reflection of the physical world.
  • an audio enhancement device which includes a non-transitory computer storage medium on which one or more executable instructions are stored, and the one or more executable instructions are processed by a processor After execution, perform any audio enhancement method as described above.
  • audio enhancements may For hearing aid equipment.
  • Another aspect of the present application also proposes a non-transitory computer storage medium, on which one or more executable instructions are stored, and the one or more executable instructions are executed by a processor to perform the above-mentioned Any audio enhancement method.
  • Embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware.
  • the hardware part can be implemented using dedicated logic; the software part can be stored in memory and executed by a suitable instruction execution system such as a microprocessor or specially designed hardware.
  • a suitable instruction execution system such as a microprocessor or specially designed hardware.
  • processor control code for example, on a carrier medium such as a magnetic disk, CD or DVD-ROM, such as a read-only memory
  • Such code is provided on a programmable memory (firmware) or on a data carrier such as an optical or electronic signal carrier.
  • the device and its modules of the present invention may be implemented by hardware circuits such as VLSI or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., It can also be realized by software executed by various types of processors, or by a combination of the above-mentioned hardware circuits and software such as firmware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Otolaryngology (AREA)
  • Neurosurgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

本申请公开了一种音频增强方法和装置、计算机存储介质。所述方法包括:由麦克风阵列生成一组音频采集信号;对该组音频采集信号进行延迟求和处理,以生成延迟求和信号;对该组音频采集信号进行阻塞矩阵处理,以生成阻塞矩阵信号;利用自适应滤波矩阵对所述阻塞矩阵信号进行滤波处理,并将经滤波的阻塞矩阵信号从所述延迟求和信号中移除,以得到经增强的音频输出信号。所述自适应滤波矩阵是基于至少一个衰减函数,并且所述至少一个衰减函数中的每一个以对应的预定更新间隔T更新。

Description

音频增强方法和装置、计算机存储介质 技术领域
本申请涉及一种波束形成技术,更具体地,涉及一种音频增强方法和装置、计算机存储介质。
背景技术
波束形成算法常常应用于耳机、助听器和音箱等音频设备,其基本原理是通过两个或多个麦克风拾取声音,并计算同一个声音到达不同麦克风的时间,以此来确定声音的来源。在后续过程中可以通过算法来保留或者消除某个方向来的声音。例如,带有环境降噪功能的蓝牙无线耳机可以将两个麦克风配置成上下摆放,使得人的嘴巴大致处于两个麦克风连接的直线上。以这种方式来拾取佩戴者说话的声音可以有利于消除环境噪音,从而提高通话时的音质。目前市面上的助听器上一般配有两个麦克风,两个麦克风可以前后摆放,从而可以通过波束形成算法实现对于前方声音(相对于佩戴者的朝向而言,下同)的提取和对于后方声音的消除,这样佩戴者在谈话中能够更好地专注于前面的声音。
但是典型的波束形成算法仅能对于设置的某一个方向的声音进行保留,对于其他方向的声音会全部削减。这对于诸如在助听器上想要通过两个或多个麦克风模拟人耳的收声效果之类的应用场景是不合适的。因此,有必要提供一种改进的波束形成算法。
发明内容
本申请的一个目的在于提供一种音频增强方法和装置、计算机存储介质,以解决波束形成算法对于非目标方向上的声音过抑制的问题。
在本申请的一个方面,提供了一种音频增强方法,所述方法包括:由麦克风阵列生成一组音频采集信号,其中该组音频采集信号中的每个音频采集信号是由所述麦克风阵列中的一个麦克风生成的,并且所述麦克风阵列中的每个麦克风相互间隔开;对该组音频采集信号进行延迟求和处理,以生成延迟求和信号YDSB(k,l),其中k表示频率窗口(frequency bin),而l表示帧指数;对该组音频采集信号进行阻塞矩阵处理,以生 成阻塞矩阵信号YBM(k,l);利用自适应滤波矩阵WANC对所述阻塞矩阵信号YBM(k,l)进行滤波处理,并将经滤波的阻塞矩阵信号从所述延迟求和信号YDSB(k,l)中移除,以得到经增强的音频输出信号YOUT(k,l);其中,所述自适应滤波矩阵WANC是基于至少一个衰减函数μ(t),随所述音频输出信号YOUT(k,l)和所述阻塞矩阵信号YBM(k,l)变化的权重系数矩阵,并且所述至少一个衰减函数μ(t)中的每一个以对应的预定更新间隔T更新。
在一些实施例中,可选地,所述麦克风阵列包括位于同一音频处理装置上的至少两个麦克风。
在一些实施例中,可选地,所述音频处理装置适于佩戴于人耳廓内。
在一些实施例中,可选地,所述至少两个麦克风中的一个被定向为朝向耳廓,而所述至少两个麦克风中的另一个被定向为远离耳廓。
在一些实施例中,可选地,所述音频输出信号由下述等式确定: 并且,所述自适应滤波矩阵WANC由下述等式确定:其中,Pest(k,l)由下述等式确定:其中,α是遗忘因子,M为麦克风阵列中麦克风的数量。
在一些实施例中,可选地,所述至少一个衰减函数包括第一衰减函数和第二衰减函数,所述第一衰减函数以第一预定更新间隔更新,所述第二衰减函数以第二预定更新间隔更新;其中,所述第一衰减函数对应于大于或者等于预定频率阈值的高频信号;而所述第二衰减函数对应于小于预定频率阈值的低频信号,并且所述第一预定更新间隔短于所述第二预定更新间隔。
在一些实施例中,可选地,所述衰减函数μ(t)中的每一个在当前更新间隔内基于其于第一个更新间隔内取值进行更新。
在一些实施例中,可选地,所述衰减函数μ(t)中的每一个在所述当前更新间隔内的各点是基于其于第一个更新间隔内相应的一点的取值赋予0~1之间的变化权重而进行更新的。
在一些实施例中,可选地,所述权重在所述当前更新间隔内是关于时间的线性函数。
在一些实施例中,可选地,所述权重在所述当前更新间隔内是关于时间的线性递增函数。
在一些实施例中,可选地,所述权重在所述当前更新间隔内是关于时间的非线性函数。
在一些实施例中,可选地,所述衰减函数μ(t)中的每一个在所述当前更新间隔内还基于其于上一个更新间隔末的取值进行更新。
在一些实施例中,可选地,所述衰减函数μ(t)中的每一个在当前更新间隔(NT,(N+1)T]内满足如下等式: 其中N取正整数。
在本申请的另一方面,还提供了一种音频增强装置,所述装置包括非暂态计算机存储介质,其上存储有一个或多个可执行指令,所述一个或多个可执行指令被处理器执行后执行如上文所述的任意一种音频增强方法。
在一些实施例中,可选地,所述音频增强装置可以为助听器设备。
在本申请的又一方面,还提供了一种非暂态计算机存储介质,其上存储有一个或多个可执行指令,所述一个或多个可执行指令被处理器执行后执行如上文所述的任意一种音频增强方法。
以上为本申请的概述,可能有简化、概括和省略细节的情况,因此本领域的技术人员应该认识到,该部分仅是示例说明性的,而不旨在以任何方式限定本申请范围。本概述部分既非旨在确定所要求保护主题的关键特征或必要特征,也非旨在用作为确定所要求保护主题的范围的辅助手段。
附图说明
通过下面说明书和所附的权利要求书并与附图结合,将会更加充分地清楚理解本申请内容的上述和其他特征。可以理解,这些附图仅描绘了本申请内容的若干实施方式,因此不应认为是对本申请内容范围的限定。通过采用附图,本申请内容将会得到更加明确和详细地说明。
图1示出了根据一个示例的波束形成算法的示意图;
图2示出了根据一个示例的波束形成算法的示意图;
图3示出了根据本申请的一个实施例的波束形成算法的示意图;
图4示出了根据本申请的一个实施例的音频增强方法;
图5示出了根据本申请的一个实施例的波束形成算法的示意图;
图6示出了根据本申请的一个实施例的波束形成算法的示意图;
图7示出了根据本申请的一个实施例的波束形成算法的效果的示意图;
图8示出了根据本申请的一个实施例的波束形成算法的效果的示意图;
图9示出了根据本申请的一个实施例的波束形成算法的效果的示意图。
在详细解释本发明的任何实施例之前,应该理解,本发明的应用不限于在下面的描述中阐述的或在以下附图中示出的构造的细节和部件的布置。本发明能够具有其他实施例并且能够以各种方式实践或实施。而且,应该理解,这里使用的措辞和术语是为了描述的目的,不应被认为是限制性的。
具体实施方式
在下面的详细描述中,参考了构成其一部分的附图。在附图中,类似的符号通常表示类似的组成部分,除非上下文另有说明。详细描述、附图和权利要求书中描述的说明性实施方式并非旨在限定。在不偏离本申请的主题的精神或范围的情况下,可以采用其他实施方式,并且可以做出其他变化。可以理解,可以对本申请中一般性描述的、在附图中图解说明的本申请内容的各个方面进行多种不同构成的配置、替换、组合、设计,而所有这些都明确地构成本申请内容的一部分。
图1和图2示出了根据一些示例的波束形成算法。如图1所示,声源101发出的声音可以为诸如助听器的麦克风102-1和麦克风102-2所拾取。麦克风102-1和麦克风102-2可以设置在助听器的佩戴者103的左右两侧(例如,设置在两侧耳廓内),二者之间的距离可以为定值d。例如,距离d可以取决于佩戴者103的两耳间距离。佩戴者103以图示的角度0°面向图1中的上方(即佩戴者的前方)。声源101位于佩戴者103左前方,与佩戴者103视野的中线成角度θ。由于声源101与佩戴者103(及其两耳)的距离远超过两耳之间的距离,因而可以认为声源101相对于麦克风102-1和麦克风102-2而言都近似成图示的角度θ。由几何关系可知,假设声音在空气中传播的速度为v,麦克风102-1接收到的信号为y1(t),那么麦克风102-2接收到的信号y2(t)=y1(t-τ),其中τ=(d*sin(θ))/v。
分别为麦克风102-1和麦克风102-2接收到的声音信号进行短时傅里叶变换,假设y1(t)的变换结果为Y1(k,l),y2(t)的变换结果为Y2(k,l),其中k表示频率窗口(frequency bin),l表示帧指数(frame index),那么Y1(k,l)与Y2(k,l)满足如下关系:Y2(k,l)=Y1(k,l)*e-jωτ
转至图2,其中延迟波束形成器201和阻塞矩阵202分别接收来自麦克风102-1和麦克风102-2的信号并对其进行处理。在一些方案中,经延迟波束形成器201处理得到的信号YDSB例如可以满足经阻塞矩阵202处理得到的信号YBM例如可以满足YBM=Y1(k,l)-Y2(k,l)ejωτ。参数可调的最小均方自适应滤波器(LMS滤波器)203将对YBM作进一步处理,并将处理后的结果发送到求和单元204,从求和单元204输出的信号YGSC(k,l)满足其中WANC(k,l)是LMS滤波器203的迭代系数,*表示共轭。
进一步地,WANC(k,l)满足下面的关系:

Pest(k,l)=αPest(k,l-1)+(1-α)(|YBM(k,l)|2+|YGSC(k,l)|2)    (2)
若助听器包括M个麦克风用于采集声音信号,则等式(2)可以表示为:
在上述等式(2)和(2’)中,α为遗忘因子。如所理解的,遗忘因子α的引入可以强调新数据提供的信息量并且逐渐削减较早数据的影响,防止数据饱和。
但是,正如前述,上述的波束形成算法仅能对于预先设置的某一个方向的声音进行保留,对于其他方向的声音则会全部削减。例如,返回图1,如果设定的保留方向是90°,那么这种算法对于90°方向的声音将几乎全部保留,但是对于0°方向的信号几乎全部消除了,并且0°方向至90°方向之间的声音也会视角度不同而有所衰减。对于诸如在助听器上要通过两个或多个麦克风来模拟人耳的收声效果之类的应用场景,这种仅定向保留的信号处理方式可能是不理想的。在实际生活中,人耳的耳廓构造有辅助收声的效果,使人对于前方的收声比后方收声更好,且对于不同频率的声音有不同的效果。因此,如果在助听器上实现模拟人耳耳廓的效果,需要一种可以对不同方向的声音进行定制化调节的波束形成方法。 此外更期待地,该方法也可以对于不同频率的声音进行针对性地调节。
本申请中提出了一种可以在低功耗的情况下控制衰减程度和/或控制不同频率信号的衰减程度的算法,以使得基于该算法的应用更符合人耳的听觉感受。
图3示出了根据本申请的一个实施例的波束形成算法的示意图。区别于以上关于图1和图2所描述的方案,根据本申请一些示例的波束形成算法中关于LMS滤波器303的迭代系数的配置方案将有所变化:上式(1)中将系数μ设为定值,而根据本申请一些示例的波束形成算法中将系数μ设置为可随时间变化的函数μ(t),并且在一些示例中还可以针对不同的频率(或频段)设置不同的函数μ1(t)、μ2(t)、……关于该系数的设置,将在下文中详细说明。
如图3所示,相比于图2所示的方案,图3中增加了一个延迟单元305。延迟单元305可以将一系列系数U延迟一段时间(在本申请的上下文中称为更新间隔,记为T),再将其用于计算针对LMS滤波器303的衰减函数μ(t),从而实现对LMS滤波器303的参数更新。如在下文将描述的,该系数U可以是衰减函数μ(t)在第一个更新间隔内的取值,并且延迟单元305可以针对该部分系数U多次延迟并输出。该部分系数U在本申请的上下文中也被称为削减系数U。
根据本申请的一些示例,每经过一段更新间隔,波束形成的削减系数U就会被重新迭代以用于形成随时间变化的衰减函数μ(t)。以此方式,可以控制声音信号衰减的强度,从而防止对于非目标方向上的声音的过度抑制。图5示出了根据本申请的一个实施例的波束形成算法的示意图。如图5所示,曲线A、B和C分别表示在时间段#1、#2和#3中更新的削减系数U。图5中示出的曲线A、B和C具有相同的形态,这说明削减系数U在时间段#1、#2和#3中是相同的。具体而言,示出的曲线A所代表的削减系数U是衰减函数μ(t)的起始部分,并且可以通过诸如图3中所示的延迟单元305对曲线A以更新间隔T为周期不断更新复制,得到如图所示的曲线B、C以及后续的各条曲线(未示出)。这个更新复制的过程相当于对曲线A进行了多次延迟并输出。
另一方面,为了保持音频衰减函数μ(t)的连续,更新的该部分削减系数U不会马上被应用,其会经过一个更新间隔T的延迟之后才被逐渐应用到衰减函数μ(t)中。如图5所示,上一个更新复制的衰减系数U会被应用于下一个更新间隔段内。具体而言,在时间段#1、#2和#3中生成更新的曲线A、B和C将分别应用于时间段#2、#3和#4,以形成对应的曲线A’、B’和C’。曲线A’、B’和C’将作为衰减函数μ(t)的相应部分。
衰减函数μ(t)在当前更新间隔内的各点的取值可以是基于衰减系数U中相应的一点的取值进行更新的,例如,可以对衰减系数U中相应的一点的取值赋予一个介于0到1之间的权重。如此,在当前更新间隔内的更新的各点取值将被限制在一个可控范围内。需要说明的是,在本申请的上下文中,当前更新间隔内的各点与其处于衰减系数U中相应的一点是按时间顺序一一对应而指定的。在一些示例中,被赋予的权重在当前更新间隔内可以是关于时间的线性函数。在其他一些示例中,被赋予的权重在当前更新间隔内也可以是关于时间的非线性函数。
如上文所述,在一些示例中,衰减函数μ(t)中被赋予的权重可以为关于时间的线性函数,也可以为关于时间的非线性函数。例如,在权重为关于时间的线性函数(线性递增函数)的情况下,关于时间的衰减函数μ(t)可以用等式(3)表示为:
其中,N表示离当前时间点最近的一次更新的次数。例如,在时间段#3(2T到3T)内,衰减函数μ(t)可以用等式(4)表示为:
从上式(3)和(4)可知,将权重设置为关于时间的线性递增函数可以在一定程度上抵消μ(t-N*T)的“过收敛”特性,从而提供了一种补偿机制。
在一些示例中,衰减函数μ(t)中被赋予的权重可以为关于时间的非线性函数。例如,关于时间的衰减函数μ(t)可以表示为:
其中,N表示离当前时间点最近的一次更新的次数。
以上关于衰减函数μ(t)的数学描述将有助于理解衰减函数μ(t)的产生机制,但是衰减函数μ(t)在现实世界的产生方法仍然可以借助于图3中示出的延迟单元305。由上式(4)可知,μ(t)在(2T,3T]这一范围内的取值与μ(t)在(0,T]的取值和μ(t)在上个更新间隔的末尾的取值μ(2T)相关。因而μ(t)在(2T,3T]这一范围内的取值(或者说,曲线B’的形态)是与μ(t)在(0,T]的取值(或者说,曲线A的形态)相关的。由于图5中的曲线A、B和C是分别在时间段#1、#2和#3内更新的,因而曲线B的形态是与曲线A的形态是一致的,换言之,曲线B’的形态是与曲线B的形态相关的。曲线B是曲线A 在时间段2#内的更新复制,因而在时间段2T~3T内可以利用更新的系数实现了针对LMS滤波器303的调节。上述在更新间隔T内对曲线的持续复制更新将导致衰减函数μ(t)按更新间隔T产生并更新,从而避免滤波器的过收敛造成对于非目标方向上的声音的过度抑制。另一方面,由于μ(t)在(2T,3T]这一范围内的取值与μ(t)在上个更新间隔的末尾的取值μ(2T)相关,因而μ(t)在时刻2T前后不会出现剧烈跳变。μ(t)的平滑可以使得诸如助听器的佩戴者免于音量非预期起伏带来的困扰。
上文介绍了曲线B和C是对曲线A的复制,因而在每个预定更新间隔起始点处,衰减系数可以具有相同的值(曲线B和C的起点值)。在其他一些示例中,曲线B和C也可以针对曲线A作微调,此时在每个预定更新间隔起始点处,衰减系数可以具有不同的值(曲线B和C的起点值)。
此外,由于人耳耳廓等因素导致人耳对不同方向上的不同频率的声音响应不一样,因而也期待波束形成算法能够对不同频率的声音进行不同的响应。在本申请的一些示例中,通过对不同频率的声音信号设定不同的更新间隔,可以实现前述的响应调整。例如,可以通过分别设定低频和高频声音的更新间隔,来分别控制低频和高频声音的衰减程度,从而可以模拟人耳耳廓的频率响应。
图6示出了根据本申请的一个实施例的波束形成算法的示意图。如图6所示,可以为低频声音(例如,频率小于4000Hz)配置更新间隔T1=5T0,而为高频声音(例如,频率大于等于4000Hz)配置更新间隔T2=T0。其中,低频声音的更新间隔T1大于高频声音的更新间隔T2,以使得衰减函数μ(t)体现为对低频声音更强的抑制。之所以这样处理,是因为低频声音相对于高频声音具有更好的衍射能力,而目标方向之外的声源中发出的低频声音相比高频声音更易于传播到麦克风处。此外,这种配置方式也能更好地抑制非目标方向上的低频噪音。
在其他示例中,区分低频声音和高频声音的阈值还可以为不同于4000Hz的其他频率,或者还可以根据例如不同的助听器佩戴者配置定制化的阈值,以此可以更好地适配佩戴者的生理特征。这些定制化阈值可以通过例如实际测试来确定,或者也可以通过统计数据来确定。在其他示例中,还可以通过其他方案来区分低频和高频声音,并且区分的方案也不限于将可闻频率划分为两个区间。相应地,衰减函数的数量也不限为2。例如,可以以阈值2000Hz和6000Hz将音频划分为低频声音(例如,频率小于2000Hz)、中频声音(例如,介于2000Hz与6000Hz之间)和高频声音(例如, 频率大于等于6000Hz)这三个区间。并且可以各个区间的音频配置不同的更新区间。例如,为低频声音配置更新间隔T3=5T0,为中频声音配置更新间隔T4=3T0,以及为高频声音配置更新间隔T5=T0
在本申请的一些示例中,助听器设备适于佩戴于人耳廓内,例如,助听器中一个麦克风可以被定向为朝向耳廓,而另一个麦克风可以被定向为远离耳廓。
图4示出了根据本申请的一个实施例的音频增强方法40,音频增强方法40包括图示的步骤S402、S404、S406和S408。应当指出,尽管图4中以示意的顺序示出了一种可行的顺序,但是步骤S402、S404、S406和S408的执行不限于此,还可以以其他可行的顺序执行步骤S402、S404、S406和S408。下面将重点介绍图4中音频增强方法40的步骤S402、S404、S406和S408的工作原理,上文中连同其他附图描述的对应示例一并引用于此,限于篇幅在此不再赘述。
如图4所示,音频增强方法40在步骤S402中生成音频采集信号。在一些示例中,如上文所描述的,诸如声源101发出的声音可以为诸如助听器的麦克风102-1和麦克风102-2所拾取。麦克风102-1和麦克风102-2可以设置在助听器的佩戴者103的左右两侧,二者之间的距离可以为定值d。例如,距离d可以取决于佩戴者103的两耳间距离。佩戴者103以图示的角度0°面向图1中的上方。声源101位于佩戴者103左前方,与佩戴者103的视野中线成角度θ。由于声源101与佩戴者103(及其两耳)的距离远超过两耳之间的距离,因而可以认为声源101相对于麦克风102-1和麦克风102-2而言都成图示的角度θ。由几何关系可知,假设声音在空气中传播的速度为v,麦克风102-1接收到的信号为y1(t),那么麦克风102-2接收到的信号y2(t)=y1(t-τ),其中τ=(d*sin(θ))/v。
分别为麦克风102-1和麦克风102-2接收到的信号进行短时傅里叶变换,设y1(t)的变换结果为Y1(k,l),y2(t)的变换结果为Y2(k,l),其中k表示频率窗口(frequency bin),l表示帧指数。生成的音频采集信号Y1(k,l)和Y2(k,l)将满足如下关系:Y2(k,l)=Y1(k,l)e-jωτ
音频增强方法40在步骤S404中对音频采集信号进行延迟求和处理。转至图3,如上文所描述的,延迟波束形成器201可以接收来自麦克风102-1和麦克风102-2的信号并进行处理。在一些方案中,经延迟波束形成器201处理得到的信号YDSB例如可 以满足
音频增强方法40在步骤S406中对音频采集信号进行阻塞矩阵处理。继续参考图3,如上文所描述的,阻塞矩阵202可以接收来自麦克风102-1和麦克风102-2的信号并进行处理。在一些方案中,经阻塞矩阵202处理得到的信号YBM例如可以满足YBM=Y1(k,l)-Y2(k,l)ejωτ
音频增强方法40在步骤S408中对阻塞矩阵信号YBM(k,l)进行滤波处理。继续参考图3,如上文所描述的,参数可调的LMS滤波器303将对YBM作进一步处理,并将处理后的结果发送到求和单元204,从求和单元204输出的信号YGSC(k,l)满足其中WANC(k,l)是LMS滤波器303的迭代系数,*表示共轭。
进一步地,WANC(k,l)满足下面的等式(5)和(6)定义的关系:

Pest(k,l)=αPest(k,l-1)+(1-α)(|YBM(k,l)|2+|YGSC(k,l)|2)    (6)
其中,衰减函数μ(t)满足如等式(3)所定义的关系。如上文所描述的,延迟单元305实现了使得μ(t)按预定更新间隔T更新,在此不再赘述。
图7、图8和图9分别示出了在图1所示的90°、0°和-90°这三个方向上来测试根据本申请的一些示例的波束形成算法的效果。从图中可知,根据本申请的一些示例的波束形成算法可以根据麦克风阵列中的麦克风1和麦克风2的频率响应曲线得到如图所示的波束形成的频率响应曲线,并且得到的频率响应曲线与真实人耳的频率响应曲线较为吻合。从仿真的结果可以看出,波束形成算法得到的频率响应曲线没有对特定方向过抑制,因而根据本申请的一些示例的波束形成算法对需要模拟人耳响应特性的应用有着较好的适配性。根据本申请的一些示例的波束形成算法在对噪声起到良好的抑制效果的基础上,还兼顾了人耳响应特性,因而尤其适应于诸如助听器等要求对物理世界如实反映的应用场景。
本申请的另一方面还提出了一种音频增强装置,所述装置包括非暂态计算机存储介质,其上存储有一个或多个可执行指令,所述一个或多个可执行指令被处理器执行后执行如上文所述的任意一种音频增强方法。在一些示例中,这种音频增强装置可以 为助听器设备。
本申请的另一方面还提出了一种非暂态计算机存储介质,其上存储有一个或多个可执行指令,所述一个或多个可执行指令被处理器执行后执行如上文所述的任意一种音频增强方法。
本发明的实施例可以通过硬件、软件或者软件和硬件的结合来实现。硬件部分可以利用专用逻辑来实现;软件部分可以存储在存储器中,由适当的指令执行系统,例如微处理器或者专用设计硬件来执行。本领域的普通技术人员可以理解上述的设备和方法可以使用计算机可执行指令和/或包含在处理器控制代码中来实现,例如在诸如磁盘、CD或DVD-ROM的载体介质、诸如只读存储器(固件)的可编程的存储器或者诸如光学或电子信号载体的数据载体上提供了这样的代码。本发明的设备及其模块可以由诸如超大规模集成电路或门阵列、诸如逻辑芯片、晶体管等的半导体、或者诸如现场可编程门阵列、可编程逻辑设备等的可编程硬件设备的硬件电路实现,也可以用由各种类型的处理器执行的软件实现,也可以由上述硬件电路和软件的结合例如固件来实现。
应当注意,尽管在上文详细描述中提及了音频增强方法、装置和存储介质的若干步骤或模块,但是这种划分仅仅是示例性的而非强制性的。实际上,根据本申请的实施例,上文描述的两个或更多模块的特征和功能可以在一个模块中具体化。反之,上文描述的一个模块的特征和功能可以进一步划分为由多个模块来具体化。
本技术领域的一般技术人员可以通过研究说明书、公开的内容及附图和所附的权利要求书,理解和实施对披露的实施方式的其他改变。在权利要求中,措词“包括”不排除其他的元素和步骤,并且措辞“一”、“一个”不排除复数。在本申请的实际应用中,一个零件可能执行权利要求中所引用的多个技术特征的功能。权利要求中的任何附图标记不应理解为对范围的限制。

Claims (16)

  1. 一种音频增强方法,其特征在于,所述方法包括:
    由麦克风阵列生成一组音频采集信号,其中该组音频采集信号中的每个音频采集信号是由所述麦克风阵列中的一个麦克风生成的,并且所述麦克风阵列中的每个麦克风相互间隔开;
    对该组音频采集信号进行延迟求和处理,以生成延迟求和信号YDSB(k,l),其中k表示频率窗口,而l表示帧指数;
    对该组音频采集信号进行阻塞矩阵处理,以生成阻塞矩阵信号YBM(k,l);
    利用自适应滤波矩阵WANC对所述阻塞矩阵信号YBM(k,l)进行滤波处理,并将经滤波的阻塞矩阵信号从所述延迟求和信号YDSB(k,l)中移除,以得到经增强的音频输出信号YOUT(k,l);
    其中,所述自适应滤波矩阵WANC是基于至少一个衰减函数μ(t),随所述音频输出信号YOUT(k,l)和所述阻塞矩阵信号YBM(k,l)变化的权重系数矩阵,并且所述至少一个衰减函数μ(t)中的每一个以对应的预定更新间隔T更新。
  2. 根据权利要求1所述的方法,其特征在于,所述麦克风阵列包括位于同一音频处理装置上的至少两个麦克风。
  3. 根据权利要求2所述的方法,其特征在于,所述音频处理装置适于佩戴于人耳廓内。
  4. 根据权利要求3所述的方法,其特征在于,所述至少两个麦克风中的一个被定向为朝向耳廓,而所述至少两个麦克风中的另一个被定向为远离耳廓。
  5. 根据权利要求1所述的方法,其特征在于,所述音频输出信号由下述等式确定:
    并且,所述自适应滤波矩阵WANC由下述等式确定:
    其中,Pest(k,l)由下述等式确定:
    其中,α是遗忘因子,M为麦克风阵列中麦克风的数量。
  6. 根据权利要求1所述的方法,其特征在于,所述至少一个衰减函数包括第一衰减函数和第二衰减函数,所述第一衰减函数以第一预定更新间隔更新,所述第二衰减函数以第二预定更新间隔更新;其中,所述第一衰减函数对应于大于或者等于预定频率阈值的高频信号;而所述第二衰减函数对应于小于预定频率阈值的低频信号,并且所述第一预定更新间隔短于所述第二预定更新间隔。
  7. 根据权利要求1所述的方法,其特征在于,所述衰减函数μ(t)中的每一个在当前更新间隔内基于其于第一个更新间隔内取值进行更新。
  8. 根据权利要求7所述的方法,其特征在于,所述衰减函数μ(t)中的每一个在所述当前更新间隔内的各点是基于其于第一个更新间隔内相应的一点的取值赋予0~1之间的变化权重而进行更新的。
  9. 根据权利要求8所述的方法,其特征在于,所述权重在所述当前更新间隔内是关于时间的线性函数。
  10. 根据权利要求9所述的方法,其特征在于,所述权重在所述当前更新间隔内是关于时间的线性递增函数。
  11. 根据权利要求8所述的方法,其特征在于,所述权重在所述当前更新间隔内是关于时间的非线性函数。
  12. 根据权利要求9或10所述的方法,其特征在于,所述衰减函数μ(t)中的每一个在所述当前更新间隔内还基于其于上一个更新间隔末的取值进行更新。
  13. 根据权利要求12所述的方法,其特征在于,所述衰减函数μ(t)中的每一个在当前更新间隔(NT,(N+1)T]内满足如下等式:
    其中N取正整数。
  14. 一种音频增强装置,其特征在于,所述装置包括非暂态计算机存储介质,其上存储有一个或多个可执行指令,所述一个或多个可执行指令被处理器执行后执行下述步骤:
    由麦克风阵列生成一组音频采集信号,其中该组音频采集信号中的每个音频采集信号是由所述麦克风阵列中的一个麦克风生成的,并且所述麦克风阵列中的每个麦克风相互间隔开;
    对该组音频采集信号进行延迟求和处理,以生成延迟求和信号YDSB(k,l),其中k表示频率窗口,而l表示帧指数;
    对该组音频采集信号进行阻塞矩阵处理,以生成阻塞矩阵信号YBM(k,l);
    利用自适应滤波矩阵WANC对所述阻塞矩阵信号YBM(k,l)进行滤波处理,并将经滤波的阻塞矩阵信号从所述延迟求和信号YDSB(k,l)中移除,以得到经增强的音频输出信号YOUT(k,l);
    其中,所述自适应滤波矩阵WANC是基于至少一个衰减函数μ(t),随所述音频输出信号YOUT(k,l)和所述阻塞矩阵信号YBM(k,l)变化的权重系数矩阵,并且所述至少一个衰减函数μ(t)中的每一个以对应的预定更新间隔T更新。
  15. 根据权利要求14所述的装置,其特征在于,所述装置为助听器。
  16. 一种非暂态计算机存储介质,其上存储有一个或多个可执行指令,所述一个或多个可执行指令被处理器执行一种音频增强方法,所述方法包括下述步骤:
    由麦克风阵列生成一组音频采集信号,其中该组音频采集信号中的每个音频采集信号是由所述麦克风阵列中的一个麦克风生成的,并且所述麦克风阵列中的每个麦克风相互间隔开;
    对该组音频采集信号进行延迟求和处理,以生成延迟求和信号YDSB(k,l),其中k表示频率窗口,而l表示帧指数;
    对该组音频采集信号进行阻塞矩阵处理,以生成阻塞矩阵信号YBM(k,l);
    利用自适应滤波矩阵WANC对所述阻塞矩阵信号YBM(k,l)进行滤波处理,并将经滤波的阻塞矩阵信号从所述延迟求和信号YDSB(k,l)中移除,以得到经增强的音频输出信号YOUT(k,l);
    其中,所述自适应滤波矩阵WANC是基于至少一个衰减函数μ(t),随所述音频输出信号YOUT(k,l)和所述阻塞矩阵信号YBM(k,l)变化的权重系数矩阵,并且所述至少一个衰减函数μ(t)中的每一个以对应的预定更新间隔T更新。
PCT/CN2023/079312 2022-03-02 2023-03-02 音频增强方法和装置、计算机存储介质 WO2023165565A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210199889.5 2022-03-02
CN202210199889.5A CN114550734A (zh) 2022-03-02 2022-03-02 音频增强方法和装置、计算机存储介质

Publications (1)

Publication Number Publication Date
WO2023165565A1 true WO2023165565A1 (zh) 2023-09-07

Family

ID=81661145

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/079312 WO2023165565A1 (zh) 2022-03-02 2023-03-02 音频增强方法和装置、计算机存储介质

Country Status (2)

Country Link
CN (1) CN114550734A (zh)
WO (1) WO2023165565A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114550734A (zh) * 2022-03-02 2022-05-27 上海又为智能科技有限公司 音频增强方法和装置、计算机存储介质

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040161121A1 (en) * 2003-01-17 2004-08-19 Samsung Electronics Co., Ltd Adaptive beamforming method and apparatus using feedback structure
US20100171662A1 (en) * 2006-04-20 2010-07-08 Nec Corporation Adaptive array control device, method and program, and adaptive array processing device, method and program using the same
CN101903948A (zh) * 2007-12-19 2010-12-01 高通股份有限公司 用于基于多麦克风的语音增强的系统、方法及设备
US20120099732A1 (en) * 2010-10-22 2012-04-26 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for far-field multi-source tracking and separation
CN109389991A (zh) * 2018-10-24 2019-02-26 中国科学院上海微系统与信息技术研究所 一种基于麦克风阵列的信号增强方法
CN110689900A (zh) * 2019-09-29 2020-01-14 北京地平线机器人技术研发有限公司 信号增强方法和装置、计算机可读存储介质、电子设备
CN110706719A (zh) * 2019-11-14 2020-01-17 北京远鉴信息技术有限公司 一种语音提取方法、装置、电子设备及存储介质
CN110782913A (zh) * 2019-10-30 2020-02-11 通用微(深圳)科技有限公司 一种基于通用mcu的波束成形语音增强算法的实现
CN110855269A (zh) * 2019-11-06 2020-02-28 韶关学院 一种自适应滤波的系数更新方法
CN114550734A (zh) * 2022-03-02 2022-05-27 上海又为智能科技有限公司 音频增强方法和装置、计算机存储介质

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040161121A1 (en) * 2003-01-17 2004-08-19 Samsung Electronics Co., Ltd Adaptive beamforming method and apparatus using feedback structure
US20100171662A1 (en) * 2006-04-20 2010-07-08 Nec Corporation Adaptive array control device, method and program, and adaptive array processing device, method and program using the same
CN101903948A (zh) * 2007-12-19 2010-12-01 高通股份有限公司 用于基于多麦克风的语音增强的系统、方法及设备
US20120099732A1 (en) * 2010-10-22 2012-04-26 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for far-field multi-source tracking and separation
CN109389991A (zh) * 2018-10-24 2019-02-26 中国科学院上海微系统与信息技术研究所 一种基于麦克风阵列的信号增强方法
CN110689900A (zh) * 2019-09-29 2020-01-14 北京地平线机器人技术研发有限公司 信号增强方法和装置、计算机可读存储介质、电子设备
CN110782913A (zh) * 2019-10-30 2020-02-11 通用微(深圳)科技有限公司 一种基于通用mcu的波束成形语音增强算法的实现
CN110855269A (zh) * 2019-11-06 2020-02-28 韶关学院 一种自适应滤波的系数更新方法
CN110706719A (zh) * 2019-11-14 2020-01-17 北京远鉴信息技术有限公司 一种语音提取方法、装置、电子设备及存储介质
CN114550734A (zh) * 2022-03-02 2022-05-27 上海又为智能科技有限公司 音频增强方法和装置、计算机存储介质

Also Published As

Publication number Publication date
CN114550734A (zh) 2022-05-27

Similar Documents

Publication Publication Date Title
US10657950B2 (en) Headphone transparency, occlusion effect mitigation and wind noise detection
JP6279570B2 (ja) 指向性音マスキング
CN107533838B (zh) 使用多个麦克风的语音感测
JP4359599B2 (ja) 補聴器
CN105530580B (zh) 听力系统
JP4469898B2 (ja) 外耳道共鳴補正装置
CN107801139B (zh) 包括反馈检测单元的听力装置
EP2202998A1 (en) A device for and a method of processing audio data
WO2006037156A1 (en) Acoustically transparent occlusion reduction system and method
TW200835379A (en) Ambient noise reduction
EP3873105B1 (en) System and methods for audio signal evaluation and adjustment
WO2023165565A1 (zh) 音频增强方法和装置、计算机存储介质
US11825269B2 (en) Feedback elimination in a hearing aid
JP6301508B2 (ja) 通信ヘッドセットにおける自己音声フィードバック
WO2017004039A1 (en) External ear insert for hearing enhancement
WO2022218093A1 (zh) 音频信号补偿方法及装置、耳机、存储介质
CN113994711A (zh) 对主动降噪设备中多个前馈麦克风的动态控制
WO2021055415A1 (en) Enhancement of audio from remote audio sources
EP4064730A1 (en) Motion data based signal processing
EP3993445A1 (en) Hearing aid device
CN116325804A (zh) 可穿戴音频设备前馈不稳定性检测
CN111683322A (zh) 前馈降噪耳机及降噪方法、系统、设备、计算机存储介质
EP3955594B1 (en) Feedback control using a correlation measure
WO2021129196A1 (zh) 一种语音信号处理方法及装置
Hohmann Signal processing in hearing aids

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23762963

Country of ref document: EP

Kind code of ref document: A1