CN116982106A - Active noise reduction audio device and method for active noise reduction - Google Patents

Active noise reduction audio device and method for active noise reduction Download PDF

Info

Publication number
CN116982106A
CN116982106A CN202180095625.9A CN202180095625A CN116982106A CN 116982106 A CN116982106 A CN 116982106A CN 202180095625 A CN202180095625 A CN 202180095625A CN 116982106 A CN116982106 A CN 116982106A
Authority
CN
China
Prior art keywords
signal
period
noise reduction
active noise
sound
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180095625.9A
Other languages
Chinese (zh)
Inventor
张立斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN116982106A publication Critical patent/CN116982106A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/16Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/175Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
    • G10K11/178Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The disclosure relates to the field of audio processing, and discloses active noise reduction audio equipment and an active noise reduction method. The active noise reduction method includes acquiring a first signal representing a first ambient sound acquired by a first microphone of an active noise reduction device during a first period of time; acquiring an evaluation signal corresponding to a first period; determining a time length of a second period based on the evaluation signal, the second period being subsequent to the first period; and generating a second signal to be played by the first speaker of the active noise reduction device based on the first signal and a time length of the second period, the second signal representing an inverse estimate of the first ambient sound during the second period. By using the method according to the present disclosure, the active noise reduction audio device estimates an inverted sound at a subsequent time based on the current sound and plays the inverted sound; and by adjusting the estimated time length of the subsequent inverse sound using the residual signal, the accuracy of the inverse signal estimation can be dynamically adjusted and the noise reduction effect can be improved.

Description

Active noise reduction audio device and method for active noise reduction Technical Field
The present disclosure relates to the field of audio processing, and more particularly, to a method for denoising ambient sound in an active noise reduction audio device and an active noise reduction audio device.
Background
As portable devices such as smartphones are increasingly used, audio devices that match them are also increasingly used. When a user uses an audio device in an environment where the ambient sound is noisy, the user often wants to be able to filter out or reduce the ambient sound so that the audio from the portable device that the audio device plays can be heard clearly, or simply wants to get a calm environment. At this time, even the environmental sound such as music is equivalent to noise for the user.
Audio devices typically have two noise filtering modes (noise reduction modes) to filter out or reduce ambient sound. One way of noise reduction is passive noise reduction. Passive noise reduction is typically accomplished by enclosing the ear to form an enclosed space, or by using a sound insulating material such as a silicone earplug to block ambient noise. The passive noise reduction has limited noise reduction effect, generally can only block high-frequency noise, and has limited noise reduction effect on low-frequency noise. In addition, passive noise reduction audio devices can give discomfort to the ear.
Another way of noise reduction is active noise reduction. The active noise reduction generates opposite-phase sound waves equal to external noise through the audio equipment, and the noise is neutralized, so that the noise reduction effect is realized. For example, active noise reduction audio devices typically provide an ambient microphone at the outermost surface of the audio device for detecting ambient sound. An active noise reduction audio device needs to complete the detection of the ambient sound and the computation and generation of the inverse signal before the ambient sound reaches the inside of the ear. Ideally, when the inverted signal is accurately calculated and reaches the inside of the ear simultaneously with the direct sound, a good noise reduction effect can be obtained. However, limited by factors such as the computation of the inverted signal, conventional active noise reduction audio devices are still not ideal for the reduction of ambient sound in some cases, and may require costly circuit devices in some cases.
Disclosure of Invention
In view of the foregoing, embodiments of the present disclosure aim to provide a technical solution for noise reduction of ambient sounds.
According to a first aspect of the present disclosure, a method for active noise reduction is provided. The method includes acquiring a first signal representative of a first ambient sound acquired by a first microphone of the active noise reduction device during a first period of time; and acquiring an evaluation signal corresponding to the first period. The method further includes determining a time length of a second period based on the evaluation signal, the second period being subsequent to the first period. The method further includes generating a second signal to be played by a first speaker of the active noise reduction device based on the first signal and a length of time of the second period. The second signal represents an inverse estimate of the first ambient sound during the second period. The evaluation signal represents the effect of active noise reduction and/or the accuracy of the estimation. In one implementation, the evaluation signal may be obtained by a residual microphone as described in detail below. Alternatively or additionally, in another implementation, the evaluation signal may be determined by the evaluation signal estimated by the processor at the initial period and the ambient sound acquired at the first period. By estimating the inverted signal of the second period in advance and playing the inverted signal, the inverted sound can reach the human ear in synchronization with the direct sound at the subsequent moment, the time delay of the inverted sound is shortened, and the inverted sound and the direct sound are counteracted to effectively improve the noise reduction effect. By adjusting the estimated time length of the subsequent inverted sound using the residual signal, the accuracy of the inverted signal estimation can be dynamically adjusted and the noise reduction effect can be improved. For example, when the residual signal indicates that the active noise reduction effect is good, the estimated time length may be prolonged. Conversely, when the residual signal indicates that the active noise reduction effect is not ideal, the estimated time length can be shortened to improve the estimated accuracy.
In some implementations, generating a second signal to be played by a first speaker of the active noise reduction device based on the first signal and the time length of the second period includes generating a third signal based on the first signal and the time length of the second period, the third signal representing an estimate of the first ambient sound during the second period; and inverting the third signal to generate a second signal. By estimating first and then inverting to generate an inverted signal, the inverted signal can be obtained more accurately.
In some implementations, generating a second signal to be played by a first speaker of the active noise reduction device based on the first signal and a length of time of the second period includes inverting the first signal to generate a fourth signal representative of an inverted sound of the first ambient sound during the first period; and generating a second signal based on the fourth signal and a time length of the second period.
In some implementations, acquiring the evaluation signal corresponding to the first period includes acquiring, as the evaluation signal, a residual signal acquired by a residual microphone of the active noise reduction audio device during the first period. The residual microphone is different from the first microphone. By using the residual microphone to collect residual sound as an evaluation signal, the effect of active noise reduction can be more accurately and intuitively obtained, thereby more effectively adjusting active noise reduction.
In some implementations, acquiring the evaluation signal corresponding to the first period includes determining the evaluation signal based on the first signal and an inverted evaluation signal corresponding to the first period estimated at a time period prior to the first period or based on the first signal and an evaluation signal corresponding to the first period estimated at a time period prior to the first period. Alternatively, acquiring the evaluation signal corresponding to the first period includes determining the evaluation signal based on the inverted signal of the first signal and the estimated signal corresponding to the first period estimated at a period preceding the first period, or based on the inverted signal of the first signal and the inverted estimated signal corresponding to the first period estimated at a period preceding the first period. Residual signals are obtained in various modes, so that the current noise reduction effect can be estimated more comprehensively and accurately.
In some implementations, generating an initial output signal to be played by a first speaker of the active noise reduction audio device based on the initial input signal and a length of time of a second period includes determining a category of first ambient sound during the first period based on the initial input signal; determining an estimation model corresponding to the category of the first environmental sound based on the category of the first environmental sound; and generating a second signal based on the estimation model, the time duration of the second period, and the first signal. By classifying the ambient sound, the ambient sound can be estimated in an inverted phase more targeted manner. By using an estimation model that matches the ambient sound, the accuracy of the estimation can be improved and the noise reduction width and the noise reduction depth can be improved accordingly.
In some implementations, generating the second signal based on the estimation model, the time duration of the second period, and the first signal includes: estimating the first signal using a speech model for generating a second signal if the determined class of ambient sounds indicates that the ambient sounds are speech; estimating the first signal using a neural network model for generating a second signal if the determined class of ambient sound indicates that the ambient sound is noise; estimating the first signal using a pool model for generating a second signal if the determined class of ambient sound indicates that the ambient sound is music; and if the determined class of ambient sounds indicates that the ambient sounds are mixed sounds, estimating the first signal using a weighted model for generating a second signal. By specifically classifying the environmental sounds into voices, noises, music, and mixed sounds, and estimating them using the corresponding voice model, neural network model, pool model, and weighting model, respectively, it is possible to more accurately obtain estimated signals and to correspondingly improve the active noise reduction effect.
In some implementations, if the speech is voiced, estimating the first signal using a pool model for generating a second signal; and if the speech is unvoiced, estimating the first signal using a linear prediction model for generating the second signal. By subdividing the unvoiced and voiced sounds even further, the estimation accuracy for the speech can be further improved and the active noise reduction effect for the speech can be correspondingly improved.
In some implementations, the method further includes: adjusting an estimation model corresponding to the category of the environmental sound based on the estimation signal; and generating a subsequent audio signal to be played by the first speaker based on the subsequently acquired audio signal of the first microphone and the adjusted model. For a single estimation model, adapting the model includes replacing the single model and/or adding the estimation model. For multiple estimation models, adjusting the model includes replacing one or more models, adjusting model weights, and/or increasing or decreasing the estimation model. By using the residual signal to adjust the estimation model of the subsequent inverse sound, the accuracy of the inverse signal estimation can be dynamically adjusted and the noise reduction effect can be improved.
In some implementations, the method further includes: determining a filter corresponding to the ambient sound based on the category of the ambient sound; a second signal is generated based on the first signal and the filtering. By filtering the collected audio signals, noise in the sound can be effectively filtered. Different filtering may be used for different sound categories. In addition, weighted filtering of various types of filtering may be used. The weighted filtering may have different filtering weights for different sounds. Similarly, the filtering may be adjusted based on a feedback mechanism such as using a residual signal. Adjusting the filtering includes increasing/decreasing the filter type, and/or adjusting the weight of each of the weighted filters.
In some implementations, if the ambient sound is determined to be music, the first signal is filtered using a finite impulse response filter for generating the second signal. If the determined class of ambient sound indicates that the ambient sound is speech, the first signal is filtered using an infinite impulse response filter for generating a second signal. By adopting different filtering modes for different sounds, noise can be filtered more effectively for voice and music, so that a better active noise reduction effect is obtained.
In some implementations, determining an estimation model corresponding to the category of the first ambient sound based on the category of the first ambient sound includes determining a weighting model corresponding to the first ambient sound based on the category of the first ambient sound, the weighting model including: the first estimation model, the weight of the first estimation model, the second estimation model and the weight of the second estimation model. By using a weighted estimation model, an estimation of the ambient sound can be made more efficiently and more targeted to correspondingly increase the noise reduction effect.
In some implementations, the method further includes adjusting the estimation model based on the residual signal. In one implementation, generating a second signal to be played by the first speaker based on the first signal and a time length of the second period includes: a second signal to be played by the first speaker is generated based on the first signal, the time length of the second time period, and the adjusted estimation model. By using the residual signal to adjust the subsequent estimation model, an appropriate estimation model may be selected based on the noise reduction effect to enhance the noise reduction effect.
In some implementations, determining the category of ambient sound during the first period based on the first signal includes determining the category of ambient sound based on at least one of a frame rate of a first energy range, a frame rate of a first frequency range, and a zero-crossing rate of the first signal. The first energy range includes a low energy range. The first frequency range includes a low frequency range. By identifying the frame rate of the low energy, the frame rate of the low frequency signal, and the zero crossing rate, the sound signal can be effectively distinguished into speech, music, noise, silence, and mixed sound. This allows a more accurate determination of the appropriate estimation model, thereby providing an accuracy of the estimation and a corresponding increase in the noise reduction width and noise reduction depth. In addition, the voice can be further divided into unvoiced sound and voiced sound, so that the accuracy of voice estimation is further improved, and the noise reduction width and the noise reduction depth are correspondingly improved.
In some implementations, generating a second signal to be played by a first speaker of the active noise reduction device based on the first signal and a length of time of the second period includes determining a filter corresponding to the first ambient sound based on a class of the first ambient sound; the second signal is generated based on the first signal, the time length of the second period, and the filtering. The noise it contains may not be the same for different sound categories. By adopting corresponding filtering mechanisms for different classes of sound, noise can be filtered out more effectively.
In some implementations, the method further includes adjusting the filtering based on the evaluation signal. By using the evaluation signal to adjust the subsequent filtering, a suitable evaluation filter may be selected based on the noise reduction effect to enhance the noise reduction effect.
In some implementations, generating the initial output signal includes: estimating the first signal using a weighted estimation model or a pool model to generate an estimated signal, the estimated signal representing an estimate of the ambient sound during the first period of time; and inverting the estimated signal to generate a second signal. Alternatively, the first signal is inverted to generate an inverted signal, and the inverted signal is estimated using a weighted estimation model or a pool model to generate a second signal. In addition, the coefficients of the individual terms in the weighted model may be dynamically adjusted or the number of models used may be increased or decreased by using the evaluation signal. By using the default model for estimation, the calculation amount of estimation is reduced, and the estimation speed is improved.
In some implementations, the method further includes obtaining a fifth signal representative of a second ambient sound captured by a second microphone in the active noise reduction audio device during the first period of time; and generating a sixth signal to be played by a second speaker in the active noise reduction audio device based on the fifth signal and the length of time of the second period, the sixth signal representing an inverse estimate of the second ambient sound during the second period. In some implementations, different microphones have different acquisition effects for different frequency bands. For example, the first microphone may be able to pick up more detail for low frequency sounds, while the second microphone may pick up more detail for medium and high frequency sounds. Thus, by employing different acquisition schemes for ambient sound using different microphones, individual details of the ambient sound may be more effectively acquired. In addition, the active noise reduction can be ensured even if one microphone fails. In some implementations, different speakers have different performance effects at different frequency bands. For example, the first speaker may be more prominent in the low frequency band, while the second speaker may be more prominent in the medium and high frequency band. Therefore, by using different speakers to play different inverted signals, different portions of the direct sound can be more accurately canceled out to obtain better noise reduction depth and noise reduction width.
In some implementations, the method further includes first filtering the first signal to generate a filtered first filtered signal; generating a second signal based on the first filtered signal; performing a second filtering on the fifth signal different from the first filtering to generate a filtered second filtered signal; and generating a sixth signal based on the second filtered signal. By using multiple filters, multiple estimations and playing multiple sounds, speakers taught for different sound categories may be used for better noise reduction.
In some implementations, the method further includes: obtaining a first residual signal representing a sum of a first sound and a direct sound, wherein the first sound is based on an initial output signal; obtaining a second residual signal, the first residual signal representing a sum of a second sound and the direct sound, wherein the second sound is based on the sixth signal; adjusting the first filtering based on the first residual signal; and adjusting the second filtering based on the second residual signal. By using a plurality of residual signals, a situation that the residual signal acquisition is inaccurate due to the influence of the position or performance of the residual microphone can be avoided, thereby continuously obtaining a more stable noise reduction effect.
In some implementations, generating the second signal based on the first signal includes: inverting the first signal to generate a first inverted signal; estimating the first inverted signal using a first estimation model to generate a first component signal; estimating the first inverted signal using a second estimation model different from the first estimation model to generate a second component signal different from the first component signal; the first component signal and the second component signal are weighted using a weighting model to generate an initial output signal. By using a default weighting model for estimation, the computational effort of the estimation is reduced, the estimation speed is increased, and the estimation can also be dynamically adjusted to increase the estimation accuracy.
According to a second aspect of the present disclosure, there is provided a computer-readable storage medium storing one or more programs. The one or more programs are configured to be executed by the one or more processors. The one or more programs include instructions for performing the method according to the first aspect. By reading and executing one or more programs in the computer-readable storage medium, the inverted signal of the second period can be estimated in advance. By playing the reverse phase signal, the reverse phase sound can reach the human ear in synchronization with the direct sound at the subsequent moment, the time delay of the reverse phase sound is shortened, and the reverse phase sound and the direct sound are counteracted to effectively improve the noise reduction effect. By adjusting the estimated time length of the subsequent inverted sound using the residual signal, the accuracy of the inverted signal estimation can be dynamically adjusted and the noise reduction effect can be improved. For example, when the residual signal indicates that the active noise reduction effect is good, the estimated time length may be prolonged. Conversely, when the residual signal indicates that the active noise reduction effect is not ideal, the estimated time length can be shortened to improve the estimated accuracy.
According to a third aspect of the present disclosure, a computer program product is provided. The computer program product includes one or more programs. The one or more programs are configured to be executed by the one or more processors. The one or more programs include instructions for performing the method according to the first aspect. By executing the computer program product, the inverted signal of the second period can be estimated in advance. By playing the reverse phase signal, the reverse phase sound can reach the human ear in synchronization with the direct sound at the subsequent moment, the time delay of the reverse phase sound is shortened, and the reverse phase sound and the direct sound are counteracted to effectively improve the noise reduction effect. By adjusting the estimated time length of the subsequent inverted sound using the residual signal, the accuracy of the inverted signal estimation can be dynamically adjusted and the noise reduction effect can be improved. For example, when the residual signal indicates that the active noise reduction effect is good, the estimated time length may be prolonged. Conversely, when the residual signal indicates that the active noise reduction effect is not ideal, the estimated time length can be shortened to improve the estimated accuracy.
According to a fourth aspect of the present disclosure, an active noise reduction audio device is provided. The active noise reduction audio device comprises an acquisition module for acquiring a first signal and an evaluation signal corresponding to a first period, wherein the first signal represents a first environmental sound acquired by a first microphone of the active noise reduction device during the first period; and an inverse estimation signal generation module for determining a time length of a second period based on the estimation signal, the second period being subsequent to the first period; and generating a second signal to be played by the first speaker of the active noise reduction device based on the first signal and a time length of the second period, the second signal representing an inverse estimate of the first ambient sound during the second period. The active noise reduction audio device may estimate the inverted signal of the second period in advance. By playing the reverse phase signal, the reverse phase sound can reach the human ear in synchronization with the direct sound at the subsequent moment, the time delay of the reverse phase sound is shortened, and the reverse phase sound and the direct sound are counteracted to effectively improve the noise reduction effect. By adjusting the estimated time length of the subsequent inverted sound using the residual signal, the accuracy of the inverted signal estimation can be dynamically adjusted and the noise reduction effect can be improved. For example, when the residual signal indicates that the active noise reduction effect is good, the estimated time length may be prolonged. Conversely, when the residual signal indicates that the active noise reduction effect is not ideal, the estimated time length can be shortened to improve the estimated accuracy.
In some implementations, the inverse estimation signal generation module is further configured to generate a third signal based on the first signal. The third signal represents an estimate of ambient sound during the second period; and inverting the third signal to generate a second signal. By estimating first and then inverting to generate an inverted signal, the inverted signal can be obtained more accurately.
In some implementations, the acquisition module is further configured to acquire a first signal corresponding to the first microphone, the first signal representing ambient sound acquired during the first period of time. The inverse estimate signal generation module is further configured to determine a time length of a second period, the second period being subsequent to the first period, based on the first signal and the inverse estimate signal of the previous period estimate or based on the first signal and the estimated signal of the previous period estimate. Alternatively, the inverse estimation signal generation module is further configured to determine a time length of a second period, which is subsequent to the first period, based on the inverse signal of the first signal and the inverse estimation signal of the previous period or based on the inverse signal of the first signal and the estimation signal of the previous period. Residual signals are obtained in various modes, so that the current noise reduction effect can be estimated more comprehensively and accurately.
In some implementations, the acquisition module is further to acquire a residual signal corresponding to a residual microphone of the active noise reduction audio device, the residual microphone being different from the first microphone. Residual signals are obtained in various modes, so that the current noise reduction effect can be estimated more comprehensively and accurately.
In some implementations, the inverse estimate signal generation module is further to determine a category of the first ambient sound during the first period based on the first signal; determining an estimation model corresponding to the category of the first environmental sound based on the category of the first environmental sound; and generating a second signal based on the estimation model, the first signal, and the time length of the second period. By classifying the ambient sound, the ambient sound can be estimated in an inverted phase more targeted manner. By using an estimation model that matches the ambient sound, the accuracy of the estimation can be improved and the noise reduction width and the noise reduction depth can be improved accordingly.
In some implementations, the inverse estimation signal generation module is further configured to determine a weighting model corresponding to the ambient sound based on the class of the ambient sound, the weighting model including: the weight of the first estimation model, the weight of the second estimation model, the weight of the first estimation model and the weight of the second estimation model. By using a weighted estimation model, an estimation of the ambient sound can be made more efficiently and more targeted to correspondingly increase the noise reduction effect.
In some implementations, the inverse estimation signal generation module is further configured to adjust the estimation model based on the estimation signal. By using the evaluation signal to adjust the subsequent estimation model, a suitable estimation model may be selected based on the noise reduction effect to enhance the noise reduction effect.
According to a fifth aspect of the present disclosure, an active noise reduction audio device is provided. The active noise reduction audio device includes one or more processors; a memory storing one or more programs configured to be executed by one or more processors, the one or more programs comprising instructions for performing the method according to the first aspect. By using the residual signal to adjust the subsequent estimation model, an appropriate estimation model may be selected based on the noise reduction effect to enhance the noise reduction effect. By adjusting the estimated time length of the subsequent inverted sound using the residual signal, the accuracy of the inverted signal estimation can be dynamically adjusted and the noise reduction effect can be improved. For example, when the residual signal indicates that the active noise reduction effect is good, the estimated time length may be prolonged. Conversely, when the residual signal indicates that the active noise reduction effect is not ideal, the estimated time length can be shortened to improve the estimated accuracy.
According to a sixth aspect of the present disclosure, an active noise reduction audio device is provided. The active noise reduction audio device includes a first microphone, one or more processors, and a first speaker. The first microphone is configured to collect a first ambient sound during a first period of time and generate a first signal. The one or more processors are configured to obtain an evaluation signal corresponding to the first time period; determining a time length of a second period based on the evaluation signal, the second period being subsequent to the first period; and generating a second signal representative of an inverse estimate of the first ambient sound during the second period of time. The first speaker is configured to play the second signal during the second period. By using the residual signal to adjust the subsequent estimation model, an appropriate estimation model may be selected based on the noise reduction effect to enhance the noise reduction effect. By adjusting the estimated time length of the subsequent inverted sound using the residual signal, the accuracy of the inverted signal estimation can be dynamically adjusted and the noise reduction effect can be improved. For example, when the residual signal indicates that the active noise reduction effect is good, the estimated time length may be prolonged. Conversely, when the residual signal indicates that the active noise reduction effect is not ideal, the estimated time length can be shortened to improve the estimated accuracy.
In some implementations, the active noise reduction audio device further includes a residual microphone. The residual microphone is configured to collect residual sound to generate a residual signal as an evaluation signal. By using the residual signal to adjust the subsequent sound estimate, the estimate can be dynamically adjusted in a feedback manner so that a better dynamic active noise reduction effect is achieved.
In some implementations, the active noise reduction audio device further includes: a second microphone and a second speaker. The second microphone is configured to collect a second ambient sound during the first period of time and generate a fifth signal, the fifth signal being different from the first signal. The second speaker is configured to play a sixth signal, wherein the sixth signal is generated by the one or more processors based on the fifth signal. The sixth signal represents a second inverse estimate of the second ambient sound during the second period and is different from the second signal.
It should be understood that what is described in this summary is not intended to limit the critical or essential features of the embodiments of the disclosure nor to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The above and other features, advantages and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, wherein like or similar reference numerals designate like or similar elements, and wherein:
FIG. 1 illustrates a schematic diagram of an active noise reduction audio device in which embodiments of the present disclosure may be implemented;
FIG. 2 illustrates a schematic block diagram of an active noise reduction audio device according to one embodiment of the present disclosure;
FIG. 3 is a schematic flow chart of a method for filtering or reducing ambient sound according to one embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a method of classifying and inverting an estimate of ambient sound according to one embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a process of active noise reduction according to one embodiment of the present disclosure;
FIG. 6 is a schematic diagram of a process of active noise reduction according to another embodiment of the present disclosure;
FIG. 7 illustrates a schematic diagram of another active noise reduction audio device in which embodiments of the present disclosure may be implemented;
FIG. 8 is a schematic diagram of a process of active noise reduction according to yet another embodiment of the present disclosure;
FIG. 9 is a schematic block diagram of a computer-readable storage medium according to one embodiment of the present disclosure; and
fig. 10 is a schematic block diagram of an apparatus for noise reduction of ambient sound according to one embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
In describing embodiments of the present disclosure, the term "comprising" and its like should be taken to be open-ended, i.e., including, but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The terms "first," "second," and the like, may refer to different or the same object. The term "period" means a continuous or discrete period of time, the smallest unit of which comprises a single point in time. In other words, the term "period" may include at least one point in time. Other explicit and implicit definitions are also possible below. The term "representation" herein means that there is a direct or indirect relationship between the two and that one may characterize a certain property and/or a change in property of the other. Herein, a "microphone" refers to a device for capturing sound and converting it into a corresponding electrical signal. "speaker" refers to a device for converting electrical signals into sound based on audio data. The term "ambient sound" refers herein to sound that is present in the external environment in which the active noise reduction audio device is located and that is captured by the active noise reduction audio device, which may be a combination of one or more sounds, such as speech, music, noise, and the like.
As described above, in some cases, the ambient sound may be noise to the user, and the user wishes to filter out or reduce the ambient sound by using the active noise reduction audio device. Filtering or reducing ambient sound is referred to herein collectively as noise reduction. Existing active noise reduction audio devices are limited by factors such as the computation of the inverse signal and do not filter or reduce the ambient sound well. For example, the time required for real-time computation of the inverted signal is greater than the propagation time of the ambient sound signal from the external sampling microphone of the active noise reduction audio device to the inside of the ear, which can cause mismatch of the inverted signal and the direct signal, and possibly even an increase in noise.
In response to the above-identified problems, as well as other potential problems, embodiments of the present disclosure provide a method, an active noise reduction audio device, and a computer-readable storage medium for reducing ambient sound. In one embodiment of the present disclosure, noise reduction of ambient sound is achieved by capturing ambient sound using an ambient sound microphone of an active noise reduction audio device, predicting or estimating an inverted signal representing a possible ambient sound for a subsequent period based on a signal representing the ambient sound, and playing the inverted signal using a built-in speaker of the active noise reduction audio device to cancel the played inverted sound and the direct ambient sound at a subsequent time. More specifically, a better noise reduction width, that is, a frequency width capable of increasing the effective noise reduction effect can be obtained by using the scheme according to the embodiment of the present disclosure, compared to the conventional scheme of generating a real-time inverted signal to cancel the direct sound. At the same time, the solution according to the embodiments of the present disclosure enables to obtain a better noise reduction depth (which may be expressed in dB numbers, for example), thereby increasing the degree of noise that is effectively suppressed/cancelled. Further, in one embodiment of the present disclosure, by obtaining an evaluation signal, the effect of the current active noise reduction may be known, and the length of the subsequent signal estimate may be adjusted based on the evaluation signal representing the current active noise reduction effect, thereby obtaining a subsequent more accurate inverse signal estimate to enhance the effect of the subsequent active noise reduction.
Fig. 1 shows a schematic diagram of an active noise reduction audio device 10 in which embodiments of the present disclosure may be implemented. In one embodiment, the active noise reduction audio device 10 may be, for example, an audio playback device in contact with the ear, such as a Truly Wireless Stereo (TWS) headset. The active noise reduction audio device 10 may include a pair of headphones, and the two headphones 11 and 12 are configured substantially identically to each other. And is therefore schematically depicted with only one earphone 11. The headset 11 includes an external first microphone 13, a processor 17 located inside the headset 11, a first speaker 15 located at the ear-in or ear-contact portion of the headset 11 (relative to the external microphone 13 exposed to the environment), and a residual microphone 14. The first microphone 13 is configured to detect or collect sounds of the external environment.
In some embodiments, the residual microphone 14 may not be present. In this case, no dynamic adjustment of the estimation may be required. While a possible configuration of the active noise reduction audio device 10 is shown in fig. 1, this is merely illustrative and not limiting of the scope of the present disclosure. For example, in some embodiments, the two headphones 11 and 12 may have only one processor 17, and wireless signals are transmitted by wireless transmission means such as bluetooth signaling to enable sharing of the single processor 17 by the two headphones 11 and 12. In another embodiment, the two headphones 11 and 12 may also share a single first microphone 13.
In one embodiment, the external first microphone 13 of the active noise reduction audio device 10 captures ambient sound and performs an acousto-electric conversion to generate a continuous electrical signal and transmits it to the processor 17. The processor 17 predicts or estimates the ambient sound at the subsequent moment based on the received signal and generates and transmits an inverted signal representing the ambient sound at the subsequent moment to the first speaker 15. In this context, an "inverted signal" means a signal that operates after an inversion operation on an audio signal, for example by directly inverting the sign of an audio sample point or further processing. In contrast, a signal that is not inverted may be referred to as a "positive phase signal". The inverted signal played by the speaker is used to cancel out to some extent with the direct sound (normal sound) inside the direct active noise reduction audio device 10 to reduce the sound perceived inside the ear. The first speaker 15 plays the inverted sound based on the received inverted signal to cancel out the direct ambient sound in the ambient direct active noise reduction audio device 10 at a subsequent time, thereby achieving the effect of noise reduction. Although the schematic configuration of the active noise reduction audio device 10 is shown in fig. 1 as a TWS headset, it is to be understood that the scope of the present disclosure is not limited in this regard. For example, another possible configuration of an active noise reduction audio device, an earmuff type earphone, is shown below in fig. 7 as a headset. Alternatively, the active noise reduction audio device 10 may also be an active noise reduction audio device that delivers audio through bone conduction, for example.
Fig. 2 shows a schematic block diagram of an active noise reduction audio device 100 according to one embodiment of the present disclosure. It should be understood that the active noise reduction audio device 100 illustrated in fig. 2 is merely exemplary, for example, to illustrate one possible implementation of the active noise reduction audio device 10 of fig. 1, and should not be construed as limiting the functionality and scope of the implementations described in this disclosure. In one embodiment, the active noise reduction audio device 100 may include a processor 110, a wireless communication module 160, an antenna 1, an audio module 170, a speaker module 170A, a microphone module 170C, keys 190, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, shown in solid line boxes and solid lines.
Processor 110 may be, for example, processor 17 of fig. 1, and may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. In some embodiments, the different processing units may be separate devices. In other embodiments, different processing units may also be integrated in one or more processors. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.
A memory may also be provided in the processor 110 for storing instructions and data. The internal memory 121 may be used to store computer executable program code including instructions. The internal memory 121 may include a storage program area and a storage data area. The processor 110 performs various functional applications and data processing of the active noise reduction audio device 100 by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
The active noise reduction audio device 100 may implement audio functions through an audio module 170, a speaker module 170A, a microphone module 170C, an application processor, and the like. Such as music playing, recording, etc. The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or a portion of the functional modules of the audio module 170 may be disposed in the processor 110.
The speaker module 170A, also referred to as a "horn," is used to convert audio electrical signals into sound signals. The active noise reduction audio device 100 may listen to music, or to hands-free calls, through the speaker module 170A. Microphone module 170C, also referred to as a "microphone" or "microphone". When making a call or transmitting voice information, the user can sound near the microphone module 170C through the mouth of a person, inputting a sound signal to the microphone module 170C.
In other embodiments, the active noise reduction audio device 100 may also include an antenna 2 and a mobile communication module 150. In addition to the components described above, the active noise reduction audio device 100 may also include one or more of an external memory interface 120, a battery 142, a receiver 170B, an earphone interface 170D, a sensor module 180, a motor 191, an indicator 192, a camera 193, a display 194, and a subscriber identity module (subscriber identification module, SIM) card interface 195, shown in phantom and in phantom. The sensor module 180 may include one or more of a pressure sensor 180A, a gyroscope sensor 180B, a barometric pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, and a bone conduction sensor 180M. The sensor module 180 may also include other types of sensors not listed.
It is to be understood that the illustrated structure of the embodiments of the present application does not constitute a specific limitation on the active noise reduction audio device 100. In other embodiments of the present application, the active noise reduction audio device 100 may include more or less components than illustrated, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Fig. 3 is a schematic flow chart of a method 300 for denoising ambient sound according to one embodiment of the present disclosure. In one embodiment, the method 300 may be performed by the active noise reduction audio device 10 or the active noise reduction audio device 100. In the case where the active noise reduction audio device 10 performs the method 300, the processor 17 of the active noise reduction audio device 10 may execute computer program code or instructions implementing the method 300. In the case where the active noise reduction audio device 10 performs the method 300, the processor 110 of the active noise reduction audio device 100 may execute computer program code or instructions implementing the method 300. By way of example only, the method 300 for denoising ambient sound is described below using the active denoising audio apparatus 10 as an example, and it should be understood that this is by way of example only and not by way of limitation.
The active noise reduction audio device 100 acquires an initial input signal corresponding to the first microphone 13 in the active noise reduction audio device 10, the initial input signal representing ambient sound collected by the active noise reduction audio device 10 during an initial period. The initial input signal is acquired, for example, by the first microphone 13 and transmitted to the active noise reduction audio device 100. It will be appreciated that the initial input signal may also be processed (e.g., filtered, gain amplified, etc.) by intermediate audio signal processing circuitry before being transmitted to the active noise reduction audio device 100 for subsequent processing. The scope of embodiments of the present disclosure is not limited in this respect.
The ambient sound is continuously generated, but from the perspective of the active noise reduction audio device 10, the active noise reduction audio device 10 typically processes the received signal at a fixed or variable data length that corresponds to the time length of the duration of the sampled ambient sound. For example, the initial input signal received by the active noise reduction audio device 10 includes a microphone signal for one or more sampling time points. Thus, in one embodiment, the active noise reduction audio device 10 captures and processes one period of microphone signal at a time. For example, in one processing cycle of the active noise reduction audio device 10, the active noise reduction audio device 10 acquires and processes a microphone signal for a first period of time. In a subsequent one of the processor cycles of the active noise reduction audio device 10, the active noise reduction audio device 10 takes the microphone signal for a first period after the initial period and processes it, and so on. The length of the initial period may be the same as or different from the length of the first period. For example, the first period may include a first plurality of points in time, and the initial period may include a second plurality of points in time that are not the same in number.
The active noise reduction audio device 10 generates an initial output signal based on the initial input signal to be played by a first speaker 15 in the active noise reduction audio device 10. The initial output signal represents an inverse estimate of the ambient sound during a first period after the initial period. In an embodiment of the present disclosure, the active noise reduction audio device 10 predicts or estimates an ambient sound signal for a successive period (e.g., a first period) based on a sound sample signal for a previous period (e.g., an initial period), and then subjects the estimated ambient sound signal to an inverse process to generate an initial output signal representing sound that the ambient sound may produce during the first period. Alternatively, the initial input signal may be first subjected to an inversion process to generate an inverted signal, and a prediction or estimation is performed based on the inverted signal to generate an initial output signal.
The first speaker 15 can generate an inverted initial inverted sound inside the ear of the user by playing the initial output signal, at which time the ambient sound actually generated during the first period is directed to the inside of the ear. The original inverted sound and the direct sound are summed (because the inverted sound and the direct sound are in opposite phases, they are effectively cancelled out to reduce the net volume) so that the noise reduction effect is obtained. Since in the present embodiment, by predicting or estimating the sound of the subsequent period, it is equivalent to extending the time left for sound processing, so that the processing circuit has enough time to generate the inverted signal and the effect of noise reduction is improved. On the other hand, since the processing circuit has enough time to generate the inverted signal, there is also no need to use a high-speed processing circuit, thereby reducing the circuit design complexity and manufacturing cost of the active noise reduction audio device 10.
In one exemplary embodiment, a pool model or weighted model, which will be described in more detail below, may be used to estimate based on the initial input signal to generate the initial output signal. For example, the active noise reduction audio device 10 may directly invert the initial input signal to generate a seventh signal and estimate the seventh signal using a first estimation model, such as a speech model, to generate a first component signal and estimate the seventh signal using a second estimation model, such as a neural network model, that is different from the first estimation model, to generate a second component signal that is different from the first component signal. The active noise reduction audio device 10 then weights the first component signal and the second component signal using a weighting model to generate an initial output signal, e.g., multiplies the first component signal by a first weight coefficient to obtain a first product, multiplies the second component signal by a second weight coefficient to obtain a second product, and adds the first product and the second product to obtain the initial output signal. It will be appreciated that in the weighting model, the weighting coefficients for the first component signal and the second component signal may be dynamically adjusted. For example, by using a residual signal described later.
Other models may also be used to estimate the initial input signal, such as other neural network models, linear prediction models, etc., provided that the model is capable of predicting a subsequent audio signal based on the current audio signal. It will be appreciated that in some embodiments, the model used may be dynamically replaced during the noise reduction of the active noise reduction audio device 10, e.g., the predictive model used by the active noise reduction audio device 10 may be switched from a first model, such as a pool model, to a second model, such as a linear predictive model, in some cases. In other embodiments, the parameters or configuration of the model used may be dynamically adjusted. For example, the weighting coefficients of the individual terms of the weighting model may be dynamically adjusted based on the frequency content or class of the audio signal. Alternatively, a default or fixed model may also be used to generate the initial output signal.
Because the environment in which the user wearing the active noise reduction audio device 10 is located may be varied and the user may shuttle through different environments, in some embodiments the initial output signal may be generated targeted according to the type of environment in which the user is located or based on the type of ambient sound, thereby achieving a better noise reduction effect. For example, the active noise reduction audio device 10 may automatically determine the category of ambient sound based on the initial input signal captured by the first microphone 13. Alternatively, the user may set the category of the environment by a portable electronic device in communication with the active noise reduction audio device 10, or by voice to send a set command to the active noise reduction audio device 10. In other embodiments, the user may set the level of noise reduction effect of the active noise reduction audio device 10. For example, the noise reduction effect of the active noise reduction audio device 10 is set to a low noise reduction level to avoid missing warning sounds in ambient sounds.
At 302, the processor 17 acquires a first signal. The first signal is representative of a first ambient sound captured by the first microphone 13 of the active noise reduction device during a first period of time. The processing of the processor 17 is similar to that for the initial input signal and will not be described in detail here. The corresponding description with respect to the initial input signal may apply here.
At 304, the processor 17 obtains an evaluation signal corresponding to the first period. In the present disclosure, the evaluation signal represents the effect of active noise reduction and/or the accuracy of the estimation. In one embodiment, the evaluation signal may be obtained by a residual microphone as described in detail below. Alternatively or additionally, in another embodiment, the evaluation signal may be determined by the evaluation signal estimated by the processor 17 at the initial period and the ambient sound acquired at the first period. For example, the processor 17 estimates an initial estimated signal during an initial period, the initial estimated signal representing a possible situation of the estimated ambient sound during the first period. The processor 17 receives a first signal from the first microphone 13 acquired during a first period of time. The processor 17 then subtracts the initial estimated signal from the first signal and the estimated signal can be determined. Alternatively, the processor 17 estimates an initial inverse signal during the initial period, the initial inverse estimated signal representing an inverse signal of the estimated possible situation of the ambient sound during the first period, i.e. an inverse estimated signal of the first signal. The processor 17 receives a first signal from the first microphone 13 acquired during a first period of time. The processor 17 then adds the initial inverted estimate signal to the first signal, and an estimate signal can be determined. It will be appreciated that there may be other ways of processing to obtain the residual signal, such as subtracting the inverse of the first signal from the initial inverse estimate signal and inverting or adding the inverse of the first signal to the initial estimate signal, as this disclosure is not limited in this regard.
At 306, the processor 17 determines a time length of a second period based on the evaluation signal, the second period being subsequent to the first period. Since the processor 17 can determine the previous estimation effect after the acquisition of the evaluation signal. Thus, the processor 17 may adjust the subsequent estimates accordingly, e.g. the time length of the successive periods, based on the effect of the estimation, to obtain a more accurate estimate, thereby improving the effect of active noise reduction. For example, when the residual signal indicates that the estimation effect is good, the time length of the estimation period may be maintained or increased. When the residual signal indicates that the estimation effect is not ideal, the time length of the subsequent estimation period can be reduced. Furthermore, the residual signal may be used to adjust, and/or add or subtract the estimation model, to adjust the weighting coefficients of the various terms in the weighting model, and to adjust the filtering, as described below.
At 308, the processor 17 generates a second signal to be played by the first speaker 15 based on the first signal and the length of time of the second period. The second signal represents an inverse estimate of the first ambient sound during the second period. The first speaker 15 of the active noise reduction device 10 may play the second signal. The processing by the processor 17 is similar to that for the initial output signal and will not be described here again. The corresponding description with respect to the initial output signal may apply here.
Fig. 4 is a schematic diagram of a method 400 of classifying and inverting an estimate of ambient sound according to one embodiment of the disclosure. In general, the environmental sounds of the environment in which a user is located can be classified into five categories: silence class, speech class, noise class, music class and mix class, wherein mix class includes, for example, a mix of speech, music and/or noise. The active noise reduction audio device 10 may determine a category of ambient sound based on the first signal and generate a third signal based on the determined category and the first signal and invert the third signal to generate a second signal. Alternatively, the first signal may be inverted to generate the fourth signal, and classified and estimated based on the inverted fourth signal to generate the second signal.
While the environmental sounds are divided into the five categories described above, this is merely illustrative and not limiting of the scope of the present disclosure. For example, ambient sound can also be divided into three categories, music, noise, and silence. In this case, the speech is classified into a noise class. Alternatively, in some cases, classification may also be based on the frequency of the ambient sound. By classifying the environmental sound signals of different categories, the environmental sound can be more effectively reduced in noise so as to obtain better noise reduction effect.
In one embodiment, the active noise reduction audio device 10 may determine the category of the ambient sound based on at least one of a frame rate of a first energy range, a frame rate of a first frequency range, and a zero-crossing rate of the first signal. Specifically, at 402, the active noise reduction audio device 10 acquires a first signal. The active noise reduction audio device 10 performs a feature analysis on the first signal after receiving the first signal to determine a classification of the first signal. For example, the first signal may be analyzed based on an energy frame rate and a zero crossing rate of the first signal.
For silence classes, silence classes are segments of an audio signal that are not perceived by the human ear, which may sometimes contain a small amount of noise. The first signal can therefore be evaluated using a short-time energy threshold and a short-time zero-crossing rate threshold in order to be able to detect it as accurately as possible. For example, at 404, the active noise reduction audio device 10 analyzes the first signal to determine whether both its energy frame rate and zero crossing rate are above respective given thresholds. In this manner, similar to endpoint detection, the active noise reduction audio device 10 may determine that the first signal is a mute signal at 406 when both the short-time energy and the short-time zero-crossing rate of the first signal are less than respective given thresholds. This means that the environment is in a quiet state and accordingly at 408, the active noise reduction audio device 10 need not generate the third signal and need not generate the second signal accordingly. This may save power consumption and extend the life of the active noise reduction audio device 10.
Returning to 404, when the active noise reduction audio device 10 determines that at least one of the energy frame rate and the zero crossing rate is not below the threshold, the active noise reduction audio device 10 further analyzes whether the frame rate of the first energy range is above the threshold at 410. In the present embodiment, the first energy range is, for example, a low energy range, and the first frequency range is, for example, a low frequency range. It will be appreciated that the specific ranges of the low energy range and the low frequency range may be set accordingly depending on the performance design of the active noise reduction audio device 10. Alternatively, it is also possible to have different energy ranges or frequency ranges for classifying the ambient sound, as long as the ambient sound has corresponding characteristics at a specific energy and/or frequency range.
Speech signals and noise signals typically have a higher low energy frame rate than music and mixed sounds. If the frame rate of the first energy range is above a given threshold, the active noise reduction audio device 10 may determine that the first signal representing ambient sound is a speech signal or a noise signal. For speech signals and noise signals, both have different characteristics in terms of frequency of sound, although they have a higher frame rate of the low energy range. For example, speech signals typically have more low frequency frame rates, whereas noise signals are more scattered in frequency due to their randomness and the frame rate at low frequencies is lower. Accordingly, at 412, the active noise reduction audio device 10 determines whether the frame rate of the first frequency range of the first signal is above a threshold. If above the threshold, the active noise reduction audio device 10 determines 413 that the first signal is a speech signal. The active noise reduction audio device 10 may then generate a third signal using the speech model for the and speech signals. For example, at 415, the active noise reduction audio device 10 may use a speech model, such as a pool model or a Linear Prediction (LPC) model, to generate a third signal based on the first signal. It will be appreciated that other speech models besides a pool model or a linear prediction model may also be used.
In particular, the alternating occurrence of unvoiced and voiced sounds in a speech signal is a characteristic property, whereas the unvoiced portion has a higher zero crossing rate and the voiced portion has a lower zero crossing rate, so that the zero crossing rate in the speech signal is alternated. For unvoiced signals, the pool model has good estimation results and can therefore be used to estimate the third signal for unvoiced signals. The reserve pool (reservair) model is also known as echo status network (Echo state network). The calculation of the reserve pool model simplifies the training process of the network, solves the problems that the traditional recurrent neural network structure is difficult to determine and the training algorithm is too complex, and simultaneously solves the memory fading problem of the recurrent network. While for voiced signals the inventors have found that the LPC model has a better estimation effect and can therefore be used to estimate the third signal for voiced signals. In one example embodiment, the active noise reduction audio device 10 may alternately use a pool model and an LPC model to estimate the speech signal. Alternatively, the active noise reduction audio device 10 may also estimate the speech signal using only a pool model or an LPC model. By using a targeted model for unvoiced and voiced speech, the accuracy of the speech estimation can be further improved.
Returning to 412, if the array of first frequency ranges is not above the threshold, the active noise reduction audio device 10 determines the first signal to be a noise signal at 414. The active noise reduction audio device 10 may then process the noise signal using an appropriate model to generate a third signal. In one embodiment, the first signal may be estimated using a neural network model to generate a third signal. Although estimation is performed above using a pool model, an LPC model, and a neural network model for the speech signal and the noise signal to generate the third signal, this is merely illustrative and not limiting the scope of the present disclosure. Other suitable models, such as the weighted model described below, may also be used to generate the third signal.
Returning to 410, if the first energy range is not above the given threshold, the active noise reduction audio device 10 may determine that the ambient sound is music or mixed sound. Music is characterized by harmony and a wide range of timbres or frequencies, which may be music played by various instruments. The audio component contained in music is complex and the energy value is also large. In addition, music has no abrupt change of unvoiced sound of a voice signal, and the energy value is not changed as much as the voice signal, so the low energy frame rate value of music is low. Similarly, mixed sounds also have a lower low energy frame rate. Therefore, a low energy frame rate is an effective feature in judging music and mixed sounds.
After this, the active noise reduction audio device 10 further distinguishes between music and mixed sounds. The presence of speech and music mixing in life tends to render an atmosphere, which is usually in the context of music parts, with speech parts being the main part (i.e. speech energy). The alternating occurrence of unvoiced and voiced sounds in a speech signal is a characteristic property, while the unvoiced portion has a higher zero crossing rate and the voiced portion has a lower zero crossing rate, so that the zero crossing rate in the speech signal is alternated. The music signal has no alternation of clear and voiced sounds, so the change of the zero crossing rate is stable, and one effective parameter for measuring the change speed of the zero crossing rate is the zero crossing rate variance. Therefore, the music signal and the speech-music mixed signal can be distinguished by the zero-crossing rate variance. In one embodiment, the active noise reduction audio device 10 may distinguish between music and mixed sounds based on zero crossing rate variance. For example, at 420, the active noise reduction audio device 10 analyzes the zero crossing rate variance of the first signal to determine whether the zero crossing rate variance of the first signal is above a given threshold.
If the zero-crossing rate variance is not above the given threshold, the active noise reduction audio device 10 determines that the first signal is a music signal at 422. For music signals, the inventors found that the pool model also has a better estimation effect for music signals. Accordingly, the active noise reduction audio device 10 may estimate 424 the first signal using the reservoir model to generate a third signal. Although the pool model is used herein to estimate based on music signals, this is merely illustrative and not limiting of the scope of the present disclosure. In some embodiments, a weighted model may be used to estimate the third signal based on the music signal.
If the zero crossing rate variance is above a given threshold, the active noise reduction audio device 10 determines 421 that the first signal is a mixed signal. For mixed signals, the active noise reduction audio device 10 may estimate 423 the first signal using a weighted model to generate a second signal. In one embodiment, the active noise reduction audio device 10 may estimate the first signal using at least two of a pool model, an LPC model, and a neural network model, and assign a weight coefficient to the result of the estimation for each model. For example, if there are more voiced signals and less noisy signals, a higher weight coefficient, e.g., 0.75, may be assigned to the result estimated for voiced signals using the LPC model, and a lower weight coefficient, e.g., 0.25, may be assigned to the result estimated for noisy signals using the neural network model. The corresponding estimation results are then multiplied by the weight coefficients and the two products are added to obtain the final second signal. In another embodiment, the weighted estimation may also be performed using the pool model, the LPC model and the neural network model to obtain the final third signal. It will be appreciated that the weighting coefficients for each model in the weighted model may be dynamically adjusted. For example, the active noise reduction audio device 10 may dynamically adjust the respective coefficients based on a residual signal described later to obtain a better noise reduction effect.
It follows that by subdividing the classification of the ambient sound and pertinently employing an estimation model suitable for each classification, noise reduction can be effectively performed even if the ambient sound has a wide sound frequency range, thereby remarkably increasing the noise reduction width. On the other hand, the environmental noise with stronger decibels usually has specific sound characteristics and belongs to specific classifications, so that the noise decibels can be effectively reduced by applying the estimation model in a targeted manner so as to obtain a remarkable noise reduction depth.
While the classification process and corresponding estimation process are described above in sequence in terms of the flow shown in fig. 4, it will be appreciated that this is merely illustrative and not limiting on the scope of the present disclosure. Other classification and estimation processes are possible. For example, the active noise reduction audio device 10 may analyze various characteristic aspects of the first signal after it is acquired and directly determine the classification category of the first signal based on the analysis result without sequentially step-by-step analysis to obtain the classification category of the first signal as shown in fig. 4. Furthermore, in cases for more or fewer categories, the corresponding model estimation approach may be added or removed to more accurately generate the third signal and then inverted to generate the second signal.
Fig. 5 is a schematic diagram of a process 500 of active noise reduction according to one embodiment of the present disclosure. The first microphone 13 of the active noise reduction audio device 10 converts the ambient sound after it has been captured into a first signal 502 and outputs it to the processor 17. The processor 17 may selectively filter 520 the first signal 502 to generate a filtered first filtered signal 504. In this context, filtering means other operations performed on the signal than estimation and inversion operations. Filtering 520 may include, for example, adjusting gain, band filtering, signal noise reduction, and the like, existing or future signal processing. Alternatively, filtering 520 may not be present in some embodiments. The processor 17 may then estimate 530 the first filtered signal 504, for example using the method 300 of fig. 3 and/or the process 400 of fig. 4, to generate and output the second signal 506 to the built-in first speaker 15 of the active noise reduction audio device 10. Note that in fig. 5, the estimation 530 includes the step of inverting, and thus the inverting operation is not described here. Although filtering 520 and estimation 530 are described above by way of example with respect to processor 17, filtering 520 and estimation 530 may be performed by other components. For example, filtering 520 may be performed by a separate filter. Furthermore, although in fig. 5 it is shown that the filtering 520 is performed first and then the estimation 530 is performed, it is also possible to first estimate 530 and then perform the filtering 520.
The first speaker 15 plays the received second signal to produce a first sound 512 within the ear. At the same time, the direct sound 514 of the ambient sound during the second period also reaches inside the ear via the active noise reduction audio device 10. Since the direct sound 514 is in opposite phase to the first sound 512, the two effectively cancel each other such that the net volume of the sound 508 perceived inside the ear is reduced compared to the direct sound 514 and the first sound 512. In this way, the active noise reduction audio device 10 may implement active noise reduction. In embodiments of the present disclosure, since the inverted sound for cancellation is the "predicted" inverted sound for the second period, it is not necessary to provide the corresponding inverted sound for the first period instantaneously when the direct sound during the first period reaches the inside of the ear. In this way, it is equivalent to shifting the generation of the anti-phase sound by one period of time backward, thereby reducing the severe requirement on the processing speed of the processor 17, and avoiding or alleviating the problem of non-ideal noise reduction effect due to the mismatch of the direct sound and the anti-phase sound in the same period of time. On the other hand, hardware design complexity and correspondingly hardware cost may also be reduced because there is no need to use ultra-high-speed computing circuitry in the active noise reduction audio device 10.
In one embodiment, the active noise reduction audio device 10 also has a built-in residual microphone 14. The residual microphone 14 is configured to collect residual sound 516 inside the ear. The residual sound 516 is actually the residual sound after noise reduction. The residual microphone 14 thus also generates a residual signal 508 representing the residual sound 516 based on the acquired residual sound 516, and feeds back the residual signal 508 as an evaluation signal to the processor 17. The processor 17 may adjust at least one of the filtering 520 and the estimating 530 based on the residual signal 508. For example, in one embodiment, the filtering 520 is adaptive filtering, which may automatically adjust the filtering 520 based on the residual signal 508. Furthermore, the processor 17 may also adjust the estimate 530 based on the residual signal 508. For example, the length of time of a second period after the first period may be adjusted, during which the processor 17 estimates an inverted audio signal of the sound collected during the second period. If the residual signal 508 is smaller, this indicates that the estimation is better. Accordingly, the processor 17 may correspondingly increase the time length of the second period based on the residual signal 508, for example increasing the time point comprised by the second period from 1 to 2, 3 or 4. If the residual signal 508 is large, this indicates that the estimation effect is less than ideal. Accordingly, the processor 17 may correspondingly reduce the time length of the second period based on the residual signal 508, for example, by reducing the estimated time points comprised by the second period from 4 to 3, 2, 1, or even 0 (i.e., not estimated).
On the other hand, if the residual signal 508 is large, the processor 17 may also alter the estimation model or adjust the model parameters. For example from the LPC model to the reservoir model, or to adjust the parameters of the weighting model. In addition, the processor 17 may adjust the filtering 520 because the noise reduction effect is not ideal and may be caused by the filtering 520. It will be appreciated that the processor 17 may adjust at least one of the second period length, the estimation model, the model parameters, and the filtering 520 described above to achieve better noise reduction performance. Furthermore, in the case where the noise reduction performance is set to a non-optimal noise reduction performance in order not to miss important environmental sounds (e.g., subway station report voices), the processor 17 may also adjust at least one of the second period length, the estimation model, the model parameters, and the filtering 520 according to the setting. For example avoiding the use of pool models and LPC models or reducing their weight coefficients in the weighting model.
Although a residual microphone 14 and adaptive filtering and estimation adjustments based on residual signal 508 are shown in fig. 5, this is merely illustrative and not limiting on the scope of the present disclosure. In some embodiments, there may be no residual microphone 14, the filtering 520 is a fixed filtering and the estimation 530 of the second signal is not adjusted accordingly. Furthermore, while only the first microphone 13, the first speaker 15, and the residual microphone 14 are shown in fig. 5, this is merely illustrative and the active noise reduction audio device 10 may have more microphones and/or speakers.
In some embodiments, the first microphone 13 may collect ambient sound during a first period of time to generate a first signal. The processor 17 may determine whether the previous estimation is accurate based on the previously estimated initial inverse estimation signal for the first period and the acquired first signal representing the real environmental sound of the first period, and adjust the time length of the subsequent estimation based on the determined result. For example, the time length of the second period after the first period is adjusted. If the sum of the first signal and the initial inverse estimate signal is small, this indicates that the estimate is good, and the length of time for the subsequent estimate can be maintained or increased. Conversely, if the sum of the first signal and the initial inverse estimated signal is large, this indicates that the estimated result is poor, and the length of time for the subsequent estimation can be reduced accordingly. Alternatively, the processor 17 may also determine whether the previous estimate is accurate based on the previously estimated signal for the first period of time and the first signal. If the difference between the first signal and the estimated signal is small, this indicates that the estimation result is good, and the length of time for the subsequent estimation can be maintained or increased. Conversely, if the difference between the first signal and the estimated signal is large, this indicates that the estimated result is poor, and the time length of the subsequent estimation can be reduced accordingly. Furthermore, like the residual signal, the above approach may be used to adjust the estimation model and/or filtering in addition to the length of time that the subsequent estimation may be used to.
Fig. 6 is a schematic diagram of a process 600 of active noise reduction according to another embodiment of the present disclosure. The first microphone 13 of the active noise reduction audio device 10 converts the ambient sound after it has been captured into a first signal 602 and outputs it to the processor 17. The processor 17 may perform selective first filtering 630 on the first signal 602 to generate a filtered first filtered signal 604. The first filtering 630 may include, for example, adjusting gain, band filtering, noise reduction, and the like. The processor 17 may then make a first estimate 630 of the first filtered signal 604, for example using the method 300 of fig. 3 and/or the process 400 of fig. 4, to generate and output a second signal 606 to the built-in first speaker 15 of the active noise reduction audio device 10. Note that in fig. 6, the first estimation 630 includes the step of inverting, and thus the inverting operation is not described here. The first speaker 15 plays the received second signal 606 to produce a first sound 612 within the ear.
Fig. 6 differs from fig. 5 in that fig. 6 also has a second branch to perform the second filtering 622 and the second estimation 632. Similarly, the first microphone 13 outputs a first signal 602 to the processor 17. The processor 17 may perform selective second filtering 622 on the first signal 602 to generate a filtered second filtered signal 605. The second filter 632 may include, for example, processing to adjust gain, band filtering, noise reduction, etc., and the second filter 632 may be the same as or different from the first filter 630. The processor 17 may then make a second estimate 632 of the second filtered signal 605, for example using the method 300 of fig. 3 and/or the process 400 of fig. 4, to generate and output a sixth signal 607 to the built-in second speaker 18 of the active noise reduction audio device 10. Note that in fig. 6, the second estimation 632 includes the step of inverting, and thus the inverting operation is not described here. The second speaker 18 plays the received sixth signal 607 to produce a second sound 613 in the ear.
At the same time, the direct sound 614 of the ambient sound during the second period also reaches inside the ear via the active noise reduction audio device 10. Because the direct sound 614 is in opposite phase to the first sound 612 and the second sound 613, the direct sound 614 and the first sound 612 and the second sound 613 effectively cancel each other to such an extent that the net volume of the sound 616 perceived inside the ear is reduced compared to the direct sound 614, the first sound 612 and the second sound 613. In this way, the active noise reduction audio device 10 may implement active noise reduction.
In one embodiment, the second filter 622 is different from the first filter 620 and the second estimate 632 is different from the first estimate 630. For example, the first filter 620 may be for a speech signal and the first estimate 630 is also for a speech signal, while the second filter 622 is for a music signal and the second estimate 632 is also for a music signal. As another example, the first filter 620 is for a low frequency audio signal and the first estimate 630 is also for a low frequency signal, while the second filter 622 is for a medium and high frequency signal and the second estimate 632 is also for a medium and high frequency signal. In this case, the respective setting optimization settings can be made for the respective kinds of signals. Furthermore, it will be appreciated that the respective first 15 and second 18 speakers may also be selected for different classes of sound to achieve better noise reduction depth and noise reduction width, as speakers for a particular class tend to have better sound teaching than general wide range speakers.
Fig. 7 shows a schematic diagram of another active noise reduction audio device 20 in which embodiments of the present disclosure may be implemented. The active noise reduction audio device 20 may be, for example, a headset. The active noise reduction audio device 20 may include a pair of earmuff portions, and the two earmuff portions are configured substantially identically to one another. And is therefore only schematically depicted in one ear cup portion. The earmuff portion includes an external first microphone 13, a second microphone 19 and a processor 17 located inside the earmuff portion. The earmuff portion also includes a first residual microphone 14, a second residual microphone 16, a first speaker 15, and a second speaker 18 located inside the earmuff portion (relative to the first microphone 13 and the second microphone 19 that are exposed to the environment). The first microphone 13 and the second microphone 19 are each configured to detect or collect sounds of an external environment, and the first microphone 13 and the second microphone 19 may be simultaneously operated or alternately operated and may collect the same or different sounds. In one embodiment, the first microphone 13 may have an internal first filter to collect sound at only a first frequency, and the second microphone 19 may have an internal second filter to collect sound at only a second frequency. For example, the first frequency is a low frequency and the second frequency is a medium-high frequency. By capturing sound for different frequencies, more ambient sound details can be obtained to achieve better sound estimation, and thus better noise reduction width and noise reduction depth.
In one embodiment, the external first and second microphones 13, 19 of the active noise reduction audio device 20 collect ambient sound and are acoustically-electrically converted to generate a continuous electrical signal and transmitted to the processor 17. The processor 17 predicts or estimates the ambient sound at the subsequent time based on the received signal, and generates and transmits an inverted signal representing the ambient sound at the subsequent time to the built-in first speaker 15 and second speaker 18. The first speaker 15 and the second speaker 18 play the inverted sound based on the received inverted signal to cancel the direct ambient sound in the ambient direct active noise reduction audio device 20 at a subsequent time, thereby achieving the effect of noise reduction.
Fig. 8 is a schematic diagram of a process 800 of active noise reduction according to yet another embodiment of the present disclosure. In one embodiment, process 800 may be implemented in active noise reduction audio device 20 shown in FIG. 7. After the first microphone 13 of the active noise reduction audio device 20 collects ambient sound during the first period of time, it converts it into a first signal 802 and outputs it to the processor 17. Processor 17 may perform selective first filtering 820 on first signal 802 to generate filtered first filtered signal 804. The filtering 820 may include, for example, processing to adjust gain, band filtering, noise reduction, and the like. The processor 17 may then make a first estimate 830 of the first filtered signal 804, for example using the method 300 of fig. 3 and/or the process 400 of fig. 4, to generate and output a second signal 806 corresponding to the second period of time to the built-in first speaker 15 of the active noise reduction audio device 20. Note that in fig. 8, the estimation 830 includes the step of inverting, and thus the inverting operation is not described here. Although the first filtering 820 and the first estimation 830 are described above by way of example with respect to the processor 17, the first filtering 820 and the first estimation 830 may also be performed by other components. For example, the first filtering 820 may be performed by a separate filter. Further, although it is shown in fig. 8 that the first filtering 820 is performed first and then the first estimation 830 is performed, the first estimation 830 may be performed first and then the first filtering 820 is performed.
Similarly, after the ambient sound is captured during the first period of time by the second speaker 18 of the active noise reduction audio device 20, it is converted to a fifth signal and output to the processor 17. The processor 17 may perform selective second filtering 822 on the fifth signal to generate a filtered second filtered signal 805. The second filtering 822 may include, for example, adjusting gain, band filtering, noise reduction, and the like. The processor 17 may then make a second estimate 832 of the second filtered signal 805, for example using the method 300 of fig. 3 and/or the process 400 of fig. 4, to generate and output a sixth signal 807 corresponding to the second period of time to the built-in second speaker 18 of the active noise reduction audio device 20. Note that in fig. 8, the second estimation 832 includes the step of inverting, so the inverting operation is not described here.
The first speaker 15 plays the received second signal 806 to produce the first sound 823 in the ear and the second speaker 18 plays the received sixth signal 807 to produce the second sound 825 in the ear. The first sound 823 and the second sound 825 may be played simultaneously or alternately. The direct sound 824 of the ambient sound during the second period also is directed to the inside of the ear via the active noise reduction audio device 20. Because the direct sound 824 is in opposite phase to the first sound 823 and the second sound 825, the direct sound 824 and the first sound 823 and the second sound 825 effectively cancel each other to some extent such that the net volume of the received sounds 826 and 827 inside the ear is reduced compared to the direct sound 824, the first sound 823 and the second sound 825. In this way, the active noise reduction audio device 20 may implement active noise reduction.
In one embodiment, the active noise reduction audio device 20 also has a built-in first residual microphone 14 and second residual microphone 16. The residual microphone 14 is configured to collect a first residual sound 826 inside the ear. The first residual sound 826 is actually the residual sound after noise reduction. The first residual microphone 14 thus also generates a first residual signal 808 representing the first residual sound 826 based on the acquired first residual sound 826 and feeds the first residual signal 808 back to the processor 17. The processor 17 may adjust at least one of the first filtering 820 and the first estimation 830 based on the first residual signal 808. For example, in one embodiment, the first filter 820 is an adaptive filter that may automatically adjust the first filter 820 based on the first residual signal 808. Furthermore, the processor 17 may also adjust the first estimate 830 based on the first residual signal 808. For example, the time length of a second period after the first period may be adjusted, and the processor 17 estimates an inverted audio signal of the second period during the second period based on the sound signal acquired during the first period. If the first residual signal 808 is smaller, this indicates that the estimation is better. Accordingly, the processor 17 may correspondingly increase the time length of the second period based on the first residual signal 808, for example, increasing the time point comprised by the second period from 1 to 2, 3 or 4. If the first residual signal 808 is larger, this indicates that the estimation effect is less ideal. Accordingly, the processor 17 may correspondingly reduce the time length of the second period based on the first residual signal 808, for example, from 4 to 3, 2, 1, or even 0 (i.e., not estimated) estimated time points comprised by the second period.
On the other hand, if the first residual signal 808 is large, the processor 17 may also alter the estimation model or adjust the model parameters. For example from the LPC model to the reservoir model, or to adjust the parameters of the weighting model. The processor 17 may adjust the first filter 820 because the noise reduction effect is not ideal and may be caused by the first filter 820. It will be appreciated that the processor 17 may adjust at least one of the second period length, the estimation model, the model parameters, and the first filtering 820 described above to achieve better noise reduction performance. Furthermore, in the case where the noise reduction performance is set to a non-optimal noise reduction performance in order not to miss important environmental sounds (e.g., subway station report voices), the processor 17 may also adjust at least one of the second period length, the estimation model, the model parameters, and the first filtering 820 according to the setting. For example avoiding the use of pool models and LPC models or reducing their weighting coefficients in the weighting model.
Similarly, the processor 17 may adjust at least one of the length of the second period, the estimation model, the model parameters, and the second filtering 822 based on the second residual signal 810 to obtain better noise reduction performance. Although first residual microphone 14, second residual microphone 16, and corresponding adaptive filtering and estimation adjustments based on first residual signal 808 and second residual signal 810 are shown in fig. 8, this is merely illustrative and not limiting of the scope of the present disclosure. In some embodiments, there may be no first residual microphone 14 and/or second residual microphone 16, e.g., the first filter 820 and/or the second filter 822 are fixed filters and the first estimate 830 for the second signal 806 and/or the second estimate 832 for the sixth signal 807 are not adjusted accordingly. Furthermore, it is understood that the active noise reduction audio device 20 may have more microphones and/or speakers.
Although fig. 8 shows that the first sound 823 and the direct sound 824 and the second sound 825 and the direct sound 824 cancel each other out, this is merely an illustrative example when the first sound 823 and the second sound 825 are alternately played. When the first sound 823 and the second sound 825 are played simultaneously, the first sound 823, the second sound 825, and the direct sound 824 may be summed together and the same residual sound is generated for collection by the first residual microphone 14 and the second residual microphone 16. In this case, the first residual signal 808 and the second residual signal 810 may be the same. Alternatively, the first residual signal 808 and the second residual signal 810 may also be different in this case, for example due to the performance and location of the first residual microphone 14 and the second residual microphone 16. In this case, the processor 17 may average the first residual signal 808 and the second residual signal 810 for use by the first filter 820, the second filter 822, the first estimate 830, and the second estimate 832. This can avoid noise reduction aberrations due to position or microphone performance, thereby providing a more stable noise reduction effect.
In embodiments of the present disclosure, since the inverted sound for cancellation is the "predicted" inverted sound for the second period, it is not necessary to provide the corresponding inverted sound for the first period instantaneously when the direct sound during the first period reaches the inside of the ear. In this way, it is equivalent to shifting the generation of the anti-phase sound by one period of time backward, thereby reducing the severe requirement on the processing speed of the processor 17, and avoiding or alleviating the problem of non-ideal noise reduction performance due to the mismatch of the direct sound and the anti-phase sound in the same period of time. On the other hand, hardware design complexity and correspondingly hardware cost may also be reduced because a very high speed computing device is not required.
It is to be appreciated that the first filter 820 and the second filter 822 may be the same or different, and the first estimate 830 and the second estimate 832 may be the same or different. In one embodiment, the second filter 822 is different from the first filter 820 and the second estimate 832 is different from the first estimate 830. For example, the first filter 820 may be for a speech signal and the first estimate 830 is also for a speech signal, while the second filter 822 is for a music signal and the second estimate 832 is also for a music signal. As another example, the first filter 820 is for a low frequency audio signal and the first estimate 830 is also for a low frequency signal, while the second filter 822 is for a medium and high frequency signal and the second estimate 832 is also for a medium and high frequency signal. Accordingly, the first sound 823 is a low frequency sound, and the second sound 825 is a medium-high frequency sound. In this case, the respective setting optimization settings can be made for the respective kinds of signals. Furthermore, it will be appreciated that the respective first 15 and second 18 speakers may also be selected for different classes of sound to achieve better noise reduction depth and noise reduction width, as speakers for a particular class tend to have better sound conditioning than general wide range speakers.
Fig. 9 is a schematic block diagram of a computer-readable storage medium 900 according to one embodiment of the present disclosure. The computer-readable storage medium 900 is, for example, a cache in the processor 17, the internal memory 121 in fig. 2, or the like. The computer-readable storage medium 900 stores one or more programs 902 …..906, the one or more programs 902 …..906 configured to be executed by one or more processors of the active noise reduction audio device. The one or more programs 902 …..906 may individually or collectively include instructions executable by the processor 17 to implement the methods or processes described herein, such as the method 300 shown in fig. 3, the method 400 shown in fig. 4, the process 500 shown in fig. 5, the process 600 shown in fig. 6, and/or the process 800 shown in fig. 8. It will be appreciated that the computer-readable storage medium 900 may also include programs for implementing other methods and steps.
Fig. 10 is a schematic block diagram of an apparatus 1000 for noise reduction of ambient sound according to one embodiment of the disclosure. The apparatus 1000 may be applied to an active noise reduction audio device. The apparatus 1000 comprises: the device comprises an acquisition module and an inverse estimation signal generation module. The acquisition module is used for acquiring a first signal and an evaluation signal corresponding to a first period. The first signal is representative of a first ambient sound captured by a first microphone of the active noise reduction device during a first period of time. The inverse estimation signal generation module is used for determining the time length of a second period based on the estimation signal, wherein the second period is after the first period; and generating a second signal to be played by the first speaker of the active noise reduction device based on the first signal and a time length of the second period, the second signal representing an inverse estimate of the first ambient sound during the second period. By predicting sound at a future time based on the audio signal at the present time, a better noise reduction width and noise reduction depth can be obtained, and also the circuit design can be simplified and the need for a high-speed processing circuit can be reduced, thereby reducing the cost of the active noise reduction audio device.
Although only two modules are shown in fig. 10, it is to be understood that this is merely illustrative and not limiting of the scope of the present disclosure. The apparatus 1000 may also include corresponding modules for performing the various steps in the methods 300, methods 400, processes 500, processes 600, and/or processes 800 described above.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims (21)

  1. A method for active noise reduction, comprising:
    acquiring a first signal representing a first ambient sound acquired by a first microphone of the active noise reduction device during a first period of time;
    acquiring an evaluation signal corresponding to the first period;
    determining a time length of a second period based on the evaluation signal, the second period being subsequent to the first period; and
    a second signal to be played by a first speaker of the active noise reduction device is generated based on the first signal and a time length of a second period, the second signal representing an inverse estimate of a first ambient sound during the second period.
  2. The method of claim 1, wherein generating a second signal to be played by a first speaker of the active noise reduction device based on the first signal and a time length of a second period comprises:
    generating a third signal based on the first signal and a time length of a second period, the third signal representing an estimate of a first ambient sound during the second period; and
    inverting the third signal to generate the second signal.
  3. The method of claim 1, wherein generating a second signal to be played by a first speaker of the active noise reduction device based on the first signal and a time length of a second period comprises:
    inverting the first signal to generate a fourth signal representing an inverted sound of the first ambient sound during the first period; and
    the second signal is generated based on the fourth signal and a time length of a second period.
  4. A method according to any of claims 1-3, wherein obtaining an evaluation signal corresponding to the first period of time comprises:
    and acquiring a residual signal acquired by a residual microphone of the active noise reduction audio device during the first period as the evaluation signal, wherein the residual microphone is different from the first microphone.
  5. A method according to any of claims 1-3, wherein obtaining an evaluation signal corresponding to the first period of time comprises:
    the evaluation signal is determined based on the first signal and an inverted evaluation signal corresponding to the first period estimated at a period preceding the first period or based on the first signal and an evaluation signal corresponding to the first period estimated at a period preceding the first period.
  6. The method of any of claims 1-5, wherein generating a second signal to be played by a first speaker of the active noise reduction device based on the first signal and a length of time of a second period of time comprises:
    determining a category of a first ambient sound during the first period of time based on the first signal;
    determining an estimation model corresponding to the category of the first environmental sound based on the category of the first environmental sound; and
    the second signal is generated based on the estimation model, the first signal, and a time length of a second period.
  7. The method of claim 6, wherein determining an estimation model corresponding to the category of the first ambient sound based on the category of the first ambient sound comprises:
    Determining a weighted model corresponding to the first ambient sound based on the class of the first ambient sound, the weighted model comprising: the first estimation model, the weight of the first estimation model, the second estimation model and the weight of the second estimation model.
  8. The method of claim 6 or 7, further comprising:
    the estimation model is adjusted based on the evaluation signal.
  9. The method of any of claims 6-8, wherein generating a second signal to be played by a first speaker of the active noise reduction device based on the first signal and a time length of a second period comprises:
    determining a filter corresponding to the first ambient sound based on the class of the first ambient sound;
    the second signal is generated based on the first signal, a time length of the second period, and the filtering.
  10. The method of any of claims 1-9, further comprising:
    obtaining a fifth signal representative of a second ambient sound captured by a second microphone in the active noise reduction audio device during the first period of time; and
    generating a sixth signal to be played by a second speaker in the active noise reduction audio device based on the fifth signal and a length of time of the second period, the sixth signal representing an inverse estimate of the second ambient sound during the second period.
  11. A computer readable storage medium storing one or more programs configured for execution by one or more processors, the one or more programs comprising instructions for performing the method of any of claims 1-10.
  12. A computer program product comprising one or more programs configured for execution by one or more processors, the one or more programs comprising instructions for performing the method of any of claims 1-10.
  13. An active noise reduction audio device comprising:
    an acquisition module for acquiring a first signal representing a first ambient sound acquired by a first microphone of an active noise reduction device during a first period of time and an evaluation signal corresponding to the first period of time; and
    an inverse estimation signal generation module for
    Determining a time length of a second period based on the evaluation signal, the second period being subsequent to the first period; and
    a second signal to be played by a first speaker of the active noise reduction device is generated based on the first signal and a time length of a second period, the second signal representing an inverse estimate of a first ambient sound during the second period.
  14. The active noise reduction audio device of claim 13, wherein the acquisition module is further to:
    the evaluation signal is determined based on the first signal and an inverted evaluation signal corresponding to the first period estimated at a period preceding the first period or based on the first signal and an evaluation signal corresponding to the first period estimated at a period preceding the first period.
  15. The active noise reduction audio device of claim 13, wherein
    The acquisition module is further configured to acquire, as the evaluation signal, a residual signal acquired during the first period by a residual microphone of the active noise reduction audio device, the residual microphone being different from the first microphone.
  16. The active noise reduction audio device of any of claims 13-15, wherein the inverse estimation signal generation module is further to:
    determining a category of a first ambient sound during the first period of time based on the first signal;
    determining an estimation model corresponding to the category of the first environmental sound based on the category of the first environmental sound; and
    the second signal is generated based on the estimation model, the first signal, and a time length of a second period.
  17. The active noise reduction audio device of claim 16, wherein the inverse estimation signal generation module is further to:
    determining a weighted model corresponding to the first ambient sound based on the class of the first ambient sound, the weighted model comprising: the first estimation model, the weight of the first estimation model, the second estimation model and the weight of the second estimation model.
  18. The active noise reduction audio device of claim 16 or 17, wherein the inverse estimation signal generation module is further configured to adjust the estimation model by the estimation signal.
  19. An active noise reduction audio device comprising:
    one or more processors;
    a memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the method of any of claims 1-10.
  20. An active noise reduction audio device comprising:
    a first microphone configured to:
    collecting a first ambient sound during a first period of time and generating a first signal;
    one or more processors configured to:
    acquiring an evaluation signal corresponding to the first period;
    Determining a time length of a second period based on the evaluation signal, the second period being subsequent to the first period; and
    generating a second signal representative of an inverse estimate of the first ambient sound during the second period of time, the first signal and the time length of the second period of time; and
    a first speaker configured to play the second signal during the second period.
  21. The active noise reduction audio device of claim 20, further comprising:
    a residual microphone configured to collect residual sound to generate the residual signal as the evaluation signal.
CN202180095625.9A 2021-03-25 2021-03-25 Active noise reduction audio device and method for active noise reduction Pending CN116982106A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/082870 WO2022198538A1 (en) 2021-03-25 2021-03-25 Active noise reduction audio device, and method for active noise reduction

Publications (1)

Publication Number Publication Date
CN116982106A true CN116982106A (en) 2023-10-31

Family

ID=83395091

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180095625.9A Pending CN116982106A (en) 2021-03-25 2021-03-25 Active noise reduction audio device and method for active noise reduction

Country Status (2)

Country Link
CN (1) CN116982106A (en)
WO (1) WO2022198538A1 (en)

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2445984B (en) * 2007-01-25 2011-12-07 Sonaptic Ltd Ambient noise reduction
JP4631939B2 (en) * 2008-06-27 2011-02-16 ソニー株式会社 Noise reducing voice reproducing apparatus and noise reducing voice reproducing method
US9402132B2 (en) * 2013-10-14 2016-07-26 Qualcomm Incorporated Limiting active noise cancellation output
EP3496089A1 (en) * 2015-05-08 2019-06-12 Huawei Technologies Co., Ltd. Active noise cancellation device
JP6928865B2 (en) * 2017-03-16 2021-09-01 パナソニックIpマネジメント株式会社 Active noise reduction device and active noise reduction method
CN107564538A (en) * 2017-09-18 2018-01-09 武汉大学 The definition enhancing method and system of a kind of real-time speech communicating
CN111050250B (en) * 2020-01-15 2021-11-02 北京声智科技有限公司 Noise reduction method, device, equipment and storage medium
CN112188340B (en) * 2020-09-22 2022-08-02 泰凌微电子(上海)股份有限公司 Active noise reduction method, active noise reduction device and earphone

Also Published As

Publication number Publication date
WO2022198538A1 (en) 2022-09-29

Similar Documents

Publication Publication Date Title
EP3682651B1 (en) Low latency audio enhancement
CN109040932B (en) Microphone system and hearing device comprising a microphone system
RU2595636C2 (en) System and method for audio signal generation
RU2605522C2 (en) Device containing plurality of audio sensors and operation method thereof
KR101444100B1 (en) Noise cancelling method and apparatus from the mixed sound
JP6150988B2 (en) Audio device including means for denoising audio signals by fractional delay filtering, especially for "hands free" telephone systems
CN109493877B (en) Voice enhancement method and device of hearing aid device
GB2581596A (en) Headset on ear state detection
KR20130055650A (en) Systems, methods, apparatus, and computer-readable media for multi-microphone location-selective processing
JP2011511571A (en) Improve sound quality by intelligently selecting between signals from multiple microphones
KR20170053623A (en) Method and apparatus for enhancing sound sources
WO2020097828A1 (en) Echo cancellation method, delay estimation method, echo cancellation apparatus, delay estimation apparatus, storage medium, and device
CN113949956B (en) Noise reduction processing method and device, electronic equipment, earphone and storage medium
CN113949955B (en) Noise reduction processing method and device, electronic equipment, earphone and storage medium
JP6265903B2 (en) Signal noise attenuation
US20230209283A1 (en) Method for audio signal processing on a hearing system, hearing system and neural network for audio signal processing
CN116982106A (en) Active noise reduction audio device and method for active noise reduction
CN114023352A (en) Voice enhancement method and device based on energy spectrum depth modulation
JP2013078117A (en) Noise reduction device, audio input device, radio communication device, and noise reduction method
CN118251717A (en) Active noise reduction system, method and equipment based on FxLMS structure
JP6221463B2 (en) Audio signal processing apparatus and program
CN114697785A (en) Audio signal processing method and system for suppressing echo

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination