CN116419111A

CN116419111A - Earphone control method, parameter generation method, device, storage medium and earphone

Info

Publication number: CN116419111A
Application number: CN202111670836.9A
Authority: CN
Inventors: 练添富; 周依鸽
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2023-07-11

Abstract

The embodiment of the application discloses a control method and a parameter generation method and device of an earphone, a storage medium and the earphone, wherein the method comprises the following steps: acquiring a residual audio signal; judging whether a target audio signal exists in the residual audio signal; and if the target audio signal exists in the residual audio signal, performing suppression processing on the target audio signal. By the scheme of the embodiment of the application, the active detection and the inhibition processing of the target audio signal can be realized, manual adjustment of a user is not needed, and the control efficiency is improved.

Description

Earphone control method, parameter generation method, device, storage medium and earphone

Technical Field

The application relates to the technical field of audio processing, in particular to a control method and a parameter generation method and device of an earphone, a storage medium and the earphone.

Background

If a user listens to music or watches video while wearing in-ear headphones, the audio signal in some specific cases may interfere with the user's listening to the music, for example, the user listens to east and west while wearing in-ear headphones, the sound of chewing food may be amplified in the user's ear, interfere with the user listening to the sound, and even mask the sound of the audio played by the headphones when the chewing sound is too loud. For another example, a user may wear an in-ear earphone on a vehicle to listen to music, and noise signals generated by the vehicle may interfere with the user's listening to noise, and these interfering signals may be different from the general ambient noise and may generate more significant interference than the general ambient noise.

When this occurs, the user can only reduce the interference of these signals by way of play-out, which is inefficient, interfering with the immersive viewing experience.

Disclosure of Invention

The embodiment of the application provides a control method and a parameter generation method and device of an earphone, a storage medium and the earphone, and can realize active detection and suppression processing of specific interference signals.

In a first aspect, an embodiment of the present application provides a method for controlling an earphone, including:

acquiring a residual audio signal;

judging whether a target audio signal exists in the residual audio signal;

and if the target audio signal exists in the residual audio signal, performing suppression processing on the target audio signal.

In a second aspect, an embodiment of the present application further provides a method for generating parameters of an earphone, where an in-ear end of the earphone is provided with an audio acquisition component and an audio output component, and the method includes:

playing the test audio signal through the audio output assembly, and collecting the in-ear audio signal through the audio collecting assembly;

acquiring an initial time domain response parameter, and generating a second offset signal of the test audio signal according to the initial time domain response parameter and the test audio signal;

and adjusting the initial time domain response parameter according to the error of the second cancellation signal and the in-ear audio signal until a preset stop condition is met, so as to obtain the target time domain response parameter of the earphone.

In a third aspect, an embodiment of the present application further provides a control device for an earphone, including:

the signal acquisition module is used for acquiring a residual audio signal;

the target detection module is used for judging whether a target audio signal exists in the residual audio signal or not;

and the target suppression module is used for performing suppression processing on the target audio signal if the target audio signal exists in the residual audio signal.

In a fourth aspect, embodiments of the present application further provide a parameter generating device of an earphone, where an in-ear end of the earphone is provided with an audio acquisition component and an audio output component, where the device includes:

the second control module is used for playing the test audio signal through the audio output assembly and collecting the in-ear audio signal through the audio collecting assembly;

In a fifth aspect, embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, which when run on a computer causes the computer to perform a method for controlling a headset as provided in any embodiment of the present application, or a method for generating parameters of a headset as provided in any embodiment of the present application.

In a sixth aspect, embodiments of the present application further provide an earphone, including an audio acquisition component, an audio output component, a processor, and a memory, where the memory has a computer program, and the processor is configured to execute a method for controlling the earphone provided in any embodiment of the present application or a method for generating parameters of the earphone provided in any embodiment of the present application by calling the computer program.

According to the technical scheme provided by the embodiment of the application, the residual audio signal is obtained. And detecting whether a target audio signal exists in the residual audio signal, and if so, performing inhibition processing on the target audio signal. Through the scheme of the embodiment of the application, the active detection and the inhibition processing of the specific interference signals can be realized, manual adjustment of a user is not needed, and the control efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of a method for controlling an earphone according to an embodiment of the present application.

Fig. 2 is a schematic structural diagram of an ANC earphone according to an embodiment of the present application.

Fig. 3 is a schematic diagram of a calculation principle of a time domain response parameter in an embodiment of the present application.

Fig. 4 is a schematic diagram of acquiring a residual audio signal in an embodiment of the present application.

Fig. 5 is a schematic diagram showing the comparison of time domain signals of masticatory sounds produced by different users.

Fig. 6 is a graphical illustration of spectral comparisons of masticatory sounds produced by different users.

Fig. 7 is a schematic diagram illustrating adjustment of feedback noise reduction in the method for controlling an earphone according to the embodiment of the present application.

Fig. 8 is a schematic diagram of a flow of increasing a play volume of an audio output assembly according to an embodiment of the present application.

Fig. 9 is a schematic diagram of a scenario of masticatory sound suppression in an embodiment of the present application.

Fig. 10 is a flowchart of a method for generating parameters of an earphone according to an embodiment of the present application.

Fig. 11 is a schematic structural diagram of a control device of an earphone according to an embodiment of the present application.

Fig. 12 is a schematic structural diagram of an earphone according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by a person skilled in the art without any inventive effort, are intended to be within the scope of the present application based on the embodiments herein.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The embodiment of the application provides a method for controlling an earphone, and an execution main body of the method for controlling an earphone may be a control device of the earphone provided by the embodiment of the application, or an earphone integrated with the control device of the earphone, where the control device of the earphone may be implemented in a hardware or software manner.

Referring to fig. 1, fig. 1 is a flowchart illustrating a method for controlling an earphone according to an embodiment of the present disclosure. The specific flow of the earphone control method provided by the embodiment of the application may be as follows:

101. a residual audio signal is acquired.

The earphone in the embodiment of the application may be an earphone with a noise reduction function, for example, an ANC (Active Noise Cancellation, active noise reduction) earphone. Or a common earphone without noise reduction function. Fig. 2 is a schematic structural diagram of an earphone according to an embodiment of the present application, where the schematic structural diagram is shown as an in-ear earphone, and in other embodiments, the earphone according to an embodiment of the present application may also be a headphone. The earphone is provided with an audio acquisition component and an audio output component, wherein the audio output component is used for playing a first audio signal, and the audio acquisition component is used for acquiring in-ear sound of a user. For example, the audio output component is a speaker built in the earphone, and the audio acquisition component is a microphone arranged on one side of the earphone close to the auditory canal.

The residual audio signal is an audio signal that leaks into the ear from sound in the external environment, or an audio signal obtained by processing an audio signal collected by an in-ear microphone. For example, an in-ear audio signal is obtained, and an audio signal played by the earphone is eliminated from the in-ear audio signal, so as to obtain a corresponding residual audio signal.

102. And judging whether a target audio signal exists in the residual audio signal.

After the residual audio signal is obtained, whether a target audio signal exists in the residual audio signal is detected. Wherein the target audio signal comprises at least one of a masticatory sound signal, a frictional sound signal, and a vehicle noise signal.

For example, in one embodiment, determining whether a target audio signal is present in the residual audio signal includes: extracting acoustic features of the residual audio signal; and classifying and detecting the acoustic characteristics according to the audio recognition model so as to detect whether a target audio signal exists in the residual audio signal.

Typically, different types of audio signals have different characteristics. In this embodiment, the target audio signal is identified based on the acoustic characteristics of the residual audio signal, and if the target audio signal is present in the residual audio signal, the acoustic characteristics of the residual audio signal also have acoustic characteristics that match those of the target audio signal. Based on the principle, whether the target audio signal exists in the residual audio signal is detected through a pre-trained audio recognition model, and the audio recognition model can be obtained through training of the pre-recorded target audio signal. Taking a target audio signal as a masticatory sound as an example, collecting masticatory sound signals of a plurality of different people in advance as positive samples, collecting common noise signals without masticatory sound as negative samples, respectively adding corresponding labels for the positive samples and the negative samples, training a pre-constructed masticatory sound recognition model audio recognition model based on a convolutional neural network by using the positive samples and the negative samples, and determining model parameters. The masticatory sound recognition model audio recognition model for determining the model parameters is used for on-line detection of masticatory sound signals.

103. And if the target audio signal exists in the residual audio signal, performing suppression processing on the target audio signal.

When the presence of the target audio signal in the residual audio signal is detected according to the above-described detection method. And performing suppression processing on the target audio signal. For example, the playing volume of the currently obtained first audio signal is increased to mask the target audio signal, so as to avoid interference of the target audio signal on the listening of the user.

In particular, the present application is not limited by the order of execution of the steps described, and certain steps may be performed in other orders or concurrently without conflict.

As can be seen from the above, the method for controlling the earphone according to the embodiments of the present application obtains the residual audio signal. And detecting whether a target audio signal exists in the residual audio signal, and if so, performing inhibition processing on the target audio signal. Through the scheme of the embodiment of the application, the active detection and the inhibition processing of the specific interference signals can be realized, manual adjustment of a user is not needed, and the control efficiency is improved.

In one embodiment, the earphone is an active noise reduction earphone; performing suppression processing on a target audio signal, including: according to the target amplitude or the first preset proportion, reducing the noise reduction of the earphone; or turn off the noise reduction mode of the headset.

In this embodiment, the headphones are active noise reduction headphones. For the noise reduction earphone provided with the audio acquisition component, feedback noise reduction can be performed on the signal acquired by the audio acquisition component, and the masticatory sound signal can be amplified in the feedback noise reduction process, which is specifically described above. In this case, when the presence of the masticatory sound signal in the in-ear audio signal is detected, the target amplitude or the first preset ratio is determined, and the noise reduction amount of the headphones is reduced in accordance with the target adjustment amount. For example, the first preset ratio may be 50%, and the noise reduction amount of the earphone is reduced to 50% of the current noise reduction amount.

Or, the feedback noise reduction amount of the earphone can be reduced by adjusting the parameters of the feedback noise reduction filter, so that the chewing sound is prevented from being played from the loudspeaker to further amplify the chewing sound volume after being picked up by the audio acquisition assembly.

In the following examples, the present application will be described in detail with reference to the example of the target audio signal as a masticatory sound signal for the convenience of the reader to understand the scheme.

In an embodiment, an in-ear terminal of the earphone is provided with an audio acquisition component and an audio output component, and the method includes: playing a first audio signal through an audio output assembly, and collecting an in-ear audio signal through an audio collecting assembly; and eliminating a second audio signal in the in-ear audio signal to obtain a residual audio signal, wherein the second audio signal is an audio signal obtained by transmitting the first audio signal to the audio acquisition assembly through the audio output assembly.

For the noise reduction earphone, a feedforward microphone and/or a feedback microphone are/is generally arranged, the feedforward microphone is arranged on the outer side of the earphone and used for detecting external environmental noise, feedforward noise reduction is realized through a feedforward filter, the feedback microphone is used for detecting noise residues in ears of a user wearing the earphone and carrying out feedback noise reduction through the feedback filter, and a built-in loudspeaker of the earphone realizes the noise reduction purpose by playing an anti-phase noise signal. The specific noise reduction principle is not described in detail herein. The embodiment of the application can directly multiplex the in-ear audio signals collected by the feedback microphone without adding new hardware.

For a common earphone without a noise reduction function, an audio acquisition component can be additionally arranged at a position, close to an auditory canal, of the common earphone, and an in-ear audio signal is acquired through the audio acquisition component. Compared with the earphone without the noise reduction function, when a user chews food while wearing the noise reduction earphone to watch video or listen to music, the influence of generated chewing sound may be larger, because the audio acquisition component of the noise reduction earphone can acquire in-ear audio signals as input data of the feedback noise reduction algorithm, if the chewing sound exists, the chewing sound can be acquired by the audio acquisition component to become a part of the in-ear audio signals, when the gain of the feedback noise reduction filter is larger, the part of the chewing sound can be amplified, the middle-high frequency signals in the amplified chewing sound signals are difficult to cancel by the noise reduction algorithm, and the chewing sound can be played through the audio output component instead, so that the chewing sound is amplified.

The scheme of the embodiment of the application can be used for any scene using the earphone, such as a scene using the earphone to carry out voice communication, a scene wearing the earphone to play music, or a scene wearing the earphone to watch video.

Next, in order to facilitate the reader to understand the present solution, the present solution is described in detail by taking the earphone as a noise reduction earphone, and taking a scene that the user wears the noise reduction earphone to play music as an example. For example, the user plays the first audio signal through headphones, the first audio signal may be music or the like. The earphone collects in-ear sound signals through the audio collection assembly while playing music through the audio output assembly.

Since the audio output component is outputting the first audio signal, such as a music signal, the sound signal in the ear canal collected by the audio collection component contains the music signal (the collected signal after playing). In order to accurately determine whether there is a masticatory sound signal in the in-ear audio signal, it is necessary to cancel the music signal in the in-ear audio signal.

It should be noted that, after the first audio signal passes through the transmission path between the audio output component and the audio acquisition component, a second audio signal is obtained, where the second audio signal is a music signal actually detected by the audio acquisition component. The second audio signal may be calculated based on the first audio signal and a time domain response parameter of a transfer path between the audio output component and the audio acquisition component. The transfer function between the audio output component and the audio acquisition component can be obtained through testing and is arranged in a memory of the earphone before the earphone leaves the factory.

After the first audio signal is acquired, a second audio signal is calculated based on the transfer function and the first audio signal, the second audio signal is eliminated from the in-ear audio signal, and the audio signal remaining after elimination is determined as the in-ear residual audio signal.

After the music signal in the in-ear audio signal is eliminated, the residual audio signal is detected, and whether the chewing sound signal exists in the residual audio signal can be judged. For example, the amplitude of the residual audio signal is detected, and when the amplitude is greater than a predetermined threshold, it is determined that a masticatory sound signal is present in the residual audio signal. For another example, the presence of the masticatory acoustic signal in the residual audio signal is detected by a pre-trained two-class model that may be trained using pre-recorded masticatory acoustic signals. For another example, the residual audio signal may be pre-emphasized, framed, windowed, and FFT (Fast Fourier Transform ) processed, and then its spectral energy calculated, and whether the spectral energy is greater than a predetermined energy threshold may be determined. For example, a feedforward microphone may be further disposed on the earphone, where the feedforward microphone is configured to collect external environmental noise, denoise the residual audio signal based on the environmental noise, detect intensity, spectral energy or time domain energy of the denoised residual audio signal, and determine whether the intensity, spectral energy or time domain energy of the denoised residual audio signal is greater than a preset threshold, and if yes, determine that a masticatory sound signal exists in the residual audio signal.

It will be appreciated that other detection methods based on the residual audio signal may be used in addition to the detection methods listed above, and are included in the solutions of the embodiments of the present application. And will not be described in detail herein.

When the presence of the masticatory sound signal in the residual audio signal is detected in accordance with the above detection method, the masticatory sound is subjected to the suppression processing. There are various implementations of the processing of suppressing the chewing sound, for example, increasing the play volume of the audio output assembly to block the chewing sound. Or when the earphone is a noise reduction earphone, the gain of the feedback noise reduction filter of the earphone is reduced so as to reduce the size of the residual audio signal input by the filter, and further prevent the chewing sound from being amplified by the audio output assembly.

It can be understood that the first audio signal played by the user is changed in real time, and the masticatory sound signal is also changed dynamically, so that the scheme of the application is to continuously collect the in-ear audio signal according to unit time and judge the masticatory sound in the playing process of the first audio signal, and the process is dynamic until the user terminates the playing of the audio. The specific length of the unit time can be preset according to the requirement.

In addition, if no masticatory sound signal exists in the residual audio signal, other processing is not needed, the playing of the first audio signal is continued, the step of playing the first audio signal through the audio output assembly and collecting the in-ear audio signal through the audio collecting assembly is executed in a returning mode.

Wherein in some embodiments, removing the second audio signal from the in-ear audio signal results in a residual audio signal comprising: removing the second audio signal from the in-ear audio signal to obtain a residual audio signal, comprising: acquiring a current time domain response parameter; generating a first cancellation signal corresponding to the first audio signal according to the time domain response parameter and the first audio signal; and eliminating the second audio signal in the in-ear audio signal based on the first cancellation signal to obtain a residual audio signal.

In this embodiment, after the first audio signal and the in-ear audio signal are acquired, the time-domain response parameter w of the transmission path between the audio output component and the audio acquisition component is acquired ^T (n). And calculating according to the time domain response parameter and the first audio signal x (n) and the following formula to obtain a first cancellation signal y (n).

y(n)＝w ^T (n) x (n) formula (1)

The time domain response parameter may be obtained through testing before the earphone leaves the factory and stored in a memory of the earphone. Alternatively, the calculation of the parameter may be actively triggered by the user in a quiet scene, so that the calculated time domain response parameter is a parameter adapted to the ear canal personalization of the user. The calculation principle of the parameter is as follows, in a quiet environment, the input test audio signal x1 (n) is played through the audio output component, and the signal collected through the audio collection component is d (n). The test audio signal x1 (n) is used as the input of the adaptive filter, and the processed output signal of the adaptive filter is y1 (n).

Wherein, the filter processing procedure is expressed as: y1 (n) =w ^T (n)x1(n)，w ^T (n) is a filter coefficient characterizing a time domain response of a transfer path between the audio output component and the audio acquisition component. Because no external noise exists in a quiet environment, the signal acquired by the audio acquisition component is the signal of x1 (n) after passing through the transmission path. The error between the two is as follows:

e(n)＝d(n)-y1(n)＝d(n)-w ^T (n) x1 (n) formula (2)

The adaptive filter comprises N filter coefficients, the initial value is 0, and the filter coefficients w are updated through iterative calculation ^T (n) such that the error signal e (n) between the output signal y1 (n) and the signal d (n) acquired by the audio acquisition component is minimized after the signal x1 (n) is input to the filter. Referring to fig. 3, fig. 3 is a schematic diagram illustrating a calculation principle of a time domain response parameter in an embodiment of the present application. Specifically, the minimum mean square error can be used for iterative calculation to obtain w with the minimum error ^T (n) calculating the w ^T (n) storing time domain response parameters as a transfer path between the audio output component and the audio acquisition component.

The above process can be performed before the earphone leaves the factory, and the calculated time domain response parameters are stored in a memory. Or, in the use stage of the user, the user sets the time domain response parameter in a quiet environment, for example, the user triggers an update instruction of the time domain response parameter through a preset operation, and plays any one audio signal as a test audio signal, the earphone collects the signal d (n) through the audio collection component, and performs iterative calculation according to a minimum mean square error to obtain w with the minimum error ^T And (n) storing the time domain response parameters corresponding to the current user as time domain response parameters.

In the time domain of acquiring the transmission path response parameter w ^T After (n), a first cancellation signal is calculated. For example, in an embodiment, a preset filter is set, the time domain response parameter is updated to a filter coefficient of the preset filter, then the first audio signal is used as input data of the preset filter, a first cancellation signal corresponding to the first audio signal is obtained by calculating according to the input data and the filter coefficient, and the calculation mode is the same as that of the formula (1), and the first audio signal is used as x (n).

According to the scheme of the embodiment, an adaptive filtering mode is adopted, a first counteracting signal capable of counteracting the first audio signal is calculated according to time domain response parameters of a transmission path between the audio output assembly and the audio acquisition assembly, the first counteracting signal has the same amplitude and opposite phase with the first audio signal, the first counteracting signal can eliminate music signals in the in-ear audio signal acquired by the audio acquisition assembly, and the rest part is residual audio signals which can be used for detecting subsequent chewing sound signals. Referring to fig. 4, fig. 4 is a schematic diagram of acquiring a residual audio signal according to an embodiment of the present application.

Wherein in some embodiments, determining whether the target audio signal is present in the residual audio signal comprises: and calculating energy values of the residual audio signals in all frequency bands or partial frequency bands, wherein if the energy values are larger than a first preset threshold value, determining that target audio signals exist in the residual audio signals.

In this embodiment, the target audio signal is still taken as a masticatory sound signal as an example, and the spectral energy value of the residual audio signal is calculated to determine whether the masticatory sound signal exists in the residual audio signal. The first preset threshold may be preset, for example, a residual audio signal in an environment where no masticatory sound signal is present is acquired, and a spectral energy value of the residual audio signal in the environment is calculated as the first preset threshold. It will be appreciated that, in different environments, the noise may be different, and the spectral energy values of a plurality of residual audio signals in a plurality of different environments may be obtained, and an average value of the spectral energy values of the plurality of residual audio signals may be calculated as the first preset threshold.

Or, setting different first preset thresholds for different degrees of environmental noise. For example, when the spectral energy value is greater than a first preset threshold, the step of determining that a masticatory sound signal is present in the residual audio signal may include: collecting environmental noise through a feedforward microphone, and calculating the amplitude of the environmental noise; acquiring a preset threshold corresponding to the amplitude as a first preset threshold; when the spectral energy value is greater than the first preset threshold, determining that a masticatory sound signal is present in the residual audio signal.

The manner in which the first preset threshold is obtained is described above. After the residual audio signal is acquired, a spectral energy value of the residual audio signal is calculated. In some embodiments, the residual audio signal may be pre-emphasized, framed, and then,The spectral energy is calculated after windowing and FFT processing. The spectrum energy is the square of the modulus of each frequency point signal in the frequency domain, and the result of FFT calculation on the residual audio signal Se is Sef. Calculating spectral energy E _sef The way of (2) is as follows:

where j is the total number of frequency bin signals, k e (1, j).

After the spectrum energy is obtained through calculation, whether the spectrum energy is larger than a first preset threshold Sthd or not is judged. If so, it may be determined that a masticatory sound signal is present in the residual audio signal.

Alternatively, in another embodiment, determining whether the target audio signal is present in the residual audio signal includes: extracting acoustic features of the residual audio signal; and classifying and detecting the acoustic characteristics according to the audio recognition model so as to detect whether a target audio signal exists in the residual audio signal.

In general, different people chew different foods with different chewing sounds and different chewing frequencies. As shown in fig. 5 and 6, fig. 5 is a schematic diagram of comparing time domain signals of masticatory sounds emitted by different users, and fig. 6 is a schematic diagram of comparing frequency spectrums of masticatory sounds emitted by different users. It can be seen that the time domain signal and the frequency spectrum of the masticatory acoustic signal of different persons are different. Based on this, in this embodiment, a deep learning artificial neural network is used to accurately identify masticatory acoustic signals.

For example, an audio recognition model based on a convolutional neural network is trained in advance, and the model can be a two-class model, so as to judge whether a masticatory sound signal exists in the residual audio signal. For example, masticatory sound signals of a plurality of different people are collected in advance to serve as positive samples, common noise signals without masticatory sound are collected to serve as negative samples, corresponding labels are respectively added to the positive samples and the negative samples, a pre-built convolutional neural network-based audio recognition model is trained by using the positive samples and the negative samples, and model parameters are determined. The audio recognition model, which determines the model parameters, is used for on-line detection of the masticatory sound signal.

In one embodiment, to improve the detection accuracy, the step of extracting the acoustic feature of the residual audio signal may include: and detecting the end point of the residual audio signal to remove the mute part in the signal, carrying out noise reduction treatment on the residual audio signal after removing the mute part, and extracting the characteristics of the residual audio signal after the noise reduction treatment to obtain the acoustic characteristics. The acoustic feature may be, among other things, MFCC (Mel-Frequency Cepstral Coefficients, mel cepstrum parameter).

After the acoustic features are acquired, the acoustic features are input into the audio recognition model for detection, and as the model is a two-class model, whether the chewing sound signals exist in the residual audio signals can be directly determined according to the output result. For example, the output result is "1", it is determined that the masticatory sound signal is present in the residual audio signal, the output result is "0", and it is determined that the masticatory sound signal is not present in the residual audio signal.

For another example, in another embodiment, the masticatory acoustic signal may be doubly detected in both ways to improve the accuracy of the detection. Before extracting the acoustic features of the residual audio signal, the method further comprises: calculating the energy value of the residual audio signal in all frequency bands or partial frequency bands; in case the energy value is larger than a first preset threshold, extracting acoustic features of the residual audio signal is performed. The specific implementation manner of each step is referred to above, and will not be described herein.

In the embodiment, the spectrum energy value of the residual audio signal is firstly detected preliminarily, and when larger residual noise exists, whether masticatory sound exists or not is judged according to the trained audio recognition model based on the convolutional neural network so as to save the operation amount and further reduce the power consumption of the related chip.

Wherein in some embodiments, if a masticatory sound signal is present in the residual audio signal, then masticatory sound suppression processing is performed, including: if the chewing sound signal exists in the residual audio signal, the feedback noise reduction amount of the earphone is reduced by adjusting the parameters of the feedback noise reduction filter, wherein the feedback noise reduction filter is used for carrying out feedback noise reduction processing on the earphone through the in-ear audio signal.

For the noise reduction earphone provided with the audio acquisition component, feedback noise reduction can be performed on the signal acquired by the audio acquisition component, and the masticatory sound signal can be amplified in the feedback noise reduction process, which is specifically described above. In this case, when the presence of the masticatory sound signal in the in-ear audio signal is detected, the feedback noise reduction amount of the earphone can be reduced by adjusting the parameters of the feedback noise reduction filter, so that the masticatory sound is prevented from being picked up by the audio acquisition component and played out of the loudspeaker to further amplify the masticatory sound volume. As shown in fig. 7, fig. 7 is a schematic diagram illustrating adjustment of feedback noise reduction in the method for controlling an earphone according to the embodiment of the present application.

The magnitude of the feedback noise reduction amount of the feedback noise reduction filter is mainly influenced by the gain value and the filter coefficient, and the larger the gain value is, the larger the feedback noise reduction amount is; the larger the filter coefficient, the larger the feedback noise reduction amount. Based on this, the gain value and/or the filter coefficient of the feedback noise reduction filter may be reduced when the presence of a masticatory sound signal in the in-ear audio signal is detected. In an embodiment, even when the masticatory sound signal is detected to exist in the in-ear audio signal, the feedback noise reduction function of the earphone can be directly turned off, so that masticatory sound is prevented from being played out of the loudspeaker after being picked up by the audio acquisition component, and the masticatory sound volume is further amplified.

For another example, in another embodiment, adjusting the amount of noise reduction of the headset includes: determining an amplitude of the residual audio signal; determining a target adjustment amount corresponding to the amplitude, wherein the amplitude is positively correlated with the target adjustment amount; and adjusting the noise reduction amount of the earphone according to the target adjustment amount.

In this embodiment, in order to more accurately adjust the feedback noise reduction amount, and to achieve a balance of the chewing sound suppression and feedback noise reduction functions. The mapping relation between different residual audio signal amplitudes and different preset adjustment amounts is preset, and the amplitudes are positively correlated with the preset adjustment amounts. When the chewing sound signal exists in the residual audio signal, the amplitude of the residual audio signal is determined, then the preset adjusting quantity corresponding to the current amplitude is determined according to the mapping relation, and the noise reduction quantity of the earphone is adjusted according to the target adjusting quantity.

For example, in one embodiment, adjusting the amount of noise reduction of the headset according to the target adjustment amount includes: and reducing parameters of the feedback noise reduction filter according to the target adjustment amount to reduce the noise reduction amount of the earphone, wherein the parameters of the feedback noise reduction filter comprise at least one of a filter gain value and a filter coefficient.

Alternatively, in other embodiments, the suppressing the target audio signal includes: and increasing the volume of the first audio signal played by the earphone according to the preset gain value or the second preset proportion.

In addition to the above-described manner by reducing the amount of feedback adjustment, this embodiment also provides a way to mask the chewing sound by increasing the play volume of the first audio signal.

In some embodiments, the value of the second preset proportion may be set according to actual needs, for example, may be set to 30%.

Wherein, in some embodiments, if the masticatory sound signal exists in the residual audio signal, after calculating the time domain energy value of the first audio signal, further comprises: and when the time domain energy value is greater than or equal to a second preset threshold value, acquiring a preset volume reduction factor, and reducing the playing volume of the first audio signal according to the preset volume reduction factor.

In this embodiment, before the playing volume of the audio output assembly is adjusted, the time domain energy value of the first audio signal is calculated, and when the time domain energy value is smaller than the second preset threshold, the playing volume is increased, otherwise, when the time domain energy value is already larger and is a value larger than the second preset threshold, the playing volume can be covered by chewing sound even if the volume is not increased, so that the volume is not increased any more, and damage to hearing of a user caused by excessive volume is avoided.

For example, when the long-time-domain energy t (n) of the first audio signal is calculated and t (n) < Th0, the volume is increased to increase the signal-to-noise ratio. See specifically the following formula (3):

t(n)＝(1-a)*x(n-1)+a*x(n)

T(n)＝20*log10(t(n))

T1(n)＝G*T(n)

z(n)＝10^(T1(n)/20)

wherein G is a gain value, which is a preset value and G is more than 1; x (n) is the first audio signal of the nth time unit, and x (n-1) is the first audio signal of the nth-1 time unit; a is a smoothing factor, and the value range is (0, 1). Because the audio playing circuit adjusts the volume according to the linear adjustment mode, and adjusts the logarithmic value when the gain G is used for amplifying the signal, the conversion from the linear value T (n) to the logarithmic value T (n) is performed before the calculation of the T1 (n), and after the calculation of the T1 (n), the T1 (n) is converted from the logarithmic value to the linear value z (n), and the specific conversion mode is referred to the above formula. z (n) is the target volume to be finally adjusted. Referring to fig. 8, fig. 8 is a flow chart illustrating the increasing of the playing volume of the audio output assembly according to the embodiment of the present application.

Wherein in some embodiments, obtaining the preset gain value comprises: determining an amplitude of the residual audio signal; and determining the candidate gain value corresponding to the amplitude as a preset gain value, wherein the preset gain value is positively correlated with the amplitude.

In this embodiment, the gain value G may be a value dynamically adjusted according to the amplitude of the current residual audio signal. The mapping relation between different residual audio signal amplitudes and different candidate gain values is preset, and the amplitudes are positively correlated with the candidate gain values. When the presence of the masticatory sound signal in the residual audio signal is detected, determining the amplitude of the residual audio signal, determining a candidate gain value corresponding to the current amplitude according to the mapping relation as a preset gain value, and increasing the playing volume of the audio output assembly according to the formula (3) based on the preset gain value. By the scheme of the embodiment, the playing volume can be reasonably increased according to the amplitude of the residual audio signal, so that the hearing of a user is prevented from being damaged due to overlarge volume.

Wherein, in some embodiments, when the presence of the masticatory acoustic signal in the residual audio signal is detected, after calculating the time domain energy value of the first audio signal, further comprises: and when the time domain energy value is smaller than or equal to a second preset threshold value, acquiring a preset volume reduction factor, and reducing the playing volume of the first audio signal according to the preset volume reduction factor.

In this embodiment, if the time domain energy value of the first audio signal is calculated in the presence of the masticatory sound signal in the residual audio signal, the volume may also be reduced to protect the hearing of the user. For example, a portion exceeding the second preset threshold may be scaled down based on the preset volume reduction factor R as follows.

It will be appreciated that since the first audio signal is dynamically changing. Even if the volume of the device is unchanged, the sound in the playing process of one song also has high and low fluctuation, through the scheme of the embodiment, if the time domain energy value of the music of the current time unit is smaller in the case that the chewing sound is detected, the music sound heard by the user at the moment is also smaller, and the volume can be amplified according to the gain value G so as to cover the chewing sound; conversely, if the time-domain energy value of the music of the current time unit is larger, the music sound heard by the user at the moment is larger, and at the moment, in order to protect the hearing of the user, the volume can be reduced appropriately. From the whole time span of music broadcast, the balance of chewing sound inhibition and hearing protection is realized, and users do not need to manually adjust the volume of the earphone when hearing music and eating food, so that user operation and interference are reduced, meanwhile, the volume is intelligently adjusted according to the current volume, the hearing interference of chewing sound to the users is reduced, the damage to the hearing of the users caused by overlarge volume can be prevented, and user experience is improved.

In an embodiment, when the presence of the masticatory sound signal in the residual audio signal is detected, the masticatory sound can be restrained by using the two modes at the same time, the play volume of the first audio is increased while the feedback noise reduction amount is reduced, the hearing interference of the masticatory sound to the user is reduced, and the user experience is improved.

Next, the present embodiment will be described in a specific application scenario. Referring to fig. 9, fig. 9 is a schematic diagram of a scenario of masticatory sound suppression in an embodiment of the present application. The earphone in the scene is a noise reduction earphone provided with a feedforward microphone and an audio acquisition component, wherein the feedforward microphone acquires environmental sound as input of a feedforward filter for feedforward noise reduction, and the audio acquisition component acquires in-ear sound and inputs the audio acquisition component for feedback noise reduction. The earphone receives a first audio signal sent by the electronic device through a communication module such as Bluetooth. On the basis, the scheme is added with a preset filter, and the in-ear audio signals collected by the audio collection assembly are divided into two paths, one path is used for feedback noise reduction, and the other path is used for detecting chewing sound. The method comprises the steps of taking a first audio signal as input data of a preset filter, calculating a first offset signal corresponding to the first audio signal through a filter coefficient by the preset filter, using the first offset signal to offset a second audio signal in an in-ear audio signal to obtain a residual audio signal, detecting chewing sound in the residual audio signal, reducing the feedback noise reduction amount of a feedback filter according to a detection result when the chewing sound exists in the residual audio signal, preventing the chewing sound from being picked up by an audio acquisition component and then being played out of a loudspeaker to further amplify the chewing sound volume, and dynamically adjusting the playing volume of an audio output component according to the detection result to cover the chewing sound in the ear, so that a user can hear the music clearly. The specific implementation of volume adjustment is referred to the above embodiments, and will not be described herein.

In addition, the embodiment of the application further provides a method for generating parameters of the earphone, refer to fig. 10, and fig. 10 is a flowchart of the method for generating parameters of the earphone according to the embodiment of the application. The method comprises the following steps:

201. playing the test audio signal through the audio output assembly, and collecting the in-ear audio signal through the audio collecting assembly;

202. acquiring an initial time domain response parameter, and generating a second offset signal of the test audio signal according to the initial time domain response parameter and the test audio signal;

203. and adjusting the initial time domain response parameter according to the error of the second cancellation signal and the in-ear audio signal until a preset stop condition is met, so as to obtain the target time domain response parameter of the earphone.

In this embodiment, the in-ear end of the earphone is provided with an audio acquisition component and an audio output component. The user may actively trigger the calculation of the time domain response parameters in a quiet scene. The time domain response parameters thus calculated are parameters that are individually adapted to the ear canal of the user. The calculation principle of the parameter is as follows:

in a quiet environment, the input test audio signal x1 (n) is played through the audio output assembly, and the signal collected through the audio collection assembly is d (n). The test audio signal x1 (n) is used as the input of the adaptive filter, and the processed output signal of the adaptive filter is y1 (n).

For the adaptive filter, an initial time domain response parameter is set, and the initial value may be a random value, for example, an initial value of 0. And generating a second cancellation signal of the test audio signal according to the initial time domain response parameter and the test audio signal.

e(n)＝d(n)-y1(n)＝d(n)-w ^T (n) x1 (n) formula (2)

For example, the adaptive filter includes N filter coefficients, the initial value of which is 0, and the filter coefficient w is updated by iterative calculation ^T (n) such that after the signal x1 (n) is input to the filter, an error signal e (n) between the output signal y1 (n) and the signal d (n) acquired by the audio acquisition component satisfies a preset stop condition.

For example, as one embodiment, adjusting the initial time domain response parameter based on the error of the second cancellation signal and the in-ear audio signal includes: obtaining the mean square error of the second cancellation signal and the in-ear audio signal; and adjusting the initial time domain response parameter by taking the reduction of the mean square error as a constraint.

In this embodiment, the w when the preset stop condition is satisfied is obtained by performing iterative calculation with the reduced mean square error as a constraint ^T (n) calculating the w ^T (n) storing time domain response parameters as a transfer path between the audio output component and the audio acquisition component. The preset stopping condition may be that the adjustment times of the initial time domain response parameter reach preset adjustment times; or the mean square error is less than or equal to the error threshold.

In an embodiment, after obtaining the target time domain response parameter of the earphone, the method further comprises: playing a first audio signal through an audio output assembly, and collecting an in-ear audio signal through an audio collecting assembly; generating a third cancellation signal corresponding to the first audio signal according to the target time domain response parameter and the first audio signal; and eliminating a second audio signal in the in-ear audio signal based on the third cancellation signal to obtain a residual audio signal, wherein the second audio signal is an audio signal obtained by transmitting the first audio signal to the audio acquisition assembly through the audio output assembly.

In this embodiment, after the target time domain response parameter is obtained, the first audio signal played by the earphone is processed according to the target time domain response parameter, so as to obtain the residual audio signal in the ear.

After the target time domain response parameter is acquired, a third cancellation signal is calculated. For example, a preset filter is set, the target time domain response parameter is updated to a filter coefficient of the preset filter, then the first audio signal is used as input data of the preset filter, and a third cancellation signal corresponding to the first audio signal is obtained through calculation according to the input data and the filter coefficient, and the calculation mode is the same as that of the formula (1).

According to the scheme of the embodiment, a self-adaptive filtering mode is adopted, a third counteracting signal capable of counteracting the first audio signal is obtained through calculation according to the target time domain response parameter, the third counteracting signal is identical in amplitude and opposite in phase to the first audio signal, the third counteracting signal can eliminate a second audio signal in the in-ear audio signal collected by the audio collection assembly, for example, the first audio signal is a music signal, and the second audio signal is an audio signal collected by the audio collection assembly after the music signal is played by the audio output assembly. After the cancellation process, the remaining portion is the residual audio signal.

By the parameter generation method, a test audio signal is input to the earphone in a quiet environment, a time domain response parameter which is personally adapted to the auditory canal of the user is generated, and accuracy of the time domain response parameter is improved.

In an embodiment, a control device of the earphone is also provided. Referring to fig. 11, fig. 11 is a schematic structural diagram of a control device 300 of an earphone according to an embodiment of the present disclosure. Wherein the control device 300 of the earphone is applied to the earphone, the earphone comprises an audio acquisition component and an audio output component, and the control device 300 of the earphone comprises:

a signal acquisition module 301, configured to acquire a residual audio signal;

a target detection module 302, configured to determine whether a target audio signal exists in the residual audio signal;

the target suppression module 303 is configured to perform suppression processing on the target audio signal if the target audio signal exists in the residual audio signal.

It should be noted that, the control device of the earphone provided in the embodiment of the present application belongs to the same concept as the control method of the earphone in the above embodiment, and any method provided in the control method embodiment of the earphone may be implemented by using the control device of the earphone, and detailed implementation processes of the method are shown in the control method embodiment of the earphone, which is not repeated herein.

As can be seen from the above, the control device for the earphone provided in the embodiment of the present application acquires the residual audio signal. And detecting whether a target audio signal exists in the residual audio signal, and if so, performing inhibition processing on the target audio signal. Through the scheme of the embodiment of the application, the active detection and the inhibition processing of the specific interference signals can be realized, manual adjustment of a user is not needed, and the control efficiency is improved.

The embodiment of the application also provides an earphone. The earphone can be a smart phone, a tablet computer and other devices. Referring to fig. 12, fig. 12 is a schematic structural diagram of an earphone according to an embodiment of the present application. The headset 400 includes a front speaker 401 and an audio acquisition component 402, as well as a processor 403 and a memory 404. The processor 403 is electrically connected to the memory 404.

Processor 403 is the control center of headset 400, and uses various interfaces and lines to connect the various parts of the overall headset, and performs various functions and processes of the headset by running or invoking computer programs stored in memory 404, and invoking data stored in memory 404, thereby performing overall monitoring of the headset.

Memory 404 may be used to store computer programs and data. The memory 404 stores computer programs that include instructions executable in the processor. The computer program may constitute various functional modules. The processor 403 executes various functional applications and data processing by calling computer programs stored in the memory 404.

In this embodiment, the processor 403 in the headset 400 loads instructions corresponding to the processes of one or more computer programs into the memory 404 according to the following steps, and the processor 403 executes the computer programs stored in the memory 404, so as to implement various functions:

Acquiring a residual audio signal;

judging whether a target audio signal exists in the residual audio signal;

Alternatively, in another embodiment, the processor 403 in the headset 400 loads the instructions corresponding to the processes of one or more computer programs into the memory 404 according to the following steps, and the processor 403 executes the computer programs stored in the memory 404, thereby implementing various functions:

As can be seen from the above, the embodiments of the present application provide an earphone for acquiring a residual audio signal. And detecting whether a target audio signal exists in the residual audio signal, and if so, performing inhibition processing on the target audio signal. Through the scheme of the embodiment of the application, the active detection and the inhibition processing of the specific interference signals can be realized, manual adjustment of a user is not needed, and the control efficiency is improved.

The present application also provides a computer readable storage medium having a computer program stored therein, which when run on a computer performs the method of any of the above embodiments.

It should be noted that, those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium, and the computer readable storage medium may include, but is not limited to: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

Furthermore, the terms "first" and "second," and the like, herein, are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to the particular steps or modules listed and certain embodiments may include additional steps or modules not listed or inherent to such process, method, article, or apparatus.

The above describes the control method, the parameter generating method, the device, the storage medium and the earphone of the earphone provided by the embodiment of the application in detail. The principles and embodiments of the present application are described herein with specific examples, the above examples being provided only to assist in understanding the methods of the present application and their core ideas; meanwhile, those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, and the present description should not be construed as limiting the present application in view of the above.

Claims

1. A method for controlling an earphone, the method comprising:

acquiring a residual audio signal;

judging whether a target audio signal exists in the residual audio signal;

2. The method of claim 1, wherein the determining whether a target audio signal is present in the residual audio signal comprises:

extracting acoustic features of the residual audio signal;

and classifying and detecting the acoustic features according to an audio recognition model so as to detect whether a target audio signal exists in the residual audio signal.

3. The method of claim 2, wherein prior to the extracting the acoustic features of the residual audio signal, the method further comprises:

calculating the energy value of the residual audio signal in all frequency bands or partial frequency bands;

and executing the step of extracting the acoustic features of the residual audio signal in case the energy value is greater than a first preset threshold.

4. The method of any one of claims 1 to 3, wherein the target audio signal comprises at least one of a masticatory acoustic signal, a frictional acoustic signal, a vehicle noise signal.

5. The method of claim 4, wherein the suppressing the target audio signal comprises:

reducing the noise reduction amount of the earphone according to the target amplitude or the first preset proportion; or (b)

And turning off the noise reduction mode of the earphone.

6. The method of claim 4, wherein the suppressing the target audio signal comprises:

and increasing the volume of the first audio signal played by the earphone according to a preset gain value or a second preset proportion.

7. The method according to any one of claims 1 to 6, wherein the in-ear end of the earphone is provided with an audio acquisition component and an audio output component, the acquiring the residual audio signal comprising:

Playing the first audio signal through the audio output assembly, and collecting an in-ear audio signal through the audio collecting assembly;

and eliminating a second audio signal in the in-ear audio signal to obtain a residual audio signal, wherein the second audio signal is an audio signal obtained by transmitting the first audio signal to the audio acquisition assembly through the audio output assembly.

8. The method of claim 7, wherein said removing the second audio signal from the in-ear audio signal results in a residual audio signal, comprising:

acquiring a current time domain response parameter;

generating a first cancellation signal corresponding to the first audio signal according to the time domain response parameter and the first audio signal;

and eliminating a second audio signal in the in-ear audio signal based on the first cancellation signal to obtain a residual audio signal.

9. The method of claim 8, wherein the generating a first cancellation signal corresponding to the first audio signal based on the time domain response parameter and the first audio signal comprises:

taking the time domain response parameter as a filter coefficient of a preset filter;

And taking the first audio signal as input data of the preset filter, and calculating to obtain a first offset signal corresponding to the first audio signal according to the input data and the filter coefficient.

10. The method for generating parameters of the earphone is characterized in that an in-ear end of the earphone is provided with an audio acquisition component and an audio output component, and the method comprises the following steps:

playing a test audio signal through the audio output assembly, and collecting an in-ear audio signal through the audio collecting assembly;

acquiring an initial time domain response parameter, and generating a second cancellation signal of the test audio signal according to the initial time domain response parameter and the test audio signal;

and adjusting the initial time domain response parameter according to the error between the second cancellation signal and the in-ear audio signal until a preset stop condition is met, so as to obtain the target time domain response parameter of the earphone.

11. The method of claim 10, wherein said adjusting the initial time domain response parameter based on an error of the second cancellation signal and the in-ear audio signal comprises:

acquiring a mean square error between the second cancellation signal and the in-ear audio signal;

And adjusting the initial time domain response parameter by taking the reduction of the mean square error as a constraint.

12. The method of claim 11, wherein the preset stop condition comprises:

the adjustment times of the initial time domain response parameters reach preset adjustment times; or alternatively

The mean square error is less than or equal to an error threshold.

13. The method of claim 12, wherein after obtaining the target time domain response parameter for the headset, further comprising:

playing a first audio signal through the audio output assembly, and collecting an in-ear audio signal through the audio collecting assembly;

generating a third cancellation signal corresponding to the first audio signal according to the target time domain response parameter and the first audio signal;

and eliminating a second audio signal in the in-ear audio signal based on the third cancellation signal to obtain a residual audio signal, wherein the second audio signal is an audio signal of the first audio signal transmitted to the audio acquisition assembly through the audio output assembly.

14. The method of claim 13, wherein the generating a third cancellation signal corresponding to the first audio signal based on the target time domain response parameter and the first audio signal comprises:

Taking the target time domain response parameter as a filter coefficient of a preset filter;

and taking the first audio signal as input data of the preset filter, and calculating to obtain a third offset signal corresponding to the first audio signal according to the input data and the filter coefficient.

15. A control device for headphones, comprising:

the signal acquisition module is used for acquiring a residual audio signal;

16. A parameter generating device of an earphone, an in-ear end of the earphone is provided with an audio acquisition component and an audio output component, the device is characterized in that:

17. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when run on a computer, causes the computer to perform the method of controlling a headset according to any one of claims 1 to 9, or the method of generating parameters of a headset according to any one of claims 10 to 14.

18. A headset comprising an audio acquisition component, an audio output component, a memory, and a processor, the memory storing a computer program, the processor being adapted to perform the method of controlling the headset according to any one of claims 1 to 9, or the method of generating parameters of the headset according to any one of claims 10 to 14, by invoking the computer program.