CN115550819A - Audio signal processing method, device, equipment and storage medium - Google Patents

Audio signal processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN115550819A
CN115550819A CN202211291158.XA CN202211291158A CN115550819A CN 115550819 A CN115550819 A CN 115550819A CN 202211291158 A CN202211291158 A CN 202211291158A CN 115550819 A CN115550819 A CN 115550819A
Authority
CN
China
Prior art keywords
signal
audio signal
voice
target audio
cancellation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211291158.XA
Other languages
Chinese (zh)
Inventor
韦莎丽
黄杰华
曹宇韬
宋明辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhongke Lanxun Technology Co ltd
Original Assignee
Shenzhen Zhongke Lanxun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhongke Lanxun Technology Co ltd filed Critical Shenzhen Zhongke Lanxun Technology Co ltd
Priority to CN202211291158.XA priority Critical patent/CN115550819A/en
Publication of CN115550819A publication Critical patent/CN115550819A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R9/00Transducers of moving-coil, moving-strip, or moving-wire type
    • H04R9/06Loudspeakers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R9/00Transducers of moving-coil, moving-strip, or moving-wire type
    • H04R9/02Details
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2400/00Loudspeakers
    • H04R2400/11Aspects regarding the frame of loudspeaker transducers

Abstract

The application provides an audio signal processing method, an audio signal processing device, an audio signal processing apparatus and a storage medium, wherein the method comprises the following steps: acquiring a target audio signal, wherein the target audio signal comprises a left channel audio signal and a right channel audio signal, and the target audio signal is an audio signal containing human voice and background sound; carrying out mutual cancellation on the target audio signal by the left and right channel voice signals to obtain a voice cancellation signal corresponding to the target audio signal; inputting the human voice counteracting signal into a background voice eliminating system for background voice elimination to obtain a human voice residual signal corresponding to the target audio signal, wherein the background voice eliminating system is used for eliminating background voice; and carrying out signal cancellation on the voice cancellation signal and the voice residual signal to obtain a voice cancellation signal corresponding to the target audio signal. The technical scheme can eliminate the voice to the maximum extent, so that the voice eliminating signal obtained by eliminating is cleaner and purer.

Description

Audio signal processing method, device, equipment and storage medium
Technical Field
The present application relates to the field of signal processing, and in particular, to an audio signal processing method, apparatus, device, and storage medium.
Background
Singing software is music content community application which is popular with the development of the internet in recent years, provides functions of playing background music and recording, and enables a user to sing on line through a mobile phone. The accompaniment is important for the singing software, and in some cases, the accompaniment is obtained by processing the song containing the voice and eliminating the voice in the song.
In the existing voice eliminating technology, the voice is generally eliminated by directly taking the inverse of and adding signals in two sound channels by using the characteristic that the voice is basically the same in the left and right sound channels, and because the voice in the two sound channels is not completely the same, a part of voice remains after elimination, and the problem that the voice elimination is not clean exists.
Disclosure of Invention
The application provides an audio signal processing method, an audio signal processing device, audio signal processing equipment and a storage medium, which are used for solving the technical problem of unclean voice elimination in the existing voice elimination scheme.
In a first aspect, an audio signal processing method is provided, including:
acquiring a target audio signal, wherein the target audio signal comprises a left channel audio signal and a right channel audio signal, and the target audio signal is an audio signal containing human voice and background sound;
carrying out mutual cancellation on the target audio signal by the left and right channel voice signals to obtain a voice cancellation signal corresponding to the target audio signal;
inputting the human voice counteracting signal into a background voice eliminating system for background voice elimination to obtain a human voice residual signal corresponding to the target audio signal, wherein the background voice eliminating system is used for eliminating background voice;
and carrying out signal cancellation on the voice cancellation signal and the voice residual signal to obtain a voice cancellation signal corresponding to the target audio signal.
In the technical scheme, after a left channel audio signal and a right channel audio signal containing human voice and background voice are obtained, a human voice counteracting signal is obtained in a mode that the left channel human voice signal and the right channel human voice signal counteract each other, and the initial elimination of the human voice signal is realized; then, inputting the voice counteracting signal into a background sound eliminating system for background sound elimination to obtain a voice residual signal in the voice counteracting signal; and finally, carrying out signal cancellation on the voice cancellation signal and the voice residual signal to obtain a voice cancellation signal, thereby realizing the re-cancellation of the voice residual signal. By means of the voice elimination mode twice, the voice can be eliminated to the maximum extent, and therefore voice elimination signals obtained through elimination are cleaner and purer.
With reference to the first aspect, in a possible implementation manner, before the inputting the human voice cancelling signal into a background sound cancellation system for background sound cancellation to obtain a human voice residual signal corresponding to the target audio signal, the method further includes: carrying out signal cancellation on the voice cancellation signal and the target audio signal to obtain a voice signal corresponding to the target audio signal; and determining a background sound eliminating system according to the target audio signal and the human voice signal. The background sound eliminating system is determined through the audio signal and the human sound signal in the audio signal, so that the background sound eliminating system can be completely matched with the target audio signal, the background sound in the human sound counteracting signal can be eliminated better by the background sound eliminating system, and a more accurate human sound residual signal is obtained.
With reference to the first aspect, in a possible implementation manner, the determining the background sound removal system according to the target audio signal and the human voice signal includes: taking the target audio signal as an input signal and the human voice signal as an output signal, and performing adaptive filtering fitting to obtain a target function, wherein the target function is used for representing the incidence relation between the input signal and the output signal; and taking the target function as a transfer function of the background sound elimination system. The transfer function between the audio signal and the human voice in the audio signal is determined through the adaptive filtering fitting algorithm, so that the optimization of a background sound eliminating system can be realized, and the background sound eliminating system can be ensured to better eliminate the background sound.
With reference to the first aspect, in a possible implementation manner, after performing signal cancellation on the human voice cancellation signal and the human voice residual signal to obtain a human voice cancellation signal corresponding to the target audio signal, the method further includes: performing frequency compensation on the target audio signal to obtain a frequency compensation signal corresponding to the target audio signal; and mixing the human voice eliminating signal and the frequency compensation signal to obtain a background voice signal corresponding to the target audio signal. After the voice eliminating signal is obtained, the frequency compensation is carried out on the audio signal, and the frequency compensation signal obtained through the frequency compensation and the voice eliminating signal are subjected to signal mixing to obtain a background voice signal in the audio signal, so that the frequency compensation of the voice eliminating signal can be realized, the defect of the background voice signal is less, and the integrity of the background voice signal is improved.
With reference to the first aspect, in a possible implementation manner, the frequency compensating the target audio signal to obtain a frequency compensated signal corresponding to the target audio signal includes: inputting the target audio signal into a first filter to obtain a first frequency compensation signal corresponding to the target audio signal, wherein the cut-off frequency of the first filter is smaller than a first preset cut-off frequency; and/or inputting the target audio signal into a second filter to obtain a second frequency compensation signal corresponding to the target audio signal, wherein the cut-off frequency of the second filter is greater than a second preset cut-off frequency; the second preset cutoff frequency is greater than the first preset cutoff frequency. The first frequency compensation signal and the second frequency compensation signal are obtained through low-frequency compensation and high-frequency compensation respectively, and the low-frequency part and the high-frequency part which are missed in the human voice eliminating signal can be compensated, so that the frequency spectrum of the background sound signal is complete enough.
With reference to the first aspect, in a possible implementation manner, before the signal mixing the human voice elimination signal and the frequency compensation signal to obtain a background sound signal corresponding to the target audio signal, the method further includes: performing gain adjustment on the voice eliminating signal and the frequency compensation signal to obtain a voice eliminating gain signal and a frequency compensation gain signal corresponding to the target audio signal; the signal mixing the human voice eliminating signal and the frequency compensation signal to obtain a background sound signal corresponding to the target audio signal includes: and mixing the human voice eliminating gain signal and the frequency compensation gain signal to obtain a background sound signal corresponding to the target audio signal. By mixing the human sound eliminating signal and the frequency compensation signal after gain adjustment, the background sound signal obtained by mixing can be more natural and complete.
With reference to the first aspect, in a possible implementation manner, the performing gain adjustment on the human voice elimination signal and the frequency compensation signal to obtain a human voice elimination gain signal and a frequency compensation gain signal corresponding to the target audio signal includes: and according to the signal correlation between the left channel audio signal and the right channel audio signal, performing gain adjustment on the voice eliminating signal and the frequency compensation signal to obtain a voice eliminating gain signal and a frequency compensation gain signal corresponding to the target audio signal. And according to the signal correlation between the left and right channel audio signals, the gain adjustment is carried out on the human voice eliminating signal and the frequency compensation signal, so that the human voice eliminating gain signal and the frequency compensation gain signal which are obtained through adjustment can accord with the signal characteristics.
In a second aspect, an audio signal processing apparatus is provided, including:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a target audio signal, the target audio signal comprises a left channel audio signal and a right channel audio signal, and the target audio signal is an audio signal containing human voice and background voice;
the first eliminating module is used for carrying out mutual cancellation on the target audio signal by the left and right channel voice signals so as to obtain a voice cancellation signal corresponding to the target audio signal;
the second eliminating module is used for inputting the human voice counteracting signal into a background sound eliminating system for background sound elimination so as to obtain a human voice residual signal corresponding to the target audio signal, and the background sound eliminating system is used for eliminating background sound;
and the third eliminating module is used for carrying out signal cancellation on the voice cancelling signal and the voice residual signal so as to obtain a voice eliminating signal corresponding to the target audio signal.
In a third aspect, there is provided an audio device comprising a memory connected to the one or more processors and one or more processors for executing one or more computer programs stored in the memory, the one or more processors, when executing the one or more computer programs, causing the audio device to implement the audio signal processing method of the first aspect.
In a fourth aspect, a computer-readable storage medium is provided, in which a computer program is stored, the computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the audio signal processing method of the first aspect.
The application can realize the following technical effects: by means of the voice elimination mode twice, the voice can be eliminated to the greatest extent, and voice elimination signals obtained through elimination are cleaner and purer.
Drawings
Fig. 1 is a schematic flowchart of an audio signal processing method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of another audio signal processing method according to an embodiment of the present application;
fig. 3 is a schematic flowchart of another audio signal processing method according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an audio signal processing apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
The technical scheme of the application is applicable to an audio processing scene, and particularly applicable to a scene in which audio containing human voice needs to be processed into audio without human voice, for example, applicable to processing audio containing original singing into accompaniment audio in a K song scene, or applicable to playing after processing audio containing human voice into pure accompaniment audio in an audio playing scene, and the like, and is not limited to the examples herein.
The technical scheme of the application can be applied to audio equipment with an audio processing function, wherein the audio equipment comprises but is not limited to a mobile phone, a notebook computer, a microphone and the like.
The technical principle of the application is as follows: after a two-channel audio signal containing human voice and background sound is obtained, firstly, the audio signal is subjected to first human voice elimination in a common left and right channel human voice signal offsetting mode to obtain a human voice offsetting signal, so that the elimination of the same human voice signal in the left and right channels is realized; then inputting the human voice counteracting signal into a background voice eliminating system for eliminating background voice, and eliminating the background voice in the human voice counteracting signal through the background voice eliminating system, thereby separating out the human voice residual signal in the human voice counteracting signal; and finally, carrying out signal cancellation on the voice cancellation signal and the voice residual signal to eliminate different voice signals in the left and right sound channels. Because the background sound eliminating system separates out the residual sound signals in the sound counteracting signals, namely different sound signals in the left and right sound channels, the same sound signals in the left and right sound channels are eliminated, and then the different sound signals in the left and right sound channels are eliminated, the complete elimination of the sound signals can be almost realized, and the sound eliminating signals obtained by elimination are cleaner and purer.
The technical solution of the present application is specifically described below.
Referring to fig. 1, fig. 1 is a schematic flowchart of an audio signal processing method provided by an embodiment of the present application, which may be applied to the aforementioned audio device, as shown in fig. 1, and the method includes the following steps:
s101, acquiring a target audio signal.
The target audio signal refers to an audio signal containing human voice and background sound, and may be understood as an audio signal in which human voice and background sound are mixed together. The target audio signal includes a left channel audio signal and a right channel audio signal. The target audio signal may be an audio signal containing the original voice of the singer in an audio playback scene.
And S102, carrying out mutual cancellation on the left and right channel voice signals of the target audio signal to obtain a voice cancellation signal corresponding to the target audio signal.
Here, mutually canceling the left and right channel vocal signals of the target audio signal to obtain the vocal canceling signal corresponding to the target audio signal means that the left channel vocal canceling signal and the right channel vocal canceling signal are obtained by subtracting the left channel audio signal and the right channel audio signal.
Specifically, the left channel audio signal may be used as the main signal, the right channel audio signal is subtracted, and the same vocal sounds in the left and right channels are cancelled to obtain a left channel vocal cancellation signal, i.e. X L1 =S L -S R (ii) a The right channel audio signal can be used as the main signal, the left channel audio signal is subtracted, the same voice in the left channel and the right channel is eliminated, and a right channel voice counteracting signal, namely X, is obtained R1 =S R -S L (ii) a Wherein, X L1 For the left track vocal cancellation signal, X R1 For right track vocal cancellation signals, S L For left channel audio signals, S R Is a right channel audio signal.
In a specific implementation, the left channel audio signal and the right channel audio signal may be input to a subtractor to realize signal subtraction between the left channel audio signal and the right channel audio signal. The same human voice part in the left channel audio signal and the right channel audio signal can be eliminated by subtracting the left channel audio signal and the right channel audio signal, so that the initial elimination of the human voice signal is realized.
And S103, inputting the human voice counteracting signal corresponding to the target audio signal into a background sound eliminating system for background sound elimination, so as to obtain a human voice residual signal corresponding to the target audio signal.
The background sound eliminating system is a transmission system for background sound, and the background sound system can eliminate a background sound signal in an audio signal so as to obtain a human sound signal in the audio signal.
The background sound elimination system may be any transmission system capable of eliminating background sound and retaining human voice. Specifically, the background sound elimination system may eliminate a background sound signal in the left channel vocal cancellation signal to obtain a left channel vocal residual signal, and eliminate a background sound signal in the right channel vocal cancellation signal to obtain a right channel vocal residual signal. The left and right channels of voice counteracting signals are respectively input into the background sound eliminating system, so that the removal of the background sound in the left and right channels can be completed, and different parts of the voice signals in the left and right channels can be extracted.
And S104, performing signal cancellation on the voice cancellation signal corresponding to the target audio signal and the voice residual signal corresponding to the target audio signal to obtain a voice cancellation signal corresponding to the target audio signal.
The means that the voice canceling signal corresponding to the target audio signal and the voice residual signal corresponding to the target audio signal are subtracted from each other to obtain the voice canceling signal corresponding to the target audio signal.
Specifically, the left channel vocal cancellation signal can be obtained by subtracting the left channel vocal residual signal from the left channel vocal cancellation signal, i.e. X L2 =X L1 -V L2 ,X L2 Removing signals for the left track voice, V L2 Is the left track voice residual signal; the right channel vocal cancellation signal can be subtracted from the right channel vocal cancellation signal to obtain a right channel vocal cancellation signal, i.e. X R2 =X R1 -V R2 ,X R2 For canceling signals, V, for right-track speech R2 Is the right channel vocal residual signal.
In specific implementation, the left channel voice cancelling signal and the left channel voice residual signal can be input into a subtracter to obtain a left channel voice cancelling signal; and inputting the right channel voice counteracting signal and the right channel voice residual signal into a subtracter to obtain a right channel voice eliminating signal. By subtracting the left channel voice offset signal from the left channel voice residual signal and subtracting the right channel voice offset signal from the right channel voice residual signal, different voice parts in the left channel and the right channel can be eliminated, and secondary elimination of the voice signals is realized.
In the corresponding technical scheme of fig. 1, after a left channel audio signal and a right channel audio signal containing human voice and background sound are obtained, a human voice cancellation signal is obtained by means of mutual cancellation of the left channel and the right channel human voice signals, so that the human voice signals are initially eliminated; then inputting the voice counteracting signal into a background sound eliminating system for background sound elimination to obtain a voice residual signal in the voice counteracting signal; and finally, carrying out signal cancellation on the voice cancellation signal and the voice residual signal to obtain a voice cancellation signal, thereby realizing the re-cancellation of the voice residual signal. By means of the voice elimination mode twice, the voice can be eliminated to the maximum extent, and therefore voice elimination signals obtained through elimination are cleaner and purer.
Referring to fig. 2, fig. 2 is a schematic flowchart of another audio signal processing method provided by the embodiment of the present application, which can be applied to the aforementioned audio device, as shown in fig. 2, and the method includes the following steps:
s201, acquiring a target audio signal.
S202, carrying out mutual cancellation on the left and right channel voice signals of the target audio signal to obtain a voice cancellation signal corresponding to the target audio signal.
Here, for a specific implementation manner of step S201 to step S202, reference may be made to the description of step S101 to step S102, which is not described herein again.
And S203, performing signal cancellation on the voice cancellation signal corresponding to the target audio signal and the target audio signal to obtain a voice signal corresponding to the target audio signal.
Here, performing signal cancellation on the voice cancellation signal corresponding to the target audio signal and the target audio signal to obtain the voice signal corresponding to the target audio signal means that the voice signal is obtained by subtracting the voice cancellation signal from the target audio signal.
Specifically, the left channel voice signal, i.e. V, can be obtained by subtracting the left channel audio signal from the left channel voice cancellation signal L1 =X L1 -S L ,V L1 Is a left track vocal signal; the right channel voice signal, i.e. V, can be obtained by subtracting the right channel audio signal from the right channel voice counteracting signal R1 =X R1 -S R ,V R1 Is the left channel vocal signal.
In specific implementation, the left channel vocal cancellation signal and the left channel audio signal can be input to a subtractor to obtain a left channel vocal signal; the right channel vocal cancellation signal and the right channel audio signal can be input to the subtractor to obtain the right channel vocal signal. By subtracting the left channel vocal cancellation signal from the left channel audio signal and subtracting the right channel vocal cancellation signal from the right channel audio signal, vocal signals in the left and right channels can be extracted.
And S204, determining a background sound elimination system according to the target audio signal and the human voice signal corresponding to the target audio signal.
Here, determining the background sound cancellation system according to the target audio signal and the human sound signal corresponding to the target audio signal means obtaining a transmission system representing a transfer relationship between the target audio signal and the human sound signal corresponding to the target audio signal by fitting based on the target audio signal and the human sound signal corresponding to the target audio signal. The background sound filtering system is essentially a filter.
The target audio signal can be used as an input signal, a human voice signal corresponding to the target audio signal can be used as an output signal, and adaptive filtering fitting is carried out to obtain a target function representing the incidence relation between the input signal and the output signal; the objective function is taken as the transfer function of the background cancellation system.
The objective function includes a left channel objective function and a right channel objective function. The left channel audio signal can be used as an input signal, and the left channel vocal signal can be used as an output signal, and adaptive filtering fitting is carried out to obtain a left channel target function; the right channel audio signal may be used as an input signal, and the right channel vocal signal may be used as an output signal, to perform adaptive filtering fitting, so as to obtain a right channel objective function.
In a specific embodiment, the target audio signal and the human voice signal corresponding to the target audio signal may be subjected to adaptive filtering fitting based on a normalized least mean square (NMLS) algorithm to obtain the objective function.
The vector form of NMLS algorithm weight updating is as follows:
w (n + 1) = w (n) +2 μ (n) x (n) e (n) formula 1
Figure BDA0003901438860000081
e (n) = y (n) -w (n) × x (n) formula 3
W (n + 1) is a weight vector at the nth iteration, w (n + 1) is a weight vector updated on the basis of w (n), and each weight coefficient in w (0) is 0; x (n) is an input vector in the nth iteration, and x (n) is obtained by sampling a left channel audio signal or a right channel audio signal; y (n) is an expected output vector in the nth iteration, and is obtained by sampling the left channel vocal signals or the right channel vocal signals; e (n) is the error between the filter output w (n) × x (n) and the desired output y (n) at the nth iteration, μ is the step factor P x (n) signal power estimated at time n, P x (n)=x 2 (n), α is the step constant of the correction, 0 < α < 2, δ is a very small constant, δ > 0. The value of δ may be set to 0.000001.
In a specific implementation, the left channel audio signal may be sampled as the input signal x 1 (n) sampling the left channel vocal signal as an output signal y 1 (n); then according to the above formula 1-formula 3, through multiple iterations, solving to make e 2 (n) left channel weight vector at minimumW 1 (z) reacting W 1 (z) as a left channel objective function. The right channel audio signal may be sampled as input signal x 2 (n) sampling the left channel vocal signal as an output signal y 2 (n); then according to the above formula 1-formula 3, through multiple iterations, solving to make e 2 (n) a minimum right channel weight vector W 2 (z) reacting W 2 (z) as a left channel objective function.
The transfer function between the audio signal and the human voice in the audio signal is determined through the adaptive filtering fitting algorithm, so that the optimization of a background sound eliminating system can be realized, and the background sound eliminating system can be ensured to better eliminate the background sound.
And S205, inputting the human voice counteracting signal corresponding to the target audio signal into a background sound eliminating system for background sound elimination, so as to obtain a human voice residual signal corresponding to the target audio signal.
Here, inputting the voice canceling signal corresponding to the target audio signal into the background sound canceling system for background sound canceling to obtain the voice residual signal corresponding to the target audio signal means filtering the voice canceling signal through a filter in the background sound canceling system to filter the background sound signal in the voice canceling signal to obtain the voice residual signal. Wherein the filter in the background sound cancellation system is characterized by the transfer function of the background sound system.
Specifically, the left channel vocal cancellation signal can be convolved with the transfer function of the background sound cancellation system to obtain a left channel residual signal, i.e. V L2 =X L1 *W 1 (z); the right channel vocal cancellation signal can be convolved with the transfer function of the background sound cancellation system to obtain the right channel residual signal, i.e. V R2 =X R1 *W 2 (z)。
And S206, carrying out signal cancellation on the voice cancellation signal corresponding to the target audio signal and the voice residual signal corresponding to the target audio signal to obtain a voice cancellation signal corresponding to the target audio signal.
Here, for a specific implementation of step S206, reference may be made to the description of step S104, and details are not repeated here.
In the technical scheme corresponding to fig. 2, after the voice cancelling signal for primarily cancelling the voice is obtained, the voice cancelling signal corresponding to the target audio signal is first subjected to signal cancellation with the target audio signal to obtain a voice signal corresponding to the target audio signal, so that the voice in the target audio signal is extracted, then a background sound cancelling system is determined according to the target audio signal and the voice signal corresponding to the target audio signal, and the voice cancelling signal corresponding to the target audio signal is input to the medium background sound cancelling system for background sound cancellation to obtain a voice residual signal; and finally, carrying out signal cancellation on the voice cancellation signal and the voice residual signal to obtain a voice cancellation signal, thereby realizing the re-cancellation of the voice residual signal. The background sound eliminating system is determined according to the target audio signal and the human sound signal corresponding to the target audio signal, so that the background sound eliminating system can be matched with the target audio signal, background sound in the audio signal can be removed better, a more complete human sound residual signal can be obtained, and complete elimination of the human sound signal is facilitated.
Referring to fig. 3, fig. 3 is a schematic flowchart of another audio signal processing method provided by the embodiment of the present application, which can be applied to the aforementioned audio device, as shown in fig. 1, and the method includes the following steps:
s301, acquiring a target audio signal.
And S302, performing mutual cancellation on the left and right channel voice signals of the target audio signal to obtain a voice cancellation signal corresponding to the target audio signal.
And S303, inputting the human voice counteracting signal corresponding to the target audio signal into a background sound eliminating system for background sound elimination, so as to obtain a human voice residual signal corresponding to the target audio signal.
And S304, carrying out signal cancellation on the voice cancellation signal corresponding to the target audio signal and the voice residual signal corresponding to the target audio signal to obtain a voice cancellation signal corresponding to the target audio signal.
Here, for a specific implementation manner of step S301 to step S304, reference may be made to the description of step S101 to step S104, which is not described herein again.
S305, performing frequency compensation on the target audio signal to obtain a frequency compensation signal corresponding to the target audio signal.
Here, performing frequency compensation on the target audio signal to obtain a frequency compensation signal corresponding to the target audio signal means inputting the target audio signal to a filter for filtering to obtain the target audio signal in a preset frequency band as the frequency compensation signal. The frequency compensation signal includes a left channel compensation signal and a right channel compensation signal.
In one possible implementation, the target audio signal may be input to a first filter to obtain a first frequency compensation signal corresponding to the target audio signal, wherein a cut-off frequency of the first filter is smaller than a first preset cut-off frequency. The first preset cutoff frequency may be a minimum frequency in a frequency range of the human voice.
In particular, the left channel audio signal may be convolved with a filter function corresponding to the first filter to obtain a left channel first frequency compensated signal, i.e., X Llp =S L *H lp (z); the right channel audio signal may be convolved with a corresponding filter function of the first filter to obtain a right channel first frequency compensated signal, X Rlp =S R *H lp (z); wherein X Llp For the first frequency compensation signal of the left channel, X Rlp For the first frequency compensation signal of the right channel, H lp And (z) is a filter function corresponding to the first filter. By inputting the target audio signal into the first filter, it is possible to remove a signal having a frequency higher than the first preset cutoff frequency from the target audio signal, thereby retaining a low-frequency signal having a frequency lower than the first preset cutoff frequency from the target audio signal.
In another possible implementation manner, the target audio signal may be input to a second filter to obtain a second frequency compensation signal corresponding to the target audio signal, where a cut-off frequency of the second filter is greater than a second preset cut-off frequency, and the second preset cut-off frequency is greater than the first preset cut-off frequency. The second preset cut-off frequency may be a maximum frequency in a frequency range of the human voice.
In particular, the left channel audio signal may be convolved with a corresponding filter function of the second filter to obtain a left channel second frequency compensated signal, i.e. X Lhp =S L *H hp (z); the right channel audio signal may be convolved with a corresponding filter function of the second filter to obtain a right channel second frequency compensated signal, X Rhp =S R *H hp (z); wherein, X Lhp For the second frequency-compensated signal of the left channel, X Rhp For the second frequency-compensated signal of the right channel, H hp And (z) is a filter function corresponding to the second filter. By inputting the target audio signal into the second filter, it is possible to remove a signal having a frequency lower than the second preset cutoff frequency from the target audio signal, thereby retaining a high-frequency signal having a frequency higher than the second preset cutoff frequency from the target audio signal.
In yet another possible implementation, the target audio signal may be input to a first filter to obtain a first frequency compensation signal corresponding to the target audio signal, and the target audio signal may be input to a second filter to obtain a second frequency compensation signal corresponding to the target audio signal.
And S306, mixing the human voice eliminating signal corresponding to the target audio signal and the frequency compensation signal corresponding to the target audio signal to obtain a background sound signal corresponding to the target audio signal.
Here, the signal mixing of the human voice eliminating signal corresponding to the target audio signal and the frequency compensation signal corresponding to the target audio signal to obtain the background sound signal corresponding to the target audio signal means that the human voice eliminating signal and the frequency compensation signal are added to obtain the background sound signal.
Specifically, the left channel vocal cancellation signal and the left channel frequency compensation signal can be added to obtain a left channel background sound signal, out L =X L2 +X Lp ,out L For left channel background sound signals, X Lp The signal is compensated for the left channel frequency. Left channel frequency compensation signalThe signal may be the aforementioned left channel first frequency compensation signal, out L =X L2 +X Llp The aforementioned left channel second frequency compensation signal, out, may also be used L =X L2 +X Lhp It is also possible to have a first frequency compensated signal for the left channel and a second frequency compensated signal for the left channel, out L =X L2 +X Llp +X Lhp
Specifically, the right channel vocal cancellation signal and the right channel frequency compensation signal can be added to obtain the right channel background sound signal, out R =X R2 +X Rp ,out R For right channel background sound signals, X Rp The signal is compensated for the right channel frequency. The right channel frequency compensation signal can be the aforementioned right channel first frequency compensation signal, out R =X R2 +X Rlp The aforementioned second frequency compensation signal for the right channel, out, can also be used R =X R2 +X Rhp It is also possible to use a first frequency compensated signal for the right channel and a second frequency compensated signal for the right channel, out R =X R2 +X Rlp +X Rhp
In the technical solution corresponding to fig. 3, after obtaining the human voice eliminating signal from which the human voice has been eliminated to the maximum extent, the target audio signal is input to the filter to obtain the frequency compensation signal not within the human voice frequency range, and the frequency compensation signal is mixed with the human voice eliminating signal to obtain the background audio signal, so that the frequency compensation of the human voice eliminating signal is realized, the low frequency part and/or the high frequency part missing in the human voice eliminating signal can be compensated, the frequency spectrum of the background audio signal is complete, and the background audio with higher quality can be obtained.
Optionally, in some possible cases, before signal mixing is performed on the human voice cancellation signal corresponding to the target audio signal and the frequency compensation signal corresponding to the target audio signal, gain adjustment may also be performed on the human voice cancellation signal corresponding to the target audio signal and the frequency compensation signal corresponding to the target audio signal, so as to obtain a human voice cancellation gain signal and a frequency compensation gain signal corresponding to the target audio signal.
Specifically, the left channel vocal cancellation signal can be gain adjusted to obtain a left channel vocal cancellation gain signal, i.e., c X L2 (ii) a The right channel vocal cancellation signal may be gain adjusted to obtain a right channel vocal cancellation gain signal, c X R2 (ii) a Where c is the gain factor.
In specific implementation, the left channel vocal cancellation signal can be input to a multiplier to obtain a left channel vocal cancellation gain signal; the right channel vocal cancellation signal may be input to a multiplier to obtain a right channel vocal cancellation gain signal.
In particular, the left channel frequency compensation signal may be gain adjusted to obtain a left channel frequency compensation gain signal, i.e., a X Llp And/or b X Lhp (ii) a The right channel frequency compensation signal may be gain adjusted to obtain a right channel frequency compensation signal, i.e., a X Rlp And/or b X Rhp (ii) a Wherein, a and b are respectively a low-frequency gain coefficient and a high-frequency gain coefficient.
In specific implementation, the left channel frequency compensation signal can be input into a multiplier to obtain a left channel frequency compensation gain signal; the right channel frequency compensation signal may be input to a multiplier to obtain a right channel frequency compensation gain signal. It should be noted that, if the frequency compensation signal includes the aforementioned first frequency compensation signal and second frequency compensation signal, there are two multipliers, which are respectively used for performing gain adjustment on the first frequency compensation signal and the second frequency compensation signal.
In some possible cases, the vocal cancellation signal corresponding to the target audio signal and the frequency compensation corresponding to the target audio signal may be gain-adjusted according to a signal correlation between the left channel audio signal and the right channel audio signal to obtain a vocal cancellation gain signal and a frequency compensation gain signal corresponding to the target audio signal. Wherein the gain factor of the multiplier is adjustable according to the signal correlation between the left channel audio signal and the right channel audio signal.
Specifically, in the case where the frequency compensation signal includes the aforementioned first frequency compensation signal and second frequency compensation signal, the aforementioned values of a and b may be adjusted according to the signal correlation between the left channel audio signal and the right channel audio signal. Wherein, the correlation calculation formula of the signal is as follows:
Figure BDA0003901438860000131
and according to the signal correlation between the left and right channel audio signals, the gain adjustment is carried out on the human voice eliminating signal and the frequency compensation signal, so that the human voice eliminating gain signal and the frequency compensation gain signal which are obtained through adjustment can accord with the signal characteristics.
After the voice canceling gain signal and the frequency compensation gain signal corresponding to the target audio signal are obtained, the voice canceling gain signal and the frequency compensation gain signal corresponding to the target audio signal may be signal-mixed to obtain a background sound signal corresponding to the target audio signal.
Specifically, the left channel vocal cancellation gain signal and the left channel frequency compensation gain signal may be added to obtain a left channel background sound signal, out L =cX L2 +aX Llp Or, out L =cX L2 +b X Lhp Or out L =cX L2 +b X Lhp +aX Llp . The right channel vocal cancellation gain signal and the right channel frequency compensation gain signal can be added to obtain a right channel background sound signal, out R =cX R2 +aX Rlp Or, out R =cX R2 +b X Rhp Or out R =cX R2 +b X Rhp +aX Rlp
By carrying out gain adjustment on the human sound eliminating signal and the frequency compensation signal and then mixing the signals, the background sound signal obtained by mixing can be more natural and complete.
The method of the present application is described above and the apparatus of the present application is described below.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an audio signal processing apparatus provided in an embodiment of the present application, where the audio signal processing apparatus may be the aforementioned audio device. As shown in fig. 4, the audio signal processing apparatus 40 includes:
an obtaining module 401, configured to obtain a target audio signal, where the target audio signal includes a left channel audio signal and a right channel audio signal, and the target audio signal is an audio signal including human voice and background sound;
a first eliminating module 402, configured to perform mutual cancellation on the target audio signal with the vocal signals of the left and right channels to obtain a vocal cancellation signal corresponding to the target audio signal;
a second eliminating module 403, configured to input the human voice cancelling signal into a background sound eliminating system for background sound elimination, so as to obtain a human voice residual signal corresponding to the target audio signal, where the background sound eliminating system is configured to eliminate background sound;
a third eliminating module 404, configured to perform signal cancellation on the voice cancellation signal and the voice residual signal to obtain a voice cancellation signal corresponding to the target audio signal.
In a possible design, the audio signal processing apparatus 40 further includes a transmission system determining module 405, configured to perform signal cancellation on the human voice cancellation signal and the target audio signal to obtain a human voice signal corresponding to the target audio signal; and determining the background sound elimination system according to the target audio signal and the human voice signal.
In one possible design, the fourth elimination module 405 is specifically configured to: taking the target audio signal as an input signal, taking the human voice signal as an output signal, and performing adaptive filtering fitting to obtain a target function, wherein the target function is used for representing the incidence relation between the input signal and the output signal; and taking the target function as a transfer function of the background sound elimination system.
In a possible design, the audio signal processing apparatus 40 further includes a frequency compensation module 406, configured to perform frequency compensation on the target audio signal to obtain a frequency compensation signal corresponding to the target audio signal; a signal mixing module 407, configured to perform signal mixing on the human voice canceling signal and the frequency compensation signal to obtain a background sound signal corresponding to the target audio signal.
In one possible design, the frequency compensation module 406 is specifically configured to: inputting the target audio signal into a first filter to obtain a first frequency compensation signal corresponding to the target audio signal, wherein the cut-off frequency of the first filter is smaller than a first preset cut-off frequency; and/or inputting the target audio signal into a second filter to obtain a second frequency compensation signal corresponding to the target audio signal, wherein the cut-off frequency of the second filter is greater than a second preset cut-off frequency; the second preset cutoff frequency is greater than the first preset cutoff frequency.
In one possible design, the audio signal processing apparatus 40 further includes a gain adjusting module 408, configured to perform gain adjustment on the human voice eliminating signal and the frequency compensating signal to obtain a human voice eliminating gain signal and a frequency compensating gain signal corresponding to the target audio signal: the signal mixing module 407 is specifically configured to perform signal mixing on the human voice cancellation gain signal and the frequency compensation gain signal to obtain a background sound signal corresponding to the target audio signal.
In a possible design, the gain adjustment module 408 is specifically configured to perform gain adjustment on the vocal cancellation signal and the frequency compensation signal according to a signal correlation between the left channel audio signal and the right channel audio signal, so as to obtain a vocal cancellation gain signal and a frequency compensation gain signal corresponding to the target audio signal.
It should be noted that, for what is not mentioned in the embodiment corresponding to fig. 4, reference may be made to the description of the foregoing method embodiment, and details are not described here again.
After the device acquires the left channel audio signal and the right channel audio signal containing the human voice and the background sound, a human voice counteracting signal is obtained by mutually counteracting the left channel and the right channel human voice signals, so that the human voice signals are eliminated for the first time; then, inputting the voice counteracting signal into a background sound eliminating system for background sound elimination to obtain a voice residual signal in the voice counteracting signal; and finally, carrying out signal cancellation on the voice cancellation signal and the voice residual signal to obtain a voice cancellation signal, so that the voice residual signal is cancelled again. By means of the voice elimination mode twice, the voice can be eliminated to the maximum extent, and therefore voice elimination signals obtained through elimination are cleaner and purer.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an audio device according to an embodiment of the present application, where the audio device 50 includes a processor 501 and a memory 502. The memory 502 is connected to the processor 501, for example, through a bus to the processor 501.
The processor 501 is configured to enable the audio device 50 to perform the corresponding functions in the methods in the above-described method embodiments. The processor 501 may be a Central Processing Unit (CPU), a Network Processor (NP), a hardware chip, or any combination thereof. The hardware chip may be an Application Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.
The memory 502 is used to store program codes and the like. Memory 502 may include Volatile Memory (VM), such as Random Access Memory (RAM); the memory 502 may also include a non-volatile memory (NVM), such as a read-only memory (ROM), a flash memory (flash memory), a hard disk (HDD) or a solid-state drive (SSD); the memory 502 may also comprise a combination of memories of the kind described above.
Optionally, the audio device 50 may also include a microphone, a speaker, or other playing peripherals.
The processor 501 may call the program code to perform the following operations:
acquiring a target audio signal, wherein the target audio signal comprises a left channel audio signal and a right channel audio signal, and the target audio signal is an audio signal containing human voice and background sound;
carrying out mutual cancellation on the target audio signal by the left and right channel voice signals to obtain a voice cancellation signal corresponding to the target audio signal;
inputting the human voice counteracting signal into a background voice eliminating system for background voice elimination to obtain a human voice residual signal corresponding to the target audio signal, wherein the background voice eliminating system is used for eliminating background voice;
and carrying out signal cancellation on the voice cancellation signal and the voice residual signal to obtain a voice cancellation signal corresponding to the target audio signal.
Embodiments of the present application also provide a computer-readable storage medium, which stores a computer program, where the computer program includes program instructions, and the program instructions, when executed by a computer, cause the computer to execute the method according to the foregoing embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only memory (ROM), a Random Access Memory (RAM), or the like.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims (10)

1. An audio signal processing method, comprising:
acquiring a target audio signal, wherein the target audio signal comprises a left channel audio signal and a right channel audio signal, and the target audio signal is an audio signal containing human voice and background sound;
carrying out mutual cancellation on the target audio signal by the left and right channel voice signals to obtain a voice cancellation signal corresponding to the target audio signal;
inputting the human voice counteracting signal into a background voice eliminating system for background voice elimination to obtain a human voice residual signal corresponding to the target audio signal, wherein the background voice eliminating system is used for eliminating background voice;
and carrying out signal cancellation on the voice cancellation signal and the voice residual signal to obtain a voice cancellation signal corresponding to the target audio signal.
2. The method of claim 1, wherein before inputting the human voice cancellation signal into a background sound cancellation system for background sound cancellation to obtain a human voice residual signal corresponding to the target audio signal, the method further comprises:
carrying out signal cancellation on the voice cancellation signal and the target audio signal to obtain a voice signal corresponding to the target audio signal;
and determining the background sound elimination system according to the target audio signal and the human voice signal.
3. The method of claim 2, wherein determining the background sound cancellation system from the target audio signal and the human voice signal comprises:
taking the target audio signal as an input signal and the human voice signal as an output signal, and performing adaptive filtering fitting to obtain a target function, wherein the target function is used for representing the incidence relation between the input signal and the output signal;
and taking the target function as a transfer function of the background sound elimination system.
4. The method according to any one of claims 1 to 3, wherein after the signal canceling the human voice canceling signal and the human voice residual signal to obtain the human voice canceling signal corresponding to the target audio signal, the method further comprises:
performing frequency compensation on the target audio signal to obtain a frequency compensation signal corresponding to the target audio signal;
and mixing the human voice eliminating signal and the frequency compensation signal to obtain a background voice signal corresponding to the target audio signal.
5. The method of claim 4, wherein the frequency compensating the target audio signal to obtain a frequency compensated signal corresponding to the target audio signal comprises:
inputting the target audio signal into a first filter to obtain a first frequency compensation signal corresponding to the target audio signal, wherein the cut-off frequency of the first filter is smaller than a first preset cut-off frequency; and/or
Inputting the target audio signal into a second filter to obtain a second frequency compensation signal corresponding to the target audio signal, wherein the cut-off frequency of the second filter is greater than a second preset cut-off frequency; the second preset cutoff frequency is greater than the first preset cutoff frequency.
6. The method of claim 4, wherein before the signal mixing the human voice cancellation signal and the frequency compensation signal to obtain the background sound signal corresponding to the target audio signal, further comprising:
performing gain adjustment on the voice eliminating signal and the frequency compensation signal to obtain a voice eliminating gain signal and a frequency compensation gain signal corresponding to the target audio signal;
the signal mixing the human voice eliminating signal and the frequency compensation signal to obtain a background sound signal corresponding to the target audio signal includes:
and mixing the human voice eliminating gain signal and the frequency compensation gain signal to obtain a background sound signal corresponding to the target audio signal.
7. The method of claim 6, wherein the gain adjusting the vocal cancellation signal and the frequency compensation signal to obtain a vocal cancellation gain signal and a frequency compensation gain signal corresponding to the target audio signal comprises:
and according to the signal correlation between the left channel audio signal and the right channel audio signal, performing gain adjustment on the voice eliminating signal and the frequency compensation signal to obtain a voice eliminating gain signal and a frequency compensation gain signal corresponding to the target audio signal.
8. An audio signal processing apparatus, comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a target audio signal, the target audio signal comprises a left channel audio signal and a right channel audio signal, and the target audio signal is an audio signal containing human voice and background voice;
the first eliminating module is used for carrying out mutual cancellation on the target audio signal by the left and right channel voice signals so as to obtain a voice cancellation signal corresponding to the target audio signal;
the second eliminating module is used for inputting the human voice counteracting signal into a background sound eliminating system for background sound elimination so as to obtain a human voice residual signal corresponding to the target audio signal, and the background sound eliminating system is used for eliminating background sound;
and the third eliminating module is used for carrying out signal cancellation on the voice cancelling signal and the voice residual signal so as to obtain a voice eliminating signal corresponding to the target audio signal.
9. An audio device comprising a memory, a processor connected to the processor for executing one or more computer programs stored in the memory, the processor, when executing the one or more computer programs, causing the audio device to implement the method of any of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the method according to any one of claims 1-7.
CN202211291158.XA 2022-10-21 2022-10-21 Audio signal processing method, device, equipment and storage medium Pending CN115550819A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211291158.XA CN115550819A (en) 2022-10-21 2022-10-21 Audio signal processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211291158.XA CN115550819A (en) 2022-10-21 2022-10-21 Audio signal processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115550819A true CN115550819A (en) 2022-12-30

Family

ID=84735623

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211291158.XA Pending CN115550819A (en) 2022-10-21 2022-10-21 Audio signal processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115550819A (en)

Similar Documents

Publication Publication Date Title
JP4161628B2 (en) Echo suppression method and apparatus
KR101183847B1 (en) Methods and apparatus for suppressing ambient noise using multiple audio signals
JP5042823B2 (en) Audio signal echo cancellation
EP1803288A1 (en) Echo cancellation
WO2007049643A1 (en) Echo suppressing method and device
JPWO2007058121A1 (en) Reverberation suppression method, apparatus, and reverberation suppression program
JP3359460B2 (en) Adaptive filter and echo canceller
JPWO2007049644A1 (en) Echo suppression method and apparatus
US8498429B2 (en) Acoustic correction apparatus, audio output apparatus, and acoustic correction method
CN109785853B (en) Echo cancellation method, device, system and storage medium
CN112562624B (en) Active noise reduction filter design method, noise reduction method, system and electronic equipment
US20140067384A1 (en) Method and apparatus for canceling vocal signal from audio signal
CN115550819A (en) Audio signal processing method, device, equipment and storage medium
JP2008072600A (en) Acoustic signal processing apparatus, acoustic signal processing program, and acoustic signal processing method
JP2003250193A (en) Echo elimination method, device for executing the method, program and recording medium therefor
US20220036910A1 (en) Filtering method, filtering device, and storage medium stored with filtering program
WO2018229821A1 (en) Signal processing device, teleconferencing device, and signal processing method
CN108986837A (en) A kind of filter update method and device
CN113763975B (en) Voice signal processing method, device and terminal
JP2010068213A (en) Echo canceler
CN113488016A (en) Coefficient determination method and device
WO2014097470A1 (en) Reverberation removal device
US9384757B2 (en) Signal processing method, signal processing apparatus, and signal processing program
WO2021131346A1 (en) Sound pick-up device, sound pick-up method and sound pick-up program
CN113453124B (en) Audio processing method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination