CN116741191A - Audio signal processing method, device, electronic equipment and storage medium - Google Patents

Audio signal processing method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116741191A
CN116741191A CN202310825895.1A CN202310825895A CN116741191A CN 116741191 A CN116741191 A CN 116741191A CN 202310825895 A CN202310825895 A CN 202310825895A CN 116741191 A CN116741191 A CN 116741191A
Authority
CN
China
Prior art keywords
audio signal
sample
audio
noise reduction
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310825895.1A
Other languages
Chinese (zh)
Inventor
张旭
郑羲光
张晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202310825895.1A priority Critical patent/CN116741191A/en
Publication of CN116741191A publication Critical patent/CN116741191A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)

Abstract

The disclosure relates to an audio signal processing method, an apparatus, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring an audio signal to be repaired; the audio signal to be repaired is a distorted audio signal containing an interference signal; processing the audio signal to be repaired according to a pre-trained noise reduction processing model to obtain a noise reduction audio signal; the noise reduction processing model is obtained based on training of a first audio sample containing an interference signal; inputting the audio signal to be repaired and the noise reduction audio signal into a pre-trained audio repair model to obtain a repaired target audio signal; the audio repair model is obtained through training based on a second audio sample constructed by the sample distortion audio signal and the sample noise reduction audio signal corresponding to the sample distortion audio signal. By adopting the method, the accuracy and quality of audio restoration are improved.

Description

Audio signal processing method, device, electronic equipment and storage medium
Technical Field
The disclosure relates to the technical field of audio, and in particular relates to an audio signal processing method, an audio signal processing device, electronic equipment and a storage medium.
Background
With the development of audio technology, an audio restoration technology has emerged, which can realize noise reduction and restoration of problematic audio.
The current audio recovery method aims at problem audio with noise, reverberation or amplitude interception and other problems possibly existing, selects the problem audio, and carries out audio recovery training on a deep learning model by the problem audio and normal audio corresponding to the problem audio to obtain a trained deep learning model. And performing audio recovery processing on any problem audio based on the trained deep learning model to obtain recovered repair audio.
However, in the current audio recovery method, because the problem audio generally has more than one problem, according to the centralized learning of various problems in the problem audio through the deep learning model, the output repair audio still has unclear audio word spitting, partial pronunciation still is missing, and the audio repair quality is poor.
Disclosure of Invention
The disclosure provides an audio signal processing method, an audio signal processing device, electronic equipment and a storage medium, so as to at least solve the problems of low audio restoration accuracy and poor quality in the related art. The technical scheme of the present disclosure is as follows:
according to a first aspect of an embodiment of the present disclosure, there is provided an audio signal processing method including:
acquiring an audio signal to be repaired; the audio signal to be repaired is a distorted audio signal containing an interference signal;
Processing the audio signal to be repaired according to a pre-trained noise reduction processing model to obtain a noise reduction audio signal; the noise reduction processing model is obtained based on training of a first audio sample containing an interference signal; the noise reduction audio signal is a distorted audio signal for eliminating the interference signal;
inputting the audio signal to be repaired and the noise reduction audio signal into a pre-trained audio repair model to obtain a repaired target audio signal; the audio repair model is obtained through training based on a second audio sample constructed by the sample distortion audio signal and the sample noise reduction audio signal corresponding to the sample distortion audio signal.
In an exemplary embodiment, the acquiring the audio signal to be repaired includes:
acquiring a first audio signal to be repaired; the first audio signal is a distorted audio signal comprising an interfering signal;
performing short-time Fourier transform on the first audio signal to obtain a first converted audio signal on a time-frequency domain;
and carrying out Mel frequency cepstrum conversion processing on the first converted audio signal to obtain an audio signal to be repaired.
In an exemplary embodiment, the processing the audio signal to be repaired according to the pre-trained noise reduction processing model to obtain a noise reduction audio signal includes:
Inputting an audio signal to be repaired containing an interference signal into a pre-trained noise reduction processing model; the interference signal comprises a noise signal and a reverberation signal;
and processing the noise signal and the reverberation signal in the audio signal to be repaired through a pre-trained noise reduction processing model to obtain a noise reduction audio signal for eliminating the noise signal and the reverberation signal.
In an exemplary embodiment, after the audio signal to be repaired and the noise reduction audio signal are input into a pre-trained audio repair model to obtain a repaired target audio signal, the method includes:
inputting the target audio signal to an audio encoder, and converting the target audio signal on a mel frequency cepstrum by the audio encoder to obtain a repair audio signal; the repair audio signal is an audio signal in a time domain after the repair of the audio signal to be repaired.
In an exemplary embodiment, the training process of the noise reduction processing model includes:
acquiring a first audio sample, wherein the first audio sample comprises a first sample audio signal and an original audio signal corresponding to the first sample audio signal; the first sample audio signal comprises an interference signal;
Inputting the first sample audio signal into a noise reduction processing model, and processing the first sample audio signal through the noise reduction processing model to obtain a first processed audio signal;
performing loss calculation according to the first processed audio signal and the original audio signal to obtain a loss result corresponding to the first processed audio signal;
judging whether the loss result meets a preset loss condition or not, and determining that the noise reduction processing model is trained until the loss result meets the preset loss condition; the first processed audio signal output by the trained noise reduction processing model is a noise reduction audio signal.
In an exemplary embodiment, the training process of the audio repair model includes:
acquiring a second audio sample, wherein the second audio sample comprises a second sample audio signal, a sample noise reduction audio signal corresponding to the second sample audio signal and an original audio signal corresponding to the second sample audio signal; the second sample audio signal is a sample distorted audio signal comprising an interfering signal;
inputting the second sample audio signal and the sample noise reduction audio signal into an audio restoration model, and processing the second sample audio signal and the sample noise reduction audio signal through the audio restoration model to obtain a second processed audio signal;
Performing loss calculation according to the second processed audio signal and an original audio signal corresponding to the second sample audio signal to obtain a loss result corresponding to the second processed audio signal;
and judging whether the loss result meets a preset loss condition or not, and determining that the audio repair model training is completed when the loss result meets the preset loss condition.
In an exemplary embodiment, the acquiring the second audio sample includes:
acquiring a second audio signal and an original audio signal corresponding to the second audio signal; the second audio signal is a sample distorted audio signal comprising an interfering signal;
respectively carrying out short-time Fourier transform on the second audio signal and an original audio signal corresponding to the second audio signal to obtain a second sample audio signal on a time-frequency domain and an original audio signal on the time-frequency domain;
processing the second sample audio signal on the time-frequency domain according to the pre-trained noise reduction processing model to obtain a sample noise reduction audio signal corresponding to the second sample audio signal;
and constructing a second audio sample according to the second sample audio signal in the time-frequency domain, the original audio signal in the time-frequency domain and the sample noise reduction audio signal in the time-frequency domain.
In an exemplary embodiment, before the constructing obtains the second audio sample, the method further includes:
performing frequency spectrum conversion on the second sample audio signal on the time-frequency domain, the original audio signal on the time-frequency domain and the sample noise reduction audio signal respectively to obtain a second sample audio signal on a mel frequency cepstrum, an original audio signal on the mel frequency cepstrum and a sample noise reduction audio signal on the time-frequency domain;
the constructing obtains a second audio sample, including:
and constructing a second audio sample according to the second sample audio signal on the Mel frequency cepstrum, the original audio signal on the Mel frequency cepstrum and the sample noise reduction audio signal on the Mel frequency cepstrum.
According to a second aspect of embodiments of the present disclosure, there is provided an audio signal processing apparatus including:
an acquisition unit configured to perform acquisition of an audio signal to be repaired; the audio signal to be repaired is a distorted audio signal containing an interference signal;
the interference elimination unit is configured to execute processing on the audio signal to be repaired according to a pre-trained noise reduction processing model to obtain a noise reduction audio signal; the noise reduction processing model is obtained based on training of a first audio sample containing an interference signal; the noise reduction audio signal is a distorted audio signal for eliminating the interference signal;
The restoration unit is configured to input the audio signal to be restored and the noise reduction audio signal into a pre-trained audio restoration model to obtain a restored target audio signal; the audio repair model is obtained through training based on a second audio sample constructed by the sample distortion audio signal and the sample noise reduction audio signal corresponding to the sample distortion audio signal.
In an exemplary embodiment, the acquiring unit includes:
an acquisition subunit configured to perform acquisition of a first audio signal to be repaired; the first audio signal is a distorted audio signal comprising an interfering signal;
a first conversion subunit configured to perform short-time fourier transform on the first audio signal, resulting in a first converted audio signal on a time-frequency domain;
and the second conversion subunit is configured to perform conversion processing of Mel frequency cepstrum on the first converted audio signal to obtain an audio signal to be repaired.
In an exemplary embodiment, the interference cancellation unit includes:
an input subunit configured to perform inputting an audio signal to be repaired including an interference signal to a noise reduction processing model trained in advance; the interference signal comprises a noise signal and a reverberation signal;
And the interference elimination subunit is configured to perform processing on the noise signal and the reverberation signal in the audio signal to be repaired through a pre-trained noise reduction processing model so as to obtain a noise reduction audio signal for eliminating the noise signal and the reverberation signal.
In an exemplary embodiment, the apparatus further comprises:
a restoring unit configured to perform inputting the target audio signal to an audio encoder, and converting the target audio signal on a mel frequency cepstrum by the audio encoder to obtain a restored audio signal; the repair audio signal is an audio signal in a time domain after the repair of the audio signal to be repaired.
In an exemplary embodiment, the apparatus further comprises:
a first sample acquisition unit configured to perform acquisition of a first audio sample including a first sample audio signal and an original audio signal corresponding to the first sample audio signal; the first sample audio signal comprises an interference signal;
a first sample processing unit configured to perform inputting the first sample audio signal into a noise reduction processing model, and processing the first sample audio signal by the noise reduction processing model to obtain a first processed audio signal;
The first loss calculation unit is configured to perform loss calculation according to the first processed audio signal and the original audio signal to obtain a loss result corresponding to the first processed audio signal;
the first judging unit is configured to judge whether the loss result meets a preset loss condition or not, and determine that the training of the noise reduction processing model is completed when the loss result meets the preset loss condition; the first processed audio signal output by the trained noise reduction processing model is a noise reduction audio signal.
In an exemplary embodiment, the apparatus further comprises:
a second sample acquiring unit configured to perform acquiring a second audio sample, where the second audio sample includes a second sample audio signal, a sample noise reduction audio signal corresponding to the second sample audio signal, and an original audio signal corresponding to the second sample audio signal; the second sample audio signal is a sample distorted audio signal comprising an interfering signal;
a second sample processing unit configured to perform inputting the second sample audio signal and the sample noise reduction audio signal into an audio repair model, and processing the second sample audio signal and the sample noise reduction audio signal through the audio repair model to obtain a second processed audio signal;
A second loss calculation unit configured to perform loss calculation according to the second processed audio signal and an original audio signal corresponding to the second sample audio signal, so as to obtain a loss result corresponding to the second processed audio signal;
and the second judging unit is configured to judge whether the loss result meets a preset loss condition or not, and determine that the audio repair model training is completed when the loss result meets the preset loss condition.
In an exemplary embodiment, the second sample acquiring unit includes:
a second acquisition subunit configured to perform acquisition of a second audio signal and an original audio signal corresponding to the second audio signal; the second audio signal is a sample distorted audio signal comprising an interfering signal;
a third conversion subunit configured to perform short-time fourier transform on the second audio signal and an original audio signal corresponding to the second audio signal, to obtain a second sample audio signal in a time-frequency domain and an original audio signal in the time-frequency domain;
a processing subunit configured to perform processing on the second sample audio signal on a time-frequency domain according to the noise reduction processing model trained in advance, so as to obtain a sample noise reduction audio signal corresponding to the second sample audio signal;
A construction subunit configured to perform construction to obtain a second audio sample from the second sample audio signal in the time-frequency domain, the original audio signal in the time-frequency domain, and the sample noise reduction audio signal in the time-frequency domain.
In an exemplary embodiment, the apparatus further comprises:
a fourth conversion subunit configured to perform frequency spectrum conversion on the second sample audio signal in the time-frequency domain, the original audio signal in the time-frequency domain, and the sample noise reduction audio signal, respectively, to obtain a second sample audio signal in a mel-frequency cepstrum, an original audio signal in a mel-frequency cepstrum, and a sample noise reduction audio signal in the time-frequency domain;
the construction subunit is configured to perform construction to obtain a second audio sample according to the second sample audio signal on the mel-frequency cepstrum, the original audio signal on the mel-frequency cepstrum, and the sample noise reduction audio signal on the mel-frequency cepstrum.
According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
Wherein the processor is configured to execute the instructions to implement the audio signal processing method according to any of the first aspects above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the audio signal processing method as set forth in any one of the first aspects above.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, which, when executed by a processor of an electronic device, enables the electronic device to perform the audio signal processing method of any one of the first aspects described above.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
according to the method, the noise reduction processing model is trained in advance to process the audio signal to be repaired, on the basis, the noise reduction audio signal and the audio signal to be repaired which are obtained after processing are processed through the audio repairing model which is trained in advance, the target audio signal which is free of interference signals and free of audio distortion after audio repairing is obtained, the problem that the interference signals exist in the audio signal to be repaired and the problem of audio signal distortion are repaired, and the accuracy and quality of audio repairing are improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.
Fig. 1 is a flowchart illustrating an audio signal processing method according to an exemplary embodiment.
Fig. 2 is a flowchart illustrating a method of acquiring an audio signal to be repaired according to an exemplary embodiment.
FIG. 3 is a flowchart illustrating a method of training a noise reduction processing model, according to an example embodiment.
FIG. 4 is a schematic diagram illustrating a noise reduction processing model training flow, according to an example embodiment.
FIG. 5 is a flowchart illustrating a method of training an audio repair model, according to an example embodiment.
Fig. 6 is a flowchart illustrating a method of constructing a second audio sample, according to an example embodiment.
FIG. 7 is a schematic diagram illustrating an audio repair model training process, according to an example embodiment.
Fig. 8 is a flowchart illustrating another method of constructing a second audio sample, according to an example embodiment.
Fig. 9 is a block diagram of an audio signal processing apparatus according to an exemplary embodiment.
Fig. 10 is a block diagram of an electronic device, according to an example embodiment.
Fig. 11 is a block diagram of another electronic device, shown in accordance with an exemplary embodiment.
Detailed Description
In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
It should be further noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for presentation, analyzed data, etc.) related to the present disclosure are information and data authorized by the user or sufficiently authorized by each party.
Fig. 1 is a flowchart of an audio signal processing method according to an exemplary embodiment, and as shown in fig. 1, the embodiment of the disclosure uses the audio signal processing method applied to a terminal for illustration, and optionally, the method may also be applied to a server, and may also be applied to a system in which the terminal and the server interact, where the embodiment of the disclosure is not limited, and the specific method includes the following steps:
in step S110, an audio signal to be repaired is acquired.
The audio signal to be repaired is a distorted audio signal containing an interference signal. The distorted audio signal is an audio signal with the problems of audio signal packet loss, cavity or clipping.
In implementation, for the current audio signal to be repaired, because the audio signal to be repaired often has more than one type of audio signal loss problem, for example, the audio signal to be repaired contains noise signals, reverberation signals and other interference signals, and the audio signal to be repaired also has the audio loss problems of holes, amplitude interception and the like. Then, for the problem of various types of audio signal loss of the audio signal to be repaired, the audio signal to be repaired needs to be repaired. Thus, the computer device acquires the audio signal to be repaired to perform audio repair processing on the audio signal to be repaired.
In step S120, the audio signal to be repaired is processed according to the pre-trained noise reduction processing model, so as to obtain a noise reduction audio signal.
The noise reduction processing model is trained based on a first audio sample containing an interference signal. The noise reduction audio signal is a distorted audio signal that cancels the interference signal.
In an implementation, a noise reduction processing model is pre-stored in the terminal, the noise reduction processing model having been trained based on a first audio sample containing the interfering signal. Therefore, when the audio signal to be repaired needs to be subjected to audio repair, the audio signal to be repaired is input into the noise reduction processing model, and interference elimination processing is only performed on the audio signal to be repaired, so that the noise reduction audio signal is obtained. The noise reduction audio signal is an audio signal which eliminates interference signals, but still has audio loss problems such as audio loss (e.g., audio packet loss, holes, clipping, etc.).
Alternatively, the noise reduction processing model may be, but is not limited to, a deep neural network model (DNN, deep Neural Networks), and the embodiments of the present disclosure will not be repeated here.
In step S130, the audio signal to be repaired and the noise reduction audio signal are input into a pre-trained audio repair model, so as to obtain a repaired target audio signal.
The audio restoration model is obtained by training a second audio sample constructed based on two signals including a sample distortion audio signal and a sample noise reduction audio signal corresponding to the sample distortion audio signal.
In implementation, the terminal inputs the audio signal to be repaired and the noise reduction audio signal into a pre-trained audio repair model, and the audio repair model processes the audio signal to be repaired and the noise reduction audio signal to obtain a repaired target audio signal. Specifically, the audio signal to be repaired and the noise reduction audio signal are input into the pre-trained audio repair model together, the two audio signals play a role of cross reference, and the audio characteristics of the audio signal are prevented from being lost in the process of simultaneously performing interference elimination processing and audio repair processing by the audio repair model, so that the problems of unclear word spitting, consonant elimination and the like still exist after the audio signal is repaired. The target audio signal is the final audio signal obtained after the problems of noise elimination, audio deletion and the like of the audio signal to be repaired are solved.
Alternatively, the audio repair model may be, but is not limited to, a deep neural network model (DNN, deep Neural Networks), which is not repeated herein by embodiments of the present disclosure.
According to the audio signal processing method, the noise reduction processing model is trained in advance to process the audio signal to be repaired, on the basis, the noise reduction audio signal and the audio signal to be repaired obtained after the noise reduction processing are processed through the audio repairing model which is trained in advance to obtain the target audio signal which is free of interference signals and audio distortion after the audio repairing, so that the problem that the interference signals exist in the audio signal to be repaired and the problem of the distortion of the audio signal are repaired are solved, and the accuracy and quality of the audio repairing are improved.
In an exemplary embodiment, when processing an audio signal to be repaired, frequency domain conversion may be performed for an audio signal in a directly acquired time domain, and the audio signal may be converted to a time-frequency domain or even a mel-frequency cepstrum, and then audio processing based on a deep learning model is performed, so as to improve accuracy of audio signal processing, specifically, as shown in fig. 2, in step S110, the method may be specifically implemented by:
in step S211, a first audio signal to be repaired is acquired.
Wherein the first audio signal is a distorted audio signal comprising an interfering signal.
In implementation, for a first audio signal to be repaired, the terminal acquires the first audio signal to be repaired because more than one audio problem exists in the first audio signal.
Optionally, the first audio signal includes an interference signal, where the interference signal may be classified into a noise signal, a reverberation signal, and the like, and in the embodiment of the present disclosure, the kind and the number of the interference signals existing in the first audio signal to be repaired are not limited.
Optionally, the first audio signal is also a distorted audio signal, where the distorted audio signal may, but is not limited to, have problems of audio packet loss, holes, clipping, and the like, and the embodiment of the disclosure does not limit the type and number of audio distortions.
In step S212, the first audio signal is subjected to short-time fourier transform, to obtain a first converted audio signal in the time-frequency domain.
In implementation, for a first audio signal in the time domain, the terminal performs Short-time fourier transform (STFT, short-Time Fourier Transform) on the first audio signal, converts the first audio signal to be repaired in the original time domain into a time-frequency domain, and obtains a first converted audio signal in the time-frequency domain. The first converted audio signal still contains interfering signals and has audio distortion problems.
In step S213, a mel-frequency cepstrum conversion process is performed on the first converted audio signal to obtain an audio signal to be repaired.
In implementation, the terminal performs conversion processing of a mel frequency cepstrum on the first converted audio signal, and converts the first converted audio signal on the time-frequency domain spectrum to the mel frequency cepstrum to obtain an audio signal to be repaired on the mel frequency cepstrum. The audio signal to be repaired is an audio signal which can be input into a deep learning model (namely a noise reduction processing model and an audio repair model) for processing.
In this embodiment, the first audio signal to be repaired is converted from the time domain to the time domain and then to the mel-frequency cepstrum by processing the first audio signal to be repaired, so as to obtain the audio signal to be repaired on the mel-frequency cepstrum, so that the audio signal to be repaired on the mel-frequency cepstrum is processed subsequently, and the discrimination of the deep learning model is increased and the discrimination accuracy of the model is improved by converting the audio signal.
In one embodiment, processing the audio signal to be repaired according to the pre-trained noise reduction processing model in step S120, to obtain the noise reduction audio signal includes:
in step S120A, the audio signal to be repaired including the interference signal is input to the noise reduction processing model trained in advance.
The interference signal comprises a noise signal and a reverberation signal.
In implementation, during the recording process of the original audio signal, the original audio signal is easily affected by the surrounding environment, the recording device and the like, so that an interference signal exists in the recorded original audio signal, wherein the interference signal may be a noise signal from the recording environment, or may be reverberation interference to the original audio signal caused by the sound leakage of the recording device and the like, that is, the original audio signal contains a reverberation signal, so that when the audio signal to be repaired is repaired, the interference signal in the audio signal to be repaired is eliminated. The terminal stores a pre-trained noise reduction processing model, and when the to-be-repaired audio signal is repaired, the terminal inputs the to-be-repaired audio signal containing the interference signal into the pre-trained noise reduction processing model so as to process the to-be-repaired audio signal.
Step S120B, the noise signals and the reverberation signals in the audio signals to be repaired are processed through a pre-trained noise reduction processing model, and noise reduction audio signals corresponding to the audio signals to be repaired are obtained.
In implementation, the terminal processes the noise signal and the reverberation signal in the audio signal to be repaired through a pre-trained noise reduction processing model, so that the noise signal and the reverberation signal contained in the audio signal to be repaired are eliminated, and the noise reduction audio signal is obtained.
In this embodiment, the noise-reduction audio signal from which the interference signal is eliminated is obtained by preprocessing the audio signal to be repaired, so that the obtained noise-reduction audio signal and the audio signal to be repaired are referred to each other, and dual repair for eliminating the interference and repairing the distortion of the audio signal to be repaired is realized.
In an exemplary embodiment, in step 130: inputting the audio signal to be repaired and the noise reduction audio signal into a pre-trained audio repair model to obtain a repaired target audio signal, wherein the method further comprises the following steps:
in step S131, the target audio signal is input to the audio encoder, and the audio encoder converts the target audio signal on the mel frequency cepstrum to obtain the repair audio signal.
The repair audio signal is an audio signal in a time domain after the repair of the audio signal to be repaired.
In implementation, for a target audio signal output by the audio repair model, the audio signal is an audio signal on a mel frequency cepstrum, and is subjected to repair processing by the audio repair model, that is, the target audio signal has eliminated an interference signal and has also repaired an audio distortion problem. Then, based on the repaired target audio signal, the terminal inputs the target audio signal into an audio encoder, converts the audio signal from the mel frequency to the spectrum through the audio encoder, and directly converts the target audio signal from the mel frequency cepstrum to the time domain to obtain the repaired audio signal from the time domain. The repair audio signal in the time domain is the audio signal which can be played.
Specifically, melS for target audio signal on mel frequency cepstrum 48k (n, m) represents that the formula for performing conversion processing on the target audio signal on the mel-frequency cepstrum is as follows:
s(t)=Vocoder(MelS 48k (n,m))
where s (t) represents the restoration audio signal in the converted time domain. The Vocoder indicates that audio encoder conversion processing is performed.
Optionally, the restored repair audio signal is converted into a playable audio signal in the time domain, so that the repair accuracy and quality of the current audio signal to be repaired can be verified by performing playing processing on the repair audio signal.
In this embodiment, the audio encoder converts the target audio signal on the mel frequency domain spectrum obtained after the repair to obtain the repaired audio signal on the time domain after the conversion, so as to implement the restoration of the target audio signal obtained after the interference elimination processing and the audio repair processing.
In an exemplary embodiment, for a noise reduction processing model applied in an audio signal processing process, the noise reduction processing model needs to be trained in advance, and as shown in fig. 3, the training process of the noise reduction processing model includes:
in step S302, a first audio sample is acquired.
Wherein the first audio sample comprises a first sample audio signal and an original audio signal corresponding to the first sample audio signal; there is only a problem of audio loss of the type of the disturbing signal in the first sample audio signal, i.e. a problem that the first sample audio signal contains disturbing signals such as noise signals, reverberation signals, etc., but the first sample audio signal is not a distorted audio signal.
In an implementation, a terminal first acquires a first audio sample when training a noise reduction processing model. Specifically, the process of obtaining the first audio sample by the terminal includes: the terminal firstly acquires a first sample audio signal with the length of T and an original audio signal corresponding to the first sample audio signal. Wherein the original audio signal x1 and the first sample audio signal y1 containing the interfering signal are x1 (t) and y1 (t), respectively, in the time domain. Wherein T represents time, and T is more than 0 and less than or equal to T. The terminal then performs short-time Fourier transform on the first sample audio signal in the time domain and the original audio signal corresponding to the first sample audio signal, respectively, to obtain a first sample audio signal in the time-frequency domain (using STFT (y 1) 48k (t) representation) of the original audio signal in the time-frequency domain (represented by STFT (x 1) 48k (t) represents). Specifically, the method can be represented by the following formula:
X1 48k (n,k)=STFT(x1 48k (t)) (1)
Y1 48k (n,k)=STFT(y1 48k (t)) (2)
wherein N is a frame sequence, N is more than 0 and less than or equal to N (N is the total frame number), and K is a center frequency sequence, K is more than 0 and less than or equal to K 48 (K 48 As the total number of bands).
Then, for the first sample audio signal (STFT (y 1 48k (t)) and the original audio signal (STFT (x 1) 48k (t)) respectively obtaining amplitude information of each frame of the audio signal, specifically, a formula for calculating the amplitude of the audio signal is as follows:
MagX1 48k (n,k)=abs(X1 48k (n,k)) (3)
MagY1 48k (n,k)=abs(Y1 48k (n,k)) (4)
Wherein MagX1 48k (n, k) represents amplitude information (or referred to as amplitude characteristics) of the original audio signal in the time-frequency domain, abs (X1) 48k (n, k)) means taking absolute value of the original audio signal. magY1 48k (n, k) represents amplitude information of the first sample audio signal in the time-frequency domain, abs (Y1) 48k (n, k)) means taking absolute value of the first sample audio signal.
And finally, the terminal constructs and obtains a first audio sample according to the amplitude information of the first sample audio signal and the amplitude information of the original audio signal corresponding to the first sample audio signal.
In step S304, the first sample audio signal is input into a noise reduction processing model, and the first sample audio signal is processed by the noise reduction processing model to obtain a first processed audio signal.
In implementation, the terminal inputs a first sample audio signal in a first audio sample into a noise reduction processing model, and performs interference elimination processing on the first sample audio signal through the noise reduction processing model to obtain a first processed audio signal. Specifically, fig. 4 is a schematic diagram of a training flow of a noise reduction processing model, as shown in fig. 4, y1 48 (t) a first sample audio signal containing an interference signal of 48k in the time domain obtained by short-time Fourier processing, a first sample audio signal in the time-frequency domain obtained by short-time Fourier transformation, and y1 48 (n, k) represents. Extracting amplitude information MagY1 of the 48k first sample audio signal in the time-frequency domain 48k (n, k) inputting amplitude information of the first sample audio signal into a noise reduction processing model (DNN 1 model), and performing noise reduction processing on the first sample audio signal by the noise reduction processing modelThe amplitude information of the frequency signal is processed by interference elimination to obtain an audio signal output by a model, and the amplitude information magX1 of the original audio signal is used 48 (n, k) as a target of the deep learning, and further, performing a loss calculation on the audio signal output from the model to perform model iteration based on a result of the loss calculation to complete training of the noise reduction processing model.
In step S306, a loss calculation is performed according to the first processed audio signal and the original audio signal, so as to obtain a loss result corresponding to the first processed audio signal.
In implementation, the terminal performs loss calculation according to the first processed audio signal and the original audio signal to obtain a loss result corresponding to the first processed audio signal. Specifically, the loss calculation may be an amplitude spectrum distance calculation, that is, the terminal may perform an amplitude spectrum distance calculation on the first processed audio signal output by the noise reduction processing model and the original audio signal, and use the calculated amplitude spectrum distance as a loss result corresponding to the first processed audio signal.
In step S308, it is determined whether the loss result satisfies a preset loss condition, until the loss result satisfies the preset loss condition, it is determined that the training of the noise reduction processing model is completed.
The first processing audio signal output by the trained noise reduction processing model is a noise reduction audio signal.
In implementation, the terminal determines whether a loss result corresponding to the current first processed audio signal meets a preset loss condition. Specifically, if the loss result meets a preset loss condition, the terminal determines that the training of the noise reduction processing model is completed. If the loss result does not meet the preset loss condition, the terminal executes the processes from step S302 to step S304 until the loss result meets the preset loss condition. The process from step S302 to step S304 is not described herein.
Optionally, the preset loss condition may be smaller than or equal to a preset first loss threshold, so that if the calculated loss result corresponding to the first processed audio signal is smaller than or equal to the preset first loss threshold, the loss result is represented to meet the preset loss condition, otherwise, if the loss result corresponding to the first processed audio signal is greater than the preset first loss threshold, the loss result is represented to not meet the preset loss condition.
In this embodiment, training of the noise reduction processing model is achieved through the first audio sample, and further, a noise reduction processing model with completed training is obtained, so that the problem that the audio signal to be repaired contains an interference signal is processed according to the noise reduction processing model with completed training.
In an exemplary embodiment, for an audio repair model applied in an audio signal processing process, the audio repair model needs to be trained in advance, and as shown in fig. 5, the training process of the audio repair model includes:
in step S502, a second audio sample is acquired.
The second audio sample includes a second sample audio signal, a sample noise reduction audio signal corresponding to the second sample audio signal, and an original audio signal corresponding to the second sample audio signal.
Wherein the second sample audio signal is a sample distorted audio signal containing an interference signal
In an implementation, the terminal obtains a second audio sample. Specifically, the terminal may obtain the resampling rate 48k, a second sample audio signal with a length T and including the interference signal, a sample noise reduction audio signal corresponding to the second sample audio signal, and an original audio signal corresponding to the second sample audio signal. The sample noise-reducing audio signal is the noise-reducing audio signal obtained after the second sample audio signal is subjected to interference elimination processing, and the original audio signal is an audio signal which corresponds to the second sample audio signal and does not contain any audio loss problem. Then, the terminal preprocesses the second sample audio signal, the sample noise reduction audio signal and the original audio signal, and constructs a second audio sample based on each preprocessed audio signal.
In step S504, the second sample audio signal and the sample noise reduction audio signal are input into the audio repair model, and the second sample audio signal and the sample noise reduction audio signal are processed by the audio repair model to obtain a second processed audio signal.
In implementation, the terminal inputs the second sample audio signal and the sample noise reduction audio signal in the second audio sample into an audio repair model (DNN 2 model), processes the second sample audio signal and the sample noise reduction audio signal through the audio repair model, and outputs a second processed audio signal. The second processed audio signal is a repaired audio signal on a mel frequency cepstrum.
In step S506, a loss calculation is performed according to the second processed audio signal and the original audio signal corresponding to the second sample audio signal, so as to obtain a loss result corresponding to the second processed audio signal.
The second processing audio signal is used for carrying out loss calculation with the original audio signal serving as a training target, and determining a loss result of the model training.
In implementation, the terminal performs loss calculation according to the second processed audio signal and the original audio signal corresponding to the second sample audio signal, so as to obtain a loss result corresponding to the second processed audio signal. Alternatively, the terminal may calculate an amplitude spectrum distance between the second processed audio signal and the original audio signal, and use the amplitude spectrum distance as a loss result corresponding to the second processed audio signal.
In step S508, it is determined whether the loss result satisfies a preset loss condition, until the loss result satisfies the preset loss condition, it is determined that the audio repair model training is completed.
In implementation, a preset loss condition is stored in the terminal, and then, the terminal judges whether the loss result meets the preset loss condition according to the loss result corresponding to the second processed audio signal output by the audio repair model each time until the loss result meets the preset loss condition, and determines that the audio repair model training is completed. Specifically, the preset loss condition is less than or equal to a preset second loss threshold. If the loss result corresponding to the second processed audio signal is smaller than or equal to a preset second loss threshold value, determining that the loss result meets a preset loss condition. Then, the terminal determines that the audio repair model training is completed. If the loss result corresponding to the second processed audio signal is larger than a preset second loss threshold value, determining that the loss result does not meet a preset loss condition. Further, the terminal continues to execute the steps S502 to S506 until the loss result meets the preset loss condition, and determines that the audio repair model training is completed.
In this embodiment, the audio repair model is trained through the second audio sample, so as to obtain a trained audio repair model, so that the audio signal to be repaired contains the interference signal and the problem of audio signal distortion is solved according to the trained audio repair model.
In an exemplary embodiment, in the process of obtaining the second audio sample, the terminal performs preprocessing on the second audio signal in the directly obtained time domain, the noise reduction audio signal corresponding to the second audio signal, and the original audio signal corresponding to the second audio signal to obtain preprocessed audio signals, and then constructs the second audio sample based on the preprocessed audio signals, which is a specific preprocessing process, as shown in fig. 6, and in step S502, the method specifically includes the following steps:
in step S602, the second audio signal and the original audio signal corresponding to the second audio signal are acquired.
Wherein the second audio signal is a sample-distorted audio signal comprising an interfering signal.
In implementation, the terminal acquires the second audio signal in the time domain and the original audio signal corresponding to the second audio signal. Wherein y2 is used for the second audio signal in the time-frequency domain 48k (t) represents. The original audio signal corresponding to the second audio signal is the audio signal corresponding to the second audio signal and not containing any audio loss problem.
In step S604, short-time fourier transform is performed on the second audio signal and the original audio signal corresponding to the second audio signal, so as to obtain a second sample audio signal in the time-frequency domain and the original audio signal in the time-frequency domain.
In practice, the terminal respectively outputs the second audio frequency in the time domainAnd performing short-time Fourier transform on the signal and an original audio signal corresponding to the second audio signal to obtain a second sample audio signal on a time-frequency domain and an original audio signal on the time-frequency domain. Specifically, as shown in FIG. 7, y2 is processed by STFT 48k (t) converting into a second sample audio signal y2 in the time-frequency domain 48k (n, k). To the original audio signal x2 in the time domain 48k (t) conversion into an original Audio Signal x2 on the time-frequency Domain 48k (n, k). Specifically, for an original audio signal in the time-frequency domain, amplitude spectrum information of the original audio signal in the time-frequency domain is determined using MagX2 48 (n,k),
In step S606, the second sample audio signal in the time domain is processed according to the pre-trained noise reduction processing model, so as to obtain a sample noise reduction audio signal corresponding to the second sample audio signal.
In implementation, the terminal performs interference elimination processing on the second sample audio signal in the time domain according to the pre-trained noise reduction processing model to obtain a sample noise reduction audio signal corresponding to the second sample audio signal. As shown in fig. 7, the second sample audio signal is backed up, one of the second sample audio signals is input into a noise reduction processing model (DNN 1 model) which has been trained in advance, and the noise reduction processing model (DNN 1 model) is processed to obtain a sample noise reduction audio signal corresponding to the second sample audio signal, wherein the sample noise reduction audio signal is amplitude spectrum information on a time-frequency domain, and MagY2 is used 48k (n, k) represents.
In step S608, a second audio sample is constructed according to the second sample audio signal in the time-frequency domain, the original audio signal in the time-frequency domain, and the sample noise reduction audio signal in the time-frequency domain.
In implementation, the terminal constructs and obtains a second audio sample according to the second sample audio signal on the time-frequency domain, the original audio signal on the time-frequency domain and the sample noise reduction audio signal on the time-frequency domain.
In this embodiment, the second audio signal on the directly acquired time domain, the noise reduction audio signal corresponding to the second audio signal, and the original audio signal corresponding to the second audio signal are preprocessed to obtain preprocessed audio signals, and according to the preprocessed second sample audio signal on the time-frequency domain, the original audio signal on the time-frequency domain, and the sample noise reduction audio signal on the time-frequency domain, a second audio sample is constructed to obtain, so that the audio repair model is trained according to the second audio sample.
In an exemplary embodiment, as shown in fig. 8, before the second audio sample is constructed in step S608, the method further includes:
in step S802, the second sample audio signal on the time-frequency domain, the original audio signal on the time-frequency domain, and the sample noise reduction audio signal are respectively subjected to frequency spectrum conversion to obtain the second sample audio signal on the mel-frequency cepstrum, the original audio signal on the mel-frequency cepstrum, and the sample noise reduction audio signal on the time-frequency domain.
In implementation, the terminal performs frequency spectrum conversion on the second sample audio signal, the original audio signal and the sample noise reduction audio signal on the time-frequency domain, and converts each audio signal to a mel frequency cepstrum to obtain the second sample audio signal on the mel frequency cepstrum, the original audio signal on the mel frequency cepstrum and the sample noise reduction audio signal on the time-frequency domain.
Then, the processing in step S608 is updated to the following steps:
in step S804, a second audio sample is constructed according to the second sample audio signal on the mel-frequency cepstrum, the original audio signal on the mel-frequency cepstrum, and the sample noise reduction audio signal on the mel-frequency cepstrum.
In implementation, the terminal constructs and obtains a second audio sample according to the second sample audio signal on the mel frequency cepstrum, the original audio signal on the mel frequency cepstrum and the sample noise reduction audio signal on the mel frequency cepstrum, so that the audio repair model is trained according to the second audio sample.
In this embodiment, the second sample audio signal, the original audio signal and the sample noise reduction audio signal on the time-frequency domain are converted to obtain a second audio sample constructed by the second sample audio signal, the original audio signal and the sample noise reduction audio signal on the mel frequency cepstrum, so that model training is performed through the second audio sample on the mel frequency cepstrum, speech information contained in the second audio sample is reserved, and accuracy of an audio repair model is improved.
It should be understood that, although the steps in the flowcharts of fig. 1-3, 5-6, and 8 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps of fig. 1-3, 5-6, 8 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the steps or stages are performed necessarily occur sequentially, but may be performed alternately or alternately with other steps or at least a portion of the steps or stages in other steps.
It should be understood that the same/similar parts of the embodiments of the method described above in this specification may be referred to each other, and each embodiment focuses on differences from other embodiments, and references to descriptions of other method embodiments are only needed.
Fig. 9 is a block diagram of an audio signal processing apparatus according to an exemplary embodiment. Referring to fig. 9, the apparatus includes an acquisition unit 902, an interference cancellation unit 904, and a repair unit 906.
An acquisition unit 902 configured to perform acquisition of an audio signal to be repaired; the audio signal to be repaired is a distorted audio signal containing an interference signal;
an interference elimination unit 904 configured to perform processing on an audio signal to be repaired according to a noise reduction processing model trained in advance, resulting in a noise reduction audio signal; the noise reduction processing model is obtained based on training of a first audio sample containing an interference signal;
a restoration unit 906 configured to perform inputting the audio signal to be restored and the noise reduction audio signal into a pre-trained audio restoration model, resulting in a restored target audio signal; the audio repair model is obtained through training based on a second audio sample constructed by the sample distortion audio signal and the sample noise reduction audio signal corresponding to the sample distortion audio signal.
In an exemplary embodiment, the acquiring unit 902 includes:
an acquisition subunit configured to perform acquisition of a first audio signal to be repaired; the first audio signal is a distorted audio signal containing an interference signal;
a first conversion subunit configured to perform short-time fourier transform on the first audio signal to obtain a first converted audio signal on a time-frequency domain;
and the second conversion subunit is configured to perform conversion processing of the Mel frequency cepstrum on the first converted audio signal to obtain an audio signal to be repaired.
In an exemplary embodiment, the interference cancellation unit 904 includes:
an input subunit configured to perform inputting an audio signal to be repaired including an interference signal to a noise reduction processing model trained in advance; the interference signal comprises a noise signal and a reverberation signal;
and the interference elimination subunit is configured to perform processing on the noise signal and the reverberation signal in the audio signal to be repaired through a pre-trained noise reduction processing model so as to obtain a noise reduction audio signal for eliminating the noise signal and the reverberation signal.
In an exemplary embodiment, the apparatus further comprises:
a restoring unit configured to perform inputting the target audio signal to an audio encoder, and converting the target audio signal on the mel frequency cepstrum by the audio encoder to obtain a restored audio signal; the repair audio signal is an audio signal in a time domain after the repair of the audio signal to be repaired.
In an exemplary embodiment, the apparatus further comprises:
a first sample acquisition unit configured to perform acquisition of a first audio sample containing a first sample audio signal and an original audio signal corresponding to the first sample audio signal; the first sample audio signal comprises an interference signal;
a first sample processing unit configured to perform inputting of a first sample audio signal into a noise reduction processing model, and to process the first sample audio signal through the noise reduction processing model to obtain a first processed audio signal;
the first loss calculation unit is configured to perform loss calculation according to the first processed audio signal and the original audio signal to obtain a loss result corresponding to the first processed audio signal;
the first judging unit is configured to judge whether the loss result meets a preset loss condition or not, and determine that the training of the noise reduction processing model is completed when the loss result meets the preset loss condition; the first processed audio signal output by the trained noise reduction processing model is a noise reduction audio signal.
In an exemplary embodiment, the apparatus further comprises:
a second sample acquiring unit configured to perform acquiring a second audio sample, where the second audio sample includes a second sample audio signal, a sample noise reduction audio signal corresponding to the second sample audio signal, and an original audio signal corresponding to the second sample audio signal; the second sample audio signal is a sample distorted audio signal comprising an interfering signal;
The second sample processing unit is configured to input a second sample audio signal and a sample noise reduction audio signal into the audio restoration model, and the second sample audio signal and the sample noise reduction audio signal are processed through the audio restoration model to obtain a second processed audio signal;
the second loss calculation unit is configured to perform loss calculation according to the second processed audio signal and the original audio signal corresponding to the second sample audio signal, so as to obtain a loss result corresponding to the second processed audio signal;
the second judging unit is configured to judge whether the loss result meets a preset loss condition or not, and determine that the audio repair model training is completed when the loss result meets the preset loss condition.
In an exemplary embodiment, the second sample acquiring unit includes:
a second acquisition subunit configured to perform acquisition of a second audio signal and an original audio signal corresponding to the second audio signal; the second audio signal is a sample distorted audio signal comprising an interfering signal;
a third conversion subunit configured to perform short-time fourier transform on the second audio signal and an original audio signal corresponding to the second audio signal, respectively, to obtain a second sample audio signal in a time-frequency domain and an original audio signal in the time-frequency domain;
The processing subunit is configured to execute processing on the second sample audio signal on the time-frequency domain according to a pre-trained noise reduction processing model to obtain a sample noise reduction audio signal corresponding to the second sample audio signal;
and a construction subunit configured to perform construction to obtain a second audio sample according to the second sample audio signal in the time-frequency domain, the original audio signal in the time-frequency domain, and the sample noise reduction audio signal in the time-frequency domain.
In an exemplary embodiment, the apparatus further comprises:
a fourth conversion subunit configured to perform frequency spectrum conversion on the second sample audio signal on the time-frequency domain, the original audio signal on the time-frequency domain, and the sample noise reduction audio signal, respectively, to obtain a second sample audio signal on the mel-frequency cepstrum, the original audio signal on the mel-frequency cepstrum, and the sample noise reduction audio signal on the time-frequency domain;
a construction subunit configured to perform construction to obtain a second audio sample from the second sample audio signal on the mel-frequency cepstrum, the original audio signal on the mel-frequency cepstrum, and the sample noise reduction audio signal on the mel-frequency cepstrum.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
Fig. 10 is a block diagram of an electronic device 1000 for an audio signal processing method according to an exemplary embodiment. For example, electronic device 1000 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.
Referring to fig. 10, an electronic device 1000 may include one or more of the following components: a processing component 1002, a memory 1004, a power component 1006, a multimedia component 1008, an audio component 1010, an input/output (I/O) interface 1012, a sensor component 1014, and a communication component 1016.
The processing component 1002 generally controls overall operation of the electronic device 1000, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 1002 can include one or more processors 1020 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 1002 can include one or more modules that facilitate interaction between the processing component 1002 and other components. For example, the processing component 1002 can include a multimedia module to facilitate interaction between the multimedia component 1008 and the processing component 1002.
The memory 1004 is configured to store various types of data to support operations at the electronic device 1000. Examples of such data include instructions for any application or method operating on the electronic device 1000, contact data, phonebook data, messages, pictures, video, and so forth. The memory 1004 may be implemented by any type of volatile or nonvolatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, optical disk, or graphene memory.
The power supply component 1006 provides power to the various components of the electronic device 1000. The power components 1006 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 1000.
The multimedia component 1008 includes a screen between the electronic device 1000 and a user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 1008 includes a front-facing camera and/or a rear-facing camera. When the electronic device 1000 is in an operational mode, such as a shooting mode or a video mode, the front-facing camera and/or the rear-facing camera may receive external multimedia data. Each front and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
The audio component 1010 is configured to output and/or input audio signals. For example, the audio component 1010 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 1000 is in an operational mode, such as a call mode, a recording mode, and an audio recognition mode. The received audio signals may be further stored in memory 1004 or transmitted via communication component 1016. In some embodiments, the audio component 1010 further comprises a speaker for outputting audio signals.
The I/O interface 1012 provides an interface between the processing assembly 1002 and peripheral interface modules, which may be a keyboard, click wheel, buttons, and the like. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.
The sensor assembly 1014 includes one or more sensors for providing status assessment of various aspects of the electronic device 1000. For example, the sensor assembly 1014 may detect an on/off state of the electronic device 1000, a relative positioning of the components, such as a display and keypad of the electronic device 1000, the sensor assembly 1014 may also detect a change in position of the electronic device 1000 or a component of the electronic device 1000, the presence or absence of a user's contact with the electronic device 1000, an orientation or acceleration/deceleration of the device 1000, and a change in temperature of the electronic device 1000. The sensor assembly 1014 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 1014 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1014 can also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 1016 is configured to facilitate communication between the electronic device 1000 and other devices, either wired or wireless. The electronic device 1000 may access a wireless network based on a communication standard, such as WiFi, an operator network (e.g., 2G, 3G, 4G, or 5G), or a combination thereof. In one exemplary embodiment, the communication component 1016 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 1016 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 1000 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.
In an exemplary embodiment, a computer-readable storage medium is also provided, such as memory 1004, including instructions executable by processor 1020 of electronic device 1000 to perform the above-described method. For example, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
In an exemplary embodiment, a computer program product is also provided, comprising instructions executable by the processor 1020 of the electronic device 1000 to perform the above-described method.
Fig. 11 is a block diagram illustrating an electronic device 1100 for audio signal processing according to an example embodiment. For example, the electronic device 1100 may be a server. Referring to FIG. 11, the electronic device 1100 includes a processing component 1120 that further includes one or more processors, and memory resources represented by memory 1122, for storing instructions, such as application programs, executable by the processing component 1120. An application program stored in memory 1122 may include one or more modules each corresponding to a set of instructions. Further, the processing component 1120 is configured to execute instructions to perform the above-described methods.
The electronic device 1100 may further include: the power component 1124 is configured to perform power management of the electronic device 1100, the wired or wireless network interface 1126 is configured to connect the electronic device 1100 to a network, and the input output (I/O) interface 1128. The electronic device 1100 may operate based on an operating system stored in the memory 1122, such as Windows Server, mac OS X, unix, linux, freeBSD, or the like.
In an exemplary embodiment, a computer readable storage medium comprising instructions, such as a memory S22 comprising instructions, is also provided, the instructions being executable by a processor of the electronic device S00 to perform the above-described method. The storage medium may be a computer readable storage medium, which may be, for example, ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
In an exemplary embodiment, a computer program product is also provided, comprising instructions executable by a processor of the electronic device 1100 to perform the above-described method.
It should be noted that the descriptions of the foregoing apparatus, the electronic device, the computer readable storage medium, the computer program product, and the like according to the method embodiments may further include other implementations, and the specific implementation may refer to the descriptions of the related method embodiments and are not described herein in detail.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (11)

1. An audio signal processing method, comprising:
acquiring an audio signal to be repaired; the audio signal to be repaired is a distorted audio signal containing an interference signal;
processing the audio signal to be repaired according to a pre-trained noise reduction processing model to obtain a noise reduction audio signal; the noise reduction processing model is obtained based on training of a first audio sample containing an interference signal; the noise reduction audio signal is a distorted audio signal for eliminating the interference signal;
inputting the audio signal to be repaired and the noise reduction audio signal into a pre-trained audio repair model to obtain a repaired target audio signal; the audio repair model is obtained through training based on a second audio sample constructed by the sample distortion audio signal and the sample noise reduction audio signal corresponding to the sample distortion audio signal.
2. The audio signal processing method according to claim 1, wherein the acquiring the audio signal to be repaired includes:
Acquiring a first audio signal to be repaired; the first audio signal is a distorted audio signal comprising an interfering signal;
performing short-time Fourier transform on the first audio signal to obtain a first converted audio signal on a time-frequency domain;
and carrying out Mel frequency cepstrum conversion processing on the first converted audio signal to obtain an audio signal to be repaired.
3. The method for processing an audio signal according to claim 1, wherein the processing the audio signal to be repaired according to the pre-trained noise reduction processing model to obtain a noise reduction audio signal comprises:
inputting an audio signal to be repaired containing an interference signal into a pre-trained noise reduction processing model; the interference signal comprises a noise signal and a reverberation signal;
and processing the noise signal and the reverberation signal in the audio signal to be repaired through a pre-trained noise reduction processing model to obtain a noise reduction audio signal for eliminating the noise signal and the reverberation signal.
4. The audio signal processing method according to claim 1, wherein the audio signal to be repaired and the noise reduction audio signal are input into a pre-trained audio repair model, and after obtaining a repaired target audio signal, the method further comprises:
Inputting the target audio signal to an audio encoder, and converting the target audio signal on a mel frequency cepstrum by the audio encoder to obtain a repair audio signal; the repair audio signal is an audio signal in a time domain after the repair of the audio signal to be repaired.
5. The audio signal processing method according to claim 1, wherein the training process of the noise reduction processing model includes:
acquiring a first audio sample, wherein the first audio sample comprises a first sample audio signal and an original audio signal corresponding to the first sample audio signal; the first sample audio signal comprises an interference signal;
inputting the first sample audio signal into a noise reduction processing model, and processing the first sample audio signal through the noise reduction processing model to obtain a first processed audio signal;
performing loss calculation according to the first processed audio signal and the original audio signal to obtain a loss result corresponding to the first processed audio signal;
judging whether the loss result meets a preset loss condition or not, and determining that the noise reduction processing model is trained until the loss result meets the preset loss condition; the first processed audio signal output by the trained noise reduction processing model is a noise reduction audio signal.
6. The audio signal processing method according to claim 1 or 2, wherein the training process of the audio repair model includes:
acquiring a second audio sample, wherein the second audio sample comprises a second sample audio signal, a sample noise reduction audio signal corresponding to the second sample audio signal and an original audio signal corresponding to the second sample audio signal; the second sample audio signal is a sample distorted audio signal comprising an interfering signal;
inputting the second sample audio signal and the sample noise reduction audio signal into an audio restoration model, and processing the second sample audio signal and the sample noise reduction audio signal through the audio restoration model to obtain a second processed audio signal;
performing loss calculation according to the second processed audio signal and an original audio signal corresponding to the second sample audio signal to obtain a loss result corresponding to the second processed audio signal;
and judging whether the loss result meets a preset loss condition or not, and determining that the audio repair model training is completed when the loss result meets the preset loss condition.
7. The method of audio signal processing according to claim 6, wherein the acquiring the second audio sample comprises:
Acquiring a second audio signal and an original audio signal corresponding to the second audio signal; the second audio signal is a sample distorted audio signal comprising an interfering signal;
respectively carrying out short-time Fourier transform on the second audio signal and an original audio signal corresponding to the second audio signal to obtain a second sample audio signal on a time-frequency domain and an original audio signal on the time-frequency domain;
processing the second sample audio signal on the time-frequency domain according to the pre-trained noise reduction processing model to obtain a sample noise reduction audio signal corresponding to the second sample audio signal;
and constructing a second audio sample according to the second sample audio signal in the time-frequency domain, the original audio signal in the time-frequency domain and the sample noise reduction audio signal in the time-frequency domain.
8. The audio signal processing method of claim 7, wherein prior to said constructing a second audio sample, the method further comprises:
performing frequency spectrum conversion on the second sample audio signal on the time-frequency domain, the original audio signal on the time-frequency domain and the sample noise reduction audio signal respectively to obtain a second sample audio signal on a mel frequency cepstrum, an original audio signal on the mel frequency cepstrum and a sample noise reduction audio signal on the time-frequency domain;
The constructing obtains a second audio sample, including:
and constructing a second audio sample according to the second sample audio signal on the Mel frequency cepstrum, the original audio signal on the Mel frequency cepstrum and the sample noise reduction audio signal on the Mel frequency cepstrum.
9. An audio signal processing apparatus, comprising:
an acquisition unit configured to perform acquisition of an audio signal to be repaired; the audio signal to be repaired is a distorted audio signal containing an interference signal;
the interference elimination unit is configured to execute processing on the audio signal to be repaired according to a pre-trained noise reduction processing model to obtain a noise reduction audio signal; the noise reduction processing model is obtained based on training of a first audio sample containing an interference signal; the noise reduction audio signal is a distorted audio signal for eliminating the interference signal;
the restoration unit is configured to input the audio signal to be restored and the noise reduction audio signal into a pre-trained audio restoration model to obtain a restored target audio signal; the audio repair model is obtained through training based on a second audio sample constructed by the sample distortion audio signal and the sample noise reduction audio signal corresponding to the sample distortion audio signal.
10. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the audio signal processing method of any of claims 1 to 7.
11. A computer readable storage medium, characterized in that instructions in the computer readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the audio signal processing method of any one of claims 1 to 7.
CN202310825895.1A 2023-07-06 2023-07-06 Audio signal processing method, device, electronic equipment and storage medium Pending CN116741191A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310825895.1A CN116741191A (en) 2023-07-06 2023-07-06 Audio signal processing method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310825895.1A CN116741191A (en) 2023-07-06 2023-07-06 Audio signal processing method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116741191A true CN116741191A (en) 2023-09-12

Family

ID=87902709

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310825895.1A Pending CN116741191A (en) 2023-07-06 2023-07-06 Audio signal processing method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116741191A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117357104A (en) * 2023-12-07 2024-01-09 深圳市好兄弟电子有限公司 Audio analysis method based on user characteristics
CN117395181A (en) * 2023-12-12 2024-01-12 方图智能(深圳)科技集团股份有限公司 Low-delay multimedia audio transmission detection method and system based on Internet of things

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117357104A (en) * 2023-12-07 2024-01-09 深圳市好兄弟电子有限公司 Audio analysis method based on user characteristics
CN117357104B (en) * 2023-12-07 2024-04-26 深圳市好兄弟电子有限公司 Audio analysis method based on user characteristics
CN117395181A (en) * 2023-12-12 2024-01-12 方图智能(深圳)科技集团股份有限公司 Low-delay multimedia audio transmission detection method and system based on Internet of things
CN117395181B (en) * 2023-12-12 2024-02-13 方图智能(深圳)科技集团股份有限公司 Low-delay multimedia audio transmission detection method and system based on Internet of things

Similar Documents

Publication Publication Date Title
US11430427B2 (en) Method and electronic device for separating mixed sound signal
CN108198569B (en) Audio processing method, device and equipment and readable storage medium
CN116741191A (en) Audio signal processing method, device, electronic equipment and storage medium
CN109887515B (en) Audio processing method and device, electronic equipment and storage medium
CN110503968B (en) Audio processing method, device, equipment and readable storage medium
CN110931028B (en) Voice processing method and device and electronic equipment
CN107945806B (en) User identification method and device based on sound characteristics
CN112185388B (en) Speech recognition method, device, equipment and computer readable storage medium
CN116129931B (en) Audio-visual combined voice separation model building method and voice separation method
CN111862995A (en) Code rate determination model training method, code rate determination method and device
CN109036404A (en) Voice interactive method and device
CN107437412B (en) Acoustic model processing method, voice synthesis method, device and related equipment
CN111583142A (en) Image noise reduction method and device, electronic equipment and storage medium
CN110970015B (en) Voice processing method and device and electronic equipment
CN112820300B (en) Audio processing method and device, terminal and storage medium
CN112185421B (en) Sound quality detection method and device, electronic equipment and storage medium
CN109754816B (en) Voice data processing method and device
CN112201267A (en) Audio processing method and device, electronic equipment and storage medium
CN110580910B (en) Audio processing method, device, equipment and readable storage medium
CN111696550A (en) Voice processing method and device for voice processing
CN111046780A (en) Neural network training and image recognition method, device, equipment and storage medium
CN111667842B (en) Audio signal processing method and device
CN112951202B (en) Speech synthesis method, apparatus, electronic device and program product
CN118038889A (en) Audio data processing method and device, electronic equipment and storage medium
CN113077807B (en) Voice data processing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination