CN116312589A - Audio signal processing method, device, electronic equipment and storage medium - Google Patents

Audio signal processing method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116312589A
CN116312589A CN202310098138.9A CN202310098138A CN116312589A CN 116312589 A CN116312589 A CN 116312589A CN 202310098138 A CN202310098138 A CN 202310098138A CN 116312589 A CN116312589 A CN 116312589A
Authority
CN
China
Prior art keywords
audio signal
audio
processed
processing
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310098138.9A
Other languages
Chinese (zh)
Inventor
韩润强
吕新亮
李楠
郑羲光
张晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202310098138.9A priority Critical patent/CN116312589A/en
Publication of CN116312589A publication Critical patent/CN116312589A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/02Circuits for transducers, loudspeakers or microphones for preventing acoustic reaction, i.e. acoustic oscillatory feedback
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The present disclosure relates to an audio signal processing method, an apparatus, an electronic device, and a storage medium, the method being applied to an audio signal feedback device, the method comprising: receiving an audio signal to be processed recorded by audio signal recording equipment, wherein the audio signal to be processed at least comprises a sound signal of a target object; processing an audio signal to be processed and a background audio signal corresponding to the audio signal to be processed based on a pre-trained audio purification model to obtain an initial audio signal; performing audio attribute processing on the initial audio signal based on a preset audio attribute processing strategy to obtain a processed target audio signal; and carrying out audio mixing processing on the target audio signal and the background audio signal to obtain a mixed audio signal, and outputting the mixed audio signal. By adopting the method, the loss of the audio signal is reduced, and the delay of the audio signal is reduced.

Description

Audio signal processing method, device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of communications technologies, and in particular, to an audio signal processing method, an apparatus, an electronic device, and a storage medium.
Background
With the development of audio technology and the entertainment demands of people in daily life, more and more users develop entertainment activities such as song requesting and K song through application programs on terminal devices such as mobile phones or tablets. In the singing process of the user, the user can record the sound signals when singing by using the microphone, meanwhile, the background music and the sound signals of the user are mixed in a mode of externally playing the background music, and therefore, the background concert is recorded by the microphone at the same time, and the definition of the recorded sound signals is interfered.
At present, in order to avoid direct interference of background music, a user usually adopts a mode of wearing headphones, firstly, a microphone records a sound signal of the user, the sound signal is transmitted to a terminal device, the terminal device mixes the sound signal of the user and the background music to obtain a mixed audio signal, and then the mixed audio signal is processed to the headphones and fed back to the user through the headphones.
However, in the current audio signal processing method, the audio signal after being mixed is transmitted in multiple ends, and is fed back to the user through the earphone, so that the audio signal is easy to be lost in the transmission process.
Disclosure of Invention
The disclosure provides an audio signal processing method, an audio signal processing device, an electronic device and a storage medium, so as to at least solve the problems of audio signal transmission loss and audio signal transmission delay in the related art. The technical scheme of the present disclosure is as follows:
according to a first aspect of embodiments of the present disclosure, there is provided an audio signal processing method, the method being applied to an audio signal feedback apparatus, the method comprising:
receiving an audio signal to be processed recorded by audio signal recording equipment, wherein the audio signal to be processed at least comprises a sound signal of a target object;
processing the audio signal to be processed and a background audio signal corresponding to the audio signal to be processed based on a pre-trained audio purification model to obtain an initial audio signal;
performing audio attribute processing on the initial audio signal based on a preset audio attribute processing strategy to obtain a processed target audio signal;
and carrying out audio mixing processing on the target audio signal and the background audio signal to obtain a mixed audio signal, and outputting the mixed audio signal.
In an exemplary embodiment, before the audio signal to be processed and the background audio signal corresponding to the audio signal to be processed are processed based on the pre-trained audio purification model to obtain the initial audio signal, the method further includes:
Determining a background audio signal corresponding to the audio signal to be processed according to the audio signal to be processed;
and acquiring the audio data of the background audio signal from the audio storage equipment according to the audio signal identification of the background audio signal.
In an exemplary embodiment, the audio purification model includes a convolution layer, a gating circulation unit, a full connection layer, and an activation layer, and the processing, based on the pre-trained audio purification model, the audio signal to be processed and a background audio signal corresponding to the audio signal to be processed to obtain an initial audio signal includes:
preprocessing the audio signal to be processed and a background audio signal corresponding to the audio signal to be processed to obtain an audio feature to be processed corresponding to the audio signal to be processed and a background audio feature corresponding to the background audio signal;
and simultaneously inputting the audio characteristics to be processed and the background audio characteristics into the audio purification model, and sequentially carrying out correlation comparison and audio characteristic processing on the audio characteristics to be processed and the background audio characteristics through a convolution layer, a gate control circulation unit, a full connection layer and an activation layer in the audio purification model to output an initial audio signal after audio purification of the audio signals to be processed.
In an exemplary embodiment, the preprocessing the audio signal to be processed and the background audio signal corresponding to the audio signal to be processed to obtain the audio feature to be processed corresponding to the audio signal to be processed and the background audio feature corresponding to the background audio signal, includes:
performing short-time Fourier transform processing on the audio signal to be processed and a background audio signal corresponding to the audio signal to be processed to obtain a first conversion signal and a second conversion signal on a processed frequency domain;
sampling the first conversion signal and the second conversion signal on the frequency domain according to a preset sampling strategy to respectively obtain a plurality of sampling point data corresponding to the first conversion signal and a plurality of sampling point data corresponding to the second conversion signal;
and respectively compressing and banding the data of the plurality of sampling points corresponding to the first conversion signal and the plurality of sampling points corresponding to the second conversion signal to obtain the audio feature to be processed corresponding to the audio signal to be processed and the background audio feature corresponding to the background audio signal.
In an exemplary embodiment, the performing audio attribute processing on the initial audio signal based on a preset audio attribute processing policy to obtain a processed target audio signal includes:
Performing tone characteristic processing and sound field characteristic processing on the initial audio signal to obtain a processed third audio signal;
and carrying out reverberation processing on the third audio signal to obtain a fourth audio signal after the reverberation processing, and carrying out dynamic compression processing on the fourth audio signal within a target amplitude range of the audio signal to obtain a target audio signal.
According to a second aspect of embodiments of the present disclosure, there is provided an audio signal processing system, the system comprising:
the audio signal recording device is used for recording an audio signal to be processed and processing the audio signal to be processed to the audio signal feedback device, wherein the audio signal to be processed comprises a sound signal and a howling signal of a target object;
the audio signal feedback device is used for receiving the audio signal to be processed recorded by the audio signal recording device, wherein the audio signal to be processed comprises a sound signal and a howling signal of a target object; processing the audio signal to be processed and a background audio signal corresponding to the audio signal to be processed based on a pre-trained audio purification model to obtain an initial audio signal;
performing audio attribute processing on the initial audio signal based on a preset audio attribute processing strategy to obtain a processed target audio signal; and carrying out audio mixing processing on the target audio signal and the background audio signal to obtain a mixed audio signal, and outputting the mixed audio signal.
In an exemplary embodiment, the audio signal feedback apparatus includes an equalizer unit, a reverberation unit, and a dynamic compression unit;
the equalizer unit is used for performing tone characteristic processing and sound field characteristic processing on the initial audio signal to obtain a processed third audio signal;
the reverberation unit is used for carrying out reverberation processing on the third audio signal to obtain a fourth audio signal after the reverberation processing;
the dynamic compression unit is used for carrying out dynamic compression processing on the fourth audio signal within the audio signal target amplitude range to obtain a target audio signal.
In an exemplary embodiment, the audio signal feedback apparatus further includes a wireless transmission unit;
the wireless transmission unit is used for constructing a wireless transmission channel with the audio signal recording equipment and receiving the audio signal to be processed transmitted by the audio signal recording equipment through the wireless transmission channel.
According to a third aspect of embodiments of the present disclosure, there is provided an audio signal processing apparatus, the apparatus being applied to an audio signal feedback device, the apparatus comprising:
a receiving unit configured to execute receiving an audio signal to be processed recorded by an audio signal recording apparatus, the audio signal to be processed including a sound signal and a howling signal of a target object;
The first processing unit is configured to execute processing on the audio signal to be processed and a background audio signal corresponding to the audio signal to be processed based on a pre-trained audio purification model to obtain an initial audio signal;
the second processing unit is configured to execute an audio attribute processing strategy based on a preset audio attribute, and perform audio attribute processing on the initial audio signal to obtain a processed target audio signal;
and the audio mixing unit is configured to perform audio mixing processing on the target audio signal and the background audio signal to obtain a mixed audio signal and output the mixed audio signal.
In an exemplary embodiment, the apparatus further comprises:
a determining unit configured to perform determining a background audio signal corresponding to the audio signal to be processed from the audio signal to be processed;
and the acquisition unit is configured to perform audio signal identification according to the background audio signal and acquire the audio data of the background audio signal from an audio storage device.
In an exemplary embodiment, the audio purification model includes a convolution layer, a gating loop unit, a full connection layer, and an activation layer, the first processing unit includes:
A preprocessing subunit, configured to perform preprocessing on the audio signal to be processed and a background audio signal corresponding to the audio signal to be processed, so as to obtain an audio feature to be processed corresponding to the audio signal to be processed and a background audio feature corresponding to the background audio signal;
and the processing subunit is configured to perform simultaneous input of the audio feature to be processed and the background audio feature into the audio purification model, and perform correlation comparison and audio feature processing on the audio feature to be processed and the background audio feature sequentially through a convolution layer, a gating circulation unit, a full connection layer and an activation layer in the audio purification model, and output an initial audio signal after audio purification of the audio signal to be processed.
In an exemplary embodiment, the preprocessing subunit is specifically configured to perform short-time fourier transform processing on the audio signal to be processed and a background audio signal corresponding to the audio signal to be processed, so as to obtain a first converted signal and a second converted signal on a frequency domain after processing;
sampling the first conversion signal and the second conversion signal on the frequency domain according to a preset sampling strategy to respectively obtain a plurality of sampling point data corresponding to the first conversion signal and a plurality of sampling point data corresponding to the second conversion signal;
And respectively compressing and banding the data of the plurality of sampling points corresponding to the first conversion signal and the plurality of sampling points corresponding to the second conversion signal to obtain the audio feature to be processed corresponding to the audio signal to be processed and the background audio feature corresponding to the background audio signal.
In an exemplary embodiment, the second processing unit includes:
a feature processing subunit configured to perform timbre feature processing and sound field feature processing on the initial audio signal, to obtain a processed third audio signal;
and the reverberation processing subunit is configured to perform reverberation processing on the third audio signal to obtain a fourth audio signal after the reverberation processing, and perform dynamic compression processing on the fourth audio signal within a target amplitude range of the audio signal to obtain a target audio signal.
According to a fourth aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the audio signal processing method according to any of the first aspects above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the audio signal processing method as set forth in any one of the first aspects above.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, which, when executed by a processor of an electronic device, enables the electronic device to perform the audio signal processing method of any one of the first aspects described above.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
by adopting the method, the audio signal to be processed recorded by the audio signal recording equipment is received, and the audio signal to be processed and the background audio signal corresponding to the audio signal to be processed are directly subjected to audio purification processing and audio attribute processing in the audio signal feedback equipment, so that the processed target audio signal is obtained. And carrying out audio mixing processing on the target audio signal and the background audio signal to obtain a mixed audio signal. Then, the mixed audio signal is directly output and fed back. The audio signal feedback equipment processes the audio signal to be processed, reduces the transmission loss of the audio signal, simplifies the transmission path of the audio signal, and reduces the processing delay of the audio signal.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.
Fig. 1 is a system configuration diagram illustrating an audio signal processing method according to an exemplary embodiment.
Fig. 2 is a flowchart illustrating an audio signal processing method according to an exemplary embodiment.
Fig. 3 is a flowchart illustrating a background audio acquisition method according to an exemplary embodiment.
Fig. 4 is a schematic diagram showing an internal structure of an audio purification model according to an exemplary embodiment.
Fig. 5 is a flowchart illustrating steps of audio purification processing for audio to be processed according to an exemplary embodiment.
Fig. 6 is a flowchart illustrating an audio feature extraction step according to an exemplary embodiment.
Fig. 7 is a flowchart illustrating steps for processing audio attributes according to an exemplary embodiment.
Fig. 8 is a block diagram illustrating an audio signal processing apparatus according to an exemplary embodiment.
Fig. 9 is a block diagram of an electronic device, according to an example embodiment.
Detailed Description
In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
It should be further noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for presentation, analyzed data, etc.) related to the present disclosure are information and data authorized by the user or sufficiently authorized by each party.
The audio signal processing method provided by the disclosure can be applied to an audio signal processing system as shown in fig. 1. Wherein the audio recorder 110 interacts with the audio signal feedback device 120 via a wireless transmission channel. In the audio signal processing system, when the audio signal is transmitted and fed back only between the audio signal recording device 110 and the audio signal feedback device 120, the transmission path of the audio signal is simplified, and the processing delay of the audio signal is reduced.
The audio recording device 110 may be, but not limited to, various microphones, recording devices, mobile terminals with recording functions, etc., and the audio signal feedback device 120 may be a wireless ear-return device, a wireless earphone, etc., and the types and kinds of the audio recording device and the audio signal feedback device are not limited in the embodiments of the present disclosure.
Fig. 2 is a flowchart illustrating an audio signal processing method according to an exemplary embodiment, which is used in an audio signal feedback apparatus as shown in fig. 2, the method including the following steps.
In step S210, an audio signal to be processed recorded by the audio signal recording apparatus is received.
Wherein the audio signal to be processed comprises at least a sound signal of the target object. Optionally, the audio signal to be processed may further include a noise signal in the recording environment and a howling signal caused by output feedback of the audio signal feedback device.
In implementation, in a daily K song entertainment process, when a target object sings a song, an audio signal generated in the singing process can be recorded by an audio signal recording device, the audio signal is processed as an audio signal to be processed to an audio signal feedback device, and the audio signal feedback device receives the audio signal to be processed recorded by the audio signal recording device.
In step S220, the audio signal to be processed and the background audio signal corresponding to the audio signal to be processed are processed based on the pre-trained audio purification model, so as to obtain an initial audio signal.
In implementation, an audio purification model is trained in advance in the audio signal feedback device, and the audio purification model can implement howling suppression processing on howling signals contained in the audio signal to be processed and noise reduction processing on the audio signal to be processed. Therefore, when the audio signal feedback device receives the audio signal to be processed, the audio signal feedback device inputs the audio signal to be processed and a background audio signal corresponding to the audio signal to be processed into the pre-trained audio purification model. The audio purification model takes a background audio signal of an audio signal to be processed as a reference signal, processes the audio signal to be processed and the background audio signal corresponding to the audio signal to be processed, realizes audio purification of the audio signal to be processed, and obtains an initial audio signal after audio purification processing.
In step S230, audio attribute processing is performed on the initial audio signal based on a preset audio attribute processing policy, so as to obtain a processed target audio signal.
In implementation, for an initial audio signal after audio purification processing, the audio signal feedback device performs audio attribute processing on the initial audio signal based on a preset audio attribute processing strategy to obtain a processed target audio signal. Specifically, the audio attribute processing such as tone, sound field, adding reverberation effect, adjusting the fluctuation range of the audio frequency spectrum and the like can be performed on the initial audio signal, so that the target audio signal after the audio attribute processing is obtained.
In step S240, the target audio signal and the background audio signal are subjected to audio mixing processing, so as to obtain a mixed audio signal, and the mixed audio signal is output.
In an implementation, the target audio signal is an audio signal after audio purification processing and audio attribute processing. The audio signal can clearly reflect the sound signal of the target object, so that the audio signal feedback device mixes the target audio signal with the pre-acquired background audio signal to obtain a mixed audio signal. Then, the audio signal feedback device can feed back the mixed audio signal to the target object, so that the singing result of the target object, namely positive feedback of the mixed audio signal, is realized in the singing process of the target object.
In the above audio signal processing method, the audio signal feedback device receives the audio signal to be processed recorded by the audio signal recording device. Wherein the audio signal to be processed includes a sound signal of the target object and a howling signal. Then, the audio signal feedback device obtains an initial audio signal according to the audio signal to be processed, a background audio signal corresponding to the audio signal to be processed and a pre-trained audio purification model. And performing audio attribute processing on the initial audio signal based on a preset audio attribute processing strategy to obtain a processed target audio signal. Then, the audio signal feedback device mixes the target audio signal with the background audio signal to obtain a mixed audio signal, and outputs the mixed audio signal. By adopting the method, the audio signal feedback equipment receives the audio signal to be processed recorded by the audio signal recording equipment, and performs audio purification treatment and audio attribute treatment on the audio signal to be processed and the background audio signal in the audio signal feedback equipment to obtain a treated target audio signal. And then, carrying out audio mixing processing on the target audio signal and the background audio signal to obtain a mixed audio signal, and outputting and feeding back the mixed audio signal to the target object. The audio signal to be processed is processed in the audio signal feedback equipment, and the transmission of the third-party audio signal is not needed by means of an additional audio mixing terminal, so that the transmission loss of the audio signal is reduced, the transmission path of the audio signal is simplified, and the processing delay of the audio mixing signal is reduced.
In an exemplary embodiment, before the audio signal to be processed is processed, a background audio signal corresponding to the audio signal to be processed may be acquired. The background audio signal can be used as a reference signal in the audio purification processing process, the background audio signal and the audio signal to be processed are input into an audio purification model for processing, the background audio signal can be a necessary signal for audio mixing processing, and the audio-processed target audio signal and the background audio signal are subjected to audio mixing processing, so that the audio-mixed audio signal can be obtained. Specifically, as shown in fig. 3, before step S210, the audio signal processing method further includes:
in step S211, a background audio signal corresponding to the audio signal to be processed is determined from the audio signal to be processed.
In an implementation, the audio signal feedback device determines a background audio signal corresponding to the audio signal to be processed from the audio signal to be processed. Specifically, the audio signal to be processed may include a sound signal of the target object, a howling signal, and a background noise signal. The sound signal of the target object is an audio signal which needs to be subjected to audio mixing processing and fed back to the target object. Accordingly, the corresponding background audio signal is determined mainly based on the sound signal of the target object, i.e. the audio signal feedback device recognizes the sound signal in the audio signal to be processed, and the background audio signal matching therewith is determined based on the sound signal.
In step S212, audio data of the background audio signal is acquired from the audio storage device according to the audio signal identification of the background audio signal.
In an implementation, the audio signal feedback device obtains audio data of the background audio signal from the audio storage device according to the audio signal identification of the background audio signal. Specifically, the audio storage device stores a correspondence between an audio signal identifier and audio data that can be played by the audio signal. After the audio signal feedback device determines the target background audio signal, the audio signal sends an audio signal acquisition instruction to the audio storage device, wherein the audio signal acquisition instruction carries an audio signal identifier corresponding to the target background audio signal, and based on the audio signal identifier, the audio signal feedback device acquires audio data corresponding to the background audio signal in the audio storage device.
In this embodiment, the audio signal feedback device determines, according to a sound signal of a target object in an audio signal to be processed, a background audio signal corresponding to the audio signal to be processed, so as to obtain, in the audio signal storage device, audio data corresponding to the background audio signal based on an audio signal identifier corresponding to the background audio signal.
Carrying out audio purification treatment together with the audio signal to be treated, and simultaneously, based on the background audio signal
In an exemplary embodiment, as shown in fig. 4, the audio purification model includes a convolution layer, a gating loop unit, a full connection layer, and an activation layer, and in particular, the audio purification model may include, but is not limited to, 5 convolution layers (Conv), a gating loop unit (GRU, gated Recurrent Units), a full connection layer (Dense), and an activation layer (Sigmoid). Of the 5 convolutional layers, each can be followed by a two-dimensional Batch normalization 9 and a RelU. As shown in fig. 5, in step S220, an initial audio signal is obtained according to the audio signal to be processed, a background audio signal corresponding to the audio signal to be processed, and a pre-trained audio purification model, and the specific processing procedure includes:
in step S502, the audio signal to be processed and the background audio signal corresponding to the audio signal to be processed are preprocessed, so as to obtain the audio feature to be processed corresponding to the audio signal to be processed and the background audio feature corresponding to the background audio signal.
In implementation, the audio signal feedback device performs preprocessing on the audio signal to be processed and the background audio signal corresponding to the audio signal to be processed, so as to obtain the audio feature to be processed corresponding to the audio signal to be processed after preprocessing and the background audio feature corresponding to the background audio signal. Specifically, the audio signal feedback device converts an audio signal to be processed and a background audio signal to a frequency domain according to a preset feature extraction algorithm, and further performs feature extraction on the audio signal to be processed and the background audio signal on the converted frequency domain, namely sub-band decomposition is performed on the audio signal to be processed on the frequency domain, so as to obtain energy spectrums corresponding to a plurality of sub-band signals of the audio signal to be processed, and thus the audio feature to be processed corresponding to the audio signal to be processed is obtained; and carrying out sub-band decomposition on the background audio signal on the frequency domain to obtain energy spectrums corresponding to a plurality of sub-band signals of the background audio signal, and obtaining the background audio characteristics corresponding to the background audio signal.
In step S504, the audio feature to be processed and the background audio feature are input into the audio purification model at the same time, and the audio feature to be processed and the background audio feature are compared and processed sequentially through the convolution layer, the gate control circulation unit, the full connection layer and the activation layer in the audio purification model, so as to output an initial audio signal after audio purification of the audio signal to be processed.
In implementation, the audio signal feedback device inputs the audio feature to be processed and the background audio feature into the audio purification model at the same time, and sequentially performs correlation comparison and audio feature processing on the audio feature to be processed and the background audio feature layer by layer through a convolution layer, a gate control circulation unit, a full connection layer and an activation layer in the audio purification model, and outputs the processed initial audio feature corresponding to the audio signal to be processed. And then, the audio signal feedback equipment performs feature reduction processing on the initial audio features to obtain initial audio signals corresponding to the initial audio features.
Specifically, the audio purification model may perform a howling signal suppression process and a noise reduction process on the audio feature to be processed, wherein the howling signal suppression process may be referenced by means of the background audio feature. The audio purification model carries out howling suppression processing on the audio characteristics to be processed and the background audio characteristics as follows:
Firstly, carrying out feature extraction on the audio features to be processed and the background audio features layer by layer through 5 convolution layers in the audio purification model, and determining the correlation between the audio features to be processed and the background audio features. If the correlation between the audio feature points contained in the audio feature to be processed and the audio feature points in the background audio feature is higher, the audio feature points in the audio feature to be processed are characterized as howling signal feature points. If the correlation between the audio feature points contained in the audio feature to be processed and the audio feature points in the background audio feature is low, the audio feature points are not the howling signal feature points, then, information such as time sequence relation, time sequence correlation and the like of the audio signal feature is stored in a GRU layer after 5 convolution layers, further howling signal feature detection and howling suppression can be performed on the audio feature to be processed based on the information stored in the GRU layer and the background audio feature, and then, the initial audio feature after the howling suppression processing of the audio feature to be processed is output through mapping of a full connection layer and limitation of the range of the audio feature after the howling suppression processing by an activation layer. The initial audio characteristics after the howling suppression processing are specifically 64 amplitude masks (masks) corresponding to 64 ERB (Equivalent rectangular bandwidth ) bands after the howling suppression processing, each amplitude mask represents a discrimination result that a point on the current ERB band is a sampling point of a non-howling signal, namely 0 or 1,0 is represented as a howling signal, and 1 is represented as a non-howling signal (namely a sound signal). And then, based on the output judging result, retaining non-howling signal characteristic points in the audio characteristics to be processed, eliminating the howling signal characteristic points, and determining the audio characteristics after the howling suppression processing.
The process of the audio purification model for carrying out noise reduction treatment on the audio characteristics to be treated is as follows: and obtaining an audio signal to be processed, and carrying out sub-band decomposition on the audio signal to be processed to obtain energy spectrums respectively corresponding to a plurality of sub-band signals, wherein the energy spectrums respectively corresponding to the sub-band signals are the audio characteristics to be processed. Then, the energy spectrum (audio feature to be processed) corresponding to each sub-band signal is input into an audio purification model, and the denoised audio features corresponding to the energy spectrum of each sub-band signal are obtained.
In this embodiment, the audio feature to be processed and the background audio feature are processed through the audio purification model, and based on the reference function of the background audio feature, the audio purification model detects howling signal information and noise signal information in the audio feature to be processed, and carries out howling suppression processing and noise reduction processing on the audio feature to be processed, so as to obtain the audio feature after the overall audio purification processing, thereby improving the definition of the sound signal in the audio signal to be processed.
In one embodiment, the audio signal feedback apparatus performs an audio purification process on an audio signal to be processed through an audio purification model, the audio purification model requiring a pre-training process, the pre-training process of the audio purification model including:
The audio signal feedback device first constructs an audio training sample. Specifically, in the audio signal recording process, the audio signal recording device records the sound signal of the target object. Because the noise signal in the environment has the characteristics that the frequency value of each frequency point is more than 800HZ (hertz), the audio signal fed back by the audio signal feedback device can be recorded into the audio signal recording device again in the process of recording the sound signal of the target object by the audio signal recording device, thereby forming a howling signal. Then, the audio signal feedback device acquires a background audio signal corresponding to a sound signal of a target object in the training audio signal based on the sound signal, and takes the background audio signal as a reference training audio signal. Then, the electronic device performs feature extraction on the training audio signal and the reference training audio signal respectively to obtain training audio features corresponding to the training audio signal and reference training audio features corresponding to the reference training audio signal, thereby constructing an audio training sample containing the training audio features and the reference training audio features. Then, the audio signal feedback device trains the audio purification model based on the audio training sample, the audio signal feedback device trains the preset audio purification model according to training audio features contained in the audio training sample and reference training audio features, the audio purification model carries out howling suppression processing and noise reduction processing on the training audio features based on reference of the reference training audio features through a convolution layer, a gate control circulation unit, a full connection layer and an activation layer, and outputs the audio features after the howling suppression processing and the noise reduction processing on the training audio features until loss results corresponding to the audio features meet preset loss conditions, and the audio purification model is determined to be trained. The process of the audio purification model for performing the howling suppression process and the noise reduction process on the training audio features and the reference training audio features is similar to the process of the audio purification model for performing the howling suppression process and the noise reduction process on the audio features to be processed, which are not described in detail in the embodiments of the present disclosure.
In an exemplary embodiment, as shown in fig. 6, in step S502, an audio signal to be processed and a background audio signal corresponding to the audio signal to be processed are preprocessed, so as to obtain an audio feature to be processed corresponding to the audio signal to be processed and a background audio feature corresponding to the background audio signal, where the specific processing procedure includes the following steps:
in step 602, short-time fourier transform processing is performed on the audio signal to be processed and the background audio signal corresponding to the audio signal to be processed, so as to obtain a first converted signal and a second converted signal on the frequency domain after processing.
In implementation, the audio signal feedback device performs short-time fourier transform processing on an audio signal to be processed and a background audio signal corresponding to the audio signal to be processed, so as to obtain a first conversion signal and a second conversion signal on a frequency domain after processing. Specifically, the audio signal feedback apparatus converts audio data contained in the first audio signal and the second audio signal from a time domain to a frequency domain, respectively, according to a preset short-time fourier transform algorithm, thereby determining a first converted signal and a second converted signal on the frequency domain.
In step 604, the first conversion signal and the second conversion signal in the frequency domain are sampled according to a preset sampling strategy, so as to obtain a plurality of sampling point data corresponding to the first conversion signal and a plurality of sampling point data corresponding to the second conversion signal respectively.
In implementation, the audio signal feedback device samples the first conversion signal and the second conversion signal on the frequency domain according to a preset sampling strategy, so as to obtain a plurality of sampling point data corresponding to the first conversion signal and a plurality of sampling point data corresponding to the second conversion signal. Specifically, the audio signal feedback device samples the first conversion signal and the second conversion signal with a sampling rate of 48KHZ (kilohertz), in a sampling manner that a frame length is 20ms (millisecond), and a frame shift is 10ms (millisecond), so as to obtain 960 sampling point data (i.e., FFT (short time fourier data)) corresponding to the first conversion signal and 960 sampling point data corresponding to the second conversion signal.
In step 606, compression banding is performed on the plurality of sampling point data corresponding to the first conversion signal and the plurality of sampling points corresponding to the second conversion signal, so as to obtain a to-be-processed audio feature corresponding to the to-be-processed audio signal and a background audio feature corresponding to the background audio signal.
In implementation, the audio signal feedback device performs compression banding processing on a plurality of sampling point data corresponding to the first conversion signal and a plurality of sampling points corresponding to the second conversion signal respectively to obtain a first audio feature corresponding to the first audio signal and a second audio feature corresponding to the second audio signal. Specifically, the 960 sampling point data corresponding to the first conversion signal are half-valued, for example, the first 481 points are taken. Then, the electronic device performs compression and banding processing on the first 481 points, compresses the first 481 points onto 64 ERB bands based on human ear hearing, determines energy summation corresponding to each sampling point data on each ERB band (namely, the sum of the numbers of the sampling point data on each ERB band), and performs logarithmic operation (namely log operation) on the energy summation on each ERB band to obtain a first audio feature corresponding to the first conversion signal. Similarly, the second conversion signal is sampled and extracted in the same manner to obtain a second audio feature corresponding to the second conversion signal, and the specific processing procedure is similar to that of the first conversion signal, which is not described in detail in the embodiment of the present disclosure.
In this embodiment, feature extraction is performed on the audio signal to be processed and the background audio signal according to a preset feature extraction algorithm, so as to obtain an audio feature to be processed corresponding to the audio signal to be processed and a background audio feature corresponding to the background audio signal respectively. The feature extraction process realizes the pretreatment of the audio signal to be processed and the background audio signal, simplifies the internal processing logic of the audio purification model, reduces the complexity of the audio purification model and improves the processing efficiency of the audio purification model.
In an exemplary embodiment, as shown in fig. 7, in step S230, audio attribute processing is performed on an initial audio signal based on a preset audio attribute processing policy to obtain a processed target audio signal, which specifically includes the following steps:
in step S702, tone characteristic processing and sound field characteristic processing are performed on the initial audio signal, and a processed third audio signal is obtained.
In an implementation, the audio signal feedback device performs timbre feature processing and sound field feature processing on the initial audio signal to obtain a processed third audio signal. Specifically, the audio signal feedback device performs audio gain or attenuation processing on different frequency bands in the audio signal, so as to achieve the purpose of adjusting the tone and the sound field of the initial audio signal, and obtain the processed third audio signal.
In step S704, the third audio signal is subjected to reverberation processing to obtain a fourth audio signal after the reverberation processing, and the fourth audio signal is subjected to dynamic compression processing of the audio signal target amplitude range to obtain a target audio signal.
In implementation, the audio signal feedback device performs reverberation processing on the audio of each frequency band included in the third audio signal, to obtain a fourth audio signal after the reverberation processing. And then, the audio signal feedback equipment performs dynamic compression processing on the fourth audio signal in the audio signal target amplitude range, dynamically adjusts the audio output amplitude, suppresses the volume to be in a preset target range when the volume is large, and properly improves the volume in the preset target range when the volume is small, so as to obtain the target audio signal.
In this embodiment, in the audio signal feedback device, the audio attribute processing is performed on the initial audio signal based on the preset audio attribute processing policy, so as to obtain the processed target audio signal, adjust the timbre and sound field of the initial audio signal, increase the reverberation effect, ensure that the audio output amplitude of the target audio signal is within the target range, and improve the audio mixing effect of the target audio signal.
In an exemplary embodiment, there is provided an audio signal processing system, as shown in fig. 1, including:
the audio signal recording device 110 is configured to record an audio signal to be processed, and process the audio signal to be processed to the audio signal feedback device.
An audio signal feedback device 120, configured to receive the audio signal to be processed recorded by the audio signal recording device 110; and processing the audio signal to be processed and the background audio signal corresponding to the audio signal to be processed based on the pre-trained audio purification model to obtain an initial audio signal. And performing audio attribute processing on the initial audio signal based on a preset audio attribute processing strategy to obtain a processed target audio signal. And carrying out audio mixing processing on the target audio signal and the background audio signal to obtain a mixed audio signal, and outputting the mixed audio signal. The output mixed audio signal is provided for a target object, and the positive feedback process of the whole audio signal is realized.
In an exemplary embodiment, an audio signal feedback apparatus includes an equalizer unit, a reverberation unit, and a dynamic compression unit;
the equalizer unit is used for performing tone characteristic processing and sound field characteristic processing on the initial audio signal to obtain a processed third audio signal;
The reverberation unit is used for carrying out reverberation processing on the third audio signal to obtain a fourth audio signal after the reverberation processing;
and the dynamic compression unit is used for carrying out dynamic compression processing on the target amplitude range of the audio signal on the fourth audio signal to obtain a target audio signal.
In implementation, the specific processing procedure of the equalizer unit for performing tone characteristic processing and sound field characteristic processing on the initial audio signal is the same as the above-mentioned processing procedure in step S602, the reverberation unit for performing reverberation processing on the third audio signal to obtain the fourth audio signal, and the dynamic compression unit for performing dynamic compression processing on the fourth audio signal is the same as the above-mentioned processing procedure in step S604, so that the embodiments of the present disclosure will not be repeated here.
In an exemplary embodiment, the audio signal feedback apparatus further includes a wireless transmission unit;
the wireless transmission unit is configured to construct a wireless transmission channel with the audio signal recording device 110, and receive the audio signal to be processed transmitted by the audio signal recording device 110 through the wireless transmission channel.
In implementation, the audio signal feedback device further comprises a wireless transmission unit, and the wireless transmission unit can receive the audio signal to be processed transmitted by the audio signal recording device through a wireless transmission channel, so that communication connection between the audio signal feedback device and the wireless transmission unit is realized. The wireless transmission channel may be, but not limited to, an offline communication channel of bluetooth communication, or a network communication channel of wireless network transmission, which is not limited in the embodiments of the present disclosure.
It should be understood that, although the steps in the flowcharts of fig. 2, 3, and 5-7 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps of fig. 2, 3, 5-7 may include multiple steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the steps or stages are performed necessarily occur sequentially, but may be performed alternately or alternately with other steps or at least a portion of the steps or stages in other steps.
It should be understood that the same/similar parts of the embodiments of the method described above in this specification may be referred to each other, and each embodiment focuses on differences from other embodiments, and references to descriptions of other method embodiments are only needed.
Fig. 8 is a block diagram of an audio signal processing apparatus according to an exemplary embodiment. Referring to fig. 8, the apparatus includes a receiving unit 802, a first processing unit 804, a second processing unit 806, and a mixing unit 808.
The receiving unit 802 is configured to execute receiving an audio signal to be processed, which includes a sound signal of a target object and a howling signal, which are recorded by an audio signal recording apparatus.
The first processing unit 804 is configured to perform processing on an audio signal to be processed and a background audio signal corresponding to the audio signal to be processed based on a pre-trained audio purification model, so as to obtain an initial audio signal;
the second processing unit 806 is configured to perform audio attribute processing on the initial audio signal based on a preset audio attribute processing policy, so as to obtain a processed target audio signal.
The mixing unit 808 is configured to perform mixing processing of the target audio signal with the background audio signal, obtain a mixed audio signal, and output the mixed audio signal.
In an exemplary embodiment, the apparatus 800 further comprises:
a determination unit configured to perform determination of a background audio signal corresponding to the audio signal to be processed from the audio signal to be processed;
and an acquisition unit configured to perform audio signal identification according to the background audio signal, and acquire audio data of the background audio signal from the audio storage device.
In an exemplary embodiment, the audio purification model includes a convolutional layer, a gating loop unit, a fully-connected layer, and an active layer, and the first processing unit 804 includes:
the preprocessing subunit is configured to perform preprocessing on the audio signal to be processed and the background audio signal corresponding to the audio signal to be processed to obtain the audio feature to be processed corresponding to the audio signal to be processed and the background audio feature corresponding to the background audio signal;
the processing subunit is configured to perform simultaneous input of the audio feature to be processed and the background audio feature into the audio purification model, and perform phase relation comparison and audio feature processing on the audio feature to be processed and the background audio feature sequentially through a convolution layer, a gate control circulation unit, a full connection layer and an activation layer in the audio purification model, and output an initial audio signal after audio purification of the audio signal to be processed.
In an exemplary embodiment, the preprocessing subunit is specifically configured to perform short-time fourier transform processing on the audio signal to be processed and a background audio signal corresponding to the audio signal to be processed, so as to obtain a first converted signal and a second converted signal on a frequency domain after processing;
sampling the first conversion signal and the second conversion signal on the frequency domain according to a preset sampling strategy to respectively obtain a plurality of sampling point data corresponding to the first conversion signal and a plurality of sampling point data corresponding to the second conversion signal;
And respectively compressing and banding the data of the plurality of sampling points corresponding to the first conversion signal and the plurality of sampling points corresponding to the second conversion signal to obtain the audio characteristics to be processed corresponding to the audio signals to be processed and the background audio characteristics corresponding to the background audio signals.
In an exemplary embodiment, the second processing unit 806 includes:
a feature processing subunit configured to perform tone feature processing and sound field feature processing on the initial audio signal, to obtain a processed third audio signal;
the reverberation processing subunit is configured to perform reverberation processing on the third audio signal to obtain a fourth audio signal after the reverberation processing, and perform dynamic compression processing on the fourth audio signal in a target amplitude range of the audio signal to obtain a target audio signal.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
Fig. 9 is a block diagram illustrating an electronic device 900 for audio signal processing according to an example embodiment. For example, electronic device 900 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, and the like.
Referring to fig. 9, an electronic device 900 may include one or more of the following components: a processing component 902, a memory 904, a power component 906, a multimedia component 908, an audio component 910, an input/output (I/O) interface 912, a sensor component 914, and a communication component 916.
The processing component 902 generally controls overall operation of the electronic device 900, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 902 may include one or more processors 920 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 902 can include one or more modules that facilitate interaction between the processing component 902 and other components. For example, the processing component 902 can include a multimedia module to facilitate interaction between the multimedia component 908 and the processing component 902.
The memory 904 is configured to store various types of data to support operations at the electronic device 900. Examples of such data include instructions for any application or method operating on the electronic device 900, contact data, phonebook data, messages, pictures, video, and so forth. The memory 904 may be implemented by any type of volatile or nonvolatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, optical disk, or graphene memory.
The power supply component 906 provides power to the various components of the electronic device 900. Power supply components 906 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for electronic device 900.
The multimedia component 908 comprises a screen between the electronic device 900 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 908 includes a front-facing camera and/or a rear-facing camera. When the electronic device 900 is in an operational mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
The audio component 910 is configured to output and/or input audio signals. For example, the audio component 910 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 900 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 904 or transmitted via the communication component 916. In some embodiments, the audio component 910 further includes a speaker for outputting audio signals.
The I/O interface 912 provides an interface between the processing component 902 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.
The sensor assembly 914 includes one or more sensors for providing status assessment of various aspects of the electronic device 900. For example, the sensor assembly 914 may detect an on/off state of the electronic device 900, a relative positioning of the components, such as a display and keypad of the electronic device 900, the sensor assembly 914 may also detect a change in position of the electronic device 900 or a component of the electronic device 900, the presence or absence of a user's contact with the electronic device 900, an orientation or acceleration/deceleration of the device 900, and a change in temperature of the electronic device 900. The sensor assembly 914 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 914 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 914 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 916 is configured to facilitate communication between the electronic device 900 and other devices, either wired or wireless. The electronic device 900 may access a wireless network based on a communication standard, such as WiFi, an operator network (e.g., 2G, 3G, 4G, or 5G), or a combination thereof. In one exemplary embodiment, the communication component 916 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 916 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 900 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.
In an exemplary embodiment, a computer-readable storage medium is also provided, such as a memory 904 including instructions executable by the processor 920 of the electronic device 900 to perform the above-described method. For example, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
In an exemplary embodiment, a computer program product is also provided, comprising instructions executable by the processor 920 of the electronic device 900 to perform the above-described method.
It should be noted that the descriptions of the foregoing apparatus, the electronic device, the computer readable storage medium, the computer program product, and the like according to the method embodiments may further include other implementations, and the specific implementation may refer to the descriptions of the related method embodiments and are not described herein in detail.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (11)

1. An audio signal processing method, characterized in that the method is applied to an audio signal feedback device, the method comprising:
receiving an audio signal to be processed recorded by audio signal recording equipment, wherein the audio signal to be processed at least comprises a sound signal of a target object;
processing the audio signal to be processed and a background audio signal corresponding to the audio signal to be processed based on a pre-trained audio purification model to obtain an initial audio signal;
performing audio attribute processing on the initial audio signal based on a preset audio attribute processing strategy to obtain a processed target audio signal;
and carrying out audio mixing processing on the target audio signal and the background audio signal to obtain a mixed audio signal, and outputting the mixed audio signal.
2. The audio signal processing method according to claim 1, wherein before the audio signal to be processed and the background audio signal corresponding to the audio signal to be processed are processed based on the pre-trained audio purification model, the method further comprises:
determining a background audio signal corresponding to the audio signal to be processed according to the audio signal to be processed;
And acquiring the audio data of the background audio signal from the audio storage equipment according to the audio signal identification of the background audio signal.
3. The audio signal processing method according to claim 1, wherein the audio purification model includes a convolution layer, a gate control loop unit, a full connection layer and an activation layer, the processing the audio signal to be processed and the background audio signal corresponding to the audio signal to be processed based on the pre-trained audio purification model to obtain an initial audio signal includes:
respectively preprocessing the audio signal to be processed and a background audio signal corresponding to the audio signal to be processed to obtain an audio feature to be processed corresponding to the audio signal to be processed and a background audio feature corresponding to the background audio signal;
and simultaneously inputting the audio characteristics to be processed and the background audio characteristics into the audio purification model, and sequentially carrying out correlation comparison and audio characteristic processing on the audio characteristics to be processed and the background audio characteristics through a convolution layer, a gate control circulation unit, a full connection layer and an activation layer in the audio purification model to output an initial audio signal after audio purification of the audio signals to be processed.
4. The audio signal processing method according to claim 3, wherein the preprocessing the audio signal to be processed and the background audio signal corresponding to the audio signal to be processed to obtain the audio feature to be processed corresponding to the audio signal to be processed and the background audio feature corresponding to the background audio signal, includes:
performing short-time Fourier transform processing on the audio signal to be processed and a background audio signal corresponding to the audio signal to be processed to obtain a first conversion signal and a second conversion signal on a processed frequency domain;
sampling the first conversion signal and the second conversion signal on the frequency domain according to a preset sampling strategy to respectively obtain a plurality of sampling point data corresponding to the first conversion signal and a plurality of sampling point data corresponding to the second conversion signal;
and respectively compressing and banding the data of the plurality of sampling points corresponding to the first conversion signal and the plurality of sampling points corresponding to the second conversion signal to obtain the audio feature to be processed corresponding to the audio signal to be processed and the background audio feature corresponding to the background audio signal.
5. The audio signal processing method according to claim 1, wherein the performing audio attribute processing on the initial audio signal based on a preset audio attribute processing policy to obtain a processed target audio signal includes:
Performing tone characteristic processing and sound field characteristic processing on the initial audio signal to obtain a processed third audio signal;
and carrying out reverberation processing on the third audio signal to obtain a fourth audio signal after the reverberation processing, and carrying out dynamic compression processing on the fourth audio signal within a target amplitude range of the audio signal to obtain a target audio signal.
6. An audio signal processing system, the system comprising:
the audio signal recording device is used for recording an audio signal to be processed and processing the audio signal to be processed to the audio signal feedback device, wherein the audio signal to be processed comprises a sound signal and a howling signal of a target object;
the audio signal feedback device is used for receiving the audio signal to be processed recorded by the audio signal recording device, wherein the audio signal to be processed comprises a sound signal and a howling signal of a target object; processing the audio signal to be processed and a background audio signal corresponding to the audio signal to be processed based on a pre-trained audio purification model to obtain an initial audio signal;
performing audio attribute processing on the initial audio signal based on a preset audio attribute processing strategy to obtain a processed target audio signal; and carrying out audio mixing processing on the target audio signal and the background audio signal to obtain a mixed audio signal, and outputting the mixed audio signal.
7. The audio signal processing system of claim 6, wherein the audio signal feedback device comprises an equalizer unit, a reverberation unit, and a dynamic compression unit;
the equalizer unit is used for performing tone characteristic processing and sound field characteristic processing on the initial audio signal to obtain a processed third audio signal;
the reverberation unit is used for carrying out reverberation processing on the third audio signal to obtain a fourth audio signal after the reverberation processing;
the dynamic compression unit is used for carrying out dynamic compression processing on the fourth audio signal within the audio signal target amplitude range to obtain a target audio signal.
8. The audio signal processing system of claim 6, wherein the audio signal feedback device further comprises a wireless transmission unit;
the wireless transmission unit is used for constructing a wireless transmission channel with the audio signal recording equipment and receiving the audio signal to be processed transmitted by the audio signal recording equipment through the wireless transmission channel.
9. An audio signal processing apparatus, the apparatus comprising:
a receiving unit configured to execute receiving an audio signal to be processed recorded by an audio signal recording apparatus, the audio signal to be processed including a sound signal and a howling signal of a target object;
The first processing unit is configured to execute processing on the audio signal to be processed and a background audio signal corresponding to the audio signal to be processed based on a pre-trained audio purification model to obtain an initial audio signal;
the second processing unit is configured to execute an audio attribute processing strategy based on a preset audio attribute, and perform audio attribute processing on the initial audio signal to obtain a processed target audio signal;
and the audio mixing unit is configured to perform audio mixing processing on the target audio signal and the background audio signal to obtain a mixed audio signal and output the mixed audio signal.
10. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the audio signal processing method of any of claims 1 to 5.
11. A computer readable storage medium, characterized in that instructions in the computer readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the audio signal processing method of any one of claims 1 to 5.
CN202310098138.9A 2023-01-30 2023-01-30 Audio signal processing method, device, electronic equipment and storage medium Pending CN116312589A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310098138.9A CN116312589A (en) 2023-01-30 2023-01-30 Audio signal processing method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310098138.9A CN116312589A (en) 2023-01-30 2023-01-30 Audio signal processing method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116312589A true CN116312589A (en) 2023-06-23

Family

ID=86784202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310098138.9A Pending CN116312589A (en) 2023-01-30 2023-01-30 Audio signal processing method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116312589A (en)

Similar Documents

Publication Publication Date Title
JP6964666B2 (en) Multi-beam selection method and equipment
CN109360549B (en) Data processing method, wearable device and device for data processing
CN111179960B (en) Audio signal processing method and device and storage medium
CN108845787B (en) Audio adjusting method, device, terminal and storage medium
CN111883164B (en) Model training method and device, electronic equipment and storage medium
CN112185388B (en) Speech recognition method, device, equipment and computer readable storage medium
EP4050601B1 (en) Method and apparatus for audio processing, terminal and storage medium
CN113113044B (en) Audio processing method and device, terminal and storage medium
CN111583958B (en) Audio signal processing method, device, electronic equipment and storage medium
CN111698593B (en) Active noise reduction method and device, and terminal
CN105244037B (en) Audio signal processing method and device
CN116312589A (en) Audio signal processing method, device, electronic equipment and storage medium
CN116229998A (en) Audio signal processing method, device, electronic equipment and storage medium
CN111667842A (en) Audio signal processing method and device
CN113938557A (en) Intelligent terminal self-adaption method, device and medium
CN111429934B (en) Audio signal processing method and device and storage medium
CN113825081B (en) Hearing aid method and device based on masking treatment system
CN112738341B (en) Call data processing method and earphone device
CN113113036B (en) Audio signal processing method and device, terminal and storage medium
CN106131346B (en) Call processing method and device
CN117877507A (en) Speech signal enhancement method, device, electronic equipment and storage medium
CN117751585A (en) Control method and device of intelligent earphone, electronic equipment and storage medium
CN117636893A (en) Wind noise detection method and device, wearable equipment and readable storage medium
CN117501363A (en) Sound effect control method, device and storage medium
CN115065921A (en) Method and device for preventing hearing aid from howling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination