CN109887515A - Audio-frequency processing method and device, electronic equipment and storage medium - Google Patents
Audio-frequency processing method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN109887515A CN109887515A CN201910086763.5A CN201910086763A CN109887515A CN 109887515 A CN109887515 A CN 109887515A CN 201910086763 A CN201910086763 A CN 201910086763A CN 109887515 A CN109887515 A CN 109887515A
- Authority
- CN
- China
- Prior art keywords
- audio
- completion
- spectral image
- corrupted
- carried out
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
This disclosure relates to a kind of audio-frequency processing method and device, electronic equipment and storage medium, which comprises carry out frequency spectrum conversion to corrupted audio to be processed, obtain the first spectral image of the corrupted audio;Frequency spectrum completion is carried out to first spectral image, obtains the second spectral image of completion;Completion is carried out to the corrupted audio according to second spectral image, the first audio after obtaining completion allows the first audio after completion that good auditory effect is presented.
Description
Technical field
This disclosure relates to signal processing technology field more particularly to a kind of audio-frequency processing method and device, electronic equipment and
Storage medium.
Background technique
Audio completion refers to when one section in audio is because of noise jamming or surprisingly leads to missing, regenerates missing
Partial audio and by its natural completion.This technology has more application in terms of audio-frequency information reparation and noise reduction.The relevant technologies
Rely primarily on traditional audio-frequency processing method, using sparse audio representation method, find and deletion fragment around known segment
Similar part is filled.
Summary of the invention
The present disclosure proposes a kind of audio signal processing technique schemes.
According to the one side of the disclosure, a kind of audio-frequency processing method is provided, comprising: carry out to corrupted audio to be processed
Frequency spectrum conversion, obtains the first spectral image of the corrupted audio;Frequency spectrum completion is carried out to first spectral image, is mended
The second full spectral image;Completion is carried out to the corrupted audio according to second spectral image, first after obtaining completion
Audio.
In one possible implementation, frequency spectrum completion is carried out to first spectral image, obtains the second of completion
Spectral image, comprising: feature extraction is carried out to first spectral image, obtains the first spectrum signature;To first frequency spectrum
Feature carries out frequency spectrum reconfiguration, obtains second spectral image.
In one possible implementation, frequency spectrum completion is carried out to first spectral image, obtains the second of completion
Spectral image, comprising: feature extraction is carried out to first spectral image, obtains the second spectrum signature;To the corrupted audio
Relevant information carry out feature extraction, obtain supervision feature;It is aligned second spectrum signature and the supervision feature;According to right
Supervision feature after neat carries out frequency spectrum reconfiguration to first spectrum signature, obtains second spectral image, wherein the phase
Closing information includes at least one of video information corresponding with the corrupted audio and Optic flow information.
In one possible implementation, the corrupted audio includes corrupted audio segment;It is described according to described second
Spectral image carries out completion, the first audio after obtaining completion to the corrupted audio, comprising: in the second spectral image with by
The corresponding spectral image of damage audio fragment carries out spectrum frequency and converts, and obtains completion audio fragment;Using completion audio fragment to impaired
Audio carries out completion, the first audio after obtaining completion.
In one possible implementation, the corrupted audio includes corrupted audio segment and undamaged audio fragment;
It is described that completion is carried out to the corrupted audio according to second spectral image, the first audio after obtaining completion, comprising: according to
Spectral image corresponding with corrupted audio segment and undamaged audio fragment in second spectral image, predict the completion audio piece
Section;Completion is carried out to corrupted audio using completion audio fragment, the first audio after obtaining completion.
In one possible implementation, described that the corrupted audio is mended according to second spectral image
Entirely, the operation of the first audio after obtaining completion, is realized by WaveNet decoding network.
In one possible implementation, first spectral image and the second spectral image include Meier spectral image
Or mel cepstrum image.
According to the one side of the disclosure, a kind of apparatus for processing audio is provided, comprising: frequency spectrum conversion module, for treating
The corrupted audio of processing carries out frequency spectrum conversion, obtains the first spectral image of the corrupted audio;Frequency spectrum completion module, for pair
First spectral image carries out frequency spectrum completion, obtains the second spectral image of completion;Audio completion module, for according to
Second spectral image carries out completion to the corrupted audio, the first audio after obtaining completion.
In one possible implementation, the frequency spectrum completion module includes: fisrt feature extracting sub-module, for pair
First spectral image carries out feature extraction, obtains the first spectrum signature;First frequency spectrum reconfiguration submodule, for described the
One spectrum signature carries out frequency spectrum reconfiguration, obtains second spectral image.
In one possible implementation, the frequency spectrum completion module includes: second feature extracting sub-module, for pair
First spectral image carries out feature extraction, obtains the second spectrum signature;Third feature extracting sub-module, for it is described by
The relevant information for damaging audio carries out feature extraction, obtains supervision feature;It is aligned submodule, for being aligned second spectrum signature
With the supervision feature;Second frequency spectrum reconfiguration submodule, for according to the supervision feature after alignment to first spectrum signature
Frequency spectrum reconfiguration is carried out, obtains second spectral image, wherein the relevant information includes view corresponding with the corrupted audio
At least one of frequency information and Optic flow information.
In one possible implementation, the corrupted audio includes corrupted audio segment;The audio completion module
It include: the first spectrum frequency transform subblock, for being carried out to spectral image corresponding with corrupted audio segment in the second spectral image
Spectrum frequency is converted, and completion audio fragment is obtained;First audio completion submodule, for using completion audio fragment to corrupted audio into
Row completion, the first audio after obtaining completion.
In one possible implementation, the corrupted audio includes corrupted audio segment and undamaged audio fragment;
The audio completion module includes: prediction submodule, for according to frequency corresponding with corrupted audio segment in the second spectral image
Spectrogram picture and undamaged audio fragment predict the completion audio fragment;Second audio completion submodule, for utilizing completion sound
Frequency segment carries out completion to corrupted audio, the first audio after obtaining completion.
In one possible implementation, the audio completion module is realized by WaveNet decoding network.
In one possible implementation, first spectral image and the second spectral image include Meier spectral image
Or mel cepstrum image.
According to the one side of the disclosure, a kind of electronic equipment is provided, comprising: processor;It can be held for storage processor
The memory of row instruction;Wherein, the processor is configured to: execute above-mentioned audio-frequency processing method method.
According to the one side of the disclosure, a kind of computer readable storage medium is provided, computer program is stored thereon with
Instruction, the computer program instructions realize above-mentioned audio-frequency processing method method when being executed by processor.
In the embodiments of the present disclosure, by carrying out frequency spectrum conversion to corrupted audio to be processed, the corrupted audio is obtained
The first spectral image;Frequency spectrum completion is carried out to first spectral image, obtains the second spectral image of completion;According to described
Second spectral image carries out completion to the corrupted audio, and the first audio after obtaining completion converts the problem of audio completion
The problem of for frequency spectrum completion, to reduce the excessive dependence to audio-frequency information.By audio-frequency processing method can with completion for example by
To noise jamming, have an explosion sound quality segment or localized distortion that Partial Fragment is erased etc. corrupted audio so that completion
Good auditory effect can be presented in audio afterwards.
It should be understood that above general description and following detailed description is only exemplary and explanatory, rather than
Limit the disclosure.
According to below with reference to the accompanying drawings to detailed description of illustrative embodiments, the other feature and aspect of the disclosure will become
It is clear.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and those figures show meet this public affairs
The embodiment opened, and together with specification it is used to illustrate the technical solution of the disclosure.
Fig. 1 shows a kind of flow chart of audio-frequency processing method according to the embodiment of the present disclosure.
Fig. 2 shows the signals according to a kind of the audio-frequency processing method sound intermediate frequency information and spectral image of the embodiment of the present disclosure
Figure.
Fig. 3 shows the structural representation of the neural network according to employed in a kind of audio-frequency processing method of the embodiment of the present disclosure
Figure.
Fig. 4 shows the structural representation of the neural network according to employed in a kind of audio-frequency processing method of the embodiment of the present disclosure
Figure.
Fig. 5 shows the block diagram of the apparatus for processing audio according to the embodiment of the present disclosure.
Fig. 6 is the block diagram of a kind of electronic equipment shown accoding to exemplary embodiment.
Fig. 7 is the block diagram of a kind of electronic equipment shown accoding to exemplary embodiment.
Specific embodiment
Various exemplary embodiments, feature and the aspect of the disclosure are described in detail below with reference to attached drawing.It is identical in attached drawing
Appended drawing reference indicate element functionally identical or similar.Although the various aspects of embodiment are shown in the attached drawings, remove
It non-specifically points out, it is not necessary to attached drawing drawn to scale.
Dedicated word " exemplary " means " being used as example, embodiment or illustrative " herein.Here as " exemplary "
Illustrated any embodiment should not necessarily be construed as preferred or advantageous over other embodiments.
The terms "and/or", only a kind of incidence relation for describing affiliated partner, indicates that there may be three kinds of passes
System, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.In addition, herein
Middle term "at least one" indicate a variety of in any one or more at least two any combination, it may for example comprise A,
B, at least one of C can indicate to include any one or more elements selected from the set that A, B and C are constituted.
In addition, giving numerous details in specific embodiment below in order to which the disclosure is better described.
It will be appreciated by those skilled in the art that without certain details, the disclosure equally be can be implemented.In some instances, for
Method, means, element and circuit well known to those skilled in the art are not described in detail, in order to highlight the purport of the disclosure.
Fig. 1 shows the flow chart of the audio-frequency processing method according to the embodiment of the present disclosure.The audio-frequency processing method can be by end
End equipment or other processing equipments execute, wherein terminal device can be user equipment (User Equipment, UE), movement
Equipment, user terminal, terminal, cellular phone, wireless phone, personal digital assistant (Personal Digital Assistant,
PDA), handheld device, calculating equipment, mobile unit, wearable device etc..Other processing equipments can be server or cloud service
Device etc..In some possible implementations, which can call the meter stored in memory by processor
The mode of calculation machine readable instruction is realized.
As shown in Figure 1, which comprises
Step S11 carries out frequency spectrum conversion to corrupted audio to be processed, obtains the first spectrogram of the corrupted audio
Picture.
In one possible implementation, corrupted audio to be processed can be the audio-frequency information in a first full songs
It is generated on the basis of (the complete audio of i.e. no any damage);For example, it may be during transmitting audio-frequency information
What the phenomenon that loss of learning occur generated, caused by being also possible to the virus carried in audio-frequency information, it can also be editor's audio
It is surprisingly deleted caused by partial information during information.
In the implementation of the disclosure, the audio frame number of corrupted audio to be processed will not be defined, it can be with
All audio frequency frame (such as 1000 frames) in audio-frequency information including a first full songs, as an example, in corrupted audio by
Damage audio fragment can be the audio frame positioned at 8-10 frame, can also only include the part in the audio-frequency information of the full songs
Audio frame (such as 10 frames), as an example, the corrupted audio segment in corrupted audio can be the audio frame positioned at the 7th frame.
In one possible implementation, the corrupted audio to be processed can pass through the audio signal of arbitrary format
It indicates.As an example, as shown in Fig. 2, corrupted audio can be indicated by sonograph 12, and for indicating the sound spectrum of intact audio
Figure 11 is compared, and has more apparent absent region (the rectangle blank parts in sonograph 12), the missing area in sonograph 12
Domain can indicate the damaged segment of corrupted audio to be processed.
In one possible implementation, audio signal can indicate audio-frequency information, spectral image under time domain space
(signal) can indicate audio-frequency information under domain space, therefore, in this implementation can by the first spectral image and
Different expression form of the corrupted audio as identical information.
In one possible implementation, the first spectral image and the second spectral image include Meier spectral image or plum
That cepstrum image.
Step S12 carries out frequency spectrum completion to first spectral image, obtains the second spectral image of completion.
Wherein, include at least one absent region in the first spectral image, that is, need the region of completion, which uses
Deletion fragment in expression corrupted audio.As an example, may include multiple continuous absent regions in the first spectral image,
It also may include the absent region at multiple intervals, in one possible implementation, absent region in the first spectral image
Area it is bigger, the damaged segment in corrupted audio is more.Smaller, the damaged segment in corrupted audio of the area of absent region
It is smaller.
In one possible implementation, for frequency spectrum completion, it can use absent region week in the first spectral image
The pixel enclosed fills the absent region of the first spectral image, achievees the effect that frequency spectrum completion;Also it can use deep learning skill
Art, according to the pixel point prediction (or association) in other regions in the first spectral image in addition to absent region in absent region
Each pixel, achieve the effect that frequency spectrum completion.
In one possible implementation, between the second spectral image of the completion and the spectral image of intact audio
Gap it is smaller.Frequency spectrum completion operation is used for completion for the lack part in the first spectral image, so that the second frequency after completion
Spectrogram picture is identical as intact spectral image approach.
As an example, as shown in Fig. 2, spectral image 13 is the first spectral image for needing completion, face in spectral image 13
The deeper rectangular area of color is the absent region of the first spectral image, and spectral image 14 is the second spectral image after completion.
Step S13 carries out completion to the corrupted audio according to second spectral image, the first sound after obtaining completion
Frequently.
Wherein, the gap between the first audio and intact audio after completion is smaller.Completion operation for completion by
The deletion fragment in audio is damaged, so that the first audio after completion is identical as intact audio approach.
It wherein, include at least one corrupted audio segment in corrupted audio, corrupted audio segment is to need the audio of completion
Segment.As an example, may include multiple continuous corrupted audio segments in corrupted audio, also may include multiple intervals by
Audio fragment is damaged, the quantity of corrupted audio segment is more in corrupted audio, and the extent of damage is more serious, the quantity of corrupted audio segment
Fewer, the extent of damage is slighter.
It in one possible implementation, can be by frequency corresponding with corrupted audio segment in the second spectral image
Spectrogram picture carries out spectrum frequency and converts, and obtains completion audio fragment, carries out completion to corrupted audio using completion audio fragment, reaches sound
The effect of frequency completion;Also it can use depth learning technology, using the second spectral image as learning objective, and according to corrupted audio
In other audio fragments prediction (or association) damaged segment in addition to deletion fragment information, achieve the effect that audio completion.
In one possible implementation, completion is carried out to the corrupted audio according to the second spectral image, is mended
The operation of the first audio after complete, is realized by WaveNet decoding network.For example, can be by WaveNet decoding network
Convolutional layer carries out spectrum frequency to spectral image corresponding with corrupted audio segment in the second spectral image and converts, and utilizes completion audio
Segment carries out completion to corrupted audio, achievees the effect that audio completion;In another example can use in WaveNet decoding network
Band hole cause and effect convolutional layer (dialated causal convolutions), the letter of corrupted audio segment in predictive of impaired audio
Breath, achievees the effect that audio completion.
In embodiment of the disclosure, by carrying out frequency spectrum conversion to corrupted audio to be processed, the impaired sound is obtained
First spectral image of frequency;Frequency spectrum completion is carried out to first spectral image, obtains the second spectral image of completion;According to institute
It states the second spectral image and completion is carried out to the corrupted audio, the first audio after obtaining completion will turn the problem of audio completion
The problem of turning to frequency spectrum completion, to reduce the excessive dependence to audio-frequency information.It can be with completion for example by audio-frequency processing method
By noise jamming, have an explosion sound quality segment or localized distortion that Partial Fragment is erased etc. corrupted audio so that mending
Good auditory effect can be presented in audio after complete.
In one possible implementation, step S12 carries out frequency spectrum completion to first spectral image, is mended
The second full spectral image, comprising: feature extraction is carried out to first spectral image, obtains the first spectrum signature;To described
First spectrum signature carries out frequency spectrum reconfiguration, obtains second spectral image.
Wherein, frequency spectrum reconfiguration is it is to be understood that utilize the pixel filling first around the first spectral image absent region
The absent region of spectral image achievees the effect that frequency spectrum completion;It also will be understood that for according in the first spectral image except missing area
The pixel point prediction (or association) in other regions except domain reaches the effect of frequency spectrum completion to each pixel in absent region
Fruit.
In one possible implementation, convolutional neural networks be can use, feature extraction is carried out to the first spectral image
With the operation of frequency spectrum reconfiguration.Wherein, convolutional neural networks may include at least one convolutional layer, wherein convolutional layer is used for the
One spectral image carries out process of convolution, extracts the feature of the first spectral image.
In one possible implementation, for feature extraction, can use Short Time Fourier Transform (STFT,
Short-time Fourier transform) and the feature extracting methods such as Meier filter feature is carried out to the first spectral image
It extracts, to obtain the first spectrum signature.
It in one possible implementation, can will be by when the quantity of corrupted audio segment is excessive in corrupted audio
Damaging audio segmentation to be only includes the corrupted audio of small number of audio fragment, and then the corrupted audio difference frequency that segmentation is obtained
Spectrum is converted to the first spectral image, and carries out feature extraction and frequency spectrum reconfiguration to each first spectral image respectively, is corresponded to
In the second spectral image of each corrupted audio.
In one possible implementation, step S12 carries out frequency spectrum completion to first spectral image, is mended
The second full spectral image, comprising: feature extraction is carried out to first spectral image, obtains the second spectrum signature;To described
The relevant information of corrupted audio carries out feature extraction, obtains supervision feature;It is aligned second spectrum signature and the supervision is special
Sign;Frequency spectrum reconfiguration is carried out to first spectrum signature according to the supervision feature after alignment, obtains second spectral image,
In, the relevant information includes at least one of video information corresponding with the corrupted audio and Optic flow information.
In this implementation, the corrupted audio can be the video from one section with audio-frequency information, wherein each
Frame (section) video frame all has corresponding audio fragment, and video frame and the content of audio fragment match, therefore, this reality
Existing mode can use the video information completion corrupted audio being naturally aligned with corrupted audio.
As an example, corrupted audio can be the recorded video played from one section of violoncello, when the video in video
Performing artist's performance amplitude is larger in frame, the audio piece when relative distance between string and body is larger, corresponding to the video frame
Loudness expressed by section is larger, the audio fragment when performance frequency is very fast in the video frame in video, corresponding to the video frame
Expressed rhythm is more rapid, opposite, when performing artist plays the relative distance that amplitude is smaller, between string and body in video frame
When smaller, loudness expressed by the audio frame corresponding to the video frame is smaller, right when performance frequency is slower in a certain section of video
Should the rhythm expressed by the audio fragment of this section it is slower.
In one possible implementation, the corresponding video information of the corrupted audio and Optic flow information be it is complete,
It is not affected by the information of noise jamming.
Wherein, video information includes each video frame corresponding to audio-frequency information, and Optic flow information is for indicating video information
Image sequence in pixel in the variation in time-domain, the correlation between adjacent video frames image and adjacent video frames figure
The information that object relatively moves as in.In this implementation, video information and Optic flow information all can serve as to be damaged
The reference of audio, so that the first audio after completion is more complete.
In one possible implementation, for feature extraction, can use Short Time Fourier Transform (STFT,
Short-time Fourier transform) and the feature extracting methods such as Meier filter to the relevant information of corrupted audio into
Row feature extraction, to obtain the supervision feature of corrupted audio.
In one possible implementation, the supervision feature can be from the corresponding video information of corrupted audio and/
Or the feature extracted in Optic flow information, for example, it may be the edge feature of video information and/or Optic flow information, texture are special
It seeks peace style and features etc..
In one possible implementation, it is related to corrupted audio to corrupted audio to can use depth learning technology
Information extraction operations;For example, can use the convolutional layer letter related to corrupted audio to corrupted audio in convolutional neural networks
Breath carries out process of convolution (feature extraction), to extract the second spectrum signature and supervision feature.
In one possible implementation, it is aligned second spectrum signature and the supervision feature, for making second
The distance of spectrum signature and the supervision feature between the two reduces as far as possible, makes the second spectrum signature and the supervision feature can
To be in the same space.
In one possible implementation, fusion can be passed through by being aligned second spectrum signature and the supervision feature
(such as splicing) the second spectrum signature and the supervision feature are completed;In this implementation, the width of the second spectrum signature can
Identical as supervision feature, whether the height of the two is identical with no restrictions, and the second spectrum signature corresponding in this way and supervision feature can
To splice in the width direction.Alternatively, the height of the second spectrum signature can be identical as supervision feature, whether the width of the two is identical not
It is limited, in this way, corresponding second spectrum signature and supervision feature can splice along short transverse.
For example, the dimension of the second spectrum signature is 1 × 4 × 1, and the dimension for supervising feature is 1 × 4 × 1, is melted in feature
During conjunction, can along height to corresponding second spectrum signature and supervision feature splice, obtain dimension be (1+1) ×
4 × 1 feature.
In one possible implementation, deep learning skill can be passed through to the frequency spectrum reconfiguration operation of the first spectrum signature
Art is completed.During frequency spectrum reconfiguration, relevant information (such as video information and the streamer of corrupted audio and corrupted audio can use
Information) self-supervision that is naturally aligned, the pixel according to other regions in the first spectral image in addition to absent region is pre-
(or association) is surveyed to each pixel is reconstructed, to achieve the effect that frequency spectrum reconfiguration, makes the second frequency spectrum obtained after frequency spectrum reconfiguration
Image can better completion corrupted audio.
It, can (such as video corresponding with the corrupted audio be believed by the relevant information of corrupted audio in this implementation
At least one of breath and Optic flow information) it is used as supervision message, the first sound to instruct completion corrupted audio, after improving completion
The integrity degree of frequency optimizes the presentation effect of the first audio.
In one possible implementation, the corrupted audio includes corrupted audio segment;Step 13, described according to institute
It states the second spectral image and completion is carried out to the corrupted audio, the first audio after obtaining completion, comprising: to the second spectral image
In spectral image corresponding with corrupted audio segment carry out spectrum frequency convert, obtain completion audio fragment;Utilize completion audio fragment
Completion is carried out to corrupted audio, the first audio after obtaining completion.
Wherein, corrupted audio can be made of corrupted audio segment and undamaged audio fragment, be wrapped in the second spectral image
Include image-region corresponding with each segment of corrupted audio (including corrupted audio segment and undamaged audio fragment) respectively.It mends
Full acoustic frequency segment is substituted for the corrupted audio segment in corrupted audio.Spectrum frequency conversion, which can be, is converted to spectral image
The process of audio, it can be understood as the inverse process that frequency spectrum is converted in step 11.
It, can be by the frequency spectrum in image-region corresponding with corrupted audio segment in the second spectral image in this implementation
Image is converted into completion audio fragment, and using the corrupted audio segment in completion audio fragment replacement corrupted audio, to reach
To the effect of audio completion.
As an example, corrupted audio is made of 5 audio fragments, wherein the 1st, 2,3,4 segment is undamaged audio piece
Section, the 5th audio fragment are corrupted audio segment.By this implementation can by the second spectral image with the 5th audio
Spectral image in the corresponding image-region of segment carries out spectrum frequency and converts, and obtains completion audio fragment, and utilize completion audio piece
The 5th audio fragment in section replacement corrupted audio, obtains being made of the 1st, 2,3,4 undamaged segment and completion audio fragment
Completion after the first audio.
In one possible implementation, the corrupted audio includes corrupted audio segment and undamaged audio fragment;
Step 13, described that completion is carried out to the corrupted audio according to second spectral image, the first audio after obtaining completion, packet
It includes: according to spectral image corresponding with corrupted audio segment in the second spectral image and undamaged audio fragment, predicting the benefit
Full acoustic frequency segment;Completion is carried out to corrupted audio using completion audio fragment, the first audio after obtaining completion.
It, can be with during determining the completion audio fragment for replacing corrupted audio segment in this implementation
It is realized using the undamaged audio fragment in corrupted audio.For example (continue in above-mentioned implementation), this reality
Existing mode can use the undamaged audio fragment positioned at the 1st, 2,3,4 segment to completion when determining completion audio fragment
The content of audio fragment is predicted (or association), and utilizes spectrogram corresponding with corrupted audio segment in the second spectral image
As guiding the generation of completion audio fragment, and then more accurate completion audio fragment is obtained, and utilization completion sound
Frequency segment replaces the 5th audio fragment in corrupted audio, obtains by the 1st, 2,3,4 undamaged segment and completion audio fragment
The first audio after the completion of composition.
In one possible implementation, step 12 and step 13 can be executed by neural network, wherein step
S12 carries out feature extraction to first spectral image, obtains the first spectrum signature;Frequency is carried out to first spectrum signature
Spectrum reconstruct, obtains the operation of second spectral image, can be by being made of the first coding network and the first decoding network
First generates network implementations;Completion is carried out to the corrupted audio according to second spectral image described in step S13, is mended
The operation of the first audio after complete, can be realized by the second decoding network.
Fig. 3 shows the structural representation of the neural network according to employed in a kind of audio-frequency processing method of the embodiment of the present disclosure
Figure.In one possible implementation, as shown in figure 3, carrying out frequency spectrum conversion to corrupted audio 201, frequency spectrum conversion knot is obtained
Fruit (the first spectral image 202);First coding network EaBy the first spectral image 202 boil down to, one feature vectorAnd it will
For indicating the feature vector of the first spectral imageIt is sent to the first decoding network Ga;First decoding network GaCan be based on should
Feature vector carries out the operation of frequency spectrum completion, and the second spectral image 203 after frequency spectrum completion is sent to the second decoding
Network 204;Second decoding network 204 then can be according to the second spectral image 203 and corrupted audio 201 of frequency spectrum completion, to impaired
Audio carries out completion, the first audio 205 after finally obtaining completion.
In one possible implementation, step 12 and step 13 can be executed by neural network;Wherein, step 12
It is described that feature extraction is carried out to first spectral image, obtain the operation of the second spectrum signature;It is special to be aligned second frequency spectrum
It seeks peace the supervision feature;Feature extraction is carried out to the relevant information of the corrupted audio after alignment, obtains supervision feature;Institute
It states and frequency spectrum reconfiguration is carried out to first spectrum signature according to the supervision feature, obtain the operation of second spectral image,
Network implementations can be generated by second be made of the first coding network, the first decoding network and the second coding network;Step
13 carry out completion to the corrupted audio according to second spectral image, and the operation of the first audio after obtaining completion can be with
It is realized by the second decoding network.
Fig. 4 shows the structural representation of the neural network according to employed in a kind of audio-frequency processing method of the embodiment of the present disclosure
Figure.In one possible implementation, as shown in figure 4, audio-frequency processing method can also be applied to, using corrupted audio and by
The self-supervision that the relevant information (such as video information and streamer information) of damage audio is naturally aligned carrys out the feelings of polishing corrupted audio
Condition.
Detailed process include: first to corrupted audio 301 carry out frequency spectrum convert the first spectral image 302, and by first frequency
Spectrogram is sent to the first coding network E as 302a;First coding network can EaFirst spectral image 302 is compressed into a feature
VectorAt the same time it can also which the video information of corrupted audio and streamer information (not shown) are sent to the second coding network Ev
Obtain supervision feature fv;Fusion is for indicating the feature vector of the first spectral imageWith supervision feature fv, and by fusion results
It is sent to the first decoding network Gav;First decoding network GavFrequency spectrum completion can be carried out based on this feature vector sum supervision feature
Operation, and the second spectral image 304 after frequency spectrum completion is sent to the second decoding network 305;Second decoding network
305 can carry out completion to corrupted audio, finally obtain according to the second spectral image 304 and corrupted audio 301 of frequency spectrum completion
The first audio 306 after completion.
In one possible implementation, the first generation network and the second decoding network can separately be trained, wherein
Network can be generated by dual training method training first, pass through mixed discrete loss function the second decoding network of training.
For training network losses function used in the first generation network that can indicate are as follows:
Wherein,For indicating the network losses of the first coding network,For indicate the first coding network and and
The network losses of first decoding network,For indicatingWithThe sum of loss, a is for indicating each for inputting
The corrupted audio of network, β are used to indicate the weight of the network losses of the first decoding network.
In this implementation, network and the second decoding network can be generated by following processes training first:
Step 401 obtains multiple training samples (corrupted audio) and corresponding with each training sample from training set
Markup information (spectral image of undamaged audio and undamaged audio);
Step 402 is directed to each training sample, carries out frequency spectrum conversion to training sample, obtains the of the training sample
One spectral image;
First spectral image is input to the first generation network by step 403, generates network to the trained sample based on first
This first spectral image carries out frequency spectrum completion, obtains the second spectral image of training sample completion;
It is step 404, true by the spectral image of audio undamaged in markup information and the second spectral image of training sample
It is fixedWithAccording toWithThe sum ofAdjustment first generates the network parameter of each network in network;
Second spectral image of training sample and training sample is input to the second decoding network by step 405, is obtained to instruction
Practice the audio that sample carries out completion;
Step 406, the audio that completion is carried out according to the undamaged audio in markup information and to training sample, determine second
The network losses of decoding network;The network parameter of the second decoding network is adjusted according to the network losses of the second decoding network.
Wherein, this implementation will not to step 405 and step 406 execution sequence be defined.
In one possible implementation, the second generation network and the second decoding network can separately be trained, wherein
It can use the training of dual training method and generate network by second, mixed discrete loss function training the second decoding net can be passed through
Network.
For training loss function used in the second generation network that can indicate are as follows:
Wherein,For indicating the network losses of the first coding network and the first decoding network, η2For the network losses
Weight, t indicate the time, η2It can decay as time increases.For indicating the first coding network, the first decoding net
The network losses of network and the second coding network, LSyncFor indicating the network losses of the first coding network and the second coding network,The sum of network losses for indicating three.
In this implementation, network and the second decoding network can be generated by following processes training second:
Step 411, obtained from training set multiple training samples (corrupted audio, corrupted audio spectral image and by
Damage audio video information and streamer information), markup information corresponding with each training sample (undamaged audio and undamaged sound
The spectral image of frequency);
Step 412 is directed to each training sample, carries out frequency spectrum conversion to the corrupted audio of training sample, obtains the instruction
Practice the first spectral image of sample;
First spectral image is input to the first coding network by step 413, obtains the second spectrum signature;
Step 414, by corrupted audio video information and streamer information input to the second coding network, it is special to obtain supervision
Sign;
Step 415 is aligned second spectrum signature and the supervision feature, and according to the second spectrum signature after alignment
L is determined with supervision featureSync;
Step 416, by after the alignment the second spectrum signature and supervision feature determination be input to the second decoding network, base
Network is generated in first, frequency spectrum completion is carried out to the first spectral image of the training sample, obtain the second of training sample completion
Spectral image;
Step 417, according to undamaged audio and undamaged sound in the second spectral image of training sample completion, markup information
The spectral image of frequency determinesWithAccording to LSync、WithThe sum ofIt adjusts (second) and generates net
The network parameter of each network in network;
Second spectral image of training sample and training sample is input to the second decoding network by step 418, is obtained to instruction
Practice the audio that sample carries out completion;
Step 419, the audio that completion is carried out according to the undamaged audio in markup information and to training sample, calculate second
The network losses of decoding network;The network parameter of the second decoding network is adjusted according to the network losses of the second decoding network.
Wherein, this implementation will not to step S418 and step S419 execution sequence be defined.
In one possible implementation, audio-frequency processing method can be applied in video-audio fix tool, by one
Section includes that the audio-frequency information of corrupted audio is input to the video-audio fix tool, video-audio fix tool, that is, exportable one
Intact audio after section completion.
It is appreciated that above-mentioned each embodiment of the method that the disclosure refers to, without prejudice to principle logic,
To engage one another while the embodiment to be formed after combining, as space is limited, the disclosure is repeated no more.
In addition, the disclosure additionally provides apparatus for processing audio, electronic equipment, computer readable storage medium, program, it is above-mentioned
It can be used to realize any audio-frequency processing method that the disclosure provides, corresponding technical solution and description and referring to method part
It is corresponding to record, it repeats no more.
It will be understood by those skilled in the art that each step writes sequence simultaneously in the above method of specific embodiment
It does not mean that stringent execution sequence and any restriction is constituted to implementation process, the specific execution sequence of each step should be with its function
It can be determined with possible internal logic.
Fig. 5 shows the block diagram of the apparatus for processing audio according to the embodiment of the present disclosure.As shown in figure 5, the audio processing dress
It sets including frequency spectrum conversion module 501, frequency spectrum completion module 502 and audio completion module 503.
Wherein, frequency spectrum conversion module 501 obtains described impaired for carrying out frequency spectrum conversion to corrupted audio to be processed
First spectral image of audio;
Frequency spectrum completion module 502 obtains the second frequency spectrum of completion for carrying out frequency spectrum completion to first spectral image
Image;
Audio completion module 503 is mended for carrying out completion to the corrupted audio according to second spectral image
The first audio after complete.
In one possible implementation, the frequency spectrum completion module includes: fisrt feature extracting sub-module, for pair
First spectral image carries out feature extraction, obtains the first spectrum signature;First frequency spectrum reconfiguration submodule, for described the
One spectrum signature carries out frequency spectrum reconfiguration, obtains second spectral image.
In one possible implementation, the frequency spectrum completion module includes: second feature extracting sub-module, for pair
First spectral image carries out feature extraction, obtains the second spectrum signature;Third feature extracting sub-module, for it is described by
The relevant information for damaging audio carries out feature extraction, obtains supervision feature;It is aligned submodule, for being aligned second spectrum signature
With the supervision feature;Second frequency spectrum reconfiguration submodule, for according to the supervision feature after alignment to first spectrum signature
Frequency spectrum reconfiguration is carried out, obtains second spectral image, wherein the relevant information includes view corresponding with the corrupted audio
At least one of frequency information and Optic flow information.
In one possible implementation, the corrupted audio includes corrupted audio segment;The audio completion module
It include: the first spectrum frequency transform subblock, for being carried out to spectral image corresponding with corrupted audio segment in the second spectral image
Spectrum frequency is converted, and completion audio fragment is obtained;First audio completion submodule, for using completion audio fragment to corrupted audio into
Row completion, the first audio after obtaining completion.
In one possible implementation, the corrupted audio includes corrupted audio segment and undamaged audio fragment;
The audio completion module includes: prediction submodule, for according to frequency corresponding with corrupted audio segment in the second spectral image
Spectrogram picture and undamaged audio fragment predict the completion audio fragment;Second audio completion submodule, for utilizing completion sound
Frequency segment carries out completion to corrupted audio, the first audio after obtaining completion.
In one possible implementation, the audio completion module is realized by WaveNet decoding network.
In one possible implementation, first spectral image and the second spectral image include Meier spectral image
Or mel cepstrum image.
In some embodiments, the embodiment of the present disclosure provides the function that has of device or comprising module can be used for holding
The method of row embodiment of the method description above, specific implementation are referred to the description of embodiment of the method above, for sake of simplicity, this
In repeat no more.
The embodiment of the present disclosure also proposes a kind of computer readable storage medium, is stored thereon with computer program instructions, institute
It states and realizes above-mentioned audio-frequency processing method when computer program instructions are executed by processor.Computer readable storage medium can be with right and wrong
Volatile computer readable storage medium storing program for executing.
The embodiment of the present disclosure also proposes a kind of electronic equipment, comprising: processor;For storage processor executable instruction
Memory;Wherein, the processor is configured to above-mentioned audio-frequency processing method.
Fig. 6 is the block diagram of a kind of electronic equipment 800 shown according to an exemplary embodiment.For example, electronic equipment 800 can
To be mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, Medical Devices are good for
Body equipment, the terminals such as personal digital assistant.
Referring to Fig. 6, electronic equipment 800 may include following one or more components: processing component 802, memory 804,
Power supply module 806, multimedia component 808, audio component 810, the interface 812 of input/output (I/O), sensor module 814,
And communication component 816.
The integrated operation of the usual controlling electronic devices 800 of processing component 802, such as with display, call, data are logical
Letter, camera operation and record operate associated operation.Processing component 802 may include one or more processors 820 to hold
Row instruction, to perform all or part of the steps of the methods described above.In addition, processing component 802 may include one or more moulds
Block, convenient for the interaction between processing component 802 and other assemblies.For example, processing component 802 may include multi-media module, with
Facilitate the interaction between multimedia component 808 and processing component 802.
Memory 804 is configured as storing various types of data to support the operation in electronic equipment 800.These data
Example include any application or method for being operated on electronic equipment 800 instruction, contact data, telephone directory
Data, message, picture, video etc..Memory 804 can by any kind of volatibility or non-volatile memory device or it
Combination realize, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable
Except programmable read only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, fastly
Flash memory, disk or CD.
Power supply module 806 provides electric power for the various assemblies of electronic equipment 800.Power supply module 806 may include power supply pipe
Reason system, one or more power supplys and other with for electronic equipment 800 generate, manage, and distribute the associated component of electric power.
Multimedia component 808 includes the screen of one output interface of offer between the electronic equipment 800 and user.
In some embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch surface
Plate, screen may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touches
Sensor is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding
The boundary of movement, but also detect duration and pressure associated with the touch or slide operation.In some embodiments,
Multimedia component 808 includes a front camera and/or rear camera.When electronic equipment 800 is in operation mode, as clapped
When taking the photograph mode or video mode, front camera and/or rear camera can receive external multi-medium data.It is each preposition
Camera and rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 810 is configured as output and/or input audio signal.For example, audio component 810 includes a Mike
Wind (MIC), when electronic equipment 800 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone
It is configured as receiving external audio signal.The received audio signal can be further stored in memory 804 or via logical
Believe that component 816 is sent.In some embodiments, audio component 810 further includes a loudspeaker, is used for output audio signal.
I/O interface 812 provides interface between processing component 802 and peripheral interface module, and above-mentioned peripheral interface module can
To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock
Determine button.
Sensor module 814 includes one or more sensors, for providing the state of various aspects for electronic equipment 800
Assessment.For example, sensor module 814 can detecte the state that opens/closes of electronic equipment 800, the relative positioning of component, example
As the component be electronic equipment 800 display and keypad, sensor module 814 can also detect electronic equipment 800 or
The position change of 800 1 components of electronic equipment, the existence or non-existence that user contacts with electronic equipment 800, electronic equipment 800
The temperature change of orientation or acceleration/deceleration and electronic equipment 800.Sensor module 814 may include proximity sensor, be configured
For detecting the presence of nearby objects without any physical contact.Sensor module 814 can also include optical sensor,
Such as CMOS or ccd image sensor, for being used in imaging applications.In some embodiments, which may be used also
To include acceleration transducer, gyro sensor, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 816 is configured to facilitate the communication of wired or wireless way between electronic equipment 800 and other equipment.
Electronic equipment 800 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.Show at one
In example property embodiment, communication component 816 receives broadcast singal or broadcast from external broadcasting management system via broadcast channel
Relevant information.In one exemplary embodiment, the communication component 816 further includes near-field communication (NFC) module, short to promote
Cheng Tongxin.For example, radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band can be based in NFC module
(UWB) technology, bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, electronic equipment 800 can be by one or more application specific integrated circuit (ASIC), number
Word signal processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array
(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, a kind of non-volatile computer readable storage medium storing program for executing is additionally provided, for example including calculating
The memory 804 of machine program instruction, above-mentioned computer program instructions can be executed by the processor 820 of electronic equipment 800 to complete
The above method.
Fig. 7 is the block diagram of a kind of electronic equipment 1900 shown according to an exemplary embodiment.For example, electronic equipment 1900
It may be provided as a server.Referring to Fig. 7, electronic equipment 1900 includes processing component 1922, further comprise one or
Multiple processors and memory resource represented by a memory 1932, can be by the execution of processing component 1922 for storing
Instruction, such as application program.The application program stored in memory 1932 may include it is one or more each
Module corresponding to one group of instruction.In addition, processing component 1922 is configured as executing instruction, to execute the above method.
Electronic equipment 1900 can also include that a power supply module 1926 is configured as executing the power supply of electronic equipment 1900
Management, a wired or wireless network interface 1950 is configured as electronic equipment 1900 being connected to network and an input is defeated
(I/O) interface 1958 out.Electronic equipment 1900 can be operated based on the operating system for being stored in memory 1932, such as
Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or similar.
In the exemplary embodiment, a kind of non-volatile computer readable storage medium storing program for executing is additionally provided, for example including calculating
The memory 1932 of machine program instruction, above-mentioned computer program instructions can by the processing component 1922 of electronic equipment 1900 execute with
Complete the above method.
The disclosure can be system, method and/or computer program product.Computer program product may include computer
Readable storage medium storing program for executing, containing for making processor realize the computer-readable program instructions of various aspects of the disclosure.
Computer readable storage medium, which can be, can keep and store the tangible of the instruction used by instruction execution equipment
Equipment.Computer readable storage medium for example can be-- but it is not limited to-- storage device electric, magnetic storage apparatus, optical storage
Equipment, electric magnetic storage apparatus, semiconductor memory apparatus or above-mentioned any appropriate combination.Computer readable storage medium
More specific example (non exhaustive list) includes: portable computer diskette, hard disk, random access memory (RAM), read-only deposits
It is reservoir (ROM), erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), portable
Compact disk read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical coding equipment, for example thereon
It is stored with punch card or groove internal projection structure and the above-mentioned any appropriate combination of instruction.Calculating used herein above
Machine readable storage medium storing program for executing is not interpreted that instantaneous signal itself, the electromagnetic wave of such as radio wave or other Free propagations lead to
It crosses the electromagnetic wave (for example, the light pulse for passing through fiber optic cables) of waveguide or the propagation of other transmission mediums or is transmitted by electric wire
Electric signal.
Computer-readable program instructions as described herein can be downloaded to from computer readable storage medium it is each calculate/
Processing equipment, or outer computer or outer is downloaded to by network, such as internet, local area network, wide area network and/or wireless network
Portion stores equipment.Network may include copper transmission cable, optical fiber transmission, wireless transmission, router, firewall, interchanger, gateway
Computer and/or Edge Server.Adapter or network interface in each calculating/processing equipment are received from network to be counted
Calculation machine readable program instructions, and the computer-readable program instructions are forwarded, for the meter being stored in each calculating/processing equipment
In calculation machine readable storage medium storing program for executing.
Computer program instructions for executing disclosure operation can be assembly instruction, instruction set architecture (ISA) instructs,
Machine instruction, machine-dependent instructions, microcode, firmware instructions, condition setup data or with one or more programming languages
The source code or object code that any combination is write, the programming language include the programming language-of object-oriented such as
Smalltalk, C++ etc., and conventional procedural programming languages-such as " C " language or similar programming language.Computer
Readable program instructions can be executed fully on the user computer, partly execute on the user computer, be only as one
Vertical software package executes, part executes on the remote computer or completely in remote computer on the user computer for part
Or it is executed on server.In situations involving remote computers, remote computer can pass through network-packet of any kind
It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit
It is connected with ISP by internet).In some embodiments, by utilizing computer-readable program instructions
Status information carry out personalized customization electronic circuit, such as programmable logic circuit, field programmable gate array (FPGA) or can
Programmed logic array (PLA) (PLA), the electronic circuit can execute computer-readable program instructions, to realize each side of the disclosure
Face.
Referring herein to according to the flow chart of the method, apparatus (system) of the embodiment of the present disclosure and computer program product and/
Or block diagram describes various aspects of the disclosure.It should be appreciated that flowchart and or block diagram each box and flow chart and/
Or in block diagram each box combination, can be realized by computer-readable program instructions.
These computer-readable program instructions can be supplied to general purpose computer, special purpose computer or other programmable datas
The processor of processing unit, so that a kind of machine is produced, so that these instructions are passing through computer or other programmable datas
When the processor of processing unit executes, function specified in one or more boxes in implementation flow chart and/or block diagram is produced
The device of energy/movement.These computer-readable program instructions can also be stored in a computer-readable storage medium, these refer to
It enables so that computer, programmable data processing unit and/or other equipment work in a specific way, thus, it is stored with instruction
Computer-readable medium then includes a manufacture comprising in one or more boxes in implementation flow chart and/or block diagram
The instruction of the various aspects of defined function action.
Computer-readable program instructions can also be loaded into computer, other programmable data processing units or other
In equipment, so that series of operation steps are executed in computer, other programmable data processing units or other equipment, to produce
Raw computer implemented process, so that executed in computer, other programmable data processing units or other equipment
Instruct function action specified in one or more boxes in implementation flow chart and/or block diagram.
The flow chart and block diagram in the drawings show system, method and the computer journeys according to multiple embodiments of the disclosure
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
One module of table, program segment or a part of instruction, the module, program segment or a part of instruction include one or more use
The executable instruction of the logic function as defined in realizing.In some implementations as replacements, function marked in the box
It can occur in a different order than that indicated in the drawings.For example, two continuous boxes can actually be held substantially in parallel
Row, they can also be executed in the opposite order sometimes, and this depends on the function involved.It is also noted that block diagram and/or
The combination of each box in flow chart and the box in block diagram and or flow chart, can the function as defined in executing or dynamic
The dedicated hardware based system made is realized, or can be realized using a combination of dedicated hardware and computer instructions.
The presently disclosed embodiments is described above, above description is exemplary, and non-exclusive, and
It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill
Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport
In principle, the practical application or to the technological improvement in market for best explaining each embodiment, or make the art its
Its those of ordinary skill can understand each embodiment disclosed herein.
Claims (10)
1. a kind of audio-frequency processing method characterized by comprising
Frequency spectrum conversion is carried out to corrupted audio to be processed, obtains the first spectral image of the corrupted audio;
Frequency spectrum completion is carried out to first spectral image, obtains the second spectral image of completion;
Completion is carried out to the corrupted audio according to second spectral image, the first audio after obtaining completion.
2. being obtained the method according to claim 1, wherein carrying out frequency spectrum completion to first spectral image
Second spectral image of completion, comprising:
Feature extraction is carried out to first spectral image, obtains the first spectrum signature;
Frequency spectrum reconfiguration is carried out to first spectrum signature, obtains second spectral image.
3. being obtained the method according to claim 1, wherein carrying out frequency spectrum completion to first spectral image
Second spectral image of completion, comprising:
Feature extraction is carried out to first spectral image, obtains the second spectrum signature;
Feature extraction is carried out to the relevant information of the corrupted audio, obtains supervision feature;
It is aligned second spectrum signature and the supervision feature;
Frequency spectrum reconfiguration is carried out to first spectrum signature according to the supervision feature after alignment, obtains second spectral image,
Wherein, the relevant information includes at least one of video information corresponding with the corrupted audio and Optic flow information.
4. method described in any one of -3 according to claim 1, which is characterized in that the corrupted audio includes corrupted audio
Segment;
It is described that completion is carried out to the corrupted audio according to second spectral image, the first audio after obtaining completion, comprising:
Spectrum frequency is carried out to spectral image corresponding with corrupted audio segment in the second spectral image to convert, and obtains completion audio piece
Section;
Completion is carried out to corrupted audio using completion audio fragment, the first audio after obtaining completion.
5. method described in any one of -3 according to claim 1, which is characterized in that the corrupted audio includes corrupted audio
Segment and undamaged audio fragment;
It is described that completion is carried out to the corrupted audio according to second spectral image, the first audio after obtaining completion, comprising:
According to spectral image corresponding with corrupted audio segment in the second spectral image and undamaged audio fragment, the benefit is predicted
Full acoustic frequency segment;
Completion is carried out to corrupted audio using completion audio fragment, the first audio after obtaining completion.
6. method described in any one of -5 according to claim 1, it is characterised in that:
It is described that completion, the behaviour of the first audio after obtaining completion are carried out to the corrupted audio according to second spectral image
Make, is realized by WaveNet decoding network.
7. method described in any one of -6 according to claim 1, it is characterised in that:
First spectral image and the second spectral image include Meier spectral image or mel cepstrum image.
8. a kind of apparatus for processing audio characterized by comprising
Frequency spectrum conversion module obtains the first frequency of the corrupted audio for carrying out frequency spectrum conversion to corrupted audio to be processed
Spectrogram picture;
Frequency spectrum completion module obtains the second spectral image of completion for carrying out frequency spectrum completion to first spectral image;
Audio completion module, for carrying out completion to the corrupted audio according to second spectral image, after obtaining completion
First audio.
9. a kind of electronic equipment characterized by comprising
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to: perform claim require any one of 1 to 7 described in method.
10. a kind of computer readable storage medium, is stored thereon with computer program instructions, which is characterized in that the computer
Method described in any one of claim 1 to 7 is realized when program instruction is executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910086763.5A CN109887515B (en) | 2019-01-29 | 2019-01-29 | Audio processing method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910086763.5A CN109887515B (en) | 2019-01-29 | 2019-01-29 | Audio processing method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109887515A true CN109887515A (en) | 2019-06-14 |
CN109887515B CN109887515B (en) | 2021-07-09 |
Family
ID=66927191
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910086763.5A Active CN109887515B (en) | 2019-01-29 | 2019-01-29 | Audio processing method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109887515B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110379414A (en) * | 2019-07-22 | 2019-10-25 | 出门问问(苏州)信息科技有限公司 | Acoustic model enhances training method, device, readable storage medium storing program for executing and calculates equipment |
CN110378860A (en) * | 2019-07-30 | 2019-10-25 | 腾讯科技(深圳)有限公司 | Method, apparatus, computer equipment and the storage medium of restored video |
CN110781223A (en) * | 2019-10-16 | 2020-02-11 | 深圳市商汤科技有限公司 | Data processing method and device, processor, electronic equipment and storage medium |
CN111145778A (en) * | 2019-11-28 | 2020-05-12 | 科大讯飞股份有限公司 | Audio data processing method and device, electronic equipment and computer storage medium |
CN111556254A (en) * | 2020-04-10 | 2020-08-18 | 早安科技(广州)有限公司 | Method, system, medium and intelligent device for video cutting by using video content |
CN111798866A (en) * | 2020-07-13 | 2020-10-20 | 商汤集团有限公司 | Method and device for training audio processing network and reconstructing stereo |
CN112071331A (en) * | 2020-09-18 | 2020-12-11 | 平安科技(深圳)有限公司 | Voice file repairing method and device, computer equipment and storage medium |
CN114997047A (en) * | 2022-05-26 | 2022-09-02 | 电子科技大学 | Electromagnetic spectrum information completion method based on cyclic generation countermeasure network |
CN114997047B (en) * | 2022-05-26 | 2024-05-14 | 电子科技大学 | Electromagnetic spectrum information complement method based on cyclic generation countermeasure network |
Citations (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0795121A (en) * | 1993-09-20 | 1995-04-07 | Takayama:Kk | Method and device for compressing voice signal |
JP2003157097A (en) * | 2001-11-20 | 2003-05-30 | Hitachi Ltd | Coded voice decoder |
JP3603381B2 (en) * | 1995-04-07 | 2004-12-22 | ソニー株式会社 | Compressed data editing device and compressed data editing method |
CN1684371A (en) * | 2004-02-27 | 2005-10-19 | 三星电子株式会社 | Lossless audio decoding/encoding method and apparatus |
US7019749B2 (en) * | 2001-12-28 | 2006-03-28 | Microsoft Corporation | Conversational interface agent |
US7035700B2 (en) * | 2002-03-13 | 2006-04-25 | The United States Of America As Represented By The Secretary Of The Air Force | Method and apparatus for embedding data in audio signals |
EP1688916A2 (en) * | 2005-02-05 | 2006-08-09 | Samsung Electronics Co., Ltd. | Method and apparatus for recovering line spectrum pair parameter and speech decoding apparatus using same |
CN101432610A (en) * | 2006-05-05 | 2009-05-13 | 汤姆森许可贸易公司 | Method and apparatus for lossless encoding of a source signal using a lossy encoded data stream and a lossless extension data stream |
CN101437009A (en) * | 2007-11-15 | 2009-05-20 | 华为技术有限公司 | Method for hiding loss package and system thereof |
CN101474104A (en) * | 2009-01-14 | 2009-07-08 | 西安交通大学 | Self-adjusting pharyngeal cavity electronic larynx voice communication system and method |
CN102324236A (en) * | 2006-07-31 | 2012-01-18 | 高通股份有限公司 | Be used for valid frame is carried out system, the method and apparatus of wideband encoding and decoding |
CN102522082A (en) * | 2011-12-27 | 2012-06-27 | 重庆大学 | Recognizing and locating method for abnormal sound in public places |
CN103377655A (en) * | 2012-04-16 | 2013-10-30 | 三星电子株式会社 | Apparatus and method with enhancement of sound quality |
CN103984315A (en) * | 2014-05-15 | 2014-08-13 | 成都百威讯科技有限责任公司 | Domestic multifunctional intelligent robot |
US20140229831A1 (en) * | 2012-12-12 | 2014-08-14 | Smule, Inc. | Audiovisual capture and sharing framework with coordinated user-selectable audio and video effects filters |
CN104011735A (en) * | 2011-12-26 | 2014-08-27 | 英特尔公司 | Vehicle Based Determination Of Occupant Audio And Visual Input |
WO2016024853A1 (en) * | 2014-08-15 | 2016-02-18 | 삼성전자 주식회사 | Sound quality improving method and device, sound decoding method and device, and multimedia device employing same |
CN105843474A (en) * | 2016-03-23 | 2016-08-10 | 努比亚技术有限公司 | Volume adjustment system and method |
CN107039042A (en) * | 2016-12-09 | 2017-08-11 | 电子科技大学 | A kind of audio restorative procedure and system based on low uniformity dictionary and rarefaction representation |
CN107077849A (en) * | 2014-11-07 | 2017-08-18 | 三星电子株式会社 | Method and apparatus for recovering audio signal |
CN107564533A (en) * | 2017-07-12 | 2018-01-09 | 同济大学 | Speech frame restorative procedure and device based on information source prior information |
CN107749302A (en) * | 2017-10-27 | 2018-03-02 | 广州酷狗计算机科技有限公司 | Audio-frequency processing method, device, storage medium and terminal |
CN108831490A (en) * | 2013-02-05 | 2018-11-16 | 瑞典爱立信有限公司 | Method and apparatus for being controlled audio frame loss concealment |
-
2019
- 2019-01-29 CN CN201910086763.5A patent/CN109887515B/en active Active
Patent Citations (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0795121A (en) * | 1993-09-20 | 1995-04-07 | Takayama:Kk | Method and device for compressing voice signal |
JP3603381B2 (en) * | 1995-04-07 | 2004-12-22 | ソニー株式会社 | Compressed data editing device and compressed data editing method |
JP2003157097A (en) * | 2001-11-20 | 2003-05-30 | Hitachi Ltd | Coded voice decoder |
US7019749B2 (en) * | 2001-12-28 | 2006-03-28 | Microsoft Corporation | Conversational interface agent |
US7035700B2 (en) * | 2002-03-13 | 2006-04-25 | The United States Of America As Represented By The Secretary Of The Air Force | Method and apparatus for embedding data in audio signals |
CN1684371A (en) * | 2004-02-27 | 2005-10-19 | 三星电子株式会社 | Lossless audio decoding/encoding method and apparatus |
EP1688916A2 (en) * | 2005-02-05 | 2006-08-09 | Samsung Electronics Co., Ltd. | Method and apparatus for recovering line spectrum pair parameter and speech decoding apparatus using same |
CN101432610A (en) * | 2006-05-05 | 2009-05-13 | 汤姆森许可贸易公司 | Method and apparatus for lossless encoding of a source signal using a lossy encoded data stream and a lossless extension data stream |
CN102324236A (en) * | 2006-07-31 | 2012-01-18 | 高通股份有限公司 | Be used for valid frame is carried out system, the method and apparatus of wideband encoding and decoding |
CN101437009A (en) * | 2007-11-15 | 2009-05-20 | 华为技术有限公司 | Method for hiding loss package and system thereof |
CN101474104A (en) * | 2009-01-14 | 2009-07-08 | 西安交通大学 | Self-adjusting pharyngeal cavity electronic larynx voice communication system and method |
CN104011735A (en) * | 2011-12-26 | 2014-08-27 | 英特尔公司 | Vehicle Based Determination Of Occupant Audio And Visual Input |
CN102522082A (en) * | 2011-12-27 | 2012-06-27 | 重庆大学 | Recognizing and locating method for abnormal sound in public places |
CN103377655A (en) * | 2012-04-16 | 2013-10-30 | 三星电子株式会社 | Apparatus and method with enhancement of sound quality |
US20140229831A1 (en) * | 2012-12-12 | 2014-08-14 | Smule, Inc. | Audiovisual capture and sharing framework with coordinated user-selectable audio and video effects filters |
CN108831490A (en) * | 2013-02-05 | 2018-11-16 | 瑞典爱立信有限公司 | Method and apparatus for being controlled audio frame loss concealment |
CN103984315A (en) * | 2014-05-15 | 2014-08-13 | 成都百威讯科技有限责任公司 | Domestic multifunctional intelligent robot |
WO2016024853A1 (en) * | 2014-08-15 | 2016-02-18 | 삼성전자 주식회사 | Sound quality improving method and device, sound decoding method and device, and multimedia device employing same |
CN107077849A (en) * | 2014-11-07 | 2017-08-18 | 三星电子株式会社 | Method and apparatus for recovering audio signal |
CN105843474A (en) * | 2016-03-23 | 2016-08-10 | 努比亚技术有限公司 | Volume adjustment system and method |
CN107039042A (en) * | 2016-12-09 | 2017-08-11 | 电子科技大学 | A kind of audio restorative procedure and system based on low uniformity dictionary and rarefaction representation |
CN107564533A (en) * | 2017-07-12 | 2018-01-09 | 同济大学 | Speech frame restorative procedure and device based on information source prior information |
CN107749302A (en) * | 2017-10-27 | 2018-03-02 | 广州酷狗计算机科技有限公司 | Audio-frequency processing method, device, storage medium and terminal |
Non-Patent Citations (1)
Title |
---|
DANIEL M. LOFARO ET AL.: "《musical tempo and beat tracking techniques in the absence of auditory cues》", 《2010 10TH IEEE-RAS INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110379414A (en) * | 2019-07-22 | 2019-10-25 | 出门问问(苏州)信息科技有限公司 | Acoustic model enhances training method, device, readable storage medium storing program for executing and calculates equipment |
CN110379414B (en) * | 2019-07-22 | 2021-12-03 | 出门问问(苏州)信息科技有限公司 | Acoustic model enhancement training method and device, readable storage medium and computing equipment |
CN110378860A (en) * | 2019-07-30 | 2019-10-25 | 腾讯科技(深圳)有限公司 | Method, apparatus, computer equipment and the storage medium of restored video |
CN110378860B (en) * | 2019-07-30 | 2023-08-18 | 腾讯科技(深圳)有限公司 | Method, device, computer equipment and storage medium for repairing video |
CN110781223A (en) * | 2019-10-16 | 2020-02-11 | 深圳市商汤科技有限公司 | Data processing method and device, processor, electronic equipment and storage medium |
CN111145778B (en) * | 2019-11-28 | 2023-04-04 | 科大讯飞股份有限公司 | Audio data processing method and device, electronic equipment and computer storage medium |
CN111145778A (en) * | 2019-11-28 | 2020-05-12 | 科大讯飞股份有限公司 | Audio data processing method and device, electronic equipment and computer storage medium |
CN111556254A (en) * | 2020-04-10 | 2020-08-18 | 早安科技(广州)有限公司 | Method, system, medium and intelligent device for video cutting by using video content |
CN111556254B (en) * | 2020-04-10 | 2021-04-02 | 早安科技(广州)有限公司 | Method, system, medium and intelligent device for video cutting by using video content |
CN111798866A (en) * | 2020-07-13 | 2020-10-20 | 商汤集团有限公司 | Method and device for training audio processing network and reconstructing stereo |
CN112071331A (en) * | 2020-09-18 | 2020-12-11 | 平安科技(深圳)有限公司 | Voice file repairing method and device, computer equipment and storage medium |
CN112071331B (en) * | 2020-09-18 | 2023-05-30 | 平安科技(深圳)有限公司 | Voice file restoration method and device, computer equipment and storage medium |
CN114997047A (en) * | 2022-05-26 | 2022-09-02 | 电子科技大学 | Electromagnetic spectrum information completion method based on cyclic generation countermeasure network |
CN114997047B (en) * | 2022-05-26 | 2024-05-14 | 电子科技大学 | Electromagnetic spectrum information complement method based on cyclic generation countermeasure network |
Also Published As
Publication number | Publication date |
---|---|
CN109887515B (en) | 2021-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109887515A (en) | Audio-frequency processing method and device, electronic equipment and storage medium | |
CN109801644B (en) | Separation method, separation device, electronic equipment and readable medium for mixed sound signal | |
CN107705783A (en) | A kind of phoneme synthesizing method and device | |
TW202022683A (en) | Method, device, storage medium, and computer equipment of processing image | |
CN109618184A (en) | Method for processing video frequency and device, electronic equipment and storage medium | |
CN108985176A (en) | image generating method and device | |
CN109614613A (en) | The descriptive statement localization method and device of image, electronic equipment and storage medium | |
CN108259991A (en) | Method for processing video frequency and device | |
CN111883107B (en) | Speech synthesis and feature extraction model training method, device, medium and equipment | |
CN110458102A (en) | A kind of facial image recognition method and device, electronic equipment and storage medium | |
CN109005352A (en) | It is in step with the method and device of video | |
CN110121083A (en) | The generation method and device of barrage | |
CN108924644A (en) | Video clip extracting method and device | |
CN104285452A (en) | Spatial audio signal filtering | |
CN110519655A (en) | Video clipping method and device | |
CN110322532A (en) | The generation method and device of dynamic image | |
CN109146789A (en) | Picture splicing method and device | |
CN110209877A (en) | Video analysis method and device | |
CN106791535A (en) | Video recording method and device | |
CN110121106A (en) | Video broadcasting method and device | |
CN109543537A (en) | Weight identification model increment training method and device, electronic equipment and storage medium | |
CN109934240A (en) | Feature update method and device, electronic equipment and storage medium | |
CN107147936A (en) | The display control method and device of barrage | |
CN109359218A (en) | Multimedia resource methods of exhibiting and device | |
CN110232909A (en) | A kind of audio-frequency processing method, device, equipment and readable storage medium storing program for executing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |