WO2020228107A1 - 一种音频修复方法、设备及可读存储介质 - Google Patents

一种音频修复方法、设备及可读存储介质 Download PDF

Info

Publication number
WO2020228107A1
WO2020228107A1 PCT/CN2019/093719 CN2019093719W WO2020228107A1 WO 2020228107 A1 WO2020228107 A1 WO 2020228107A1 CN 2019093719 W CN2019093719 W CN 2019093719W WO 2020228107 A1 WO2020228107 A1 WO 2020228107A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
target frame
interval
noise
processing unit
Prior art date
Application number
PCT/CN2019/093719
Other languages
English (en)
French (fr)
Inventor
徐东
Original Assignee
腾讯音乐娱乐科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯音乐娱乐科技(深圳)有限公司 filed Critical 腾讯音乐娱乐科技(深圳)有限公司
Priority to US17/627,103 priority Critical patent/US11990150B2/en
Publication of WO2020228107A1 publication Critical patent/WO2020228107A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02087Noise filtering the noise being separate speech, e.g. cocktail party
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02163Only one microphone

Definitions

  • This application relates to the field of signal processing, and in particular to an audio repair method, device and readable storage medium.
  • the buffer module According to the audio characteristics of the multiple audio frames in the buffer module, detecting noise points that appear as short-term high-energy pulses in the target frame;
  • the repair unit is used for repairing the target frame, and the repairing is used for removing noise points in the target frame.
  • an embodiment of the present application provides a computer-readable storage medium, wherein the computer storage medium stores a computer program, the computer program includes program instructions, and the program instructions are executed by a processor. To perform the method as described in the first aspect.
  • FIG. 1 is a schematic diagram of an application scenario of an audio repair method provided by an embodiment of the present application
  • FIG. 2 is a schematic flowchart of an audio repair method provided by an embodiment of the present application
  • FIG. 3 is a schematic flowchart of an audio repair method provided by another embodiment of the present application.
  • FIG. 4 is a schematic diagram of inputting multiple audio frames into a buffer module according to an embodiment of the present application
  • FIG. 5 is a schematic diagram of a cache relocation and repair provided by an embodiment of the present application.
  • This application is mainly applied to audio repair devices, which may be traditional audio repair devices or the audio repair devices described in the third and fourth embodiments of this application, which are not limited in this application.
  • the audio repair device sends data, it records and transmits the characteristics of the data in a preset format, where the characteristics of the data include time, location, type, etc.
  • this application proposes a method for detecting and repairing noise points in audio signals.
  • an audio repair device (such as the mobile phone in the figure) obtains audio signals through microphone recording, or receives audio signals from the Internet, and then detects and repairs noise points in the audio signal that appear as short-term high-energy pulses.
  • the dashed circle shows the noise points in the unprocessed audio signal, which appear as short-term high-energy pulses.
  • the noise circled by the dotted line The points are well fixed.
  • the audio repair method can be roughly divided into five stages to perform, including signal input, buffer relocation, noise point detection, noise point repair, and signal output. Next, this application will introduce the five stages in turn.
  • the cache module is shown in Figure 4.
  • the cache module is composed of five processing units connected in sequence.
  • the first processing unit is the head processing unit, and the signal processing unit at the center of the five processing units is the central processing unit.
  • each processing unit can contain two audio frames, the audio frames are input by the head processing unit of the buffer module, and are transmitted to other processing units according to the order in which the processing units are connected.
  • the buffer unit can contain three or more processing units of any odd number, and the length of the processing unit in the buffer module can be set to any length value, generally speaking, it can be set to at least two audio frames The length above. For example, when the length of the processing unit is the length of two audio frames, 50% of the signals overlap between adjacent audio frames during the audio frame processing, thereby avoiding the truncation effect and making the signal processing result smoother .
  • the multiple audio frames in the buffer module are buffered and relocated, that is, the point in the audio frame that is most likely to be a noise point is the center , Reacquire the audio signal segment that needs to be detected.
  • the audio frame in the central processing unit is taken as the target frame, the peak point of the target frame (the point where the absolute value of the amplitude value is the maximum value) is determined, and the peak point is obtained in the buffer module based on the peak point.
  • An audio signal segment with a length of 4 processing units, and finally the audio signal segment is re-divided into multiple sections.
  • Repairing the target frame mainly includes three steps.
  • the first step is to remove the noise points, use linear prediction or the superposition of adjacent sampling points to estimate the normal value of the target frame before noise interference (that is, the normal amplitude value), and then remove the noise
  • the amplitude value of the point is replaced with the above normal value.
  • the second step is to take the time domain smoothing method to smooth the target frame with the replaced amplitude value in the time domain.
  • the third step is to take the frequency filtering method to replace The target frame with the amplitude value becomes smooth in the frequency domain.
  • the repaired target frame is output in a preset format, and the preset format is any one of audio format wav, audio format mp3, and audio format flac.
  • the biggest advantage of this application is that it can perform efficient, complete and accurate automatic detection and repair of noise points in the audio signal, which can adapt to the rapid processing needs of massive audio and save a lot of labor. Cost and time cost have high economic value and technical advantages.
  • the content shown in FIG. 1, FIG. 4, and FIG. 5 is an example, and does not constitute a limitation to the embodiment of the present invention, because this application does not limit the number of processing units included in the cache unit, and the length of the processing unit ,
  • the length of the audio signal segment obtained in the buffer relocation, the length of the divided sections, the source of the audio signal, and the audio repair device are limited.
  • the buffer unit can also include 7 processing units.
  • the length of the audio signal segment can be the length of 4 processing units or the length of 6 processing units.
  • the audio signal segment is divided into multiple sections.
  • the length of the first sub-interval in can be 1/4 processing unit or 1/2 processing unit length.
  • the audio signal can also be received through the Internet and other methods.
  • the audio repair device that processes the audio signal can be any terminal device such as a computer, a server, etc., in addition to a mobile phone.
  • the audio repair method may include:
  • the buffer module is composed of a plurality of processing units in sequence, and the processing unit at the center of the multiple processing units is the central processing unit.
  • the length of the processing unit in the buffer module can be set to any length value. Generally speaking, it can be set to a length of at least two audio frames. For example, when the length of the processing unit is the length of two audio frames, 50% of the signals overlap between adjacent audio frames during the audio frame processing, thereby avoiding the truncation effect and making the signal processing result smoother .
  • FIG. 4 is a schematic diagram of the structure of a buffer module.
  • the processing unit in the middle position is the central processing unit
  • the processing unit for audio frame input is the header Processing unit
  • each processing unit contains two audio frames
  • the entire buffer module includes a total of 10 audio frames.
  • a single processing unit is a black and bold solid line rectangular frame containing a dotted line, and the two numbers in the frame respectively represent the number of the corresponding input audio frame.
  • each processing unit of the buffer module has no audio frame input, so the signals in the buffer module are all 0.
  • the header processing unit at the right end of the buffer module inputs the first audio frame
  • the header processing unit includes 0 and the first audio frame
  • the processing unit at the center of the buffer is the audio frame including the 5th frame and the 6th frame.
  • the audio signal can be processed in the unit of audio frame in the subsequent steps.
  • the advantage of this is that it can meet the needs of audio real-time processing. It can repair the audio signal while outputting the audio signal. Okay audio frame.
  • audio signal to be repaired is obtained, and then the audio signal is divided into frames to obtain the multiple input into the buffer module.
  • Audio frame includes recording signals and electronic sound synthesis signals.
  • the audio signal to be repaired is acquired first, and then the multiple audio frames are obtained after framing the audio signal.
  • the audio signal may be the audio signal.
  • the audio signal recorded by the repair device may also be an audio signal obtained from other terminal devices through the Internet.
  • the audio signal includes a recording signal and an electronic sound synthesis signal.
  • the recording signal includes external sounds (such as telephone recording) recorded by the local audio repair device or other terminal equipment through peripheral devices (such as microphones), and the electronic sound synthesis signal includes the local audio repair device or other terminal equipment through audio synthesis Electronic sounds synthesized by software (such as robot singing).
  • the format, size, and number of channels of the above audio signal are not limited.
  • the format is any one of audio format wav, audio format mp3, and audio format flac, and the channels are mono, dual, and multi-channel. Any of the channels.
  • each processing unit in the above-mentioned buffer module is filled with audio frames
  • all the audio frames contained in the central processing unit in the buffer module are used as target frames.
  • one processing unit may contain at least An audio frame.
  • the audio frames are sequentially input to other processing units by the head processing unit, so when the audio frames are input to the central processing unit, they are As the target frame, perform subsequent processing such as noise point detection and repair, so the embodiment of this application can process each audio frame without omission, and each audio frame is very short, and the length of the general audio frame is 20 milliseconds Up to 50 milliseconds, which is less than the length of a phoneme, and contains enough vibration periods to meet the needs of signal processing.
  • the length of the audio frame can be set to 20 milliseconds, 25 milliseconds, 30 milliseconds, 32 milliseconds, 40 milliseconds and 50 milliseconds
  • the length is not equal, so this application processes the audio signal in units of audio frames, which can greatly improve the efficiency of detection in an almost carpet search manner.
  • the audio characteristics of multiple audio frames in the above-mentioned buffer module are extracted, and then compared, and then it is determined whether the target frame contains noise points that appear as short-term high-energy pulses according to the comparison. Specifically, first determine the peak point of the target frame, that is, the point where the amplitude value is the maximum, and take the peak point as the center, obtain the audio signal segment of the preset length in the buffer module, and then divide the audio signal segment into multiple segments Finally, the audio features of the target frame and the multi-segment interval are respectively extracted, and noise points are determined in the target frame according to the audio features of the target frame and the multi-segment interval.
  • the audio characteristics include at least one of peak value, signal energy, average power, local peak ratio, autocorrelation coefficient roll-off rate, sound intensity, and peak duration. among them.
  • the peak value refers to the maximum amplitude value in the interval;
  • the signal energy refers to the integral of the square of the signal amplitude value;
  • the average power refers to the average value of the power of the signal in a limited interval or a period;
  • the local peak ratio refers to The proportion of the peak value of the signal in the sum of the peak values of all the signals;
  • the roll-off rate of the autocorrelation coefficient refers to the rate of decrease of the signal's autocorrelation coefficient;
  • the sound intensity refers to the speed of the signal passing perpendicular to the sound wave propagation direction per unit time
  • the energy per unit area is proportional to the square of the amplitude of the sound wave;
  • the peak duration refers to the time that the peak energy of the signal is greater than or equal to the preset value.
  • the multi-segment interval includes a first processing interval, a second processing interval, and an intermediate processing interval located between the first processing interval and the second processing interval.
  • the intermediate processing interval includes a first sub-interval and a second sub-interval.
  • a central sub-interval located between the first sub-interval and the second sub-interval.
  • the cache relocation is shown in Figure 5.
  • the peak point in the target frame in the central processing unit is determined to be point A
  • the pre-processing is obtained in the cache module with the peak point as the center.
  • the length is an audio signal segment of four processing units, and then divide the audio signal segment to obtain multiple sections, including a first processing section, an intermediate processing section, and a second processing section.
  • the first processing section and the second processing section The length of the interval is one audio frame
  • the length of the intermediate processing interval is two audio frames
  • the intermediate processing includes the first sub-interval, the central sub-interval and the second sub-interval.
  • the length of the first sub-interval and the second sub-interval is 1 /4 processing unit length
  • the length of the central sub-interval is 3/2 processing unit length.
  • the first judgment is to judge whether the amplitude value of the peak point of the target frame is both greater than the amplitude value of the peak point of the central sub-interval and the amplitude value of the peak point of the intermediate processing interval. This judgment is used to determine the amplitude value of the peak point of the target frame Whether it is the only and largest among adjacent signals;
  • the second judgment is to judge whether the amplitude value of the peak point of the target frame is greater than the amplitude value of the peak point of the first subinterval and the amplitude value of the peak point of the second subinterval at the same time, and the greater part exceeds the first threshold value. It is used to judge whether the amplitude value of the peak point of the target frame is obviously higher than the adjacent signal;
  • the third judgment is to judge whether the signal energy in the intermediate processing interval is greater than the second threshold, and this judgment is used to judge whether the energy of the peak point of the target frame is too large;
  • the fourth judgment is to judge whether the ratio of the average power of the intermediate processing interval to the average power of the audio signal segment is greater than the third threshold, and judge whether the signal-to-noise ratio occupied by the peak point of the target frame is too large;
  • the fifth judgment is to judge whether the ratio of the amplitude value of the peak point of the target frame to the total amplitude value of the peak point of the audio signal segment is greater than the fourth threshold. This judgment is used to determine whether the amplitude value of the peak point of the target frame is in the audio signal segment. Whether the sum of the amplitude values of the peak points in each interval is too large;
  • the sixth judgment is to judge whether the roll-off rate of the autocorrelation coefficient of the audio signal segment is greater than the fourth threshold. This judgment is used to judge whether the peak point of the target frame appears as a sharp pulse signal, otherwise it is a continuous pulse signal;
  • the seventh judgment is to judge whether the sound intensity of the central processing section is greater than the sound intensity of the first processing section and the sound intensity of the second processing section. This judgment is used to judge whether the peak point of the target frame appears as a high-energy pulse;
  • the eighth judgment is to judge whether the peak duration of the target frame is less than the fifth threshold, and this judgment is used to judge whether the peak point of the target frame appears as a short-term pulse;
  • the embodiment of the present application judges whether the peak point in the target frame is a noise point by serially performing the above eight judgments. If the results of the above eight judgments are all positive results, it is determined that the peak point of the target frame is Noise points, but as long as the result of any one of the above judgments is negative, the peak point of the target frame is not a noise point.
  • the embodiment of the present application mainly determines whether the peak point of the target frame is a noise point, because the foregoing has proved that the length of the audio frame is already very short, so even if the target frame includes multiple audio frames, the target frame includes two and The possibility of the above noise points is extremely small, so in combination with the short-term high-energy characteristics of the noise points that need to be detected, this application only needs to determine whether the peak point in the target frame is a noise point, so this application can leave no omissions. It can locate the noise point very quickly, thereby improving the efficiency and accuracy of detection.
  • the target frame is repaired.
  • the repair process includes removing the noise point, and removing the noise point of the target frame in the time domain and the frequency domain. Everything becomes smooth. Specifically, when removing noise points, first use any one of the linear prediction algorithm and the adjacent sampling point superposition algorithm to estimate that the target frame is normal at the noise point of the target frame before it is disturbed by noise. Value, and replace the amplitude value of the noise point with the estimated normal value, then smooth the target frame in the time domain to make the target frame continuous in the time domain, and then perform frequency filtering on the target frame so that the target frame is in frequency Continuous on the domain.
  • time-domain smoothing refers to smoothing the endpoints on both sides of the target frame to replace the noise point with the amplitude value.
  • the method used is mean filtering, that is, the mean value of the neighboring endpoints on both sides is used to replace the The values of the two endpoints can be used to make the target frame after peak replacement change more smoothly over time.
  • the above frequency domain filtering refers to smoothing the target frame in the frequency domain. Since the energy of the target frame at the noise point is larger than the energy of the adjacent audio frame, sound breakage may even occur, especially in higher frequency bands, and after the above peak replacement and time domain smoothing steps, it may This makes the target frame more abrupt in the high frequency band (such as above 16kHz), so it is necessary to perform frequency domain smoothing after the target frame is smoothed in the time domain.
  • the frequency domain smoothing method adopted in the embodiment of the present application is to use a zero phase shift digital filter to perform low-pass filtering on the target frame, and the cut-off frequency of the low-pass filter is equal to the average spectral height of the audio signal before framing.
  • the target frame after the noise point repair will not add new repair traces, that is, the recording signal before and after processing is in the frequency domain. There is a very good consistency.
  • any one of the aforementioned linear prediction algorithm and adjacent sampling point superposition algorithm can be used to estimate the normal value of noise points
  • both methods have their own advantages.
  • the characteristic of the former is to obtain the predicted value based on the minimum mean square error criterion based on the past sampling points of the signal, with large calculation amount and smooth processing effect, which is suitable for the application scenarios of offline non-real-time system;
  • the power exponent decreases to obtain the predicted value, the calculation amount is small, the processing effect is moderate, and it is suitable for application scenarios of online real-time systems.
  • the device of the embodiment of this application can choose between the two methods according to the application scenario.
  • the repaired target frame is output in a preset format
  • the preset format is any one of audio format wav, audio format mp3, and audio format flac.
  • the user can set the preset format, and this application does not limit the preset format.
  • the embodiments of the present application include at least the following invention points.
  • the embodiments of the present application successively input multiple audio frames into the buffer module, and sequentially process the audio frames in the central processing unit of the buffer module.
  • the embodiment of this application compares the audio features of the target frame with the audio features of the audio frames adjacent to the target frame, which can accurately The noise points in the target frame are detected.
  • the embodiment of the present application can also remove the above noise points. Therefore, the embodiments of the present application can automatically repair a large number of audio signals, and provide an efficient, accurate and fast audio repair method.
  • the audio repair method may include:
  • the audio signal to be repaired is acquired.
  • audio signals include recording signals and electronic sound synthesis signals.
  • the audio signal may be an audio signal recorded by the audio restoration device, or an audio signal obtained from other terminal devices via the Internet.
  • the audio signal includes a recording signal and an electronic sound synthesis signal.
  • the recording signal includes external sounds (such as telephone recording) recorded by the local audio repair device or other terminal equipment through peripheral devices (such as microphones), and the electronic sound synthesis signal includes the local audio repair device or other terminal equipment through audio synthesis Electronic sounds synthesized by software (such as robot singing).
  • the format, size, and number of channels of the above audio signal are not limited.
  • the format is any one of audio format wav, audio format mp3, and audio format flac, and the channels are mono, dual, and multi-channel. Any of the channels.
  • the audio signal is framed before obtaining the multiple audio frames.
  • the buffer module is composed of multiple processing units in sequence, and the processing unit located at the center of the multiple processing units is the central processing unit.
  • a plurality of audio frames are sequentially and continuously input into the buffer module.
  • the plurality of audio frames are all or part of the audio frames obtained after framing an audio signal.
  • the audio frames are continuous, and are continuously input to the head processing unit of the buffer module in the order in the audio signal before framing, and then sequentially transmitted to the processing unit connected to the head processing unit.
  • the cache module includes multiple processing units, and the multiple processing units are connected in sequence, the processing unit at the first position is the head processing unit, and the processing unit at the center is the central processing unit.
  • the aforementioned audio signal and the aforementioned multiple audio frames are all time-domain signals.
  • the length of the processing unit in the buffer module can be set to any length value. Generally speaking, it can be set to a length of at least two audio frames. For example, when the length of the processing unit is the length of two audio frames, 50% of the signals overlap between adjacent audio frames during the audio frame processing, thereby avoiding the truncation effect and making the signal processing result smoother .
  • FIG. 4 is a schematic diagram of the structure of a buffer module.
  • the processing unit in the middle position is the central processing unit
  • the processing unit for audio frame input is the header Processing unit
  • each processing unit contains two audio frames
  • the entire buffer module includes a total of 10 audio frames.
  • a single processing unit is a black and bold solid line rectangular frame containing a dotted line, and the two numbers in the frame respectively represent the number of the corresponding input audio frame.
  • each processing unit of the buffer module has no audio frame input, so the signals in the buffer module are all 0.
  • the header processing unit at the right end of the buffer module inputs the first audio frame
  • the header processing unit includes 0 and the first audio frame
  • the processing unit at the center of the buffer is the audio frame including the 5th frame and the 6th frame.
  • the audio signal can be processed in the unit of audio frame in the subsequent steps.
  • the advantage of this is that it can meet the needs of audio real-time processing. It can repair the audio signal while outputting the audio signal. Okay audio frame.
  • each processing unit in the above-mentioned buffer module is filled with audio frames
  • all the audio frames contained in the central processing unit in the buffer module are used as target frames.
  • one processing unit may contain at least An audio frame.
  • the audio frames are sequentially input to other processing units by the head processing unit, so when the audio frames are input to the central processing unit, they are As the target frame, perform subsequent processing such as noise point detection and repair, so the embodiment of this application can process each audio frame without omission, and each audio frame is very short, and the length of the general audio frame is 20 milliseconds Up to 50 milliseconds, which is less than the length of a phoneme, and contains enough vibration periods to meet the needs of signal processing.
  • the length of the audio frame can be set to 20 milliseconds, 25 milliseconds, 30 milliseconds, 32 milliseconds, 40 milliseconds and 50 milliseconds
  • the length is not equal, so this application processes the audio signal in units of audio frames, which can greatly improve the efficiency of detection in an almost carpet search manner.
  • the peak point of the target frame is determined, that is, the point where the amplitude value is the maximum value.
  • the buffer module shown in FIG. 5 includes 5 processing units, and the peak point in the target frame in the central processing unit is determined as point A.
  • an audio signal segment of a preset length is acquired in the buffer module with the peak point of the target frame as the center.
  • the audio signal segment can be any preset value.
  • an audio signal segment with a preset length of four processing units is acquired in the buffer module with the peak point of the target frame as the center.
  • the multi-stage interval includes a first processing interval, a second processing interval, and an intermediate processing interval located between the first processing interval and the second processing interval.
  • the intermediate processing interval includes a first sub-interval, a second sub-interval, and a The central sub-interval between a sub-interval and the second sub-interval.
  • the audio signal segment is divided to obtain multiple segments, including a first processing segment, an intermediate processing segment, and The second processing interval, wherein the length of the first processing interval and the second processing interval is one audio frame, the length of the intermediate processing interval is two audio frames, and the intermediate processing includes the first sub-interval, the central sub-interval and the second sub-interval.
  • Interval, the length of the first subinterval and the second subinterval is 1/4 the length of the processing unit, and the length of the center subinterval is 3/2 the length of the processing unit.
  • the audio features of the target frame and the multiple segments are extracted respectively.
  • the audio features include peak value, signal energy, average power, local peak ratio, and autocorrelation coefficient. At least one of roll-off rate, sound intensity, and peak duration. among them.
  • whether the peak point in the target frame is a noise point is determined according to the multi-segment interval and the audio characteristics of the target frame, and the judgment standard is as follows:
  • the first judgment is to judge whether the amplitude value of the peak point of the target frame is both greater than the amplitude value of the peak point of the central sub-interval and the amplitude value of the peak point of the intermediate processing interval. This judgment is used to determine the amplitude value of the peak point of the target frame Whether it is the only and largest among adjacent signals;
  • the second judgment is to judge whether the amplitude value of the peak point of the target frame is greater than the amplitude value of the peak point of the first subinterval and the amplitude value of the peak point of the second subinterval at the same time, and the greater part exceeds the first threshold value. It is used to judge whether the amplitude value of the peak point of the target frame is obviously higher than the adjacent signal;
  • the third judgment is to judge whether the signal energy in the intermediate processing interval is greater than the second threshold, and this judgment is used to judge whether the energy of the peak point of the target frame is too large;
  • the fourth judgment is to judge whether the ratio of the average power of the intermediate processing interval to the average power of the audio signal segment is greater than the third threshold, and judge whether the signal-to-noise ratio occupied by the peak point of the target frame is too large;
  • the fifth judgment is to judge whether the ratio of the amplitude value of the peak point of the target frame to the total amplitude value of the peak point of the audio signal segment is greater than the fourth threshold. This judgment is used to determine whether the amplitude value of the peak point of the target frame is in the audio signal segment. Whether the sum of the amplitude values of the peak points in each interval is too large;
  • the sixth judgment is to judge whether the roll-off rate of the autocorrelation coefficient of the audio signal segment is greater than the fourth threshold. This judgment is used to judge whether the peak point of the target frame appears as a sharp pulse signal, otherwise it is a continuous pulse signal;
  • the seventh judgment is to judge whether the sound intensity of the central processing section is greater than the sound intensity of the first processing section and the sound intensity of the second processing section. This judgment is used to judge whether the peak point of the target frame appears as a high-energy pulse;
  • the eighth judgment is to judge whether the peak duration of the target frame is less than the fifth threshold, and this judgment is used to judge whether the peak point of the target frame appears as a short-term pulse;
  • the embodiment of the present application judges whether the peak point in the target frame is a noise point by serially performing the above eight judgments. If the results of the above eight judgments are all positive results, it is determined that the peak point of the target frame is Noise points, but as long as the result of any one of the above judgments is negative, the peak point of the target frame is not a noise point.
  • the embodiment of the present application mainly determines whether the peak point of the target frame is a noise point, because the foregoing has proved that the length of the audio frame is already very short, so even if the target frame includes multiple audio frames, the target frame includes two and The possibility of the above noise points is extremely small, so in combination with the short-term high-energy characteristics of the noise points that need to be detected, this application only needs to determine whether the peak point in the target frame is a noise point, so this application can leave no omissions. It can locate the noise point very quickly, thereby improving the efficiency and accuracy of detection.
  • the target frame is repaired.
  • the repair process includes removing the noise point, and removing the noise point of the target frame in the time domain and the frequency domain. Everything becomes smooth. Specifically, when removing noise points, first use any one of the linear prediction algorithm and the adjacent sampling point superposition algorithm to estimate that the target frame is normal at the noise point of the target frame before it is disturbed by noise. Value, and replace the amplitude value of the noise point with the estimated normal value, then smooth the target frame in the time domain to make the target frame continuous in the time domain, and then perform frequency filtering on the target frame so that the target frame is in frequency Continuous on the domain.
  • time-domain smoothing refers to smoothing the endpoints on both sides of the target frame to replace the noise point with the amplitude value.
  • the method used is mean filtering, that is, the mean value of the neighboring endpoints on both sides is used to replace the The values of the two endpoints can be used to make the target frame after peak replacement change more smoothly over time.
  • the above frequency domain filtering refers to smoothing the target frame in the frequency domain. Since the energy of the target frame at the noise point is larger than the energy of the adjacent audio frame, sound breakage may even occur, especially in higher frequency bands, and after the above peak replacement and time domain smoothing steps, it may This makes the target frame more abrupt in the high frequency band (such as above 16kHz), so it is necessary to perform frequency domain smoothing after the target frame is smoothed in the time domain.
  • the frequency domain smoothing method adopted in the embodiment of the present application is to use a zero phase shift digital filter to perform low-pass filtering on the target frame, and the cut-off frequency of the low-pass filter is equal to the average spectral height of the audio signal before framing.
  • the target frame after the noise point repair will not add new repair traces, that is, the recording signal before and after processing is in the frequency domain. There is a very good consistency.
  • any one of the aforementioned linear prediction algorithm and adjacent sampling point superposition algorithm can be used to estimate the normal value of noise points
  • both methods have their own advantages.
  • the characteristic of the former is to obtain the predicted value based on the minimum mean square error criterion based on the past sampling points of the signal, with large calculation amount and smooth processing effect, which is suitable for the application scenarios of offline non-real-time system;
  • the power exponent decreases to obtain the predicted value, the calculation amount is small, the processing effect is moderate, and it is suitable for application scenarios of online real-time systems.
  • the device of the embodiment of this application can choose between the two methods according to the application scenario.
  • the method based on linear prediction can be selected for peak replacement.
  • the repaired target frame is output in a preset format
  • the preset format is any one of audio format wav, audio format mp3, and audio format flac.
  • the user can set the preset format, and this application does not limit the preset format.
  • the embodiment of this application describes the process of the audio repair method in more detail.
  • the audio signal is first acquired, and multiple audio frames obtained after the audio signal is divided into frames are input into the buffer module, and then The audio frame in the central processing unit in the buffer module is used as the target frame, and the peak point in the target frame is determined, and with the peak point as the center, the audio signal segment of the preset length is obtained in the buffer unit, and then the audio The signal segment is re-divided to obtain multiple segments, and the noise points in the target frame are determined according to the audio characteristics of other audio frames in the buffer module, and finally the target frame is repaired and output. Therefore, the embodiments of the present application can automatically repair a large number of audio signals, and provide an efficient, accurate and fast audio repair method.
  • FIG. 6 is a schematic block diagram of an audio repair device provided by an embodiment of the present application.
  • the audio repair device of this embodiment includes: an input unit 601, an acquisition unit 602, a detection unit 603, and a repair unit 604. specific:
  • the input unit 601 is configured to sequentially input a plurality of audio frames into a buffer module, the buffer module is composed of a plurality of processing units in sequence, and the processing unit located at the center of the multiple processing units is the central processing unit;
  • the acquiring unit 602 is configured to use at least one audio frame included in the central processing unit as a target frame;
  • the detection unit 603 is configured to detect noise points in the target frame that are short-term high-energy pulses according to the audio characteristics of the multiple audio frames in the buffer module;
  • the repair unit 604 is configured to repair the target frame, and the repair is used to remove noise points in the target frame.
  • the audio restoration device further includes a determining unit 605, configured to determine the peak point of the target frame; the acquiring unit 602, further configured to center the peak point, Acquire an audio signal segment of a preset length in the buffer module; the audio repair device further includes a segmentation unit 606, configured to divide the audio signal segment into multiple sections, including a first processing section and a second processing section , And an intermediate processing interval located between the first processing interval and the second processing interval, the intermediate processing interval including a first subinterval, a second subinterval, and an intermediate processing interval located between the first subinterval and the The center sub-interval between the second sub-interval; the audio repair device further includes an extracting unit 607, configured to extract audio features of the target frame and the multi-segment interval respectively, and the audio features include peak values, signal energy, At least one of the average power, the proportion of the local peak value, the roll-off rate of the autocorrelation coefficient, the sound intensity, and the peak duration; the determining unit 605 is further configured
  • the above determining unit 605 is specifically configured to determine whether the amplitude value of the peak point of the target frame is greater than the amplitude value of the peak point of the central subinterval and the amplitude value of the peak point of the intermediate processing interval at the same time; and , Determining whether the amplitude value of the peak point of the target frame is greater than the amplitude value of the peak point of the first sub-interval and the amplitude value of the peak point of the second sub-interval at the same time, and the greater part exceeds the first threshold; And, determine whether the signal energy of the intermediate processing interval is greater than a second threshold; and determine whether the ratio of the average power of the intermediate processing interval to the average power of the audio signal segment is greater than a third threshold; and, determine whether the Whether the ratio of the amplitude value of the peak point of the target frame to the total amplitude value of the peak point of the audio signal segment is greater than the fourth threshold; and determining whether the roll-off rate of the autocorrelation coefficient of the audio signal
  • the audio restoration device further includes an estimation unit 608, configured to use an estimation algorithm to estimate the normal value of the noise point of the target frame before the target frame is not disturbed by noise;
  • the audio repair device further includes a replacement unit 609, configured to replace the amplitude value of the noise point with the normal value;
  • the audio repair device further includes a smoothing unit 610, configured to perform temporal smoothing on the target frame , So that the target frame is continuous in the time domain; the smoothing unit 610 is further configured to perform frequency filtering on the target frame, so that the target frame is continuous in the frequency domain.
  • estimation algorithm includes any one of a linear prediction algorithm and an adjacent sampling point superposition algorithm.
  • the above-mentioned acquiring unit 602 is configured to acquire an audio signal to be repaired, and the audio signal includes a recording signal; the audio repairing device further includes a framing unit 611, configured to The audio signal is framed to obtain the multiple audio frames.
  • the audio device further includes an output unit 612, which is further configured to output the repaired target frame in a preset format, the preset format being audio format wav, audio format mp3, and audio format. Any of the formats flac.
  • the embodiments of the present application include at least the following invention points.
  • the present application divides the audio signal into multiple audio frames with a short length and sequentially inputs them into the buffer module to ensure accuracy and accuracy. Locate each noise point in the audio.
  • this application can accurately detect the noise point in the target frame by comparing the audio feature of the target frame with the audio signal adjacent to the target frame.
  • this application can detect the above-mentioned Noise points can also be removed. Therefore, the embodiments of the present application can automatically repair a large number of audio signals, and provide an efficient, accurate and fast audio repair method.
  • FIG. 7 is a schematic block diagram of an audio repair device according to another embodiment of the present application.
  • the audio repair device in this embodiment as shown in the figure may include a processor 710, a communication interface 720, an input device 730, an output device 740, and a memory 750.
  • the aforementioned processor 710, communication interface 720, input device 730, output device 740, and memory 750 are connected through a bus 760. specific:
  • the aforementioned processor 710 is specifically configured to determine whether the amplitude value of the peak point of the target frame is both greater than the amplitude value of the peak point of the central sub-interval and the amplitude value of the peak point of the intermediate processing interval; and , Determining whether the amplitude value of the peak point of the target frame is greater than the amplitude value of the peak point of the first sub-interval and the amplitude value of the peak point of the second sub-interval at the same time, and the greater part exceeds the first threshold; And, determine whether the signal energy of the intermediate processing interval is greater than a second threshold; and determine whether the ratio of the average power of the intermediate processing interval to the average power of the audio signal segment is greater than a third threshold; and, determine whether the Whether the ratio of the amplitude value of the peak point of the target frame to the total amplitude value of the peak point of the audio signal segment is greater than the fourth threshold; and determining whether the roll-off rate of the autocorrelation coefficient of the audio signal segment is
  • the input device 730 or the communication interface 720 is used for the acquisition unit 602, which is used for acquiring the audio signal to be repaired, and the audio signal includes a recording signal; the aforementioned processor 710 is also used for performing analysis.
  • the function of the frame unit 611 is used for framing the audio signal to obtain the multiple audio frames.
  • the output device 740 is used to perform the function of the output unit 612, and is also used to output the repaired target frame in a preset format, the preset format being an audio format wav and an audio format mp3. , Any one of audio format flac.
  • the processor 710 may be a central processing unit (Central Processing Unit, CPU), and the processor 710 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs). ), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 750 may include a read-only memory and a random access memory, and provides instructions and data to the processor 710. A part of the memory 750 may also include a non-volatile random access memory. For example, the memory 750 may also store device type information.
  • the computer-readable storage medium may be an internal storage unit of the audio repair device of any of the foregoing embodiments, such as the hard disk or memory of the audio repair device.
  • the computer-readable storage medium can also be an external storage device of the audio repair device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, and a flash memory equipped on the audio repair device. Card (Flash Card) etc.
  • the computer-readable storage medium may also include both an internal storage unit of the audio repair device and an external storage device.
  • the computer-readable storage medium is used to store computer programs and other programs and data required by the audio repair device.
  • the computer-readable storage medium can also be used to temporarily store data that has been output or will be output.
  • the disclosed audio repair device and audio repair method can be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.
  • the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present application.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application is essentially or the part that contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium It includes several instructions to make a computer device (which may be a personal computer, audio repair device, or network device, etc.) execute all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)

Abstract

一种音频修复方法、设备及可读存储介质,其中方法包括:将多个音频帧依次输入到缓存模块中,所述缓存模块由多个处理单元顺序组成,位于所述多个处理单元的中心位置的处理单元为中心处理单元(201);将所述中心处理单元中所包含的至少一个音频帧作为目标帧(202);根据所述缓存模块中的多个音频帧的音频特征,检测所述目标帧中表现为短时高能量脉冲的噪声点(203);修复所述目标帧,所述修复用于去除所述目标帧中的噪声点(204)。该方法先将多个音频帧连续的输入到缓存模块中之后,依次对位于缓存模块的中心位置的音频帧中表现为短时高能量脉冲的噪声点进行检测和修复,是一种高效、准确和快速的音频修复方法。

Description

一种音频修复方法、设备及可读存储介质 技术领域
本申请涉及信号处理领域,尤其涉及一种音频修复方法、设备及可读存储介质。
背景技术
由于干扰信号的影响,音频中时常会产生出一种听感上为“咔哒”声的噪声,这种噪声实际上是一种存在于音频中的短时高能量脉冲,有着能量大和持续时间小的特点。
现目前针对音频中的这种表现为高能量短时脉冲的噪声,还没有比较好的检测以及修复的方法。
发明内容
本申请实施例提供一种音频修复方法,可以检测并修复音频中表现为短时高能量脉冲的噪声点。
第一方面,本申请实施例提供了一种音频修复方法,该方法包括:
将多个音频帧依次输入到缓存模块中,所述缓存模块由多个处理单元顺序组成,位于所述多个处理单元的中心位置的处理单元为中心处理单元;
将所述中心处理单元中所包含的至少一个音频帧作为目标帧;
根据所述缓存模块中的多个音频帧的音频特征,检测所述目标帧中表现为短时高能量脉冲的噪声点;
修复所述目标帧,所述修复用于去除所述目标帧中的噪声点。
第二方面,本申请实施例提供了一种音频修复设备,该音频修复设备包括用于执行上述第一方面的音频修复方法的单元,该音频修复设备包括:
输入单元,用于将多个音频帧依次输入到缓存模块中,所述缓存模块由多个处理单元顺序组成,位于所述多个处理单元的中心位置的处理单元为中心处理单元;
获取单元,用于将所述中心处理单元中所包含的至少一个音频帧作为目标帧;
检测单元,用于根据所述缓存模块中的多个音频帧的音频特征,检测所述目标帧中表现为短时高能量脉冲的噪声点;
修复单元,用于修复所述目标帧,所述修复用于去除所述目标帧中的噪声点。
第三方面,本申请实施例提供了一种音频修复设备,包括处理器、通信接口、输入设备、输出设备和存储器,所述处理器、通信接口、输出设备、输出设备和存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,用以执行如第一方面所述的方法
第四方面,本申请实施例提供了一种计算机可读存储介质,其特征在于,所述计算机存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令被处理器执行,用以执行如第一方面所述的方法。
本申请将多个音频帧依次输入到缓存模块中,然后把缓存模块中的中心处理单元中的音频帧作为目标帧,并根据缓存模块中的多个音频帧的音频特征,来确定目标帧中的噪声点,最后修复该目标帧。可见,本申请至少包含以下几个发明点,首先本申请通过将多个音频帧连续的输入到缓存模块中,并依次对缓存模块中的中心处理单元中的音频帧进行处理,来毫不遗漏且准确的检测和修复到每个音频帧中的噪声点,其次本申请通过将目标帧的音频特征,与目标帧相邻的音频帧的音频特征进行比较,可以准确的检测出目标帧中的噪声点,最后本申请除了可以检测出上述噪声点,还可以去除上述噪声点。因此,本申请可以自动地对大量音频信号进行修复,提供了一种高效、准确以及快速的音频修复方法。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍。
图1是本申请实施例提供的一种音频修复方法的应用场景示意图;
图2是本申请实施例提供的一种音频修复方法的示意流程图;
图3是本申请另一实施例提供的一种音频修复方法的示意流程图;
图4是本申请实施例提供的一种将多个音频帧输入缓存模块的示意图;
图5是本申请实施例提供的一种缓存重定位和修复的示意图;
图6是本申请实施例提供的一种音频修复设备的示意性框图;
图7是本申请实施例提供的一种音频修复设备的结构性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本申请主要应用于音频修复设备,该音频修复设备可以是传统音频修复设备或者本申请第三实施例和第四实施例所描述的音频修复设备,本申请对此不做限制。当音频修复设备发送数据的时候,按照预设格式进行对数据的特性进行记录并传送,其中,数据的特性包括时间、地点、类型等。
音频信号因为受到噪声的干扰,而产生一种表现为短时高能量脉冲的噪声点,使得音频信号在播放的时候,产生一种听感上类似于“咔哒”的噪声。为了解决这个问题,本申请提出了一种用于检测并修复音频信号中噪声点的方法。
为了能够更好地理解本发明实施例,下面将结合图1,对应用本发明实施例的方法进行介绍,本发明实施例可以应用于音频修复设备对音频信号进行检测和修复的场景中。
请参见图1,音频修复设备(例如图中的手机)通过麦克风录音获得音频信号,或者从互联网接收到音频信号,然后对音频信号中表现为短时高能量脉冲的噪声点进行检测和修复。如图1所示,虚线圈出了未经处理的音频信号中的噪声点,该噪声点表现为短时高能量脉冲,在该音频信号经过音频修复设备的处理之后,虚线所圈出的噪声点被很好的修复了。具体的,本音频修复方法可以大致分为五个阶段来执行,包括信号输入,缓存重定位,噪声点检测,噪声点修复,以及信号输出。接下来本申请对该五个阶段依次进行介绍。
本申请首先对获得的任意格式的音频信号进行分帧,得到多个音频帧,然后将该多个音频帧依次且连续的输入到缓存模块中。缓存模块如图4所示,该缓存模块由5个处理单元顺序连接而成,位于首位的处理单元为头部处理单元,而位于该5个处理单元的中心位置的信号处理单位为中心处理单元,并且每个处理单元中可容纳两个音频帧,音频帧由缓存模块的头部处理单元输入,按照处理单元连接的顺序,传输到其他处理单元中。一般来说,缓存单元可以包含三个及三个以上任意奇数个的处理单元,而且缓存模块中的处理单元的长度可以被设置为任意长度值,一般来说,可以设置为至少两个音频帧以上的长度。例如当处理单元的长度为两个音频帧的长度的时候,在对音频帧处理的过程中相邻音频帧之间有50%的信号重叠,从而避免了截断效应,使得信号处理的结果更加平滑。
在音频帧输入到缓存模块,并每个处理单元都充满了音频帧之后,对缓存模块中的多个音频帧进行缓存重定位,即以音频帧中为噪声点的可能性最大的点为中心,重新获取需要进行检测的音频信号段。具体的,如图5所示,将中心处理单元中的音频帧作为目标帧,确定目标帧的峰值点(幅度值的绝对值为最大值的点),并基于该峰值点在缓存模块中获取长度为4个处理单元的音频信号段,最后将该音频信号段重新划分为多段区间,该多个区间包括第一处理区间、第二处理区间,以及位于第一处理区间和第二处理区间之间的中间处理区间,中间处理区间包括第一子区间、第二子区间,以及位于第一子区间和第二子区间的之间的中心子区间。需要说明的是,由于本申请用于修复的噪声点表现为短时高能量脉冲,于是噪声点最可能为音频帧中的峰值点,且音频信号分帧之后得到帧信号的长度已经很短了,一个音频帧中存在两个噪声点的可能性极小,因此本申请只需要检测是否为噪声点即可。
提取上述音频信号段中的多段区间的音频特征,包括括峰值、信号能量、平均功率、局部峰值占比、自相关系数的滚降速率、声强和峰值持续时间中的至少一种。然后根据该多段区间的音频特征,判断目标帧的峰值点是否为噪声点。
在确定出目标帧中的峰值点为噪声点之后,对该目标帧进行修复。修复目标帧主要包括三个步骤,第一步是去除噪声点,利用线性预测或者相邻采样点 叠加的方法来估计目标帧在受到噪声干扰之前的正常值(即正常幅度值),然后将噪声点的幅度值替换为上述正常值,第二步是采取时域平滑的方法,将替换了幅度值的目标帧在时域上变得平滑,第三步是采取了频率滤波的方法,将替换了幅度值的目标帧在频域上变得平滑。经过这三个步骤之后,目标帧的修复完成。
在完成对目标帧的修复之后,以预设格式输出修复之后的目标帧,该预设格式为音频格式wav、音频格式mp3、音频格式flac中的任意一种。
可见,本申请通过将音频信号分帧得到的多个很短的音频帧之后输入到缓存模块中,然后将缓存模块的中心处理单元中的音频帧作为目标帧,并对该目标帧中进行处理,从而本申请可以毫无遗留的处理到每个音频帧,然后以中心处理单元中的音频帧中的峰值点为中心,在缓存模块中获取预设长度的音频信号段,并对该音频信号段进行划分得到多段区间。再然后,提取该多段区间中的音频特征,然后根据该多段区间的音频特征,判断中心处理单元的音频帧的峰值点是否为噪声点,若是则通过幅度值替换,时域平滑和频域平滑来修复目标帧,修复完成后以任意格式输出该目标帧。于是,本申请最大的好处是,可以对音频信号中的噪声点进行高效的、毫无遗漏的以及准确的自动检测和修复,能适应于对海量音频的快速处理需求,省去了大量的人工成本和时间成本,具有很高的经济价值和技术优势。
需要说明的是,图1、图4和图5中所示内容为一种举例,并不构成对本发明实施例的限定,因为本申请不对缓存单元包含的处理单元的个数,处理单元的长度,缓存重定位中获取的音频信号段的长度,划分的多个区间的长度,音频信号的来源,音频修复设备等进行限定。例如缓存单元除了可以包含5个处理单元,还可以包含7个处理单元,音频信号段的长度可以是4个处理单元的长度,也可是6个处理单元的长度,对音频信号段划分得到多段区间中的第一子区间的长度可以是1/4个处理单元,也可以是1/2个处理单元的长度,音频信号除了可以是直接录音得到的,也可以是通过互联网接收等任意方式得到的,而且对音频信号进行处理的音频修复设备除了可以是手机,还可以是电脑,服务器等任意终端设备。
参见图2,是本申请实施例提供一种音频修复方法的示意流程图,如图2所示音频修复方法可包括:
201:将多个音频帧依次输入到缓存模块中,缓存模块由多个处理单元顺序组成,位于该多个处理单元的中心位置的处理单元为中心处理单元。
在本申请实施例中,首先将多个音频帧依次且连续的输入到缓存模块中,该多个音频帧是将一个音频信号分帧之后得到的全部音频帧或者部分音频帧,于是该多个音频帧之间是连续的,并按照在分帧之前在音频信号中的顺序来连续输入到缓存模块的头部处理单元,然后顺序传输到与该头部处理单元相连接的处理单元,需要说明的是,缓存模块包含多个处理单元,且该多个处理单元是顺序连接的,位于首位的处理单元为头部处理单元,位于中心位置的处理单元为中心处理单元。其中,上述音频信号和上述多个音频帧都是时域信号。
需要说明的是,缓存模块中的处理单元的长度可以被设置为任意长度值,一般来说,可以设置为至少两个音频帧以上的长度。例如当处理单元的长度为两个音频帧的长度的时候,在对音频帧处理的过程中相邻音频帧之间有50%的信号重叠,从而避免了截断效应,使得信号处理的结果更加平滑。
举例来说,如图4所示的是一种缓存模块的结构示意图,以缓存模块包含5块处理单元为例,位于中间位置的处理单元为中心处理单元,音频帧输入的处理单元为头部处理单元,每块处理单元包含两个音频帧,整个缓存模块共包括10个音频帧。如图4所示,单个处理单元为一块包含虚线的黑色加粗实线矩形框,框中的两个数字分别代表着对应输入的音频帧的编号。在初始状态,缓存模块各个处理单元都没有音频帧输入,因此缓存模块中的信号都为0,当缓存模块的右端处的头部处理单元输入第1个音频帧之后,头部处理单元包括了0和第1个音频帧;当连续输入到第10帧信号时,缓存中心的处理单元是包括第5帧和第6帧的音频帧。
可见,音频信号分帧之后,后续步骤就可以以音频帧为单位对音频信号进行处理,这样做的好处是可以满足音频实时处理的需要,能一边对音频信号进行修复,一边输出音频信号已经修复好了的音频帧。
在另一种可实现的实施方式中,在上述将多个音频帧依次输入到缓存模块中之前,获取待修复的音频信号,然后对该音频信号进行分帧得到上述输入到 缓存模块中多个音频帧。其中,音频信号包括录音信号和电子音合成信号。
本实施方式中,在将上述多个音频帧输入到缓存模块之前,先获取需要修复的音频信号,然后对该音频信号进行分帧之后才得到上述多个音频帧,该音频信号可以是本音频修复设备录制的音频信号,也可以是通过互联网从其他终端设备处获取的音频信号,该音频信号包括录音信号和电子音合成信号。其中,录音信号包括本端的音频修复设备或者其他终端设备通过外设设备(例如麦克风)等录制的外界声音(例如电话录音),电子音合成信号为本端的音频修复设备或者其他终端设备通过音频合成软件合成的电子音(例如机器人唱歌)。
需要说明的是,上述音频信号的格式、大小、声道数不限,格式为音频格式wav、音频格式mp3、音频格式flac中的任意一种,声道为单声道、双声道和多声道中的任意一种。
202:将上述中心处理单元中所包含的至少一个音频帧作为目标帧。
在本申请实施例中,在上述缓存模块中每个处理单元都充满了音频帧之后,将缓存模块中的中心处理单元中包含的所有音频帧作为目标帧,其中,一个处理单元中可以包含至少一个音频帧。
可见,由于本申请实施例在将上述多个音频帧输入到缓存模块中之后,音频帧是由头部处理单元依次顺序输入到其他处理单元,于是当音频帧输入到中心处理单元之后,就被作为目标帧进行后续的噪声点检测和修复等处理,于是本申请实施例可以毫无遗漏地对每个音频帧进行处理,而每个音频帧是非常短的,一般音频帧的长度为20毫秒至50毫秒,即小于一个音素的长度,又包含足够多的振动周期,能满足信号处理的需求,音频帧的长度可以设置为20毫秒、25毫秒、30毫秒、32毫秒、40毫秒和50毫秒不等的长度,于是本申请通过以音频帧为单位来对音频信号进行处理,可以以几乎是地毯式搜索的方式来大大提高检测的效率。
203:根据上述缓存模块中的多个音频帧的音频特征,检测上述目标帧中表现为短时高能量脉冲的噪声点。
在本申请实施例中,提取上述缓存模块中的多个音频帧的音频特征,然后进行对比,然后根据对比情况判断目标帧中是否含有表现为短时高能量脉冲的噪声点。具体的,首先确定目标帧的峰值点,即幅度值为最大值的点,并以该 峰值点为中心,在缓存模块中获取预设长度的音频信号段,然后将音频信号段划分为多段区间,最后分别提取该目标帧和该多段区间的音频特征,并根据目标帧和多段区间的音频特征在目标帧中确定噪声点。
需要说明的是,音频特征包括峰值、信号能量、平均功率、局部峰值占比、自相关系数的滚降速率、声强和峰值持续时间中的至少一种。其中。峰值指的是区间内最大的幅度值;信号能量指的是信号的幅度值的平方的积分;平均功率指的是信号在有限区间或者一个周期内的功率的平均值;局部峰值比指的是信号的峰值在所有各个信号的峰值之和的占比;自相关系数的滚降速率指的是信号的自相关系数的下降的速率;声强指的是单位时间内通过垂直于声波传播方向的单位面积的能量,与声波振幅的平方成正比;峰值持续时间指的是信号的峰值的能量大于等于预设值所维持的时间。
还需要说明的是,多段区间包含第一处理区间、第二处理区间,以及位于第一处理区间和第二处理区间之间的中间处理区间,中间处理区间包括第一子区间、第二子区间,以及位于所述第一子区间和所述第二子区间的之间的中心子区间。
举例来说,缓存重定位如图5所示,假设缓存模块包含5个处理单元,确定中心处理单元中的目标帧中的峰值点为A点,以该峰值点为中心在缓存模块中获取预设长度为四个处理单元的音频信号段,然后对该音频信号段进行划分,得到多段区间,包含第一处理区间、中间处理区间和第二处理区间,其中,第一处理区间和第二处理区间的长度为一个音频帧,中间处理区间的长度为两个音频帧,而中间处理包含第一子区间、中心子区间和第二子区间,第一子区间和第二子区间的长度为1/4个处理单元长度,中心子区间的长度为3/2个处理单元长度。在划分得到该多段区间之后,提取该多段区间和目标帧的音频特征,并根据该多段区间和目标帧的音频特征来确定目标帧中的峰值点是否为噪声点,判定的标准如下:
第一个判断,判断目标帧的峰值点的幅度值是否同时大于中心子区间的峰值点的幅度值以及中间处理区间的峰值点的幅度值,该判断用于判断目标帧的峰值点的幅度值是否在相邻信号中唯一且最大;
第二个判断,判断目标帧的峰值点的幅度值是否同时大于第一子区间的峰 值点的幅度值和第二子区间的峰值点的幅度值,且大于的部分超过第一阈值,该判断用于判断目标帧的峰值点的幅度值是否比相邻信号明显凸起;
第三个判断,判断中间处理区间的信号能量是否大于第二阈值,该判断用于判断目标帧的峰值点的能量是否过大;
第四个判断,判断中间处理区间的平均功率与音频信号段的平均功率的比值是否大于第三阈值,判断用于判断目标帧的峰值点所占信噪比是否过大;
第五个判断,判断目标帧的峰值点的幅度值与音频信号段的峰值点的总幅度值的比值是否大于第四阈值,该判断用于判断目标帧的峰值点的幅度值在音频信号段的各个区间的峰值点的幅度值之和的占比是否过大;
第六个判断,判断音频信号段的自相关系数的滚降速率是否大于第四阈值,该判断用于判断目标帧的峰值点是否表现为一段尖锐脉冲信号,反之则是一段连续脉冲信号;
第七个判断,判断中心处理区间的声强是否大于第一处理区间的声强和第二处理区间的声强,该判断用于判断目标帧的峰值点是否表现为高能量脉冲;
第八个判断,判断目标帧的峰值持续时间是否小于第五阈值,该判断用于判断目标帧的峰值点是否表现为短时脉冲;
需要注意的是,本申请实施例通过串行进行上述八个判断来判断目标帧中的峰值点是否为噪声点,若上述八个判断的结果都是肯定结果,则确定目标帧的峰值点为噪声点,但只要上述任意一个判断的结果是否定结果的话,目标帧的峰值点就不为噪声点。
可见,本申请实施例主要是判断目标帧的峰值点是否为噪声点,因为前述已经证明了音频帧的长度已经很短,因此即使目标帧包含多个音频帧,在目标帧中包含两个及以上的噪声点的可能性极小,于是结合需要检测的噪声点的短时高能量的特性,本申请只需要确定目标帧中的峰值点是否为噪声点,于是本申请既可以毫无遗漏,又可以非常快速的定位到噪声点,从而提高了检测的效率和精确度。
204:修复上述目标帧,该修复用于去除目标帧中的噪声点。
本申请实施例中,在确定上述目标帧的峰值点为噪声点之后,对上述目标帧进行修复,修复的过程包括去除噪声点,以及将去除了噪声点的目标帧在时 域上和频域上都变得平滑。具体的,在去除噪声点的时候,先采用线性预测算法及相邻采样点叠加算法中的任意一种估计算法,来估计目标帧在未受到噪声干扰之前,该目标帧的噪声点处的正常值,并将该噪声点的幅度值替换为估计得到的正常值,然后对目标帧进行时域平滑,使得目标帧在时域上连续,以及再对目标帧进行频率滤波,使得目标帧在频域上连续。
需要说明的是,上述时域平滑指的是在目标帧去替换了幅度值的噪声点的两侧的端点进行平滑,采用的方法是均值滤波,即利用两侧的端点分别临近的均值代替该两个端点的值,通过这个方法可以使得峰值替换后的目标帧随着时间的变化更加平滑。
还需要说明的是,上述频域滤波指的是使得目标帧在频域上的变得平滑。由于目标帧在噪声点的能量比相邻的音频帧的能量偏大,甚至会出现破音的情况,特别是在较高频段更加明显,并且经过上面峰值替换和时域平滑的步骤之后,可能使得目标帧在高频段(如16kHz以上)更加突兀,因此需要在对目标帧进行时域平滑之后再进行频域平滑。本申请实施例中采用的频域平滑的方法是采用零相移数字滤波器对目标帧进行低通滤波,并且低通滤波器的截止频率等于在分帧之前的音频信号的平均频谱高度。这样做的优势在于,相比于分帧之前的音频信号能量很弱、或者无能量的高频段区间,噪声点修复之后的目标帧不会增加新的修复痕迹,即处理前后录音信号在频域上有很好的一致性。
在另一种可实现的实施方式中,虽然上述提到的线性预测算法和相邻采样点叠加算法中的任意一种算法都可以用于估算噪声点的正常值,但是两种方法各有优点,前者的特点是将信号过去的采样点基于最小均方误差准则获得预测值,计算量大、处理效果平滑,适用于离线非实时系统的应用场景;后者的特点是对相邻采样点进行幂指数下降获得预测值,计算量小、处理效果适中,适用于在线实时系统的应用场景。基于两种方法不同的优点,本申请实施例的设备可以根据应用场景来在该两种方法中进行选择,在终端实时系统里,由于对实时性要求较高,可以选择基于相邻采样点叠加的方法进行峰值替换;在本地离线系统里,由于对实时性无过高要求,保证处理性能,可以选择基于线性预测的方法进行峰值替换。
在另一种可实现的实施方式中,上述目标帧被修复之后,以预设格式输出 修复之后的目标帧,预设格式为音频格式wav、音频格式mp3、音频格式flac中的任意一种。用户可以对该预设格式进行设置,本申请不对该预设格式进行限定。
本申请实施例将多个音频帧依次输入到缓存模块中,然后把缓存模块中的中心处理单元中的音频帧作为目标帧,并根据缓存模块中的多个音频帧的音频特征,来确定目标帧中的噪声点,最后修复该目标帧。可见,本申请实施例至少包含以下几个发明点,首先本申请实施例通过将多个音频帧连续的输入到缓存模块中,并依次对缓存模块中的中心处理单元中的音频帧进行处理,来毫不遗漏且准确的检测和修复到每个音频帧中的噪声点,其次本申请实施例通过将目标帧的音频特征,与目标帧相邻的音频帧的音频特征进行比较,可以准确的检测出目标帧中的噪声点,最后本申请实施例除了可以检测出上述噪声点,还可以去除上述噪声点。因此,本申请实施例可以自动地对大量音频信号进行修复,提供了一种高效、准确以及快速的音频修复方法。
参见图3,是本申请实施例提供另一种音频修复方法的示意流程图,如图3所示音频修复方法可包括:
301:获取待修复的音频信号,该音频信号包括录音信号。
本申请实施例中,获取待修复的音频信号。其中,音频信号包括录音信号和电子音合成信号。该音频信号可以是本音频修复设备录制的音频信号,也可以是通过互联网从其他终端设备处获取的音频信号,该音频信号包括录音信号和电子音合成信号。其中,录音信号包括本端的音频修复设备或者其他终端设备通过外设设备(例如麦克风)等录制的外界声音(例如电话录音),电子音合成信号为本端的音频修复设备或者其他终端设备通过音频合成软件合成的电子音(例如机器人唱歌)。
需要说明的是,上述音频信号的格式、大小、声道数不限,格式为音频格式wav、音频格式mp3、音频格式flac中的任意一种,声道为单声道、双声道和多声道中的任意一种。
302:对上述音频信号进行分帧得到多个音频帧。
本实施方式中,在获取到需要修复的音频信号之后,对该音频信号进行分 帧之后才得到上述多个音频帧,
303:将多个音频帧依次输入到缓存模块中,缓存模块由多个处理单元顺序组成,位于该多个处理单元的中心位置的处理单元为中心处理单元。
在本申请实施例中,首先将多个音频帧依次且连续的输入到缓存模块中,该多个音频帧是将一个音频信号分帧之后得到的全部音频帧或者部分音频帧,于是该多个音频帧之间是连续的,并按照在分帧之前在音频信号中的顺序来连续输入到缓存模块的头部处理单元,然后顺序传输到与该头部处理单元相连接的处理单元,需要说明的是,缓存模块包含多个处理单元,且该多个处理单元是顺序连接的,位于首位的处理单元为头部处理单元,位于中心位置的处理单元为中心处理单元。其中,上述音频信号和上述多个音频帧都是时域信号。
需要说明的是,缓存模块中的处理单元的长度可以被设置为任意长度值,一般来说,可以设置为至少两个音频帧以上的长度。例如当处理单元的长度为两个音频帧的长度的时候,在对音频帧处理的过程中相邻音频帧之间有50%的信号重叠,从而避免了截断效应,使得信号处理的结果更加平滑。
举例来说,如图4所示的是一种缓存模块的结构示意图,以缓存模块包含5块处理单元为例,位于中间位置的处理单元为中心处理单元,音频帧输入的处理单元为头部处理单元,每块处理单元包含两个音频帧,整个缓存模块共包括10个音频帧。如图4所示,单个处理单元为一块包含虚线的黑色加粗实线矩形框,框中的两个数字分别代表着对应输入的音频帧的编号。在初始状态,缓存模块各个处理单元都没有音频帧输入,因此缓存模块中的信号都为0,当缓存模块的右端处的头部处理单元输入第1个音频帧之后,头部处理单元包括了0和第1个音频帧;当连续输入到第10帧信号时,缓存中心的处理单元是包括第5帧和第6帧的音频帧。
可见,音频信号分帧之后,后续步骤就可以以音频帧为单位对音频信号进行处理,这样做的好处是可以满足音频实时处理的需要,能一边对音频信号进行修复,一边输出音频信号已经修复好了的音频帧。
304:将上述中心处理单元中所包含的至少一个音频帧作为目标帧。
在本申请实施例中,在上述缓存模块中每个处理单元都充满了音频帧之后,将缓存模块中的中心处理单元中包含的所有音频帧作为目标帧,其中,一个处 理单元中可以包含至少一个音频帧。
可见,由于本申请实施例在将上述多个音频帧输入到缓存模块中之后,音频帧是由头部处理单元依次顺序输入到其他处理单元,于是当音频帧输入到中心处理单元之后,就被作为目标帧进行后续的噪声点检测和修复等处理,于是本申请实施例可以毫无遗漏地对每个音频帧进行处理,而每个音频帧是非常短的,一般音频帧的长度为20毫秒至50毫秒,即小于一个音素的长度,又包含足够多的振动周期,能满足信号处理的需求,音频帧的长度可以设置为20毫秒、25毫秒、30毫秒、32毫秒、40毫秒和50毫秒不等的长度,于是本申请通过以音频帧为单位来对音频信号进行处理,可以以几乎是地毯式搜索的方式来大大提高检测的效率。
305:确定上述目标帧的峰值点。
在本申请实施例中,确定目标帧的峰值点,即幅度值为最大值的点。
举例来说,如图5所示的缓存模块包含5个处理单元,确定中心处理单元中的目标帧中的峰值点为A点。
306:以上述目标帧的峰值点为中心,在缓存模块中获取预设长度的音频信号段。
在本申请实施例中,以目标帧的峰值点为中心,在缓存模块中获取预设长度的音频信号段。该音频信号段可以是任意预设值。
举例来说,如图5所示,以目标帧的峰值点为中心在缓存模块中获取预设长度为四个处理单元的音频信号段。
307:将上述音频信号段划分为多段区间。
然后将音频信号段划分为多段区间,最后
多段区间包含第一处理区间、第二处理区间,以及位于第一处理区间和第二处理区间之间的中间处理区间,中间处理区间包括第一子区间、第二子区间,以及位于所述第一子区间和所述第二子区间的之间的中心子区间。
举例来说,如图5所示,在得到获取到预设长度为四个处理单元的音频信号段之后,对该音频信号段进行划分,得到多段区间,包含第一处理区间、中间处理区间和第二处理区间,其中,第一处理区间和第二处理区间的长度为一个音频帧,中间处理区间的长度为两个音频帧,而中间处理包含第一子区间、 中心子区间和第二子区间,第一子区间和第二子区间的长度为1/4个处理单元长度,中心子区间的长度为3/2处理单元长度。
308:分别提取上述目标帧和上述多段区间的音频特征。
本申请实施例中,在将音频信号段重新划分为多段区间之后,分别提取目标帧和该多段区间的音频特征,音频特征包括峰值、信号能量、平均功率、局部峰值占比、自相关系数的滚降速率、声强和峰值持续时间中的至少一种。其中。峰值指的是区间内最大的幅度值;信号能量指的是信号的幅度值的平方的积分;平均功率指的是信号在有限区间或者一个周期内的功率的平均值;局部峰值比指的是信号的峰值在所有各个信号的峰值之和的占比;自相关系数的滚降速率指的是信号的自相关系数的下降的速率;声强指的是单位时间内通过垂直于声波传播方向的单位面积的能量,与声波振幅的平方成正比;峰值持续时间指的是信号的峰值的能量大于等于预设值所维持的时间。
309:根据上述目标帧和上述多段区间的音频特征在上述目标帧中确定噪声点。
在本申请实施例中,根据该多段区间和目标帧的音频特征来确定目标帧中的峰值点是否为噪声点,判定的标准如下:
第一个判断,判断目标帧的峰值点的幅度值是否同时大于中心子区间的峰值点的幅度值以及中间处理区间的峰值点的幅度值,该判断用于判断目标帧的峰值点的幅度值是否在相邻信号中唯一且最大;
第二个判断,判断目标帧的峰值点的幅度值是否同时大于第一子区间的峰值点的幅度值和第二子区间的峰值点的幅度值,且大于的部分超过第一阈值,该判断用于判断目标帧的峰值点的幅度值是否比相邻信号明显凸起;
第三个判断,判断中间处理区间的信号能量是否大于第二阈值,该判断用于判断目标帧的峰值点的能量是否过大;
第四个判断,判断中间处理区间的平均功率与音频信号段的平均功率的比值是否大于第三阈值,判断用于判断目标帧的峰值点所占信噪比是否过大;
第五个判断,判断目标帧的峰值点的幅度值与音频信号段的峰值点的总幅度值的比值是否大于第四阈值,该判断用于判断目标帧的峰值点的幅度值在音频信号段的各个区间的峰值点的幅度值之和的占比是否过大;
第六个判断,判断音频信号段的自相关系数的滚降速率是否大于第四阈值,该判断用于判断目标帧的峰值点是否表现为一段尖锐脉冲信号,反之则是一段连续脉冲信号;
第七个判断,判断中心处理区间的声强是否大于第一处理区间的声强和第二处理区间的声强,该判断用于判断目标帧的峰值点是否表现为高能量脉冲;
第八个判断,判断目标帧的峰值持续时间是否小于第五阈值,该判断用于判断目标帧的峰值点是否表现为短时脉冲;
需要注意的是,本申请实施例通过串行进行上述八个判断来判断目标帧中的峰值点是否为噪声点,若上述八个判断的结果都是肯定结果,则确定目标帧的峰值点为噪声点,但只要上述任意一个判断的结果是否定结果的话,目标帧的峰值点就不为噪声点。
可见,本申请实施例主要是判断目标帧的峰值点是否为噪声点,因为前述已经证明了音频帧的长度已经很短,因此即使目标帧包含多个音频帧,在目标帧中包含两个及以上的噪声点的可能性极小,于是结合需要检测的噪声点的短时高能量的特性,本申请只需要确定目标帧中的峰值点是否为噪声点,于是本申请既可以毫无遗漏,又可以非常快速的定位到噪声点,从而提高了检测的效率和精确度。
310:修复上述目标帧,该修复用于去除目标帧中的噪声点。
本申请实施例中,在确定上述目标帧的峰值点为噪声点之后,对上述目标帧进行修复,修复的过程包括去除噪声点,以及将去除了噪声点的目标帧在时域上和频域上都变得平滑。具体的,在去除噪声点的时候,先采用线性预测算法及相邻采样点叠加算法中的任意一种估计算法,来估计目标帧在未受到噪声干扰之前,该目标帧的噪声点处的正常值,并将该噪声点的幅度值替换为估计得到的正常值,然后对目标帧进行时域平滑,使得目标帧在时域上连续,以及再对目标帧进行频率滤波,使得目标帧在频域上连续。
需要说明的是,上述时域平滑指的是在目标帧去替换了幅度值的噪声点的两侧的端点进行平滑,采用的方法是均值滤波,即利用两侧的端点分别临近的均值代替该两个端点的值,通过这个方法可以使得峰值替换后的目标帧随着时间的变化更加平滑。
还需要说明的是,上述频域滤波指的是使得目标帧在频域上的变得平滑。由于目标帧在噪声点的能量比相邻的音频帧的能量偏大,甚至会出现破音的情况,特别是在较高频段更加明显,并且经过上面峰值替换和时域平滑的步骤之后,可能使得目标帧在高频段(如16kHz以上)更加突兀,因此需要在对目标帧进行时域平滑之后再进行频域平滑。本申请实施例中采用的频域平滑的方法是采用零相移数字滤波器对目标帧进行低通滤波,并且低通滤波器的截止频率等于在分帧之前的音频信号的平均频谱高度。这样做的优势在于,相比于分帧之前的音频信号能量很弱、或者无能量的高频段区间,噪声点修复之后的目标帧不会增加新的修复痕迹,即处理前后录音信号在频域上有很好的一致性。
在另一种可实现的实施方式中,虽然上述提到的线性预测算法和相邻采样点叠加算法中的任意一种算法都可以用于估算噪声点的正常值,但是两种方法各有优点,前者的特点是将信号过去的采样点基于最小均方误差准则获得预测值,计算量大、处理效果平滑,适用于离线非实时系统的应用场景;后者的特点是对相邻采样点进行幂指数下降获得预测值,计算量小、处理效果适中,适用于在线实时系统的应用场景。基于两种方法不同的优点,本申请实施例的设备可以根据应用场景来在该两种方法中进行选择,在终端实时系统里,由于对实时性要求较高,可以选择基于相邻采样点叠加的方法进行峰值替换;在本地离线系统里,由于对实时性无过高要求,保证处理性能,可以选择基于线性预测的方法进行峰值替换。
311:以预设格式输出修复之后的目标帧。
本申请实施例中,上述目标帧被修复之后,以预设格式输出修复之后的目标帧,预设格式为音频格式wav、音频格式mp3、音频格式flac中的任意一种。用户可以对该预设格式进行设置,本申请不对该预设格式进行限定。
相比于上一申请实施例,本申请实施例更加详细的描述了本音频修复方法的过程,先获取音频信号,将音频信号分帧之后得到的多个音频帧输入到缓存模块中,然后将缓存模块中的中心处理单元中的音频帧作为目标帧,并确定目标帧中的峰值点,并以该峰值点为中心,在缓存单元中获取预设长度的音频信号段,再然后对该音频信号段重新进行划分,得到多段区间,根据缓存模块中其他音频帧的音频特征,来确定目标帧中的噪声点,最后修复并输出该目标帧。 因此,本申请实施例可以自动地对大量音频信号进行修复,并提供了一种高效、准确以及快速的音频修复方法。
需要说明的是,上文对各个实施例的描述倾向于强调各个实施例之间的不同之处,其相同或相似之处可以互相参考,为了简洁,本文不再赘述。
本申请实施例还提供一种音频修复设备,该音频修复设备用于执行前述任一项的音频修复方法的单元。具体地,参见图6,是本申请实施例提供的一种音频修复设备的示意框图。本实施例的音频修复设备包括:输入单元601、获取单元602、检测单元603以及修复单元604。具体的:
输入单元601,用于将多个音频帧依次输入到缓存模块中,所述缓存模块由多个处理单元顺序组成,位于所述多个处理单元的中心位置的处理单元为中心处理单元;
获取单元602,用于将所述中心处理单元中所包含的至少一个音频帧作为目标帧;
检测单元603,用于根据所述缓存模块中的多个音频帧的音频特征,检测所述目标帧中表现为短时高能量脉冲的噪声点;
修复单元604,用于修复所述目标帧,所述修复用于去除所述目标帧中的噪声点。
在另一种可实现的实施方式中,所述音频修复设备还包括确定单元605,用于确定所述目标帧的峰值点;所述获取单元602,还用于以所述峰值点为中心,在所述缓存模块中获取预设长度的音频信号段;所述音频修复设备还包括分段单元606,用于将所述音频信号段划分为多段区间,包括第一处理区间、第二处理区间,以及位于所述第一处理区间和所述第二处理区间之间的中间处理区间,所述中间处理区间包括第一子区间、第二子区间,以及位于所述第一子区间和所述第二子区间的之间的中心子区间;所述音频修复设备还包括提取单元607,用于分别提取所述目标帧和所述多段区间的音频特征,所述音频特征包括峰值、信号能量、平均功率、局部峰值占比、自相关系数的滚降速率、声强和峰值持续时间中的至少一种;所述确定单元605,还用于根据所述目标帧和所述多段区间的音频特征在所述目标帧中确定所述噪声点。
具体的,上述确定单元605,具体用于判断所述目标帧的峰值点的幅度值是否同时大于所述中心子区间的峰值点的幅度值以及所述中间处理区间的峰值点的幅度值;且,判断所述目标帧的峰值点的幅度值是否同时大于所述第一子区间的峰值点的幅度值和所述第二子区间的峰值点的幅度值,且大于的部分超过第一阈值;且,判断所述中间处理区间的信号能量是否大于第二阈值;且,判断所述中间处理区间的平均功率与所述音频信号段的平均功率的比值是否大于第三阈值;且,判断所述目标帧的峰值点的幅度值与所述音频信号段的峰值点的总幅度值的比值是否大于第四阈值;且,判断所述音频信号段的自相关系数的滚降速率是否大于第四阈值;且,判断所述中心处理区间的声强是否大于所述第一处理区间的声强和所述第二处理区间的声强;且,判断所述目标帧的峰值持续时间是否小于第五阈值;若是,则确定所述目标帧的峰值点为所述噪声点。
在另一种可实现的实施方式中,所述音频修复设备还包括估计单元608,用于采用估计算法估计所述目标帧在未受到噪声干扰之前,所述目标帧的噪声点的正常值;所述音频修复设备还包括替换单元609,用于将所述噪声点的幅度值替换为所述正常值;所述音频修复设备还包括平滑单元610,用于对所述目标帧进行时域平滑,使得所述目标帧在时域上连续;所述平滑单元610,还用于对所述目标帧进行频率滤波,使得所述目标帧在频域上连续。
需要说明的是,所述估计算法包括线性预测算法及相邻采样点叠加算法中的任意一种。
在另一种可实现的实施方式中,上述获取单元602,用于获取待修复的音频信号,所述音频信号包括录音信号;所述音频修复设备还包括分帧单元611,用于对所述音频信号进行分帧得到所述多个音频帧。
在另一种可实现的实施方式中,所述音频设备还包括输出单元612,还用于以预设格式输出修复之后的目标帧,所述预设格式为音频格式wav、音频格式mp3、音频格式flac中的任意一种。
本申请实施例通过输入单元将多个音频帧依次输入到缓存模块中,获取单元将缓存模块中的中心处理单元中的音频帧作为目标帧,检测单元根据缓存模块中其他音频帧的音频特征,来确定目标帧中的噪声点,最后修复单元修复该 目标帧。可见,本申请实施例至少包含以下几个发明点,首先本申请通过将音频信号分帧为长度很短的多个音频帧,并依次连续的输入到缓存模块中,来毫不遗漏且准确的定位到音频中各个噪声点,其次本申请通过将目标帧的音频特征,与目标帧相邻的音频信号进行比较,可以准确的检测出目标帧中的噪声点,最后本申请除了可以检测出上述噪声点,还可以去除上述噪声点。因此,本申请实施例可以自动地对大量音频信号进行修复,并提供了一种高效、准确以及快速的音频修复方法。
参见图7,是本申请另一实施例提供的一种音频修复设备示意框图。如图所示的本实施例中的音频修复设备可以包括:处理器710、通信接口720、输入设备730、输出设备740和存储器750。上述处理器710、通信接口720、输入设备730、输出设备740和存储器750通过总线760连接。具体的:
处理器710,用于执行输入单元601的功能,用于将多个音频帧依次输入到缓存模块中,所述缓存模块由多个处理单元顺序组成,位于所述多个处理单元的中心位置的处理单元为中心处理单元;还用于执行获取单元602的功能,用于将所述中心处理单元中所包含的至少一个音频帧作为目标帧;还用于执行检测单元603的功能,用于根据所述缓存模块中的多个音频帧的音频特征,检测所述目标帧中表现为短时高能量脉冲的噪声点;还用于执行修复单元604的功能,用于修复所述目标帧,所述修复用于去除所述目标帧中的噪声点。
在另一种可实现的实施方式中,上述处理单元还用于执行确定单元605的功能,用于确定所述目标帧的峰值点;还用于以所述峰值点为中心,在所述缓存模块中获取预设长度的音频信号段;还用于执行分段单元606的功能,用于将所述音频信号段划分为多段区间,包括第一处理区间、第二处理区间,以及位于所述第一处理区间和所述第二处理区间之间的中间处理区间,所述中间处理区间包括第一子区间、第二子区间,以及位于所述第一子区间和所述第二子区间的之间的中心子区间;还用于执行提取单元607的功能,用于分别提取所述目标帧和所述多段区间的音频特征,所述音频特征包括峰值、信号能量、平均功率、局部峰值占比、自相关系数的滚降速率、声强和峰值持续时间中的至少一种;还用于根据所述目标帧和所述多段区间的音频特征在所述目标帧中 确定所述噪声点。
具体的,上述处理器710,具体用于判断所述目标帧的峰值点的幅度值是否同时大于所述中心子区间的峰值点的幅度值以及所述中间处理区间的峰值点的幅度值;且,判断所述目标帧的峰值点的幅度值是否同时大于所述第一子区间的峰值点的幅度值和所述第二子区间的峰值点的幅度值,且大于的部分超过第一阈值;且,判断所述中间处理区间的信号能量是否大于第二阈值;且,判断所述中间处理区间的平均功率与所述音频信号段的平均功率的比值是否大于第三阈值;且,判断所述目标帧的峰值点的幅度值与所述音频信号段的峰值点的总幅度值的比值是否大于第四阈值;且,判断所述音频信号段的自相关系数的滚降速率是否大于第四阈值;且,判断所述中心处理区间的声强是否大于所述第一处理区间的声强和所述第二处理区间的声强;且,判断所述目标帧的峰值持续时间是否小于第五阈值;若是,则确定所述目标帧的峰值点为所述噪声点。
在另一种可实现的实施方式中,上述处理器710还用于执行估计单元608的功能,用于采用估计算法估计所述目标帧在未受到噪声干扰之前,所述目标帧的噪声点的正常值;上述处理器710还用于执行替换单元609的功能,用于将所述噪声点的幅度值替换为所述正常值;上述处理器710还用于执行平滑单元610的功能,对所述目标帧进行时域平滑,使得所述目标帧在时域上连续,还用于对所述目标帧进行频率滤波,使得所述目标帧在频域上连续。
需要说明的是,所述估计算法包括线性预测算法及相邻采样点叠加算法中的任意一种。
在另一种可实现的实施方式中,输入设备730或者通信接口720用于获取单元602,用于获取待修复的音频信号,所述音频信号包括录音信号;上述处理器710还用于执行分帧单元611的功能,用于对所述音频信号进行分帧得到所述多个音频帧。
在另一种可实现的实施方式中,输出设备740用于执行输出单元612的功能,还用于以预设格式输出修复之后的目标帧,所述预设格式为音频格式wav、音频格式mp3、音频格式flac中的任意一种。
应当理解,在本申请实施例中,所称处理器710可以是中央处理单元 (Central Processing Unit,CPU),该处理器710还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
该存储器750可以包括只读存储器和随机存取存储器,并向处理器710提供指令和数据。存储器750的一部分还可以包括非易失性随机存取存储器。例如,存储器750还可以存储设备类型的信息。
计算机可读存储介质可以是前述任一实施例的音频修复设备的内部存储单元,例如音频修复设备的硬盘或内存。计算机可读存储介质也可以是音频修复设备的外部存储设备,例如音频修复设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,计算机可读存储介质还可以既包括音频修复设备的内部存储单元也包括外部存储设备。计算机可读存储介质用于存储计算机程序以及音频修复设备所需的其他程序和数据。计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。
具体实现中,本申请实施例中所描述的处理器710可执行本申请实施例提供的音频修复方法的第二实施例和第三实施例中所描述的实现方式,也可执行本申请实施例所描述的音频修复设备的实现方式,在此不再赘述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同音频修复方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的音频修复设备和单元的具体工作过程,可以参考前述音频修复方法实施例 中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的音频修复设备和音频修复方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本申请实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,音频修复设备,或者网络设备等)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。

Claims (10)

  1. 一种音频修复方法,其特征在于,包括:
    将多个音频帧依次输入到缓存模块中,所述缓存模块由多个处理单元顺序组成,位于所述多个处理单元的中心位置的处理单元为中心处理单元;
    将所述中心处理单元中所包含的至少一个音频帧作为目标帧;
    根据所述缓存模块中的多个音频帧的音频特征,检测所述目标帧中表现为短时高能量脉冲的噪声点;
    修复所述目标帧,所述修复用于去除所述目标帧中的噪声点。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述缓存模块中的多个音频帧的音频特征,检测所述目标帧中表现为短时高能量脉冲的噪声点,包括:
    确定所述目标帧的峰值点;
    以所述峰值点为中心,在所述缓存模块中获取预设长度的音频信号段;
    将所述音频信号段划分为多段区间,包括第一处理区间、第二处理区间,以及位于所述第一处理区间和所述第二处理区间之间的中间处理区间,所述中间处理区间包括第一子区间、第二子区间,以及位于所述第一子区间和所述第二子区间的之间的中心子区间;
    分别提取所述目标帧和所述多段区间的音频特征,所述音频特征包括峰值、信号能量、平均功率、局部峰值占比、自相关系数的滚降速率、声强和峰值持续时间中的至少一种;
    根据所述目标帧和所述多段区间的音频特征在所述目标帧中确定所述噪声点。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述目标帧和所述多段区间的音频特征在所述目标帧中确定所述噪声点,包括:
    判断所述目标帧的峰值点的幅度值是否同时大于所述中心子区间的峰值点的幅度值以及所述中间处理区间的峰值点的幅度值;
    且,判断所述目标帧的峰值点的幅度值是否同时大于所述第一子区间的峰值点的幅度值和所述第二子区间的峰值点的幅度值,且大于的部分超过第一阈值;
    且,判断所述中间处理区间的信号能量是否大于第二阈值;
    且,判断所述中间处理区间的平均功率与所述音频信号段的平均功率的比值是否大于第三阈值;
    且,判断所述目标帧的峰值点的幅度值与所述音频信号段的峰值点的总幅度值的比值是否大于第四阈值;
    且,判断所述音频信号段的自相关系数的滚降速率是否大于第四阈值;
    且,判断所述中心处理区间的声强是否大于所述第一处理区间的声强和所述第二处理区间的声强;
    且,判断所述目标帧的峰值持续时间是否小于第五阈值;
    若是,则确定所述目标帧的峰值点为所述噪声点。
  4. 根据权利要求1所述的方法,其特征在于,所述修复所述目标帧,包括:
    采用估计算法估计所述目标帧在未受到噪声干扰之前,所述目标帧的噪声点的正常值;
    将所述噪声点的幅度值替换为所述正常值;
    对所述目标帧进行时域平滑,使得所述目标帧在时域上连续;
    对所述目标帧进行频率滤波,使得所述目标帧在频域上连续。
  5. 根据权利要求4所述的方法,其特征在于,所述估计算法包括线性预测算法及相邻采样点叠加算法中的任意一种。
  6. 根据权利要求1至5的任意一项所述的方法,其特征在于,所述将多个音频帧依次输入到缓存模块中之前,还包括;
    获取待修复的音频信号,所述音频信号包括录音信号;
    对所述音频信号进行分帧得到所述多个音频帧。
  7. 根据权利要求1至6任意一项所述的方法,其特征在于,所述修复所述目标帧之后,还包括:
    以预设格式输出修复之后的目标帧,所述预设格式包括wav、mp3和flac中的任意一种。
  8. 一种音频修复设备,其特征在于,包括:
    输入单元,用于将多个音频帧依次输入到缓存模块中,所述缓存模块由多个处理单元顺序组成,位于所述多个处理单元的中心位置的处理单元为中心处理单元;
    获取单元,用于将所述中心处理单元中所包含的至少一个音频帧作为目标帧;
    检测单元,用于根据所述缓存模块中的多个音频帧的音频特征,检测所述目标帧中表现为短时高能量脉冲的噪声点;
    修复单元,用于修复所述目标帧,所述修复用于去除所述目标帧中的噪声点。
  9. 一种音频修复设备,其特征在于,包括处理器、通信接口、输入设备、输出设备和存储器,所述处理器、通信接口、输出设备、输出设备和存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,用以执行如权利要求1-7任一项所述的方法。
  10. 一种计算机可读存储介质,其特征在于,所述计算机存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令被处理器执行,用以执行如权利要求1-7任一项所述的方法。
PCT/CN2019/093719 2019-05-13 2019-06-28 一种音频修复方法、设备及可读存储介质 WO2020228107A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/627,103 US11990150B2 (en) 2019-05-13 2019-06-28 Method and device for audio repair and readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910397254.4 2019-05-13
CN201910397254.4A CN110136735B (zh) 2019-05-13 2019-05-13 一种音频修复方法、设备及可读存储介质

Publications (1)

Publication Number Publication Date
WO2020228107A1 true WO2020228107A1 (zh) 2020-11-19

Family

ID=67573554

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/093719 WO2020228107A1 (zh) 2019-05-13 2019-06-28 一种音频修复方法、设备及可读存储介质

Country Status (3)

Country Link
US (1) US11990150B2 (zh)
CN (1) CN110136735B (zh)
WO (1) WO2020228107A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110136735B (zh) * 2019-05-13 2021-09-28 腾讯音乐娱乐科技(深圳)有限公司 一种音频修复方法、设备及可读存储介质
CN111583943A (zh) * 2020-03-24 2020-08-25 普联技术有限公司 音频信号的处理方法、装置、安防摄像头及存储介质
CN112071331B (zh) * 2020-09-18 2023-05-30 平安科技(深圳)有限公司 语音文件修复方法、装置、计算机设备及存储介质
CN112525337B (zh) * 2020-11-18 2023-06-02 西安因联信息科技有限公司 一种针对机械压力机振动监测数据预处理方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160260442A1 (en) * 2015-03-02 2016-09-08 Faraday Technology Corp. Method and apparatus for detecting noise of audio signals
CN107346665A (zh) * 2017-06-29 2017-11-14 广州视源电子科技股份有限公司 音频检测的方法、装置、设备以及存储介质
CN108449497A (zh) * 2018-03-12 2018-08-24 广东欧珀移动通信有限公司 语音通话数据处理方法、装置、存储介质及移动终端
CN109087632A (zh) * 2018-08-17 2018-12-25 平安科技(深圳)有限公司 语音处理方法、装置、计算机设备及存储介质
CN109545246A (zh) * 2019-01-21 2019-03-29 维沃移动通信有限公司 一种声音处理方法及终端设备

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW224191B (zh) * 1992-01-28 1994-05-21 Qualcomm Inc
US7058572B1 (en) * 2000-01-28 2006-06-06 Nortel Networks Limited Reducing acoustic noise in wireless and landline based telephony
JP3773817B2 (ja) * 2001-07-13 2006-05-10 三洋電機株式会社 ノイズキャンセラ
JP2005197813A (ja) * 2003-12-26 2005-07-21 Pioneer Electronic Corp ノイズ除去装置および受信機
US7139701B2 (en) * 2004-06-30 2006-11-21 Motorola, Inc. Method for detecting and attenuating inhalation noise in a communication system
US8656415B2 (en) * 2007-10-02 2014-02-18 Conexant Systems, Inc. Method and system for removal of clicks and noise in a redirected audio stream
CN101477801B (zh) * 2009-01-22 2012-01-04 东华大学 一种检测和消除数字音频信号中脉冲噪声的方法
CN101882442A (zh) * 2009-05-04 2010-11-10 上海音乐学院 历史音频噪声检测与消除方法
JP5839795B2 (ja) * 2010-12-01 2016-01-06 キヤノン株式会社 撮像装置および情報処理システム
US9092337B2 (en) * 2011-01-31 2015-07-28 Intelligent Intellectual Property Holdings 2 Llc Apparatus, system, and method for managing eviction of data
US8996762B2 (en) * 2012-02-28 2015-03-31 Qualcomm Incorporated Customized buffering at sink device in wireless display system based on application awareness
US20150071463A1 (en) * 2012-03-30 2015-03-12 Nokia Corporation Method and apparatus for filtering an audio signal
CN103534755B (zh) * 2012-04-20 2017-03-01 松下电器(美国)知识产权公司 声音处理装置、声音处理方法、程序及集成电路
CN104143341B (zh) * 2013-05-23 2015-10-21 腾讯科技(深圳)有限公司 爆音检测方法和装置
US9832299B2 (en) * 2013-07-17 2017-11-28 Empire Technology Development Llc Background noise reduction in voice communication
CN104715771B (zh) * 2013-12-12 2018-08-21 展讯通信(上海)有限公司 信号降噪
US10755726B2 (en) * 2015-01-07 2020-08-25 Google Llc Detection and suppression of keyboard transient noise in audio streams with auxiliary keybed microphone
CN106157967A (zh) * 2015-04-28 2016-11-23 杜比实验室特许公司 脉冲噪声抑制
CN105118513B (zh) * 2015-07-22 2018-12-28 重庆邮电大学 一种基于混合激励线性预测MELP的1.2kb/s低速率语音编解码方法
CN107689228B (zh) * 2016-08-04 2020-05-12 腾讯科技(深圳)有限公司 一种信息处理方法及终端
CN110136735B (zh) * 2019-05-13 2021-09-28 腾讯音乐娱乐科技(深圳)有限公司 一种音频修复方法、设备及可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160260442A1 (en) * 2015-03-02 2016-09-08 Faraday Technology Corp. Method and apparatus for detecting noise of audio signals
CN107346665A (zh) * 2017-06-29 2017-11-14 广州视源电子科技股份有限公司 音频检测的方法、装置、设备以及存储介质
CN108449497A (zh) * 2018-03-12 2018-08-24 广东欧珀移动通信有限公司 语音通话数据处理方法、装置、存储介质及移动终端
CN109087632A (zh) * 2018-08-17 2018-12-25 平安科技(深圳)有限公司 语音处理方法、装置、计算机设备及存储介质
CN109545246A (zh) * 2019-01-21 2019-03-29 维沃移动通信有限公司 一种声音处理方法及终端设备

Also Published As

Publication number Publication date
US20220254365A1 (en) 2022-08-11
US11990150B2 (en) 2024-05-21
CN110136735B (zh) 2021-09-28
CN110136735A (zh) 2019-08-16

Similar Documents

Publication Publication Date Title
WO2020228107A1 (zh) 一种音频修复方法、设备及可读存储介质
US10522164B2 (en) Method and device for improving audio processing performance
CN110265064B (zh) 音频爆音检测方法、装置和存储介质
WO2017202292A1 (zh) 一种回声时延跟踪方法及装置
US20150179187A1 (en) Voice Quality Monitoring Method and Apparatus
CN105118522B (zh) 噪声检测方法及装置
US10269371B2 (en) Techniques for decreasing echo and transmission periods for audio communication sessions
CN103886871A (zh) 语音端点的检测方法和装置
US9997168B2 (en) Method and apparatus for signal extraction of audio signal
CN108806707B (zh) 语音处理方法、装置、设备及存储介质
CN106228993B (zh) 一种消除噪声的方法和装置以及电子设备
CN114627854A (zh) 语音识别方法、语音识别系统及存储介质
CN110400573B (zh) 一种数据处理的方法及装置
CN112802498B (zh) 语音检测方法、装置、计算机设备和存储介质
CN111986694B (zh) 基于瞬态噪声抑制的音频处理方法、装置、设备及介质
CN104240697A (zh) 一种音频数据的特征提取方法及装置
WO2020186695A1 (zh) 语音信息的批量处理方法、装置、计算机设备及存储介质
CN106504756A (zh) 嵌入式语音识别系统及方法
CN114743571A (zh) 一种音频处理方法、装置、存储介质及电子设备
US9361905B2 (en) Voice data playback speed conversion method and voice data playback speed conversion device
CN111145770A (zh) 音频处理方法和装置
CN116504264B (zh) 音频处理方法、装置、设备及存储介质
CN112671376B (zh) 信号的削波检测方法、装置、终端和计算机可读存储介质
WO2019100327A1 (zh) 一种信号处理方法、装置及终端
CN117198312A (zh) 一种用于智能眼镜的语音交互处理方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19928906

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03.03.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19928906

Country of ref document: EP

Kind code of ref document: A1