CN110265064B - Audio frequency crackle detection method, device and storage medium - Google Patents

Audio frequency crackle detection method, device and storage medium Download PDF

Info

Publication number
CN110265064B
CN110265064B CN201910506938.3A CN201910506938A CN110265064B CN 110265064 B CN110265064 B CN 110265064B CN 201910506938 A CN201910506938 A CN 201910506938A CN 110265064 B CN110265064 B CN 110265064B
Authority
CN
China
Prior art keywords
frame
audio
signal
audio signal
short
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910506938.3A
Other languages
Chinese (zh)
Other versions
CN110265064A (en
Inventor
陈洲旋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Music Entertainment Technology Shenzhen Co Ltd
Original Assignee
Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Music Entertainment Technology Shenzhen Co Ltd filed Critical Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority to CN201910506938.3A priority Critical patent/CN110265064B/en
Priority to PCT/CN2019/093409 priority patent/WO2020248308A1/en
Publication of CN110265064A publication Critical patent/CN110265064A/en
Application granted granted Critical
Publication of CN110265064B publication Critical patent/CN110265064B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Abstract

The embodiment of the application discloses an audio frequency popping detection method, an audio frequency popping detection device and a storage medium, when the audio frequency signal is subjected to popping detection, the audio frequency signal to be detected can be obtained, the audio frequency signal is divided into a plurality of frame signals, then, the short-time energy difference of two adjacent frame signals is calculated, then, the frame signal meeting a preset condition interval is obtained according to the short-time energy difference, a sudden change audio frequency signal is obtained, then, the frequency spectrum flatness of the sudden change audio frequency signal is calculated, and if the frequency spectrum flatness is larger than a preset flat value, the audio frequency signal is determined to have popping; the scheme can accurately detect whether the audio signal has the plosive.

Description

Audio frequency crackle detection method, device and storage medium
Technical Field
The application relates to the technical field of communication, in particular to an audio plosive detection method, an audio plosive detection device and a storage medium.
Background
With the continuous development of internet technology, the internet has a great amount of various audio files, such as various audio files of music/speech/book/chat. Due to a series of complicated steps of recording, processing, transmitting, storing and the like, the audio frequency may have 'distortion' phenomena, such as beginning pop, glitch, breakpoint and the like. Beginning pop is a relatively common distortion phenomenon. "pop at the beginning" means that there is a short pulse at the beginning of the musical waveform and sounds like a "click", and this harsh unnatural sound gives the listener a poor user experience. In the statistical case of a song library, it is shown that the audio ratio with the beginning plosive is up to 10%, resulting in poor audio quality due to the presence of the plosive. Therefore, it is important to accurately detect the audio beginning plosive.
Disclosure of Invention
The embodiment of the application provides an audio pop detection method, an audio pop detection device and a storage medium, which can be used for detecting whether frequency band loss exists in an audio signal or not, so that an audio file with the frequency band loss is effectively and quickly screened out.
The embodiment of the application provides an audio plosive detection method, which comprises the following steps:
acquiring an audio signal to be detected, and dividing the audio signal into a plurality of frame signals;
calculating the short-time energy difference of two adjacent frame signals;
acquiring a frame signal meeting a preset condition interval according to the short-time energy difference to obtain a sudden change audio signal;
and calculating the spectral flatness of the sudden change audio signal, and if the spectral flatness is greater than a preset flatness value, determining that the audio signal has popping.
Optionally, in some embodiments, in the audio pop detection method, the dividing the audio signal into a plurality of frame signals includes:
selecting a signal with a preset time period from a first frame in a time domain to obtain a beginning audio signal;
the beginning audio signal is divided into a plurality of frame signals.
Optionally, in some embodiments, in the audio pop detection method, the calculating a short-time energy difference between two adjacent frame signals includes:
calculating the short-time energy of each frame signal;
acquiring the time of each frame signal;
and sequentially calculating the difference between the short-time energies of two adjacent frame signals according to the time sequence of the frame signals to obtain the short-time energy difference of the two adjacent frame signals.
Optionally, in some embodiments, in the audio pop detection method, the obtaining, according to the short-time energy difference, a frame signal that meets a preset condition interval to obtain a sudden-change audio signal includes:
acquiring two frame signals of which the short-time energy difference is larger than a preset threshold value, and determining the next frame signal of the two frame signals as a starting frame signal according to a time sequence;
acquiring two frame signals of which the short-time energy difference is smaller than a preset threshold negative value after the starting frame signal, and determining the latter one of the two frame signals as an ending frame signal according to a time sequence;
and acquiring signals between the starting frame signal and the ending frame signal to obtain a sudden change audio signal.
Optionally, in some embodiments, in the audio pop detection method, the acquiring two frame signals with the short-time energy difference smaller than a preset threshold negative value after the start frame signal, and determining a next frame signal of the two frame signals as an end frame signal according to a time sequence includes:
sequentially judging whether the short-time energy difference is a negative value smaller than a preset threshold value or not according to a time sequence after the starting frame signal;
and when the short-time energy difference is detected to be smaller than the preset threshold negative value for the first time, determining the next frame signal in the two frame signals smaller than the preset threshold negative value as an end frame signal according to the time sequence.
Optionally, in some embodiments, in the audio pop detection method, the calculating the spectral flatness of the abrupt change audio signal includes:
detecting a peak position of the abrupt audio signal;
a plurality of fixed sampling points are respectively taken before and after the peak position to form a plosive audio frame;
and calculating the spectral flatness of the plosive audio frame.
Optionally, in some embodiments, in the audio pop detection method, the determining that the audio signal has a pop if the spectral flatness is greater than a preset flatness value includes:
judging whether the frequency spectrum flatness is larger than a preset flatness value or not;
if the frequency spectrum flatness is larger than a preset flatness value, determining that the audio signal has crackles;
and if the frequency spectrum flatness is smaller than a preset flatness value, determining that the audio signal does not have crackle.
Optionally, in some embodiments, in the audio pop detection method, after determining that the audio signal has a pop if the spectral flatness is greater than a preset flatness value, the method further includes:
and returning to the step of obtaining the frame signal meeting the preset condition interval according to the short-time energy difference to obtain the abrupt change audio signal until the detection of the audio signal to be detected is finished.
Correspondingly, this application embodiment still provides an audio frequency plosive detection device, includes:
the framing module is used for acquiring an audio signal to be detected and dividing the audio signal into a plurality of frame signals;
the calculating module is used for calculating the short-time energy difference of two adjacent frame signals;
the acquisition module is used for acquiring a frame signal meeting a preset condition interval according to the short-time energy difference to obtain a sudden change audio signal;
and the judging module is used for calculating the spectral flatness of the sudden change audio signal, and if the spectral flatness is greater than a preset flatness value, the audio signal is determined to have the popping sound.
Optionally, in some embodiments, in the audio pop detection apparatus, the framing module includes:
the selection submodule is used for selecting signals of a preset time period from the first frame to the audio signals in the time domain to obtain starting audio signals;
and the framing submodule is used for dividing the starting audio signal into a plurality of frame signals.
Optionally, in some embodiments, in the audio pop detection apparatus, the calculation module includes:
the energy submodule is used for calculating the short-time energy of each frame signal;
the acquisition submodule is used for acquiring the time of each frame signal;
and the energy difference submodule is used for sequentially calculating the difference between the short-time energies of two adjacent frame signals according to the time sequence of the frame signals to obtain the short-time energy difference of the two adjacent frame signals.
Optionally, in some embodiments, in the audio pop detection device, the energy difference sub-module is specifically configured to obtain two frame signals of which the short-time energy difference is greater than a preset threshold, and determine a next frame signal of the two frame signals as a start frame signal according to a time sequence; acquiring two frame signals of which the short-time energy difference is smaller than a preset threshold negative value after the starting frame signal, and determining the latter one of the two frame signals as an ending frame signal according to a time sequence; and acquiring signals between the starting frame signal and the ending frame signal to obtain a sudden change audio signal.
Optionally, in some embodiments, in the audio pop detection device, the energy difference sub-module is specifically configured to sequentially determine, after the start frame signal, whether the short-time energy difference is a negative value smaller than a preset threshold in a time sequence; and when the short-time energy difference is detected to be smaller than the preset threshold negative value for the first time, determining the next frame signal in the two frame signals smaller than the preset threshold negative value as an end frame signal according to the time sequence.
Optionally, in some embodiments, in the audio pop detection device, the determining module includes:
a detection submodule for detecting a peak position of the abrupt change audio signal;
the sampling submodule is used for respectively taking a plurality of fixed sampling points before and after the peak position to form a plosive audio frame;
and the calculating submodule is used for calculating the spectral flatness of the popping audio frame.
Optionally, in some embodiments, in the audio pop detection device, the determining module is specifically configured to determine whether the spectral flatness is greater than a preset flatness value; if the frequency spectrum flatness is larger than a preset flatness value, determining that the audio signal has crackles; and if the frequency spectrum flatness is smaller than a preset flatness value, determining that the audio signal does not have crackle.
Optionally, in some embodiments, in the audio pop detection apparatus, the audio pop detection apparatus further includes:
and the detection module is used for returning to execute the step of obtaining the frame signal meeting the preset condition interval according to the short-time energy difference to obtain the abrupt change audio signal until the detection of the audio signal to be detected is finished.
In addition, a storage medium is further provided, where multiple instructions are stored, and the instructions are suitable for being loaded by a processor to perform the steps in any one of the audio plosive detection methods provided in the embodiments of the present application.
When the method and the device are used for performing pop detection on the audio signal, the audio signal to be detected can be obtained, the audio signal is divided into a plurality of frame signals, then, the short-time energy difference of two adjacent frame signals is calculated, then, the frame signal meeting a preset condition interval is obtained according to the short-time energy difference, a sudden change audio signal is obtained, then, the frequency spectrum flatness of the sudden change audio signal is calculated, and if the frequency spectrum flatness is larger than a preset flatness value, the audio signal is determined to have the pop; the scheme includes that audio signals are subjected to framing, time domain short-time energy of each frame of audio signals is calculated, the audio frame position with sudden energy change is found out through short-time energy difference, the sudden change audio signals are found out, then the spectral flatness of the sudden change audio signals is calculated, and audio files with frequency band loss are accurately screened out through the ground spectral flatness.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1a is a schematic scene diagram of an audio pop detection method according to an embodiment of the present disclosure;
fig. 1b is a schematic diagram of a first process of an audio pop detection method according to an embodiment of the present disclosure;
fig. 2a is a second flowchart of an audio pop detection method according to an embodiment of the present disclosure;
FIG. 2b is a schematic diagram of an audio signal of an audio pop detection method according to an embodiment of the present disclosure;
fig. 3a is a schematic diagram of a first structure of an audio pop detection device according to an embodiment of the present disclosure;
fig. 3b is a schematic diagram of a second structure of the audio pop detection device according to the embodiment of the present application;
fig. 4 is a schematic structural diagram of a network device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first", "second", and "third", etc. in this application are used to distinguish between different objects and not to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions.
The embodiment of the application provides an audio plosive detection method, an audio plosive detection device and a storage medium.
For example, referring to fig. 1a, when a user needs to perform initial pop detection on a large number of audio files, the network device may be triggered to process the audio files, and may obtain an audio signal to be detected, divide the audio signal into a plurality of frame signals, then calculate a short-time energy difference between two adjacent frame signals, then obtain a frame signal that meets a preset condition interval according to the short-time energy difference, obtain a sudden change audio signal, then calculate a spectral flatness of the sudden change audio signal, and if the spectral flatness is greater than a preset flat value, determine that the audio signal has pop.
The following are detailed below. The order of the following examples is not intended to limit the preferred order of the examples.
In the present embodiment, the description will be made in terms of an audio pop detection apparatus, which may be specifically integrated in a network device, and the network device may be a terminal or a server, and the terminal may include a tablet Computer, a notebook Computer, a Personal Computer (PC), or the like.
The embodiment of the application provides an audio plosive detection method, which comprises the following steps: the method comprises the steps of obtaining an audio signal to be detected, dividing the audio signal into a plurality of frame signals, calculating a short-time energy difference between two adjacent frame signals, obtaining a frame signal meeting a preset condition interval according to the short-time energy difference to obtain a sudden change audio signal, calculating the spectral flatness of the sudden change audio signal, and determining that the audio signal has a pop sound if the spectral flatness is larger than a preset flatness value.
As shown in fig. 1b, the specific process of the audio pop detection method may be as follows:
101. the method comprises the steps of obtaining an audio signal to be detected, and dividing the audio signal into a plurality of frame signals.
For example, the audio file may be obtained from various ways such as a network, a mobile phone, or a video, and then provided to the audio pop detection device, that is, the audio pop detection device may specifically receive the audio file obtained from various ways, and then extract the audio signal to be detected from the audio file. Then, the audio signals are divided into a plurality of frame signals.
The audio file may be: sound files and Musical Instrument Digital Interface (MIDI) files. The sound file is original sound recorded by sound recording equipment, and binary sampling data of real sound is directly recorded; a MIDI file is a musical performance instruction sequence that can be performed using a sound output device or an electronic musical instrument connected to a computer. And the audio signal is a regular sound wave frequency and amplitude change information carrier with voice, music and sound effects. Audio information can be classified into regular audio and irregular sound according to the characteristics of sound waves. Regular audio can be divided into speech, music and sound effects. Regular audio is a continuously varying analog signal that can be represented by a continuous curve called a sound wave.
In order to improve the detection efficiency, a detection time period may be set at the beginning of the audio signal in the time domain, and the audio signal in the time period may be subjected to framing processing, that is, the step "dividing the audio signal into a plurality of frame signals", specifically, the following steps may be performed:
selecting a signal with a preset time period from a first frame in a time domain to obtain a beginning audio signal;
the beginning audio signal is divided into a plurality of frame signals.
102. And calculating the short-time energy difference of two adjacent frame signals.
For example, the short-time energy of each frame signal may be calculated, then the time of each frame signal is obtained, and the difference between the short-time energies of two adjacent frame signals is calculated sequentially according to the time sequence of the frame signal, so as to obtain the short-time energy difference between two adjacent frame signals.
The short-time energy represents the intensity of signals at different moments. The calculation of the short-time energy E of each frame signal may be as follows:
Figure BDA0002092128710000071
wherein, N is the number of sampling points of each frame signal, N is the sampling point of the frame signal, t represents the position of the frame signal, and e (t) is the short-time energy of the t-th frame signal.
Wherein, calculating the short-time energy difference of two adjacent frame signals can be as follows:
pt=E(t)-E(t-1)
where t is the position of the frame, ptIs the short-time energy difference of two adjacent frame signals.
103. And acquiring a frame signal meeting a preset condition interval according to the short-time energy difference to obtain a sudden change audio signal.
The preset condition may be set in various ways, for example, the preset condition may be flexibly set according to the requirements of the actual application, or may be preset and stored in the network device. In addition, the preset condition may be built in the network device, or may be stored in the memory and transmitted to the network device, and so on.
For example, two frame signals with the short-time energy difference larger than the preset threshold may be obtained, the next frame signal in the two frame signals is determined as the start frame signal according to the time sequence, the two frame signals with the short-time energy difference smaller than the negative value of the preset threshold are obtained after the start frame signal, the next frame signal in the two frame signals is determined as the end frame signal according to the time sequence, and the signal between the start frame signal and the end frame signal is obtained to obtain the abrupt change audio signal.
The preset threshold (threshold), abbreviated as Th, may be set in various ways, for example, flexibly set according to the requirements of practical applications, or may be preset and stored in the network device. In addition, the preset threshold may be built in the network device, or may be stored in the memory and transmitted to the network device, and so on.
In order to make the subsequent calculation of the frequency flatness closer to the true value of the preset condition interval, and in order to make the accuracy of the detection result higher, the latter one of the two frame signals of the frame signal whose short-time energy difference is smaller than the preset threshold negative value is detected for the first time after the start frame signal may be taken as the end frame signal, that is, "the two frame signals whose short-time energy difference is smaller than the preset threshold negative value are obtained after the start frame signal, and the latter one of the two frame signals is determined as the end frame signal according to the time sequence", specifically, the following steps may be:
sequentially judging whether the short-time energy difference is a negative value smaller than a preset threshold value or not according to the time sequence after the starting frame signal;
and when the short-time energy difference is detected to be smaller than the preset threshold negative value for the first time, determining the next frame signal in the two frame signals smaller than the preset threshold negative value as an end frame signal according to the time sequence.
104. And calculating the spectral flatness of the sudden change audio signal, and determining that the audio signal has the popping sound if the spectral flatness is larger than a preset flatness value.
For example, the abrupt change audio signal may be subjected to fourier transform to obtain a frequency domain abrupt change audio signal, the spectral flatness of the frequency domain abrupt change audio signal is calculated, and then, whether the spectral flatness is greater than a preset flatness value is determined; if the frequency spectrum flatness is larger than a preset flatness value, determining that the audio signal has crackle; and if the frequency spectrum flatness is smaller than a preset flatness value, determining that the audio signal does not have crackle.
The preset flat value may be set in various ways, for example, the preset flat value may be flexibly set according to the requirements of the actual application, or may be preset and stored in the network device. In addition, the preset flat value may be built in the network device, or may be stored in the memory and transmitted to the network device, and so on.
Spectral flatness, also called wiener entropy, is a metric used in digital signal processing to characterize the audio spectrum. Spectral flatness can be measured by the ratio of the Geometric Mean (GM) to the Arithmetic Mean (AM) of the signal, also commonly referred to as Spectral Flatness Measure (SFM). Namely:
Figure BDA0002092128710000091
wherein w (n) is a window function, k is a frequency point of the frequency domain mutation audio signal, and X is the frequency domain mutation audio signal. Wherein the window function may be a rectangular window, a triangular window, or a hanning window, etc.
Figure BDA0002092128710000092
Figure BDA0002092128710000093
F(t)=GM(t)/AM(t)
Wherein gm (t) is the geometric mean of the frequency-domain abrupt change audio signal, am (t) is the arithmetic mean of the frequency-domain abrupt change audio signal, and f (t) is the spectral flatness.
For example, in order to further improve the detection accuracy and ensure that the audio experienced by the user has no defects, the peak position of the abrupt change audio signal may be detected first, and then N/2 sampling points are taken from the left and right sides to form a pop audio frame with the peak position as the center, that is, the pop audio frame has N sampling points in total. Therefore, the step "calculating the spectral flatness of the abrupt change audio signal" may specifically be as follows:
detecting a peak position of the abrupt audio signal;
a plurality of fixed sampling points are respectively taken before and after the peak position to form a plosive audio frame;
the spectral flatness of the pop audio frame is calculated.
After a pop is detected, for accuracy of subsequent repair, the method may further include, after detecting a short-time energy difference to obtain a frame signal that satisfies a preset condition interval until all audio signals to be detected are detected, that is, after "if the spectral flatness is greater than a preset flatness value, it is determined that the audio signal has a pop":
and returning to the step of obtaining the frame signal meeting the preset condition interval according to the short-time energy difference to obtain the abrupt change audio signal until the detection of the audio signal to be detected is finished.
After the audio signal detection is finished, an interface of a detection result can be generated, the interface comprises a detection interface, the interface can receive the detection result of the audio signal to be detected, and whether the audio popping signal is detected or not is prompted on the interface after the detection is finished.
As can be seen from the above, in the embodiment, when the pop detection is performed on the audio signal, the audio signal to be detected may be obtained, the audio signal is divided into a plurality of frame signals, then, the short-time energy difference between two adjacent frame signals is calculated, then, the frame signal meeting the preset condition interval is obtained according to the short-time energy difference, so as to obtain a sudden change audio signal, then, the spectral flatness of the sudden change audio signal is calculated, and if the spectral flatness is greater than the preset flatness value, it is determined that the pop exists in the audio signal; the scheme includes that audio signals are subjected to framing, time domain short-time energy of each frame of audio signals is calculated, the audio frame position with sudden energy change is found out through short-time energy difference, the sudden change audio signals are found out, then the spectral flatness of the sudden change audio signals is calculated, and audio files with frequency band loss are accurately screened out through the ground spectral flatness.
According to the method described in the foregoing embodiment, the following will be described in further detail by way of example in which the audio pop detection apparatus is specifically integrated in a network device.
As shown in fig. 2a, a specific process of an audio pop detection method may be as follows:
201. the network equipment acquires the audio signal to be detected.
For example, a user may specifically obtain audio files from various ways such as a network, a mobile phone, or a video, and then provide the audio files to the network device, and the network device may receive the audio files obtained from various ways and extract the audio signals to be detected from the audio files.
202. The network equipment divides the audio signal into frames to obtain frame signals.
For example, in order to improve the detection efficiency, the network device may set a detection time period at the beginning of the audio signal in the time domain, and perform framing processing on the audio signal in the time period, that is, the step "divide the audio signal into a plurality of frame signals", specifically, the following steps may be performed:
selecting a signal with a preset time period from a first frame in a time domain to obtain a beginning audio signal;
the beginning audio signal is divided into a plurality of frame signals.
203. The network device calculates the short-time energy difference of two adjacent frame signals.
For example, the network device may specifically calculate the short-time energy of each frame signal, then obtain the time of each frame signal, and sequentially calculate the difference between the short-time energies of two adjacent frame signals according to the time sequence of the frame signal, so as to obtain the short-time energy difference between two adjacent frame signals.
The short-time energy represents the intensity of signals at different moments. The calculation of the short-time energy E of each frame signal may be as follows:
Figure BDA0002092128710000111
wherein, N is the number of sampling points of each frame signal, N is the sampling point of the frame signal, t represents the position of the frame signal, and e (t) is the short-time energy of the t-th frame signal.
Wherein, calculating the short-time energy difference of two adjacent frame signals can be as follows:
pt=E(t)-E(t-1)
where t is the position of the frame, ptIs the short-time energy difference of two adjacent frame signals.
204. And the network equipment acquires the frame signal meeting the preset condition interval according to the short-time energy difference to obtain the abrupt change audio signal.
The preset condition may be set in various ways, for example, the preset condition may be flexibly set according to the requirements of the actual application, or may be preset and stored in the network device. In addition, the preset condition may be built in the network device, or may be stored in the memory and transmitted to the network device, and so on.
For example, the network device may specifically acquire two frame signals of which the short-time energy difference is greater than a preset threshold, determine a subsequent frame signal of the two frame signals as a start frame signal according to a time sequence, acquire two frame signals of which the short-time energy difference is smaller than a negative value of the preset threshold after the start frame signal, determine the subsequent frame signal of the two frame signals as an end frame signal according to the time sequence, and acquire a signal between the start frame signal and the end frame signal, so as to obtain the abrupt change audio signal. For example, as shown in FIG. 2b, the short-time energy difference p between E (2) and E (3) is calculated3If p is3>Th, the starting frame signal is a third frame signal a, the short-time energy difference of two adjacent frame signals after the third frame signal is continuously calculated, and if the short-time energy difference p of E (3) and E (4) is obtained4<-Th,The end frame signal is the fourth frame signal b, and the third frame signal a to the fourth frame signal b are taken as the abrupt change audio signals of the audio signal.
The preset threshold may be set in various manners, for example, the preset threshold may be flexibly set according to the requirements of the actual application, or may be preset and stored in the network device. In addition, the preset threshold may be built in the network device, or may be stored in the memory and transmitted to the network device, and so on.
In order to make the subsequent calculation of the frequency flatness closer to the true value of the preset condition interval, and in order to make the accuracy of the detection result higher, the latter one of the two frame signals of the frame signal whose short-time energy difference is smaller than the preset threshold negative value is detected for the first time after the start frame signal may be taken as the end frame signal, that is, "the two frame signals whose short-time energy difference is smaller than the preset threshold negative value are obtained after the start frame signal, and the latter one of the two frame signals is determined as the end frame signal according to the time sequence", specifically, the following steps may be:
sequentially judging whether the short-time energy difference is a negative value smaller than a preset threshold value or not according to the time sequence after the starting frame signal;
and when the short-time energy difference is detected to be smaller than the preset threshold negative value for the first time, determining the next frame signal in the two frame signals smaller than the preset threshold negative value as an end frame signal according to the time sequence.
205. The network device calculates the spectral flatness of the abrupt audio signal.
For example, the network device may specifically perform fourier transform on the abrupt change audio signal to obtain a frequency domain abrupt change audio signal, and then calculate the spectral flatness of the frequency domain abrupt change audio signal.
The preset flat value may be set in various ways, for example, the preset flat value may be flexibly set according to the requirements of the actual application, or may be preset and stored in the network device. In addition, the preset flat value may be built in the network device, or may be stored in the memory and transmitted to the network device, and so on.
Spectral flatness, also called wiener entropy, is a metric used in digital signal processing to characterize the audio spectrum. Spectral flatness can be measured by the ratio of the Geometric Mean (GM) to the Arithmetic Mean (AM) of the signal, also commonly referred to as spectral flatness. Namely:
Figure BDA0002092128710000121
wherein w (n) is a window function, k is a frequency point of the frequency domain mutation audio signal, and X is the frequency domain mutation audio signal. Wherein the window function may be a rectangular window, a triangular window, or a hanning window, etc.
Figure BDA0002092128710000122
Figure BDA0002092128710000123
F(t)=GM(t)/AM(t)
Wherein gm (t) is the geometric mean of the frequency-domain abrupt change audio signal, am (t) is the arithmetic mean of the frequency-domain abrupt change audio signal, and f (t) is the spectral flatness.
For example, in order to further improve the detection accuracy and ensure that the audio experienced by the user has no flaws, the network device may first detect the peak position of the abrupt change audio signal, and then take the same plurality of sampling points to the left and right to form a plosive audio frame with the peak position as the center, that is, the peak position of the abrupt change audio signal may be specifically detected; a plurality of fixed sampling points are respectively taken before and after the peak position to form a plosive audio frame; the spectral flatness of the pop audio frame is calculated.
For example, as shown in fig. 2b, with the peak position of the abrupt change audio signal as the center, N/2 sampling points are respectively taken from the left and right to form a pop audio frame c, that is, the pop audio frame c has N sampling points in total, and then the spectral flatness of the pop audio frame c is calculated.
206. The network equipment judges whether the frequency spectrum flatness is larger than a preset flatness value or not, and if the frequency spectrum flatness is larger than the preset flatness value, the fact that the audio signal has the popping sound is determined.
For example, the network device may specifically determine whether the spectrum flatness is greater than a preset flatness value; if the frequency spectrum flatness is larger than a preset flatness value, determining that the audio signal has crackle; and if the frequency spectrum flatness is smaller than a preset flatness value, determining that the audio signal does not have crackle.
207. And the network equipment judges whether the detection of the audio signal to be detected is finished, if not, the step of obtaining the frame signal meeting the preset condition interval according to the short-time energy difference is returned to execute the step 204 to obtain the abrupt change audio signal until the detection of the audio signal to be detected is finished.
For example, after a pop is detected, for accuracy of subsequent repair, the network device may continue to detect the short-time energy difference to obtain the frame signal satisfying the preset condition interval until all the audio signals to be detected are detected, and then return to the step of obtaining the frame signal satisfying the preset condition interval according to the short-time energy difference to obtain the abrupt change audio signal until the audio signals to be detected are detected. For example, after determining whether the preset flatness value is larger than the preset flatness value according to the spectral flatness of the abrupt change audio signal, whether the determination result is larger than the preset flatness value, the frame signals subsequent to the fourth frame signal may be continuously detected until all the frame signals are detected, and the detection result is obtained.
Optionally, after the audio signal is detected, an interface of the detection result may be generated, where the interface includes a detection interface, the interface may receive the detection result of the audio signal to be detected, and after the detection is completed, whether an audio popping signal is detected is prompted on the interface.
Optionally, after the beginning pop is detected, the band missing signals may be repaired or replaced to ensure that the user can listen to the audio file with good quality.
As can be seen from the above, when the network device of this embodiment performs pop detection on an audio signal, the network device may acquire the audio signal to be detected, divide the audio signal into a plurality of frame signals, then calculate a short-time energy difference between two adjacent frame signals, then acquire a frame signal that meets a preset condition interval according to the short-time energy difference, obtain a sudden change audio signal, then calculate a spectral flatness of the sudden change audio signal, and determine that the audio signal has a pop if the spectral flatness is greater than a preset flatness value; the scheme includes that audio signals are subjected to framing, time domain short-time energy of each frame of audio signals is calculated, the audio frame position with sudden energy change is found out through short-time energy difference, the sudden change audio signals are found out, then the spectral flatness of the sudden change audio signals is calculated, and audio files with frequency band loss are accurately screened out through the ground spectral flatness.
In addition, the scheme can also repair or replace the beginning pop sound, so that the quality of the audio file can be improved, and the user experience is improved.
In order to better implement the audio pop detection method provided by the embodiment of the present application, an embodiment of the present application further provides an audio pop detection device, which may be specifically integrated in a network device such as a mobile phone, a tablet computer, a handheld computer, and the like. The meaning of the noun is the same as that in the audio plosive detection method, and specific implementation details can refer to the description in the method embodiment.
For example, as shown in fig. 3a, the audio pop detection apparatus may include a framing module 301, a calculating module 302, an obtaining module 303, and a determining module 304, as follows:
(1) a framing module 301;
the framing module 301 is configured to acquire an audio signal to be detected and divide the audio signal into a plurality of frame signals.
For example, the framing module 301 may specifically acquire audio files from various ways such as a network, a mobile phone, or a video, and then provide the audio files to the audio pop detection device, that is, the audio pop detection device may specifically receive the audio files acquired from various ways, and then extract the audio signals to be detected from the files. Then, the audio signals are divided into a plurality of frame signals.
In order to improve the detection efficiency, a detection time period may be set at the beginning of the time domain of the audio signal, and the audio signal in the time period may be subjected to framing processing, that is, the framing module may include a selection sub-module and a framing sub-module, as follows:
the selection submodule is used for selecting signals of a preset time period from the first frame to the audio signal in a time domain to obtain a starting audio signal;
and the framing submodule is used for dividing the starting audio signal into a plurality of frame signals.
(2) A calculation module 302;
a calculating module 302, configured to calculate a short-time energy difference between two adjacent frame signals.
For example, the calculation module 302 may include an energy sub-module, an acquisition sub-module, and an energy difference sub-module, as follows:
the energy submodule is used for calculating the short-time energy of each frame signal;
the acquisition submodule is used for acquiring the time of each frame signal;
and the energy difference submodule is used for sequentially calculating the difference between the short-time energies of two adjacent frame signals according to the time sequence of the frame signal to obtain the short-time energy difference of the two adjacent frame signals.
The short-time energy represents the intensity of signals at different moments. The calculation of the short-time energy E of each frame signal may be as follows:
Figure BDA0002092128710000151
wherein, N is the number of sampling points of each frame signal, N is the sampling point of the frame signal, t represents the position of the frame signal, and e (t) is the short-time energy of the t-th frame signal.
Wherein, calculating the short-time energy difference of two adjacent frame signals can be as follows:
pt=E(t)-E(t-1)
where t is the position of the frame, ptFor two adjacent framesShort-term energy difference of the numbers.
(3) An acquisition module 303;
an obtaining module 303, configured to obtain a frame signal meeting a preset condition interval according to the short-time energy difference, so as to obtain a sudden-change audio signal.
The preset condition may be set in various ways, for example, the preset condition may be flexibly set according to the requirements of the actual application, or may be preset and stored in the network device. In addition, the preset condition may be built in the network device, or may be stored in the memory and transmitted to the network device, and so on.
For example, the obtaining module 303 may specifically obtain two frame signals with the short-time energy difference being greater than a preset threshold, determine a next frame signal of the two frame signals as a start frame signal according to a time sequence, obtain two frame signals with the short-time energy difference being smaller than a negative value of the preset threshold after the start frame signal, determine a next frame signal of the two frame signals as an end frame signal according to the time sequence, and obtain a signal between the start frame signal and the end frame signal, so as to obtain the abrupt change audio signal.
The preset threshold may be set in various manners, for example, the preset threshold may be flexibly set according to the requirements of the actual application, or may be preset and stored in the network device. In addition, the preset threshold may be built in the network device, or may be stored in the memory and transmitted to the network device, and so on.
In order to make the subsequent calculation of the frequency flatness closer to the true value of the preset condition interval and to make the accuracy of the detection result higher, the last frame signal of two frame signals of the frame signal whose short-time energy difference is smaller than the negative value of the preset threshold value and detected for the first time after the start frame signal may be taken as the end frame signal, that is, the obtaining module may specifically perform the following operations:
sequentially judging whether the short-time energy difference is a negative value smaller than a preset threshold value or not according to the time sequence after the starting frame signal;
and when the short-time energy difference is detected to be smaller than the preset threshold negative value for the first time, determining the next frame signal in the two frame signals smaller than the preset threshold negative value as an end frame signal according to the time sequence.
(4) A judging module 304;
the determining module 304 is configured to calculate a spectral flatness of the abrupt change audio signal, and determine that the audio signal has a pop sound if the spectral flatness is greater than a preset flatness value.
For example, the determining module 304 may specifically perform fourier transform on the abrupt change audio signal to obtain a frequency domain abrupt change audio signal, calculate the spectral flatness of the frequency domain abrupt change audio signal, and then determine whether the spectral flatness is greater than a preset flatness value; if the frequency spectrum flatness is larger than a preset flatness value, determining that the audio signal has crackle; and if the frequency spectrum flatness is smaller than a preset flatness value, determining that the audio signal does not have crackle.
The preset flat value may be set in various ways, for example, the preset flat value may be flexibly set according to the requirements of the actual application, or may be preset and stored in the network device. In addition, the preset flat value may be built in the network device, or may be stored in the memory and transmitted to the network device, and so on.
Spectral flatness, also called wiener entropy, is a metric used in digital signal processing to characterize the audio spectrum. Spectral flatness can be measured by the ratio of the Geometric Mean (GM) to the Arithmetic Mean (AM) of the signal, also commonly referred to as spectral flatness. Namely:
Figure BDA0002092128710000161
wherein w (n) is a window function, k is a frequency point of the frequency domain mutation audio signal, and X is the frequency domain mutation audio signal. Wherein the window function may be a rectangular window, a triangular window, or a hanning window, etc.
Figure BDA0002092128710000171
Figure BDA0002092128710000172
F(t)=GM(t)/AM(t)
Wherein gm (t) is the geometric mean of the frequency-domain abrupt change audio signal, am (t) is the arithmetic mean of the frequency-domain abrupt change audio signal, and f (t) is the spectral flatness.
For example, in order to further improve the detection accuracy and ensure that the audio experienced by the user has no defects, the peak position of the abrupt change audio signal may be detected first, and then N/2 sampling points are taken from the left and right sides to form a pop audio frame with the peak position as the center, that is, the pop audio frame has N sampling points in total. Therefore, the determining module may specifically include a detecting sub-module, a sampling sub-module and a calculating sub-module, as follows:
a detection submodule for detecting a peak position of the abrupt change audio signal;
the sampling submodule is used for the sampling subunit to respectively take a plurality of fixed sampling points before and after the peak position to form an explosive sound audio frame;
and the calculating submodule calculates the spectral flatness of the popping audio frame.
After detecting a pop, for accuracy of subsequent repair, the method may continue to detect the short-time energy difference to obtain the frame signal satisfying the preset condition interval until all the audio signals to be detected are detected, that is, the audio pop detection apparatus, as shown in fig. 3b, may further include a detection module 305, as follows:
the detecting module 305 is configured to return to execute the step of obtaining the frame signal meeting the preset condition interval according to the short-time energy difference to obtain the abrupt change audio signal until the detection of the audio signal to be detected is completed.
It will be appreciated by those skilled in the art that the audio pop detection device shown in fig. 3a does not constitute a limitation of the device and may include more or fewer components than shown, or some components in combination, or a different arrangement of components. In addition, it should be noted that the specific implementation of each unit may refer to the foregoing method embodiment, and is not described herein again.
As can be seen from the above, in the audio pop detection device of this embodiment, when performing pop detection on an audio signal, the framing module 301 may obtain the audio signal to be detected, divide the audio signal into a plurality of frame signals, then the calculating module 302 calculates a short-time energy difference between two adjacent frame signals, then the obtaining module 303 obtains a frame signal meeting a preset condition interval according to the short-time energy difference to obtain a sudden-change audio signal, then the judging module 304 calculates a spectral flatness of the sudden-change audio signal, and if the spectral flatness is greater than a preset flatness value, it is determined that the audio signal has pop; the scheme includes that audio signals are subjected to framing, time domain short-time energy of each frame of audio signals is calculated, the audio frame position with sudden energy change is found out through short-time energy difference, the sudden change audio signals are found out, then the spectral flatness of the sudden change audio signals is calculated, and audio files with frequency band loss are accurately screened out through the ground spectral flatness.
Correspondingly, the embodiment of the invention also provides network equipment, which can be equipment such as a server or a terminal and integrates any audio plosive detection device provided by the embodiment of the invention. Fig. 4 is a schematic diagram illustrating a network device according to an embodiment of the present invention, specifically:
the network device may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the network device architecture shown in fig. 4 does not constitute a limitation of network devices and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. Wherein:
the processor 401 is a control center of the network device, connects various parts of the entire network device by using various interfaces and lines, and performs various functions of the network device and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby performing overall monitoring of the network device. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.
The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the network device, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.
The network device further includes a power supply 403 for supplying power to each component, and preferably, the power supply 403 is logically connected to the processor 401 through a power management system, so that functions of managing charging, discharging, and power consumption are implemented through the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
The network device may also include an input unit 404, where the input unit 404 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the network device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 401 in the network device loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application program stored in the memory 402, thereby implementing various functions as follows:
the method comprises the steps of obtaining an audio signal to be detected, dividing the audio signal into a plurality of frame signals, calculating a short-time energy difference between two adjacent frame signals, obtaining a frame signal meeting a preset condition interval according to the short-time energy difference to obtain a sudden change audio signal, calculating the spectral flatness of the sudden change audio signal, and determining that the audio signal has a pop sound if the spectral flatness is larger than a preset flatness value.
Optionally, dividing the audio signal into a plurality of frame signals may include:
selecting a signal with a preset time period from a first frame in a time domain to obtain a beginning audio signal; the beginning audio signal is divided into a plurality of frame signals.
Optionally, calculating the short-time energy difference between two adjacent frame signals may include:
calculating the short-time energy of each frame signal; acquiring the time of each frame signal; and sequentially calculating the difference between the short-time energies of two adjacent frame signals according to the time sequence of the frame signals to obtain the short-time energy difference of the two adjacent frame signals.
Optionally, obtaining a frame signal meeting a preset condition interval according to the short-time energy difference to obtain a sudden-change audio signal, where the obtaining may include:
acquiring two frame signals of which the short-time energy difference is larger than a preset threshold value, and determining the next frame signal in the two frame signals as a starting frame signal according to a time sequence; acquiring two frame signals of which the short-time energy difference is smaller than a preset threshold negative value after the starting frame signal, and determining the latter frame signal of the two frame signals as an ending frame signal according to a time sequence; and acquiring signals between the starting frame signal and the ending frame signal to obtain the abrupt change audio signal.
Optionally, acquiring two frame signals with the short-time energy difference smaller than a preset threshold negative value after the start frame signal, and determining a next frame signal of the two frame signals as an end frame signal according to the time sequence, may include:
sequentially judging whether the short-time energy difference is a negative value smaller than a preset threshold value or not according to the time sequence after the starting frame signal; and when the short-time energy difference is detected to be smaller than the preset threshold negative value for the first time, determining the next frame signal in the two frame signals smaller than the preset threshold negative value as an end frame signal according to the time sequence.
Optionally, calculating the spectral flatness of the abrupt change audio signal may include:
detecting a peak position of the abrupt audio signal; a plurality of fixed sampling points are respectively taken before and after the peak position to form a plosive audio frame; the spectral flatness of the pop audio frame is calculated.
Optionally, if the spectral flatness is greater than a preset flatness value, determining that the audio signal has a pop sound may include:
judging whether the frequency spectrum flatness is larger than a preset flatness value or not; if the frequency spectrum flatness is larger than a preset flatness value, determining that the audio signal has crackle; and if the frequency spectrum flatness is smaller than a preset flatness value, determining that the audio signal does not have crackle.
Optionally, if the spectral flatness is greater than the preset flatness value, after determining that the pop exists in the audio signal, the method may further include:
and returning to the step of obtaining the frame signal meeting the preset condition interval according to the short-time energy difference to obtain the abrupt change audio signal until the detection of the audio signal to be detected is finished.
The above operations can be referred to the previous embodiments specifically, and are not described herein again.
As can be seen from the above, when the network device of this embodiment performs pop detection on an audio signal, the network device may acquire the audio signal to be detected, divide the audio signal into a plurality of frame signals, then calculate a short-time energy difference between two adjacent frame signals, then acquire a frame signal that meets a preset condition interval according to the short-time energy difference, obtain a sudden change audio signal, then calculate a spectral flatness of the sudden change audio signal, and determine that the audio signal has a pop if the spectral flatness is greater than a preset flatness value; the scheme includes that audio signals are subjected to framing, time domain short-time energy of each frame of audio signals is calculated, the audio frame position with sudden energy change is found out through short-time energy difference, the sudden change audio signals are found out, then the spectral flatness of the sudden change audio signals is calculated, and audio files with frequency band loss are accurately screened out through the ground spectral flatness.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.
To this end, embodiments of the present application provide a storage medium, in which a plurality of instructions are stored, where the instructions can be loaded by a processor to execute the steps in any one of the audio pop detection methods provided in the embodiments of the present application. For example, the instructions may perform the steps of:
the method comprises the steps of obtaining an audio signal to be detected, dividing the audio signal into a plurality of frame signals, calculating a short-time energy difference between two adjacent frame signals, obtaining a frame signal meeting a preset condition interval according to the short-time energy difference to obtain a sudden change audio signal, calculating the spectral flatness of the sudden change audio signal, and determining that the audio signal has a popping sound if the spectral flatness is greater than a preset flatness value
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
Since the instructions stored in the storage medium may execute the steps in any audio pop detection method provided in the embodiments of the present application, beneficial effects that any method provided in the embodiments of the present application can be applied to the audio pop detection method can be achieved, and the details are given in the foregoing embodiments and are not repeated herein.
The method, the device and the storage medium for detecting audio pop provided by the embodiment of the present application are described in detail above, a specific example is applied in the description to explain the principle and the implementation of the present application, and the description of the above embodiment is only used to help understanding the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (9)

1. An audio plosive detection method, comprising:
acquiring an audio signal to be detected, and dividing the audio signal into a plurality of frame signals;
calculating the short-time energy difference of two adjacent frame signals;
acquiring a frame signal meeting a preset condition interval according to the short-time energy difference to obtain a sudden change audio signal;
detecting the peak position of the abrupt change audio signal, taking a plurality of fixed sampling points before and after the peak position to form a plosive audio frame, calculating the geometric mean and the arithmetic mean of the frequency domain plosive audio frame, calculating the spectral flatness according to the geometric mean and the arithmetic mean, and determining that the audio signal has the plosive if the spectral flatness is greater than a preset flat value.
2. The audio plosive detecting method according to claim 1, wherein the dividing the audio signal into a plurality of frame signals comprises:
selecting a signal with a preset time period from a first frame in a time domain to obtain a beginning audio signal;
the beginning audio signal is divided into a plurality of frame signals.
3. The audio plosive detecting method according to claim 1, wherein the calculating the short-time energy difference between two adjacent frame signals comprises:
calculating the short-time energy of each frame signal;
acquiring the time of each frame signal;
and sequentially calculating the difference between the short-time energies of two adjacent frame signals according to the time sequence of the frame signals to obtain the short-time energy difference of the two adjacent frame signals.
4. The audio plosive detection method according to claim 3, wherein the obtaining a frame signal satisfying a preset condition interval according to the short-time energy difference to obtain a sudden-change audio signal includes:
acquiring two frame signals of which the short-time energy difference is larger than a preset threshold value, and determining the next frame signal of the two frame signals as a starting frame signal according to a time sequence;
acquiring two frame signals of which the short-time energy difference is smaller than a preset threshold negative value after the starting frame signal, and determining the latter one of the two frame signals as an ending frame signal according to a time sequence;
and acquiring signals between the starting frame signal and the ending frame signal to obtain a sudden change audio signal.
5. The audio pop detection method of claim 4, wherein the obtaining two frame signals with the short-time energy difference smaller than a preset threshold negative value after the start frame signal, and determining a next frame signal of the two frame signals as an end frame signal according to a time sequence comprises:
sequentially judging whether the short-time energy difference is a negative value smaller than a preset threshold value or not according to a time sequence after the starting frame signal;
and when the short-time energy difference is detected to be smaller than the preset threshold negative value for the first time, determining the next frame signal in the two frame signals smaller than the preset threshold negative value as an end frame signal according to the time sequence.
6. The audio pop detection method of claim 1, wherein determining that the audio signal has a pop if the spectral flatness is greater than a predetermined flatness value comprises:
judging whether the frequency spectrum flatness is larger than a preset flatness value or not;
if the frequency spectrum flatness is larger than a preset flatness value, determining that the audio signal has crackles;
and if the frequency spectrum flatness is smaller than a preset flatness value, determining that the audio signal does not have crackle.
7. The method for detecting audio pop according to claim 1, wherein if the spectral flatness is greater than a predetermined flatness value, after determining that the audio signal has pop, further comprising:
and returning to the step of obtaining the frame signal meeting the preset condition interval according to the short-time energy difference to obtain the abrupt change audio signal until the detection of the audio signal to be detected is finished.
8. An audio pop detection device, comprising:
the framing module is used for acquiring an audio signal to be detected and dividing the audio signal into a plurality of frame signals;
the calculating module is used for calculating the short-time energy difference of two adjacent frame signals;
the acquisition module is used for acquiring a frame signal meeting a preset condition interval according to the short-time energy difference to obtain a sudden change audio signal;
and the judging module is used for detecting the peak position of the sudden change audio signal, taking a plurality of fixed sampling points before and after the peak position to form a plosive audio frame, calculating the geometric mean and the arithmetic mean of the frequency domain plosive audio frame, calculating the spectral flatness according to the geometric mean and the arithmetic mean, and determining that the audio signal has the plosive if the spectral flatness is greater than a preset flat value.
9. A storage medium storing instructions adapted to be loaded by a processor to perform the steps of the audio plosive detecting method according to any one of claims 1 to 7.
CN201910506938.3A 2019-06-12 2019-06-12 Audio frequency crackle detection method, device and storage medium Active CN110265064B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910506938.3A CN110265064B (en) 2019-06-12 2019-06-12 Audio frequency crackle detection method, device and storage medium
PCT/CN2019/093409 WO2020248308A1 (en) 2019-06-12 2019-06-27 Audio pop detection method and apparatus, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910506938.3A CN110265064B (en) 2019-06-12 2019-06-12 Audio frequency crackle detection method, device and storage medium

Publications (2)

Publication Number Publication Date
CN110265064A CN110265064A (en) 2019-09-20
CN110265064B true CN110265064B (en) 2021-10-08

Family

ID=67917850

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910506938.3A Active CN110265064B (en) 2019-06-12 2019-06-12 Audio frequency crackle detection method, device and storage medium

Country Status (2)

Country Link
CN (1) CN110265064B (en)
WO (1) WO2020248308A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111312285B (en) * 2020-01-14 2023-02-14 腾讯音乐娱乐科技(深圳)有限公司 Beginning popping detection method and device
CN113542863B (en) * 2020-04-14 2023-05-23 深圳Tcl数字技术有限公司 Sound processing method, storage medium and intelligent television
CN112151055A (en) * 2020-09-25 2020-12-29 北京猿力未来科技有限公司 Audio processing method and device
CN112735481B (en) * 2020-12-18 2022-08-05 Oppo(重庆)智能科技有限公司 POP sound detection method and device, terminal equipment and storage medium
CN113035223B (en) * 2021-03-12 2023-11-14 北京字节跳动网络技术有限公司 Audio processing method, device, equipment and storage medium
CN113611330A (en) * 2021-07-29 2021-11-05 杭州网易云音乐科技有限公司 Audio detection method and device, electronic equipment and storage medium
CN113744756A (en) * 2021-08-11 2021-12-03 浙江讯飞智能科技有限公司 Equipment quality inspection and audio data expansion method and related device, equipment and medium
CN113613159B (en) * 2021-08-20 2023-07-21 贝壳找房(北京)科技有限公司 Microphone blowing signal detection method, device and system
CN115243183A (en) * 2022-06-29 2022-10-25 上海勤宽科技有限公司 Audio detection method, device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103918030A (en) * 2011-09-29 2014-07-09 杜比国际公司 High quality detection in fm stereo radio signals
CN105118520A (en) * 2015-07-13 2015-12-02 腾讯科技(深圳)有限公司 Elimination method and device of audio beginning sonic boom
CN109616135A (en) * 2018-11-14 2019-04-12 腾讯音乐娱乐科技(深圳)有限公司 Audio-frequency processing method, device and storage medium
CN109658955A (en) * 2019-01-07 2019-04-19 环鸿电子(昆山)有限公司 Sonic boom detection method and device
CN109801646A (en) * 2019-01-31 2019-05-24 北京嘉楠捷思信息技术有限公司 Voice endpoint detection method and device based on fusion features

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050120870A1 (en) * 1998-05-15 2005-06-09 Ludwig Lester F. Envelope-controlled dynamic layering of audio signal processing and synthesis for music applications
EP1685554A1 (en) * 2003-10-09 2006-08-02 TEAC America, Inc. Method, apparatus, and system for synthesizing an audio performance using convolution at multiple sample rates
US8433582B2 (en) * 2008-02-01 2013-04-30 Motorola Mobility Llc Method and apparatus for estimating high-band energy in a bandwidth extension system
CN103650040B (en) * 2011-05-16 2017-08-25 谷歌公司 Use the noise suppressing method and device of multiple features modeling analysis speech/noise possibility
CN105989853B (en) * 2015-02-28 2020-08-18 科大讯飞股份有限公司 Audio quality evaluation method and system
CN107346665A (en) * 2017-06-29 2017-11-14 广州视源电子科技股份有限公司 Method, apparatus, equipment and the storage medium of audio detection
CN108198572A (en) * 2017-12-29 2018-06-22 珠海市君天电子科技有限公司 A kind of audio-frequency processing method and device
CN108492837B (en) * 2018-03-23 2020-10-13 腾讯音乐娱乐科技(深圳)有限公司 Method, device and storage medium for detecting audio burst white noise

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103918030A (en) * 2011-09-29 2014-07-09 杜比国际公司 High quality detection in fm stereo radio signals
CN105118520A (en) * 2015-07-13 2015-12-02 腾讯科技(深圳)有限公司 Elimination method and device of audio beginning sonic boom
CN109616135A (en) * 2018-11-14 2019-04-12 腾讯音乐娱乐科技(深圳)有限公司 Audio-frequency processing method, device and storage medium
CN109658955A (en) * 2019-01-07 2019-04-19 环鸿电子(昆山)有限公司 Sonic boom detection method and device
CN109801646A (en) * 2019-01-31 2019-05-24 北京嘉楠捷思信息技术有限公司 Voice endpoint detection method and device based on fusion features

Also Published As

Publication number Publication date
WO2020248308A1 (en) 2020-12-17
CN110265064A (en) 2019-09-20

Similar Documents

Publication Publication Date Title
CN110265064B (en) Audio frequency crackle detection method, device and storage medium
RU2743315C1 (en) Method of music classification and a method of detecting music beat parts, a data medium and a computer device
US9111526B2 (en) Systems, method, apparatus, and computer-readable media for decomposition of a multichannel music signal
CN105118522B (en) Noise detection method and device
CN110111811B (en) Audio signal detection method, device and storage medium
EP1895507B1 (en) Pitch estimation, apparatus, pitch estimation method, and program
US9646592B2 (en) Audio signal analysis
Kumar Real-time performance evaluation of modified cascaded median-based noise estimation for speech enhancement system
CN113259832B (en) Microphone array detection method and device, electronic equipment and storage medium
CN112712816A (en) Training method and device of voice processing model and voice processing method and device
CN110688518A (en) Rhythm point determining method, device, equipment and storage medium
CN112309426A (en) Voice processing model training method and device and voice processing method and device
JP5395399B2 (en) Mobile terminal, beat position estimating method and beat position estimating program
JP2013250548A (en) Processing device, processing method, program, and processing system
CN112866770B (en) Equipment control method and device, electronic equipment and storage medium
CN112151055A (en) Audio processing method and device
JP5815435B2 (en) Sound source position determination apparatus, sound source position determination method, program
CN112652290A (en) Method for generating reverberation audio signal and training method of audio processing model
CN112951263A (en) Speech enhancement method, apparatus, device and storage medium
CN111782859A (en) Audio visualization method and device and storage medium
US9398387B2 (en) Sound processing device, sound processing method, and program
CN115731943A (en) Plosive detection method, plosive detection system, storage medium and electronic equipment
CN112735481B (en) POP sound detection method and device, terminal equipment and storage medium
CN113593604A (en) Method, device and storage medium for detecting audio quality
CN113555031A (en) Training method and device of voice enhancement model and voice enhancement method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant