CN110265064A

CN110265064A - Audio sonic boom detection method, device and storage medium

Info

Publication number: CN110265064A
Application number: CN201910506938.3A
Authority: CN
Inventors: 陈洲旋
Original assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Current assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date: 2019-06-12
Filing date: 2019-06-12
Publication date: 2019-09-20
Anticipated expiration: 2039-06-12
Also published as: CN110265064B; WO2020248308A1

Abstract

The embodiment of the present application discloses a kind of audio sonic boom detection method, device and storage medium, the application is when carrying out sonic boom detection to audio signal, available audio signal to be detected, the audio signal is divided into multiple frame signals, then, the short-time energy for calculating two neighboring frame signal is poor, then, the frame signal for meeting preset condition section is obtained according to the short-time energy difference, obtain mutation audio signal, subsequently, calculate the frequency spectrum flatness of the mutation audio signal, if the frequency spectrum flatness, which is greater than, presets flat value, then determine that there are sonic booms for the audio signal；The program can accurately detect audio signal with the presence or absence of sonic boom.

Description

Audio sonic boom detection method, device and storage medium

Technical field

This application involves fields of communication technology, and in particular to a kind of audio sonic boom detection method, device and storage medium.

Background technique

As Internet technology continues to develop, internet there are all kinds of audio files of magnanimity, as music/speech/monologue story-telling with gestures/ Various types of audio files such as chat.It the step of due to a series of complex such as audio is by recording, processing, transmission, storages, can The phenomenon that " distortion ", such as beginning sonic boom, burr, breakpoint etc. can occur.Starting sonic boom is a kind of relatively common distortion phenomenon. " beginning sonic boom " refers to that the beginning part in musical waveform sounds like a sound of " clatter ", this thorn there is of short duration pulse The unnatural sound of ear can bring poor user experience to hearer.It shows, deposits in the statistics case to a library Reach 10% in the audio accounting of beginning sonic boom, due to the presence of sonic boom, causes audio quality poor.Therefore, it correctly detects out It is extremely important that audio starts sonic boom.

Summary of the invention

The embodiment of the present application provides a kind of audio sonic boom detection method, device and storage medium, can be used for detecting audio It is lacked in signal with the presence or absence of frequency band, to effectively and rapidly filter out the audio file of frequency band missing.

The embodiment of the present application provides a kind of audio sonic boom detection method, comprising:

Audio signal to be detected is obtained, the audio signal is divided into multiple frame signals；

The short-time energy for calculating two neighboring frame signal is poor；

The frame signal for meeting preset condition section is obtained according to the short-time energy difference, obtains mutation audio signal；

The frequency spectrum flatness for calculating the mutation audio signal, if the frequency spectrum flatness, which is greater than, presets flat value, really There are sonic booms for the fixed audio signal.

Optionally, in some embodiments, described to divide the audio signal in the audio sonic boom detection method For multiple frame signals, comprising:

The signal for choosing preset time period to the audio signal since first frame in time domain obtains beginning audio letter Number；

The beginning audio signal is divided into multiple frame signals.

Optionally, in some embodiments, described to calculate two neighboring frame signal in the audio sonic boom detection method Short-time energy it is poor, comprising:

Calculate the short-time energy of each frame signal；

Obtain the time of each frame signal；

The difference between the short-time energy of two neighboring frame signal is successively calculated according to the time sequencing of the frame signal, is obtained The short-time energy of two neighboring frame signal is poor.

Optionally, in some embodiments, described poor according to the short-time energy in the audio sonic boom detection method The frame signal for meeting preset condition section is obtained, mutation audio signal is obtained, comprising:

Two frame signals that the short-time energy difference is greater than preset threshold are obtained, it will be in two frame signals according to time sequencing Following frame signal be determined as start frame signal；

Two frame signals that the short-time energy difference is less than preset threshold negative value are obtained after the beginning frame signal, according to Following frame signal in two frame signals is determined as terminating frame signal by time sequencing；

The beginning frame signal is obtained to the signal between the end frame signal, obtains mutation audio signal.

Optionally, in some embodiments, described after the beginning frame signal in the audio sonic boom detection method Two frame signals that the short-time energy difference is less than preset threshold negative value are obtained, it will be after in two frame signals according to time sequencing One frame signal is determined as terminating frame signal, comprising:

Successively judge whether the short-time energy difference is less than preset threshold in chronological order after the beginning frame signal Negative value；

When detecting that the short-time energy difference is less than preset threshold negative value for the first time, will be less than according to time sequencing default Following frame signal in two frame signals of threshold value negative value is determined as terminating frame signal.

Optionally, in some embodiments, described to calculate the mutation audio letter in the audio sonic boom detection method Number frequency spectrum flatness, comprising:

Detect the peak position of the mutation audio signal；

Multiple fixed sample point composition sonic boom audio frames are respectively taken before and after the peak position；

Calculate the frequency spectrum flatness of the sonic boom audio frame.

Optionally, in some embodiments, in the audio sonic boom detection method, if the frequency spectrum flatness is big In presetting flat value, it is determined that there are sonic booms for the audio signal, comprising:

Judge whether the frequency spectrum flatness is greater than and presets flat value；

If the frequency spectrum flatness, which is greater than, presets flat value, it is determined that there are sonic booms for the audio signal；

If the frequency spectrum flatness, which is less than, presets flat value, it is determined that sonic boom is not present in the audio signal.

Optionally, in some embodiments, in the audio sonic boom detection method, if the frequency spectrum flatness is big In presetting flat value, it is determined that there are after sonic boom for the audio signal, further includes:

The frame signal for executing and obtaining according to the short-time energy difference and meeting preset condition section is returned, mutation audio letter is obtained Number the step of, until audio signal to be detected detection finish.

Correspondingly, the embodiment of the present application also provides a kind of audio sonic boom detection device, comprising:

The audio signal is divided into multiple frame signals for obtaining audio signal to be detected by framing module；

Computing module, the short-time energy for calculating two neighboring frame signal are poor；

Module is obtained, for obtaining the frame signal for meeting preset condition section according to the short-time energy difference, is mutated Audio signal；

Judgment module, for calculating the frequency spectrum flatness of the mutation audio signal, if the frequency spectrum flatness is greater than in advance If flat value, it is determined that there are sonic booms for the audio signal.

Optionally, in some embodiments, in the audio sonic boom detection device, the framing module, comprising:

Submodule is chosen, for choosing the signal of preset time period to the audio signal since first frame in time domain, Obtain beginning audio signal；

Framing submodule, for the beginning audio signal to be divided into multiple frame signals.

Optionally, in some embodiments, in the audio sonic boom detection device, the computing module, comprising:

Energy submodule, for calculating the short-time energy of each frame signal；

Acquisition submodule, for obtaining the time of each frame signal；

Energy difference submodule, for successively calculating two neighboring frame signal in short-term according to the time sequencing of the frame signal Difference between energy, the short-time energy for obtaining two neighboring frame signal are poor.

Optionally, in some embodiments, in the audio sonic boom detection device, the energy difference submodule, specifically Two frame signals for being greater than preset threshold for obtaining the short-time energy difference, will be after in two frame signals according to time sequencing One frame signal is determined as starting frame signal；It is negative less than preset threshold that the short-time energy difference is obtained after the beginning frame signal Following frame signal in two frame signals is determined as terminating frame signal according to time sequencing by two frame signals of value；It obtains The frame signal that starts obtains mutation audio signal to the signal between the end frame signal.

Optionally, in some embodiments, in the audio sonic boom detection device, the energy difference submodule, specifically For successively judging whether the short-time energy difference is less than the negative of preset threshold in chronological order after the beginning frame signal Value；When detecting that the short-time energy difference is less than preset threshold negative value for the first time, preset threshold will be less than according to time sequencing Following frame signal in two frame signals of negative value is determined as terminating frame signal.

Optionally, in some embodiments, in the audio sonic boom detection device, the judgment module, comprising:

Detection sub-module, for detecting the peak position of the mutation audio signal；

Submodule is sampled, for respectively taking multiple fixed sample point composition sonic boom audio frames before and after the peak position；

Computational submodule, for calculating the frequency spectrum flatness of the sonic boom audio frame.

Optionally, in some embodiments, in the audio sonic boom detection device, the judgment module is specifically used for Judge whether the frequency spectrum flatness is greater than and presets flat value；If the frequency spectrum flatness, which is greater than, presets flat value, it is determined that institute Stating audio signal, there are sonic booms；If the frequency spectrum flatness, which is less than, presets flat value, it is determined that there is no quick-fried for the audio signal Sound.

Optionally, in some embodiments, in the audio sonic boom detection device, further includes:

Detection module obtains the frame signal for meeting preset condition section according to the short-time energy difference for returning to execute, The step of obtaining mutation audio signal, until audio signal to be detected detection finishes.

In addition, the embodiment of the present application also provides a kind of storage medium, the storage medium is stored with a plurality of instruction, the finger It enables and being loaded suitable for processor, to execute the step in any audio sonic boom detection method provided by the embodiments of the present application.

The application is when carrying out sonic boom detection to audio signal, available audio signal to be detected, by the audio Signal is divided into multiple frame signals, and then, the short-time energy for calculating two neighboring frame signal is poor, then, in short-term can according to described Amount difference obtains the frame signal for meeting preset condition section, obtains mutation audio signal and subsequently calculates the mutation audio signal Frequency spectrum flatness, if the frequency spectrum flatness be greater than preset flat value, it is determined that there are sonic booms for the audio signal；The program By carrying out framing to audio signal, the time domain short-time energy of every frame audio signal is then calculated, is looked for by short-time energy difference The audio frame position of energy jump out, finds out mutation audio signal, then calculates its frequency spectrum flatness, passes through ground spectral flatness It spends accurately to filter out the audio file of frequency band missing.

Detailed description of the invention

In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, the drawings in the following description are only some examples of the present application, for For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.

Fig. 1 a is the schematic diagram of a scenario of audio sonic boom detection method provided by the embodiments of the present application；

Fig. 1 b is the first pass schematic diagram of audio sonic boom detection method provided by the embodiments of the present application；

Fig. 2 a is the second procedure schematic diagram of audio sonic boom detection method provided by the embodiments of the present application；

Fig. 2 b is the schematic diagram of the audio signal of audio sonic boom detection method provided by the embodiments of the present application；

Fig. 3 a is the first structure diagram of audio sonic boom detection device provided by the embodiments of the present application；

Fig. 3 b is the second structural schematic diagram of audio sonic boom detection device provided by the embodiments of the present application；

Fig. 4 is the structural schematic diagram of the network equipment provided by the embodiments of the present application.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on Embodiment in the application, those skilled in the art's every other implementation obtained without creative efforts Example, shall fall in the protection scope of this application.

Term " first ", " second " and " third " in the application etc. are for distinguishing different objects, rather than for retouching State particular order.In addition, term " includes " and " having " and their any deformations, it is intended that cover and non-exclusive include.

The embodiment of the present application provides a kind of audio sonic boom detection method, device and storage medium.

Wherein, which specifically can integrate in the network device, which can be terminal Or the equipment such as server, for example, with reference to Fig. 1 a, user, can when needing the audio file to magnanimity to carry out beginning sonic boom detection It is handled with triggering the network equipment to these audio files, the available audio signal to be detected of the network equipment, by the sound Frequency signal is divided into multiple frame signals, and then, the short-time energy for calculating two neighboring frame signal is poor, then, in short-term can according to this Amount difference obtains the frame signal for meeting preset condition section, obtains mutation audio signal and subsequently calculates the mutation audio signal Frequency spectrum flatness, if the frequency spectrum flatness, which is greater than, presets flat value, it is determined that there are sonic booms for the audio signal.

It is described in detail separately below.It should be noted that the sequence of following embodiment is not as preferably suitable to embodiment The restriction of sequence.

In the present embodiment, it will be described from the angle of audio sonic boom detection device, audio sonic boom detection device tool Body can integrate in the network device, which can be the equipment such as terminal or server, wherein the terminal may include Tablet computer, laptop or personal computer (Personal Computer, PC) etc..

The embodiment of the present application provides a kind of audio sonic boom detection method, comprising: audio signal to be detected is obtained, by the sound Frequency signal is divided into multiple frame signals, and then, the short-time energy for calculating two neighboring frame signal is poor, then, in short-term can according to this Amount difference obtains the frame signal for meeting preset condition section, obtains mutation audio signal and subsequently calculates the mutation audio signal Frequency spectrum flatness, if the frequency spectrum flatness, which is greater than, presets flat value, it is determined that there are sonic booms for the audio signal.

As shown in Figure 1 b, the detailed process of the audio sonic boom detection method can be such that

101, audio signal to be detected is obtained, which is divided into multiple frame signals.

For example, specifically first audio file can be obtained from the various approach such as network, mobile phone or video, and then it is supplied to The audio sonic boom detection device, that is, the audio sonic boom detection device specifically can receive the audio file that various approach are got, Audio signal to be detected is extracted from these files again.Then, these audio signals are divided into multiple frame signals.

Wherein, audio file can be with are as follows: audio files and musical instrument digital interface (Musical Instrument Digital Interface, MIDI) file.Audio files is the original sound recorded by sound recording device, directly has recorded true sound The binary sampled data of sound；MIDI file is a kind of musical performance instruction sequence, using audio output device or with calculating The connected electronic musical instrument of machine is played.And audio signal be with voice, music and audio regular sound wave frequency, Amplitude change information carrier.According to the feature of sound wave, audio-frequency information can be classified as regular audio and irregular sound.Wherein advise Then audio can be divided into voice, music and audio again.Regular audio is a kind of continuously varying analog signal, can be with one continuously Curve indicate, referred to as sound wave.

In order to improve the efficiency of detection, the period of detection can be set at the beginning in the time domain of audio signal, and Sub-frame processing, i.e. step " audio signal is divided into multiple frame signals " are carried out to the audio signal in the period, specifically It can be such that

The signal for choosing preset time period to the audio signal since first frame in time domain obtains beginning audio signal；

The beginning audio signal is divided into multiple frame signals.

102, the short-time energy for calculating two neighboring frame signal is poor.

For example, can specifically calculate the short-time energy of each frame signal, then, the time of each frame signal is obtained, according to The time sequencing of the frame signal successively calculates the difference between the short-time energy of two neighboring frame signal, obtains two neighboring frame signal Short-time energy it is poor.

Wherein, what short-time energy embodied is degree of strength of the signal in different moments.The short-time energy E's of each frame signal Calculating can be such that

Wherein, N is the sampling number of every frame signal, and n is the sampled point of frame signal, and t indicates the position of frame signal, and E (t) is The short-time energy of t frame signal.

Wherein, the short-time energy for calculating two neighboring frame signal is poor, can be such that

p_t=E (t)-E (t-1)

Wherein, t is the position of frame, p_tShort-time energy for two neighboring frame signal is poor.

103, the frame signal for meeting preset condition section is obtained according to the short-time energy difference, obtains mutation audio signal.

Wherein, the setting means of preset condition can there are many kinds of, for example, can flexibly be set according to the demand of practical application It sets, storage can also be pre-set in the network device.In addition, preset condition can be built in the network equipment, alternatively, Can save in memory and be sent to the network equipment, etc..

For example, specific available short-time energy difference is greater than two frame signals of preset threshold, it will according to time sequencing Following frame signal in two frame signals is determined as starting frame signal, and it is small that the short-time energy difference is obtained after this starts frame signal In two frame signals of preset threshold negative value, the following frame signal in two frame signals is determined as terminating according to time sequencing Frame signal, obtaining this and starting frame signal terminates signal between frame signal to this, obtains mutation audio signal.

Wherein, the setting means of preset threshold (threshold), abbreviation Th can also there are many kinds of, for example, can basis The demand flexible setting of practical application can also pre-set storage in the network device.In addition, preset threshold can be built-in In the network equipment, alternatively, can also save in memory and be sent to the network equipment, etc..

In order to subsequent frequency-flat degree calculating closer to preset condition section true value, in order to make testing result Accuracy is higher, can take the frame letter for detecting that short-time energy difference is less than preset threshold negative value for the first time after starting frame signal Following frame signal in number two frame signals is to terminate frame signal, i.e. step " should obtain this in short-term after this starts frame signal Energy difference is less than two frame signals of preset threshold negative value, according to time sequencing that the following frame signal in two frame signals is true Being set to terminates frame signal ", specifically it can be such that

Successively judge whether the short-time energy difference is less than the negative of preset threshold in chronological order after this starts frame signal Value；

When detecting that the short-time energy difference is less than preset threshold negative value for the first time, default threshold will be less than according to time sequencing Following frame signal in two frame signals of value negative value is determined as terminating frame signal.

104, the frequency spectrum flatness of the mutation audio signal is calculated, if the frequency spectrum flatness, which is greater than, presets flat value, really There are sonic booms for the fixed audio signal.

For example, specifically the mutation audio signal can be carried out Fourier transformation, frequency domain mutation audio signal is obtained, is calculated Then the frequency spectrum flatness of frequency domain mutation audio signal judges whether the frequency spectrum flatness is greater than and presets flat value；If the frequency Spectrum flatness, which is greater than, presets flat value, it is determined that there are sonic booms for the audio signal；If the frequency spectrum flatness, which is less than, presets flat value, Then determining the audio signal, there is no sonic booms.

Wherein, preset flat value setting means can also there are many kinds of, for example, can be according to the spirit of the demand of practical application Setting living, can also pre-set storage in the network device.In addition, presetting flat value can be built in the network equipment, Alternatively, can also save in memory and be sent to the network equipment, etc..

Wherein, frequency spectrum flatness, also referred to as wiener entropy are in Digital Signal Processing for characterizing the measurement of audible spectrum. Frequency spectrum flatness can pass through the geometric mean (Geometric Mean, GM) and arithmetic average to signal The ratio of (Arithmetic Mean, AM) is measured, and frequency spectrum flatness (SpectralFlatness is generally also Measure, SFM).That is:

Wherein, w (n) is window function, and k is the frequency point that frequency domain is mutated audio signal, and X is that frequency domain is mutated audio signal.Wherein, Window function can choose rectangular window, quarter window or Hanning window etc..

F (t)=GM (t)/AM (t)

Wherein, GM (t) is the geometric mean that frequency domain is mutated audio signal, and AM (t) is the calculation that frequency domain is mutated audio signal Art average, F (t) are frequency spectrum flatness.

It, can be with for example, guarantee that the audio to user experience does not have flaw to further promote the accuracy of detection The peak position of the mutation audio signal is first detected, then centered on the peak position, respectively takes N/2 groups of samples to the left and right At a sonic boom audio frame, i.e. sonic boom audio frame one shares N number of sampled point.Therefore, step " calculates the frequency of the mutation audio signal Compose flatness ", specifically it can be such that

Detect the peak position of the mutation audio signal；

Calculate the frequency spectrum flatness of the sonic boom audio frame.

After detecting a sonic boom, for the accuracy of subsequent reparation, the acquisition of short-time energy difference can be continued to test The frame signal for meeting preset condition section is finished until all audio signals to be detected all detect, i.e., step is " if the frequency spectrum is flat Smooth degree, which is greater than, presets flat value, it is determined that there are sonic booms for the audio signal " after, can also include:

The frame signal for executing and obtaining according to the short-time energy difference and meeting preset condition section is returned, mutation audio signal is obtained The step of, until audio signal to be detected detection finishes.

After audio signal detection finishes, the interface of testing result can be generated, which includes detection interface, this connects Mouth can receive the testing result of audio signal to be detected, whether detect audio sonic boom in the interface prompt after the completion of detection Signal.

From the foregoing, it will be observed that the present embodiment to audio signal carry out sonic boom detection when, available audio signal to be detected, The audio signal is divided into multiple frame signals, then, the short-time energy for calculating two neighboring frame signal is poor, then, according to this Short-time energy difference obtains the frame signal for meeting preset condition section, obtains mutation audio signal and subsequently calculates the mutation audio The frequency spectrum flatness of signal, if the frequency spectrum flatness, which is greater than, presets flat value, it is determined that there are sonic booms for the audio signal；The program By carrying out framing to audio signal, the time domain short-time energy of every frame audio signal is then calculated, is looked for by short-time energy difference The audio frame position of energy jump out, finds out mutation audio signal, then calculates its frequency spectrum flatness, passes through ground spectral flatness It spends accurately to filter out the audio file of frequency band missing.

The method according to described in preceding embodiment will specifically be integrated in network below with the audio sonic boom detection device and set Standby middle citing is described in further detail.

As shown in Figure 2 a, a kind of audio sonic boom detection method, detailed process can be such that

201, the network equipment obtains audio signal to be detected.

For example, user can specifically obtain audio file from the various approach such as network, mobile phone or video, and then provide To the network equipment, the network equipment can receive the audio file that various approach are got, and extract from these files to be checked The audio signal of survey.

202, the audio signal is carried out framing by the network equipment, obtains frame signal.

For example, in order to improve the efficiency of detection, the network equipment can set inspection at the beginning in the time domain of audio signal The period of survey, and sub-frame processing is carried out to the audio signal in the period, i.e. the audio signal " is divided into multiple by step Frame signal " specifically can be such that

The beginning audio signal is divided into multiple frame signals.

203, the network equipment calculate two neighboring frame signal short-time energy it is poor.

For example, the network equipment can specifically calculate the short-time energy of each frame signal, then, obtain each frame signal when Between, the difference between the short-time energy of two neighboring frame signal is successively calculated according to the time sequencing of the frame signal, obtains adjacent two The short-time energy of a frame signal is poor.

p_t=E (t)-E (t-1)

204, the network equipment obtains the frame signal for meeting preset condition section according to the short-time energy difference, obtains mutation audio Signal.

For example, the network equipment specifically the available short-time energy difference be greater than preset threshold two frame signals, according to when Between sequence by the following frame signal in two frame signals be determined as start frame signal, obtain this in short-term after this starts frame signal Energy difference is less than two frame signals of preset threshold negative value, according to time sequencing that the following frame signal in two frame signals is true It is set to end frame signal, obtaining this and starting frame signal terminates signal between frame signal to this, obtains mutation audio signal.Than Such as, as shown in Figure 2 b, the short-time energy difference p of E (2) and E (3) is calculated₃If p₃> Th, then starting frame signal is third frame signal a, The short-time energy for continuing to calculate the two neighboring frame signal after third frame signal is poor, if getting the short-time energy of E (3) and E (4) Poor p₄<-Th, then terminating frame signal is the 4th frame signal b, using third frame signal a to the 4th frame signal b as the audio signal It is mutated audio signal.

Wherein, the setting means of preset threshold can also there are many kinds of, for example, can be flexible according to the demand of practical application Setting can also pre-set storage in the network device.In addition, preset threshold can be built in the network equipment, alternatively, Can also save in memory and be sent to the network equipment, etc..

205, the network equipment calculates the frequency spectrum flatness of the mutation audio signal.

For example, the mutation audio signal specifically can be carried out Fourier transformation by the network equipment, frequency domain mutation audio is obtained Then signal calculates the frequency spectrum flatness of frequency domain mutation audio signal.

Wherein, frequency spectrum flatness, also referred to as wiener entropy are in Digital Signal Processing for characterizing the measurement of audible spectrum. Frequency spectrum flatness can be measured by the ratio of geometric mean (GM) to signal and arithmetic average (AM), generally Also it is frequency spectrum flatness.That is:

F (t)=GM (t)/AM (t)

For example, guaranteeing that the audio to user experience does not have flaw, network to further promote the accuracy of detection Equipment can first detect the peak position of the mutation audio signal, then centered on the peak position, respectively take to the left and right identical Multiple groups of samples specifically can detecte the peak position of the mutation audio signal at a sonic boom audio frame；In the peak value The front and back of position respectively takes multiple fixed sample point composition sonic boom audio frames；Calculate the frequency spectrum flatness of the sonic boom audio frame.

For example, as shown in Figure 2 b, centered on the peak position of the mutation audio signal, respectively taking N/2 sampling to the left and right Point forms a sonic boom audio frame c, i.e. sonic boom audio frame c mono- shares N number of sampled point, then calculates the frequency of sonic boom audio frame c Compose flatness.

206, the network equipment judges whether the frequency spectrum flatness is greater than and presets flat value, if the frequency spectrum flatness be greater than it is default Flat value, it is determined that there are sonic booms for the audio signal.

For example, the network equipment, which specifically may determine that whether the frequency spectrum flatness is greater than, presets flat value；If the spectral flatness Degree, which is greater than, presets flat value, it is determined that there are sonic booms for the audio signal；If the frequency spectrum flatness, which is less than, presets flat value, it is determined that Sonic boom is not present in the audio signal.

207, the network equipment judges whether audio signal to be detected detects and finishes, if nothing, return execute it is short according to this When energy difference obtain and meet the frame signal (returning to step 204) in preset condition section, obtain the step of mutation audio signal Suddenly, until audio signal to be detected detection finishes.

For example, for the accuracy of subsequent reparation, the network equipment can continue to test short after detecting a sonic boom When energy difference obtain and meet the frame signal in preset condition section, finish, that is, return until all audio signals to be detected all detect The step of receipt row obtains the frame signal for meeting preset condition section according to the short-time energy difference, obtains mutation audio signal, directly It is finished to audio signal to be detected detection.For example, presetting flat value according to the judgement of the frequency spectrum flatness of the mutation audio signal Whether it is greater than after presetting flat value, no matter whether judging result, which is greater than, is preset flat value, can also continue to detection the 4th frame letter Frame signal after number finishes until all frame signals detect, obtains testing result.

Optionally, after audio signal detection finishes, the interface of testing result can be generated, which includes that detection connects Mouthful, whether which can receive the testing result of audio signal to be detected, detects after the completion of detection in the interface prompt Audio detonator signal.

Optionally, after detecting beginning sonic boom, these frequency band deleted signals can also be repaired or is replaced, with Guarantee that user can be with the good audio file of uppick.

From the foregoing, it will be observed that the network equipment of the present embodiment is when carrying out sonic boom detection to audio signal, it is available to be detected Audio signal, which is divided into multiple frame signals, then, the short-time energy for calculating two neighboring frame signal is poor, Then, the frame signal for meeting preset condition section is obtained according to the short-time energy difference, obtains mutation audio signal, subsequently, meter The frequency spectrum flatness of the mutation audio signal is calculated, if the frequency spectrum flatness, which is greater than, presets flat value, it is determined that the audio signal is deposited In sonic boom；Then the program calculates the time domain short-time energy of every frame audio signal, passes through by carrying out framing to audio signal Short-time energy difference finds out the audio frame position of energy jump, finds out mutation audio signal, then calculates its frequency spectrum flatness, leads to Ground frequency spectrum flatness is crossed accurately to filter out the audio file of frequency band missing.

In addition, the program can also be repaired or be replaced to beginning sonic boom, it is thus possible to improve the matter of audio file Amount improves user experience.

In order to better implement audio sonic boom detection method provided by the embodiments of the present application, the embodiment of the present application also provides one Kind audio sonic boom detection device, the audio sonic boom detection device specifically can integrate in such as mobile phone, tablet computer, palm PC etc. In the network equipment.Wherein the meaning of noun is identical with above-mentioned audio sonic boom detection method, and specific implementation details can be with reference to side Explanation in method embodiment.

For example, as shown in Figure 3a, audio sonic boom detection device may include framing module 301, computing module 302, obtain Module 303 and judgment module 304, as follows:

(1) framing module 301；

The audio signal is divided into multiple frame signals for obtaining audio signal to be detected by framing module 301.

For example, framing module 301, specifically can first obtain audio text from the various approach such as network, mobile phone or video Part, and then it is supplied to the audio sonic boom detection device, that is, the audio sonic boom detection device specifically can receive various approach and obtain The audio file arrived, then audio signal to be detected is extracted from these files.Then, these audio signals are divided into more A frame signal.

In order to improve the efficiency of detection, the period of detection can be set at the beginning in the time domain of audio signal, and Sub-frame processing is carried out to the audio signal in the period, i.e. framing module may include choosing submodule and framing submodule, It is as follows:

Submodule is chosen, for choosing the signal of preset time period to the audio signal since first frame in time domain, is obtained To beginning audio signal；

(2) computing module 302；

Computing module 302, the short-time energy for calculating two neighboring frame signal are poor.

It may include energy submodule, acquisition submodule and energy difference submodule for example, computing module 302, as follows:

Energy submodule, for calculating the short-time energy of each frame signal；

Acquisition submodule, for obtaining the time of each frame signal；

Energy difference submodule, for successively calculating in short-term capable of for two neighboring frame signal according to the time sequencing of the frame signal Difference between amount, the short-time energy for obtaining two neighboring frame signal are poor.

p_t=E (t)-E (t-1)

(3) module 303 is obtained；

Module 303 is obtained, for obtaining the frame signal for meeting preset condition section according to the short-time energy difference, is mutated Audio signal.

For example, obtaining module 303, specific available short-time energy difference is greater than two frame signals of preset threshold, root The following frame signal in two frame signals is determined as according to time sequencing to start frame signal, obtaining after this starts frame signal should Short-time energy difference is less than two frame signals of preset threshold negative value, is believed the following frame in two frame signals according to time sequencing Number it is determined as terminating frame signal, obtaining this and starting frame signal terminates signal between frame signal to this, obtains being mutated audio signal.

In order to subsequent frequency-flat degree calculating closer to preset condition section true value, in order to make testing result Accuracy is higher, can take the frame letter for detecting that short-time energy difference is less than preset threshold negative value for the first time after starting frame signal Following frame signal in number two frame signals is to terminate frame signal, i.e. acquisition module can specifically perform the following operations:

(4) judgment module 304；

Judgment module 304 is preset for calculating the frequency spectrum flatness of the mutation audio signal if the frequency spectrum flatness is greater than Flat value, it is determined that there are sonic booms for the audio signal.

For example, judgment module 304, specifically can carry out Fourier transformation for the mutation audio signal, obtain frequency domain mutation Then audio signal, the frequency spectrum flatness for calculating frequency domain mutation audio signal it is default to judge whether the frequency spectrum flatness is greater than Flat value；If the frequency spectrum flatness, which is greater than, presets flat value, it is determined that there are sonic booms for the audio signal；If the frequency spectrum flatness is small In presetting flat value, it is determined that sonic boom is not present in the audio signal.

F (t)=GM (t)/AM (t)

It, can be with for example, guarantee that the audio to user experience does not have flaw to further promote the accuracy of detection The peak position of the mutation audio signal is first detected, then centered on the peak position, respectively takes N/2 groups of samples to the left and right At a sonic boom audio frame, i.e. sonic boom audio frame one shares N number of sampled point.Therefore, judgment module can specifically include detection Module, sampling submodule and computational submodule are as follows:

Submodule is sampled, respectively takes multiple fixed sample point composition sonic booms before and after the peak position for sampling subelement Audio frame；

Computational submodule calculates the frequency spectrum flatness of the sonic boom audio frame.

After detecting a sonic boom, for the accuracy of subsequent reparation, the acquisition of short-time energy difference can be continued to test The frame signal for meeting preset condition section is finished until all audio signals to be detected all detect, i.e. audio sonic boom detection dress It sets, can also include detection module 305 as shown in Figure 3b, as follows:

Detection module 305 obtains the frame signal for meeting preset condition section according to the short-time energy difference for returning to execute, The step of obtaining mutation audio signal, until audio signal to be detected detection finishes.

It will be understood by those skilled in the art that the limit of the not structure twin installation of audio sonic boom detection device shown in Fig. 3 a It is fixed, it may include perhaps combining certain components or different component layouts than illustrating more or fewer components.In addition, needing Illustrate, the specific implementation of above-mentioned each unit can be found in the embodiment of the method for front, and therefore not to repeat here.

From the foregoing, it will be observed that the audio sonic boom detection device of the present embodiment, when carrying out sonic boom detection to audio signal, framing mould The available audio signal to be detected of block 301, is divided into multiple frame signals for the audio signal, and then, computing module 302 is counted The short-time energy for calculating two neighboring frame signal is poor, then, obtains module 303 according to short-time energy difference acquisition and meets preset condition The frame signal in section obtains mutation audio signal, and subsequently, judgment module 304 calculates the spectral flatness of the mutation audio signal Degree, if the frequency spectrum flatness, which is greater than, presets flat value, it is determined that there are sonic booms for the audio signal；The program passes through to audio signal Framing is carried out, the time domain short-time energy of every frame audio signal is then calculated, the sound of energy jump is found out by short-time energy difference Frequency frame position, finds out mutation audio signal, then calculates its frequency spectrum flatness, is accurately screened by ground frequency spectrum flatness The audio file for thering is frequency band to lack out.

Correspondingly, the embodiment of the present invention also provides a kind of network equipment, which can be server or terminal etc. Equipment is integrated with any audio sonic boom detection device provided by the embodiment of the present invention.As shown in figure 4, it illustrates this The structural schematic diagram of the network equipment involved in inventive embodiments, specifically:

The network equipment may include one or more than one processing core processor 401, one or more The components such as memory 402, power supply 403 and the input unit 404 of computer readable storage medium.Those skilled in the art can manage It solves, network equipment infrastructure shown in Fig. 4 does not constitute the restriction to the network equipment, may include more more or fewer than illustrating Component perhaps combines certain components or different component layouts.Wherein:

Processor 401 is the control centre of the network equipment, utilizes various interfaces and connection whole network equipment Various pieces by running or execute the software program and/or module that are stored in memory 402, and are called and are stored in Data in reservoir 402 execute the various functions and processing data of the network equipment, to carry out integral monitoring to the network equipment. Optionally, processor 401 may include one or more processing cores；Preferably, processor 401 can integrate application processor and tune Demodulation processor processed, wherein the main processing operation system of application processor, user interface and application program etc., modulatedemodulate is mediated Reason device mainly handles wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 401 In.

Memory 402 can be used for storing software program and module, and processor 401 is stored in memory 402 by operation Software program and module, thereby executing various function application and data processing.Memory 402 can mainly include storage journey Sequence area and storage data area, wherein storing program area can the (ratio of application program needed for storage program area, at least one function Such as sound-playing function, image player function) etc.；Storage data area, which can be stored, uses created number according to the network equipment According to etc..In addition, memory 402 may include high-speed random access memory, it can also include nonvolatile memory, such as extremely A few disk memory, flush memory device or other volatile solid-state parts.Correspondingly, memory 402 can also wrap Memory Controller is included, to provide access of the processor 401 to memory 402.

The network equipment further includes the power supply 403 powered to all parts, it is preferred that power supply 403 can pass through power management System and processor 401 are logically contiguous, to realize management charging, electric discharge and power managed etc. by power-supply management system Function.Power supply 403 can also include one or more direct current or AC power source, recharging system, power failure monitor The random components such as circuit, power adapter or inverter, power supply status indicator.

The network equipment may also include input unit 404, which can be used for receiving the number or character of input Information, and generate keyboard related with user setting and function control, mouse, operating stick, optics or trackball signal Input.

Although being not shown, the network equipment can also be including display unit etc., and details are not described herein.Specifically in the present embodiment In, the processor 401 in the network equipment can be corresponding by the process of one or more application program according to following instruction Executable file be loaded into memory 402, and the application program being stored in memory 402 is run by processor 401, It is as follows to realize various functions:

Audio signal to be detected is obtained, which is divided into multiple frame signals, then, calculates two neighboring frame The short-time energy of signal is poor, then, the frame signal for meeting preset condition section is obtained according to the short-time energy difference, obtains mutation sound Frequency signal subsequently calculates the frequency spectrum flatness of the mutation audio signal, if the frequency spectrum flatness, which is greater than, presets flat value, Determine that there are sonic booms for the audio signal.

Optionally, which is divided into multiple frame signals, may include:

The signal for choosing preset time period to the audio signal since first frame in time domain obtains beginning audio signal； The beginning audio signal is divided into multiple frame signals.

Optionally, the short-time energy for calculating two neighboring frame signal is poor, may include:

Calculate the short-time energy of each frame signal；Obtain the time of each frame signal；According to the time sequencing of the frame signal The difference between the short-time energy of two neighboring frame signal is successively calculated, the short-time energy for obtaining two neighboring frame signal is poor.

Optionally, the frame signal for meeting preset condition section is obtained according to the short-time energy difference, obtains mutation audio signal, May include:

Two frame signals that the short-time energy difference is greater than preset threshold are obtained, it will be in two frame signals according to time sequencing Following frame signal is determined as starting frame signal；The short-time energy difference is obtained after this starts frame signal less than preset threshold negative value Two frame signals, according to time sequencing by the following frame signal in two frame signals be determined as terminate frame signal；Obtaining should Starting frame signal terminates the signal between frame signal to this, obtains mutation audio signal.

Optionally, two frame signals that the short-time energy difference is less than preset threshold negative value are obtained after this starts frame signal, The following frame signal in two frame signals is determined as according to time sequencing to terminate frame signal, may include:

Successively judge whether the short-time energy difference is less than the negative of preset threshold in chronological order after this starts frame signal Value；When detecting that the short-time energy difference is less than preset threshold negative value for the first time, will be born less than preset threshold according to time sequencing Following frame signal in two frame signals of value is determined as terminating frame signal.

Optionally, the frequency spectrum flatness for calculating the mutation audio signal may include:

Detect the peak position of the mutation audio signal；Multiple fixed sample points are respectively taken to form before and after the peak position Sonic boom audio frame；Calculate the frequency spectrum flatness of the sonic boom audio frame.

Optionally, if the frequency spectrum flatness, which is greater than, presets flat value, it is determined that there are sonic booms for the audio signal, can wrap It includes:

Judge whether the frequency spectrum flatness is greater than and presets flat value；If the frequency spectrum flatness, which is greater than, presets flat value, really There are sonic booms for the fixed audio signal；If the frequency spectrum flatness, which is less than, presets flat value, it is determined that sonic boom is not present in the audio signal.

Optionally, if the frequency spectrum flatness, which is greater than, presets flat value, it is determined that the audio signal may be used also there are after sonic boom To include:

Above each operation is for details, reference can be made to the embodiment of front, and details are not described herein.

It will appreciated by the skilled person that all or part of the steps in the various methods of above-described embodiment can be with It is completed by instructing, or relevant hardware is controlled by instruction to complete, which can store computer-readable deposits in one In storage media, and is loaded and executed by processor.

For this purpose, the embodiment of the present application provides a kind of storage medium, wherein being stored with a plurality of instruction, which can be processed Device is loaded, to execute the step in any audio sonic boom detection method provided by the embodiment of the present application.For example, this refers to Order can execute following steps:

Audio signal to be detected is obtained, which is divided into multiple frame signals, then, calculates two neighboring frame The short-time energy of signal is poor, then, the frame signal for meeting preset condition section is obtained according to the short-time energy difference, obtains mutation sound Frequency signal subsequently calculates the frequency spectrum flatness of the mutation audio signal, if the frequency spectrum flatness, which is greater than, presets flat value, Determine that there are sonic booms for the audio signal

The specific implementation of above each operation can be found in the embodiment of front, and details are not described herein.

Wherein, which may include: read-only memory (Read Only Memory, ROM), random access memory Body (Random Access Memory, RAM), disk or CD etc..

By the instruction stored in the storage medium, it is quick-fried that any audio provided by the embodiment of the present application can be executed Step in sound detection method, it is thereby achieved that any audio sonic boom that is applied to provided by the embodiment of the present application detects Beneficial effect achieved by method is detailed in the embodiment of front, and details are not described herein.

A kind of audio sonic boom detection method, device and storage medium provided by the embodiment of the present application have been carried out in detail above Thin to introduce, specific examples are used herein to illustrate the principle and implementation manner of the present application, and above embodiments are said It is bright to be merely used to help understand the present processes and its core concept；Meanwhile for those skilled in the art, according to this Shen Thought please, there will be changes in the specific implementation manner and application range, is to sum up somebody's turn to do, the content of the present specification should not be understood For the limitation to the application.

Claims

1. a kind of audio sonic boom detection method characterized by comprising

The short-time energy for calculating two neighboring frame signal is poor；

The frequency spectrum flatness for calculating the mutation audio signal, if the frequency spectrum flatness, which is greater than, presets flat value, it is determined that institute Stating audio signal, there are sonic booms.

2. audio sonic boom detection method according to claim 1, which is characterized in that it is described the audio signal is divided into it is more A frame signal, comprising:

The beginning audio signal is divided into multiple frame signals.

3. audio sonic boom detection method according to claim 1, which is characterized in that described to calculate the short of two neighboring frame signal When energy difference, comprising:

Calculate the short-time energy of each frame signal；

Obtain the time of each frame signal；

The difference between the short-time energy of two neighboring frame signal is successively calculated according to the time sequencing of the frame signal, is obtained adjacent The short-time energy of two frame signals is poor.

4. audio sonic boom detection method according to claim 3, which is characterized in that described to be obtained according to the short-time energy difference The frame signal for meeting preset condition section obtains mutation audio signal, comprising:

Two frame signals that the short-time energy difference is greater than preset threshold are obtained, it will be after in two frame signals according to time sequencing One frame signal is determined as starting frame signal；

Two frame signals that the short-time energy difference is less than preset threshold negative value are obtained after the beginning frame signal, according to the time Following frame signal in two frame signals is determined as terminating frame signal by sequence；

5. audio sonic boom detection method according to claim 4, which is characterized in that described to be obtained after the beginning frame signal The short-time energy difference is less than two frame signals of preset threshold negative value, according to time sequencing by the latter in two frame signals Frame signal is determined as terminating frame signal, comprising:

Successively judge whether the short-time energy difference is less than the negative of preset threshold in chronological order after the beginning frame signal Value；

When detecting that the short-time energy difference is less than preset threshold negative value for the first time, preset threshold will be less than according to time sequencing Following frame signal in two frame signals of negative value is determined as terminating frame signal.

6. audio sonic boom detection method according to claim 1, which is characterized in that the calculating mutation audio signal Frequency spectrum flatness, comprising:

Detect the peak position of the mutation audio signal；

Calculate the frequency spectrum flatness of the sonic boom audio frame.

7. audio sonic boom detection method according to claim 1, which is characterized in that if the frequency spectrum flatness is greater than in advance If flat value, it is determined that there are sonic booms for the audio signal, comprising:

8. audio sonic boom detection method according to claim 1, which is characterized in that if the frequency spectrum flatness is greater than in advance If flat value, it is determined that there are after sonic boom for the audio signal, further includes:

The frame signal for executing and obtaining according to the short-time energy difference and meeting preset condition section is returned, mutation audio signal is obtained Step, until audio signal to be detected detection finishes.

9. a kind of audio sonic boom detection device characterized by comprising

Module is obtained, for obtaining the frame signal for meeting preset condition section according to the short-time energy difference, obtains mutation audio Signal；

Judgment module, for calculating the frequency spectrum flatness of the mutation audio signal, if the frequency spectrum flatness is greater than default put down Smooth value, it is determined that there are sonic booms for the audio signal.

10. a kind of storage medium, which is characterized in that the storage medium is stored with a plurality of instruction, and described instruction is suitable for processor It is loaded, the step in 1 to 8 described in any item audio sonic boom detection methods is required with perform claim.