CN110838302B - Audio frequency segmentation method based on signal energy peak identification - Google Patents

Audio frequency segmentation method based on signal energy peak identification Download PDF

Info

Publication number
CN110838302B
CN110838302B CN201911121998.XA CN201911121998A CN110838302B CN 110838302 B CN110838302 B CN 110838302B CN 201911121998 A CN201911121998 A CN 201911121998A CN 110838302 B CN110838302 B CN 110838302B
Authority
CN
China
Prior art keywords
matrix
frequency
audio
audio signal
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911121998.XA
Other languages
Chinese (zh)
Other versions
CN110838302A (en
Inventor
王旻轩
鲍亭文
金超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Cyberinsight Technology Co ltd
Original Assignee
Beijing Cyberinsight Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Cyberinsight Technology Co ltd filed Critical Beijing Cyberinsight Technology Co ltd
Priority to CN201911121998.XA priority Critical patent/CN110838302B/en
Publication of CN110838302A publication Critical patent/CN110838302A/en
Application granted granted Critical
Publication of CN110838302B publication Critical patent/CN110838302B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Abstract

The application relates to an audio frequency segmentation method based on signal energy spike identification, which comprises the following steps: carrying out short-time Fourier transform on an input audio signal, and converting the input audio signal into a power spectrum matrix; extracting intermediate frequency energy characteristics based on a power spectrum; carrying out peak identification on the extracted intermediate frequency energy characteristics; performing error division correction on the signal subjected to peak identification; and outputting the time coordinate of the division point of the audio signal. The audio segmentation method does not need to set a threshold value and train in advance, can be used for performing analysis on the basis of the audio signal in real time, quickly and accurately, can be deployed at the edge end, does not need to access other operation parameters, and basically realizes parameter-free dynamic segmentation.

Description

Audio frequency segmentation method based on signal energy peak identification
Technical Field
The application relates to an audio segmentation method based on signal energy peak identification, which is applicable to the technical field of audio signal processing.
Background
The main implementation schemes for the simple audio segmentation algorithm are as follows:
1. a segmentation method based on endpoint detection, such as chinese patent with application number CN 200510061358.6. And detecting all mute points as potential points of possible change of the speaker by utilizing the characteristic that the speaker pauses in the speaking interval. Such methods are not accurate because the mute point is difficult to detect under different snr environments.
2. The model-based segmentation method is disclosed in chinese patent application nos. CN201710512310.5 and CN 201811581291.2. Corresponding models are established for different types of audio segments, then model maximum likelihood selection is carried out on an input audio stream in a sliding window, and an audio dividing point is considered as a position where audio categories are changed. In order to establish a generalized model, various model-based segmentation methods are successively proposed and implemented. For example, UBM is used to distinguish between speech segments and non-speech segments, while UGM is used to distinguish between male and female speakers, however, these "a priori knowledge" is generally not available. This method therefore has no detection capability for unknown acoustic features.
3. Distance-based segmentation methods compute the left and right window data "difference" for each sample point in the audio stream, represented by a distance scale. When the "difference" reaches a certain level, i.e. the distance scale exceeds a certain given threshold or a local maximum is taken, it is considered as an audio segmentation point. Although such a method does not require a priori knowledge for decision making and has high segmentation accuracy, the threshold selection depends largely on the audio characteristics, so the method lacks stability and robustness, and is computationally expensive.
Taking a fan blade scene as an example, the main implementation scheme of audio segmentation is to access the real-time rotating speed of the fan blades, and obtain the approximate position of the segmentation point between each blade after operation. This solution is simple and efficient, but has the prominent problems that:
1. the positioning of the division points is not accurate, the actual rotation process is continuously variable, if the rotation time of each blade is calculated and divided according to the average rotation speed within a time range of a certain resolution, only the uniform length can be obtained approximately, and the time used by each blade in the actual rotation process is not necessarily equal. Therefore, the method is only suitable for reference and is not suitable for being used as accurate input of other analysis algorithms;
2. the real-time rotating speed of the connected fan blade has higher requirements on installation of the sensor, acquisition of high-precision rotating speed requires additional sensor hardware of acquisition equipment, engineering implementation difficulty is high, cost is high, maintenance is not facilitated, and the acquisition of the rotating speed of the main shaft is carried out at the cabin part of the fan, the acquisition device is arranged at the tower footing, and an overlong signal transmission line causes interference in acquisition signals, so that data quality is poor, and segmentation and interpretation are seriously influenced.
Disclosure of Invention
The application provides an audio frequency segmentation method based on signal energy peak recognition, which can be used for performing analysis on an audio signal in real time, quickly and accurately without setting a threshold value and training in advance, can be deployed at an edge end, does not need to access other operation parameters, and basically realizes parameter-free dynamic segmentation.
The audio segmentation method based on signal energy spike identification comprises the following steps:
(1) carrying out short-time Fourier transform on an input audio signal, and converting the input audio signal into a power spectrum matrix;
(2) extracting intermediate frequency energy characteristics based on a power spectrum;
(3) carrying out peak identification on the extracted intermediate frequency energy characteristics;
(4) performing error division correction on the signal subjected to peak identification;
(5) and outputting the time coordinate of the division point of the audio signal.
The method for extracting the energy features comprises the following steps:
(1) carrying out short-time Fourier transform on the original audio signal, and converting the original audio signal into a time-frequency domain matrix M0
(2) The time-frequency domain matrix M0Into a spectral matrix M expressed in decibels1
(3) Determining the frequency range of the audio signal as principal element, and aligning the spectrogram matrix M1Performing band-pass filtering to filter out low-frequency environmental noise and high-frequency abnormal sound;
(4) pair spectrogram matrix M1Cutting according to frequency axis, and reserving sub-power spectrum matrix M mainly based on audio signal2
(5) Will M2The column-wise summation of (a) to obtain the sum of each time-domain power spectral vector.
The method for identifying the peak of the extracted intermediate frequency energy features comprises the following steps:
(1) determining the rated rotating speed rs of the rotation of the fan blade and the time length t of the input audio;
(2) calculating to obtain a conversion relation prop of the characteristic index and the time index according to the duration t and the length k of the Energy characteristic;
Figure BDA0002275705630000021
(3) obtaining the rated segmentation step length distance of the feature index according to the rated rotating speed rs and the prop;
Figure BDA0002275705630000022
(4) and searching the feature vectors by using a binary search method until the peak is not searched.
The method for performing error division correction on the signal subjected to peak identification comprises the following steps of:
(1) setting a wrong score judgment threshold value;
(2) removing the division points with the values larger than the misclassification judgment threshold value to obtain final division point coordinates m';
(3) the coordinate m' is converted back to the time index according to the conversion relation prop.
Wherein the band-pass filtering is performed by selecting the upper limit of the cut-off frequency in a matrix M1Coordinate index of vertical axis and selected cutoff frequency lower limit in matrix M1The vertical axis index of the vertical axis is determined by the following formula:
Figure BDA0002275705630000031
wherein UpperBound represents the upper limit of the selected cutoff frequency in the matrix M1Coordinate index of vertical axis, LowerBound for selected cutoff frequency lower bound in matrix M1Vertical axis index of vertical axis, sr is sampling frequency of audio, Freqlow,FrequpIs the frequency range in which the audio signal is the principal element.
The method is specific to the wind sweeping sound of the fan blade, for example, a specific scene of the wind sweeping sound segmentation of the fan blade, combines the algorithm result of the voice analysis, and provides an energy feature extraction method which is specific to the wind sweeping sound of the fan blade and has strong generalization and robustness; the audio segmentation method based on the energy characteristics has no parameters, low computation amount and accurate realization of variable speed cutting, and adds a wrong division post-processing mechanism to further improve the accuracy of segmentation. The method simultaneously provides an energy characteristic extraction mode aiming at wind sweeping sound of the blades of the wind turbine generator based on a priori knowledge, and the energy characteristic extraction mode is used as preprocessing and input of segmentation and has strong robustness. Specifically, the audio segmentation method based on signal energy spike identification has the following technical advantages:
(1) the method for extracting the wind sweeping sound energy characteristics of the blades of the wind turbine generator system comprises the steps of performing band-pass filtering on a power spectrum matrix, filtering low-frequency and high-frequency environment noise by taking an energy matrix of a specific frequency section, and taking the sum of the frequency domain energy of each time domain section as the medium-frequency energy characteristics of the wind sweeping sound of the blades. The characteristic can effectively filter noise point interference caused by too many sampling points and environmental sounds, and can extract information capable of stably representing the wind sweeping regularity characteristic of the fan blade from disordered original audio signals;
(2) the method for extracting the features and searching the energy trough by utilizing the peak recognition method provides a parameter-free method, the real-time rotating speed of a fan is not required to be accessed, a threshold value is not required to be set during segmentation, the priori knowledge of the audio is not required, and meanwhile, the real-time segmentation can be carried out without the need of training in advance. A correction mechanism is added after the segmentation, and the segmentation accuracy is further adjusted; the method is rapid, stable and accurate;
(3) the requirement on deployment conditions is low, a sensor does not need to be additionally installed when the draught fan is built, and only the audio acquisition equipment needs to be installed on the periphery of the equipment, so that the engineering deployment cost is saved, and errors caused by signal interference are avoided; the operation can be unaffected when idling/not generating/stopping because real-time rotating speed information is not needed during operation.
Drawings
Fig. 1 shows a diagram of an original audio signal of an audio a in an embodiment.
Fig. 2 shows the mid-frequency energy profile of audio a after vertical summing.
Fig. 3 shows a schematic diagram of finding all valley locations in fig. 2 using a spike identification algorithm.
Fig. 4 shows an effect diagram showing the division points of the audio a in the original waveform diagram.
Fig. 5 shows an effect diagram showing the division points of the audio a in the power spectrogram.
Fig. 6 shows a diagram of the original audio signal of audio B in an embodiment.
Fig. 7 shows the mid-frequency energy profile of audio B after vertical summation.
FIG. 8 shows a schematic diagram of finding all valley locations in FIG. 7 using a spike identification algorithm.
Fig. 9 shows an effect diagram showing the division points of the audio B in the original waveform diagram.
Fig. 10 shows an effect diagram showing the division point of the audio B in the power spectrogram.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
The audio segmentation method based on signal energy spike identification comprises the following steps:
(1) carrying out short-time Fourier transform on an input audio signal, and converting the input audio signal into a power spectrum matrix;
(2) extracting intermediate frequency energy characteristics based on a power spectrum;
(3) carrying out peak identification on the extracted intermediate frequency energy characteristics;
(4) performing error division correction on the signal subjected to peak identification;
(5) and outputting the time coordinate of the division point of the audio signal.
The following explains and explains an audio signal obtained by applying a specific technology used in the audio segmentation method to blade sweeping in a wind turbine generator system as an example, and it should be noted that the method of the present application can be applied to segmentation of any periodic audio signal, and is not limited to the field of blade sweeping. The audio sensor can be mounted on the periphery of the blade of the wind turbine generator, for example, on a tower, so as to acquire an audio signal generated by the blade sweeping.
Energy feature extraction method
(1) Performing short-time Fourier transform (STFT) on the original audio signal, and converting the STFT into a time-frequency domain matrix M0The dimension of the matrix depends on the parameter setting of the short-time Fourier transform, the setting window length is usually the same as the FFT point number n _ FFT, the range is generally between 1024-0Frequency dimension (number of rows); the overlap length n _ overlap between windows is generally equal to one of the number of FFT pointsHalf, and the audio duration together determine M0Time dimension (number of columns); the window function that is windowed over the signal is typically a hamming window.
For a particular input signal xnAnd window omeganThe short-time Fourier transform is defined as
Figure BDA0002275705630000051
STFTs are typically visualized using log-spectra (log-spectra) for M0Into a spectral matrix M expressed in decibels1The elements in the matrix are converted to decibel form, then M1The spectrogram can be displayed in a thermodynamic diagram form; m1Dimension and M0Keeping consistent, and the specific dimension is jointly determined by n _ fft and n _ overlap.
The STFTs to decibel conversion is defined as
20log10(|Xn|).。
(2) The sensor receives a wind sweeping sound signal of the fan blade, the wind sweeping sound signal is an approximately periodic signal, each blade is in the process of moving away from the sensor to approach the sensor and then moving away, the sound of the audio frequency is in the process of changing from small to large and then changing from small to small, and the fade-out of the previous blade is accompanied with the fade-in of the next blade. However, due to the influence of noise, the signal waveform of the original audio does not necessarily exhibit this characteristic obviously, nor does it necessarily have periodicity; secondly, the original audio has more noise points at high resolution due to higher sampling rate, and is easy to interfere with the subsequent segmentation. The method requires that features which can more obviously represent the blade wind sweeping change trend and the periodic trend, namely the intermediate frequency energy based on the power spectrum, can be extracted on the premise of certain filtering or down-sampling, and the specific operations are as follows:
(1) the sampling frequency sr of the input audio is definite and is generally between 12800 and 51200 Hz;
(2) determining the frequency range Freq taking the wind sweeping sound of the blade as a principal element through experimentslow,FrequpTo spectrogram matrix M1(rather than the original audio signal) to filter out the low frequency loopAmbient noise (mainly wind, thunder, speech, car, etc.) and high-frequency abnormal sounds (e.g. bird sounds, whistling sounds caused by blade damage, etc.). The main frequency range of the general wind sweeping is between 100 and 1000Hz, and the specific filtering method is as follows:
Figure BDA0002275705630000052
wherein UpperBound represents the upper limit of the selected cutoff frequency in the matrix M1Coordinate index of vertical axis, LowerBound for selected cutoff frequency lower bound in matrix M1The vertical axis index of the vertical axis, round is a well-known function and represents the result of rounding operation by a specified decimal place, and length represents a function of the matrix length. Because the processing is carried out based on the matrix subjected to short-time Fourier transform, the number of sample points is reduced in a time domain, and the influence of noise points is reduced; in the frequency domain, the vertical axis h represents frequency information, so that the vertical axis h represents M1Cutting according to a frequency axis, and reserving a sub-power spectrum matrix M mainly based on wind sweeping sound2
M2=M1[LowerBound:UpperBound,;]
(3) Will M2The column direction summation of (a) to obtain the sum Energy of each time domain power spectrum vector:
Energy=sum(M′2)
the obtained medium-frequency Energy characteristic vector Energy can fully represent the periodic and gradual change characteristics of the wind sweeping sound signal, the characteristic pattern is observed, and if a waveform with more regular and obvious wave crests and wave troughs which alternately appear is obtained, the characteristic represents that the blade wind sweeping sound is captured by the characteristic. Each section of wave crest is a blade wind sweeping sound, the wave trough is the wind sweeping alternate position of the two blades, and the blade sound is segmented, namely all the wave trough positions in the audio section are positioned.
Peak identification method for energy characteristics
Energy is a one-dimensional signal, a segmentation point is located at each trough position, and the peak identification method positions all local minimum values by comparing the sizes of the adjacent points. In the peak identification method, a signal peak A [ m ] is defined as any sample point in a signal, the value of the direct adjacent point is higher than the point, the points are approximate to infinity at the beginning and the end of the array, and no dividing point exists:
A[m-1]≥A[m],A[m+1]≥A[m]
the specific method for identifying the peak is as follows:
(1) defining the rated rotating speed rs (in RPM) of the rotation of the fan blade and the time length t of the input audio; (ii) a
(2) Calculating to obtain a conversion relation prop of the characteristic index and the time index according to the duration t and the length k of the Energy characteristic;
Figure BDA0002275705630000061
(3) obtaining the rated segmentation step length distance of the feature index according to the rated rotating speed rs and the prop;
Figure BDA0002275705630000062
(4) searching the feature vector by using a binary search method, firstly looking up an element in the middle of the array, if the element is a peak, directly returning, otherwise, if the element on the left side is large, recursively processing the left half array, and if the element on the right side is large, processing the right half array until the peak is not searched. Additionally, when the nominal segmentation step distance is set, i.e., the minimum horizontal distance between adjacent peaks is determined, the smaller peaks are removed first until all remaining peaks are satisfied.
Error correction method
Because the set rated segmentation step length distance is based on the rated rotating speed of the fan blade, the rated rotating speed of a higher level is difficult to reach when the blade rotates generally, so the actual number of real segmentation points is often less than the number of segmentation points output in the previous step, and a wrong segmentation post-processing mechanism is added. When wrong division occurs, the value of the corresponding division point in the feature vector is not the position of the trough of the wave; the value of the miscut point is greater than the value of the integral segmentation point. Thus, with this feature, the misclassification determination threshold is set:
mean (Energy [ m ]) + std (Energy [ m ]), where mean represents the mean and std represents the standard deviation;
and removing the segmentation points with the values larger than the misclassification judgment threshold value to obtain the final segmentation point coordinates m'. Finally, the coordinate m' is converted back to the time index according to the conversion relation prop.
Examples of embodiment
Data was collected from a wind field idling fan and two sets of audio signals a and B were used for the experiment. Under the experimental conditions, the rated rotating speed of the blade tip of the fan is 8.5RPM (revolutions per minute), and the audio sampling frequency is 51.2 kHz. In order to verify the segmentation effect, the two fans are subjected to manual audio labeling, and the accurate segmentation point position is determined.
The original audio signal image of audio signal a is shown in fig. 1. The duration of the audio A is 65.53125 seconds, and an obvious regular wind sweeping form cannot be observed in the original audio; obtaining a time-frequency spectrum matrix of A after short-time Fourier transform, obtaining a power spectrum matrix of decibel (dB) unit by conversion according to a conversion method used in the scheme, wherein the dimension of the power spectrum matrix is (4097 multiplied by 3277) under experimental parameters (n _ fft is 8192, n _ overlap is 1024); and obtaining 0.019997329874885564 a conversion relation prop according to the audio time length and the time domain dimension of the power spectrum matrix, and 117.66276753906142 a rated segmentation step distance calculated according to the rated rotating speed and the conversion relation.
And calculating to obtain the coordinates of the upper limit and the lower limit of the longitudinal axis of the power spectrum matrix as (256, 32) according to the sampling frequency of 51.2kHz and the upper limit and the lower limit of the filtering of 800Hz/100 Hz. The mid-frequency energy characteristic of a after vertical summing is shown in fig. 2. It can be clearly seen in fig. 2 that each segment of the wave peak represents the periodic wind sweeping sound of one blade of the fan. Using a spike identification algorithm, finding all the trough positions in fig. 2 based on the calculated nominal segmentation step size is shown in fig. 3. The dotted line is a threshold line for correcting misclassification, if the value of a segmentation point is above the threshold line, the segmentation point is removed, and the misclassification point above the threshold line does not appear in the segmentation of the A. The x-type icon is all the division points m of the positioning.
Converting m to a time index (in seconds) m' through a conversion relation prop, wherein the coordinate of the algorithm division point of A is as follows:
[1.3198237717424472,4.95933780897162,8.538859856576137,11.158510070186145,15.277960024412572,18.677506103143116,21.31715364662801,25.2166329722307,28.03625648458956,30.935869316447967,34.55538602380226,37.5349881751602,40.634574305767465,44.114109703997556,46.87374122673177,49.993324687213914,52.952929508696975,55.93253166005493,58.67216585291425,61.35180805614891,64.31141287763198];
the coordinates of the manual accurate segmentation points of A are as follows:
[1.03,4.65,8.26,11.72,15.22,18.21,21.6,25.08,28.07,30.94,34.42,37.39,40.58,43.72,46.77,49.83,52.88,55.73,58.43,61.18,64.24];
the average error of the population is within 0.1 s. The effect of showing the segmentation points in the raw waveform diagram and the power spectrum diagram is shown in fig. 4 and 5.
Similarly, the original audio signal image of audio B is shown in fig. 6. The duration of the audio B is 112.53125 seconds, and an obvious regular wind sweeping form cannot be observed in the original audio; obtaining a time-frequency spectrum matrix of B after short-time Fourier transform, obtaining a power spectrum matrix of decibel (dB) unit by conversion according to a conversion method used in the scheme, wherein the dimension of the power spectrum matrix is (4097 multiplied by 5627) under experimental parameters (n _ fft is 8192, n _ overlap is 1024); and obtaining a conversion relation prop =0.019998444997334282 according to the audio time length and the time domain dimension of the power spectrum matrix, and calculating a rated segmentation step distance 117.6562066092752 according to the rated rotating speed and the conversion relation.
And calculating to obtain the coordinates of the upper limit and the lower limit of the longitudinal axis of the power spectrum matrix as (256, 32) according to the sampling frequency of 51.2kHz and the upper limit and the lower limit of the filtering of 800Hz/100 Hz. The mid-frequency energy characteristics of B after vertical summing are shown in fig. 7. It can be clearly seen in fig. 7 that each segment of the wave peak represents the periodic wind sweeping sound of one blade of the fan. Using a spike identification algorithm, finding all the trough positions in fig. 7 based on the calculated nominal segmentation step size is shown in fig. 8. The dotted line is a threshold line for correcting misclassification, if the value of a segmentation point is above the threshold line, the segmentation point is removed, and the misclassification point above the threshold line does not appear in the segmentation of B. The x-type icon is all the division points m of the positioning.
The algorithm for converting m to a time index (in seconds) m' through a conversion relation prop has the following coordinates of the division points:
[1.3198973698240626,4.179675004442865,6.659482184112316,9.33927381375511,12.039063888395237,14.598864848054026,17.378648702683492,20.098437222320953,22.578244401990403,25.158043806646525,28.01782144126533,30.39763639594811,33.19741869557491,36.077194775191046,38.65699417984717,41.05680757952728,44.03657588413009,46.61637528878621,49.33616380842367,51.79597254309579,54.45576572774125,56.855579127421365,59.7353552070375,62.5951328416563,65.11493691132043,67.77473009596588,70.45452172560867,73.17431024524613,75.8940987648836,78.61388728452106,81.27368046916652,83.87347831881998,86.3332870534921,89.21306313310824,91.73286720277235,94.51265105740181,97.23243957703927,99.95222809667673,102.6120212813222,105.13182535098632,108.05159832059712,110.47141016527458];
the coordinates of the manual accurate segmentation points of the B are as follows:
[1.3,3.84,6.44,9.34,12.04,14.6,17.22,19.99,22.34,25.13,27.71,30.45,33.14,35.84,38.38,41.14,43.72,46.3,49.09,51.67,54.39,57.04,59.8,62.34,65.1,67.68,70.37,73.09,75.72,78.32,81.06,83.74,86.21,89.21,91.65,94.42,97.11,99.74,102.34,105.1,107.87,110.3];
the average error of the population is also around 0.1 s. The effect of displaying the segmentation points in the original oscillogram and the power spectrum is shown in fig. 9 and 10
Although the embodiments of the present invention have been described above, the above descriptions are only for the convenience of understanding the present invention, and are not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (4)

1. An audio segmentation method based on signal energy spike identification, comprising the steps of:
(1) carrying out short-time Fourier transform on an input audio signal, and converting the input audio signal into a power spectrum matrix;
(2) extracting intermediate frequency energy characteristics based on a power spectrum;
(3) carrying out peak identification on the extracted intermediate frequency energy characteristics;
(4) performing error division correction on the signal subjected to peak identification;
(5) outputting the time coordinate of the division point of the audio signal;
the method for identifying the peak of the extracted intermediate frequency energy features comprises the following steps:
(1) determining the rated rotating speed rs of the rotation of the fan blade and the time length t of the input audio;
(2) calculating to obtain a conversion relation prop of the characteristic index and the time index according to the duration t and the length k of the Energy characteristic;
Figure FDA0003391529650000011
(3) obtaining the rated segmentation step length distance of the feature index according to the rated rotating speed rs and the prop;
Figure FDA0003391529650000012
(4) and searching the feature vectors by using a binary search method until the peak is not searched.
2. The audio segmentation method according to claim 1, wherein the method of extracting energy features comprises the steps of:
(1) carrying out short-time Fourier transform on the original audio signal, and converting the original audio signal into a time-frequency domain matrix M0
(2) The time-frequency domain matrix M0Into a spectral matrix M expressed in decibels1
(3) Determining the frequency range of the audio signal as principal element, and aligning the spectrogram matrix M1Performing band-pass filtering to filter out low-frequency environmental noise and high-frequency abnormal sound;
(4) pair spectrogram matrix M1Cutting according to frequency axis, and reserving sub-power spectrum matrix M mainly based on audio signal2
(5) Will M2The column-wise summation of (a) to obtain the sum of each time-domain power spectral vector.
3. The audio segmentation method according to claim 1 or 2, wherein the method of error correction of the spike identified signal comprises the steps of:
(1) setting a wrong score judgment threshold value;
(2) removing the division points with the values larger than the misclassification judgment threshold value to obtain final division point coordinates m';
(3) the coordinate m' is converted back to the time index according to the conversion relation prop.
4. The audio segmentation method of claim 1 or 2 wherein the band-pass filtering is performed by selecting the upper cut-off frequency limit in the matrix M1Coordinate index of vertical axis and selected cutoff frequency lower limit in matrix M1The vertical axis index of the vertical axis is determined by the following formula:
Figure FDA0003391529650000021
wherein UpperBound represents the upper limit of the selected cutoff frequency in the matrix M1Coordinate index of vertical axis, LowerBound for selected cutoff frequency lower bound in matrix M1Vertical axis index of vertical axis, sr is sampling frequency of audio, Freqlow,FrequpIs the frequency range in which the audio signal is the principal element.
CN201911121998.XA 2019-11-15 2019-11-15 Audio frequency segmentation method based on signal energy peak identification Active CN110838302B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911121998.XA CN110838302B (en) 2019-11-15 2019-11-15 Audio frequency segmentation method based on signal energy peak identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911121998.XA CN110838302B (en) 2019-11-15 2019-11-15 Audio frequency segmentation method based on signal energy peak identification

Publications (2)

Publication Number Publication Date
CN110838302A CN110838302A (en) 2020-02-25
CN110838302B true CN110838302B (en) 2022-02-11

Family

ID=69576610

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911121998.XA Active CN110838302B (en) 2019-11-15 2019-11-15 Audio frequency segmentation method based on signal energy peak identification

Country Status (1)

Country Link
CN (1) CN110838302B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112727703B (en) * 2020-12-15 2022-02-11 北京天泽智云科技有限公司 Fan blade protective film damage monitoring method and system based on audio signal
CN112727704B (en) * 2020-12-15 2021-11-30 北京天泽智云科技有限公司 Method and system for monitoring corrosion of leading edge of blade
CN114764570A (en) * 2020-12-30 2022-07-19 北京金风科创风电设备有限公司 Blade fault diagnosis method, device and system and storage medium
CN112927715A (en) * 2021-02-26 2021-06-08 腾讯音乐娱乐科技(深圳)有限公司 Audio processing method and device and computer readable storage medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7895036B2 (en) * 2003-02-21 2011-02-22 Qnx Software Systems Co. System for suppressing wind noise
JP2010244602A (en) * 2009-04-03 2010-10-28 Sony Corp Signal processing device, method, and program
EP2306108B1 (en) * 2009-09-25 2013-11-20 Hans Östberg A ventilating arrangement
US20130332159A1 (en) * 2012-06-08 2013-12-12 Apple Inc. Using fan throttling to enhance dictation accuracy
US9347432B2 (en) * 2014-07-31 2016-05-24 General Electric Company System and method for enhanced operation of wind parks
JP6589131B2 (en) * 2015-02-23 2019-10-16 パナソニックIpマネジメント株式会社 Blower
CN106593781A (en) * 2016-11-29 2017-04-26 上海电机学院 Wind driven generator fault detecting system and method based on Android platform
CN109214318B (en) * 2018-08-22 2021-10-22 北京天泽智云科技有限公司 Method for searching weak peak of unsteady time sequence
CN110107461B (en) * 2019-05-22 2021-06-25 华润新能源(太原)有限公司 Fan fault early warning method, device, equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN110838302A (en) 2020-02-25

Similar Documents

Publication Publication Date Title
CN110838302B (en) Audio frequency segmentation method based on signal energy peak identification
JP4310371B2 (en) Sound determination device, sound detection device, and sound determination method
JP4547042B2 (en) Sound determination device, sound detection device, and sound determination method
CN108896878B (en) Partial discharge detection method based on ultrasonic waves
JP4545233B2 (en) Sound determination device, sound determination method, and sound determination program
US20100303254A1 (en) Audio source direction detecting device
JPS63259696A (en) Voice pre-processing method and apparatus
US9646592B2 (en) Audio signal analysis
CN110890087A (en) Voice recognition method and device based on cosine similarity
CN108962285A (en) A kind of sound end detecting method dividing subband based on human ear masking effect
CN115467787A (en) Motor state detection system and method based on audio analysis
CN114694640A (en) Abnormal sound extraction and identification method and device based on audio frequency spectrogram
CN102438191B (en) For the method that acoustic signal is followed the tracks of
CN112782421A (en) Audio-based rotating speed identification method
CN116312623B (en) Whale signal overlapping component direction ridge line prediction tracking method and system
CN114242085A (en) Fault diagnosis method and device for rotating equipment
CN111611686A (en) Detection method for communication signal time-frequency domain
CN111755025A (en) State detection method, device and equipment based on audio features
Ziólko et al. Phoneme segmentation of speech
CN107548007B (en) Detection method and device of audio signal acquisition equipment
Tahliramani et al. Performance analysis of speaker identification system with and without spoofing attack of voice conversion
CN115376548B (en) Audio signal voiced segment endpoint detection method and system
Sattar et al. Automatic event detection for noisy hydrophone data using relevance features
CN110189765B (en) Speech feature estimation method based on spectrum shape
JP2639353B2 (en) Acoustic signal detection device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant