CN115002642A

CN115002642A - Feature extraction method for abnormal sound of loudspeaker based on combination of auditory masking and SVD-MRMR

Info

Publication number: CN115002642A
Application number: CN202210561080.2A
Authority: CN
Inventors: 徐翠锋; 詹锦成; 苏海涛; 吴景; 陈家钦; 胡鸿志; 许金; 郭庆
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2022-05-23
Filing date: 2022-05-23
Publication date: 2022-09-02

Abstract

The invention relates to the technical field of loudspeaker quality detection, in particular to a loudspeaker abnormal sound feature extraction method based on auditory masking and SVD-MRMR. Firstly, analyzing a loudspeaker response signal by simulating artificial listening to collected sound, and calculating to obtain an auditory perception masking spectrum of the loudspeaker; then SVD decomposition is carried out on the auditory perception masking spectrum to obtain a singular value characteristic sequence; and then, the optimal features in the singular value feature sequence are extracted through an MRMR algorithm, redundant information is eliminated, the features after dimension reduction are obtained, the redundant information is eliminated, the problem that the dimensionality of a classification model is too high due to too many features in the classification process and the efficiency of the model is influenced is avoided, and the method is more favorable for unified measurement and judgment.

Description

Feature extraction method for abnormal sound of loudspeaker based on combination of auditory masking and SVD-MRMR

Technical Field

The invention relates to the technical field of loudspeaker quality detection, in particular to a loudspeaker abnormal sound feature extraction method based on auditory masking and SVD-MRMR.

Background

The loudspeaker and the system thereof are key elements of the sound equipment, and the quality of the loudspeaker directly influences the tone quality of the whole pronunciation system; however, in the production process, the loudspeaker sounds abnormally due to bottoming of the voice coil, breakage of the diaphragm, excessive impedance and the like. Therefore, it is very practical to detect and classify abnormal sounds of the produced speaker.

The traditional loudspeaker abnormal sound detection method is characterized in that a listener detects abnormal sound through listening to the ears, listening results are influenced by the complexity of technology, the physical and mental states of workers and the working fatigue degree, the subjective difference of the listening results is large, the detection precision and stability of a loudspeaker are difficult to further improve, hearing damage is easy to cause, and the production efficiency and the product quality of enterprises are difficult to further improve.

The speaker abnormal sound can be detected by a signal processing method, for example, by detecting a time-frequency diagram obtained by performing short-time Fourier transform or waveform transform, decomposing a speaker response signal, and then detecting a time-frequency characteristic, for example, by using an LMD energy entropy (Chenxiu-Lou. speaker abnormal sound fault diagnosis research [ D ] Guilin electronic technology university, 2020.) and a VMD energy entropy (Zhou Jing, Yan Ting. speaker abnormal sound classification [ J ] acoustics, 2021,46(02): 263-plus 270.) and the like. Although the method can have good effect on the recognition and classification of the abnormal sounds of the loudspeaker, the auditory perception characteristics of human ears are not concerned, and some methods extract a psychoacoustic energy mean value (Guo Qing, who Jie happy, Su Hao, Wang King building.) and a loudspeaker abnormal sound detection algorithm [ J ] based on psychoacoustic and a support vector machine (Nature science edition), 2020,46(02):275 plus 281.) as a characteristic to classify the abnormal sounds of the loudspeaker, but the classification result is poor due to overhigh dimension.

Disclosure of Invention

In order to solve the above problems, the present invention provides a method for extracting the characteristic of the speaker abnormal sound based on auditory masking in combination with SVD-MRMR, the specific technical solution is as follows:

a method for extracting the characteristics of speaker abnormal sounds based on auditory masking and SVD-MRMR comprises the following steps:

step S1: using a microphone to collect the sound signal of a loudspeaker, and setting the sampling frequency of the microphone as f _s In Hz; selecting a logarithmic sweep frequency signal of 20-20000 Hz as an excitation signal, wherein the duration of a sound signal is T, and the unit is s; calculating an auditory perceptual masking spectrum E [ k, n ] of the loudspeaker response signal x (t) according to the ITU-RBS.1387-1 psychoacoustic model]K is the number of subbands, n is the number of frames for framing the duration T of the sound signal according to the frame length of 2048 and the overlapping rate of 50%,

step S2, masking each sub-band auditory perception spectrum E [ k, n ] of the loudspeaker response signal]Constructing a k × n order matrix E with the rank of r, (r ═ min (k, n)) and then carrying out SVD on the matrix E to obtain singular value eigenvectors, and forming the eigenvectors obtained from each sample into a feature set X of the psychoacoustic energy spectrum of the loudspeaker _h×r Wherein h is the number of speaker samples and r is the feature dimension;

step S3 from data set X by MRMR algorithm _h×r And selecting the optimal characteristic subset Z to eliminate redundant information.

Preferably, the step S1 specifically includes the following steps:

step S11, the response signal x (t) of the speaker is framed, and each frame signal is obtainedx _n [t,n]Comprises the following steps:

wherein t is the number of time-domain sampling points of each frame; n is a radical of hydrogen _F The number of samples for each frame;

step S12, responding the speaker response signal x after frame division _n [t,n]Using Hanning window filtering and fast Fourier transform to convert signal from time domain to frequency domain to obtain frequency spectrum F _f [k _f ,n]:

F _f [k _f ,n]＝fft(h[t]·x _n [t,n])；

Wherein: k is a radical of _f The number of frequency domain points; h [ t ]]Is a Hanning window function;

step S13, simulating auditory characteristics of middle and outer ear of human, and obtaining frequency spectrum F _f [k _f ,n]Carrying out frequency domain weighting to obtain an output F after the middle and outer ear frequencies are weighted _e [k _f ,n]：

Wherein, W [ f ] is the frequency response function of middle and outer ear;

step S14, uniformly dividing a Bark scale into 4 frequency bands, wherein the frequency bandwidth is Δ z 1/4, and the Bark domain is uniformly divided into 109 non-overlapping critical frequency sub-bands in the 20Hz to 18kHz hearing range,

k

1, 2. Adding the signal energy values in each sub-band to obtain an energy output spectrum P in one frequency sub-band _e [k,n]Wherein k represents the kth group of critical frequency subbands;

wherein the energy distribution of each sub-band is calculated by:

wherein f is _u [k]、f _l [k]Respectively the low frequency and the high frequency of the kth group of critical frequency sub-bands;

energy output spectrum P within one frequency sub-band _e [k,n]Calculated from the following formula:

P _e [k,n]＝max(∑U[k,k _f ](F _e [k _f ,n]) ² ,10 ^-12 )；

step S15, outputting spectrum P to energy _e [k,n]Adding a piece of internal noise P _noise [k]Adding to obtain P _p [k,n]：

P _p [k,n]＝P _e [k,n]+P _noise [k]；

Where fc is the center frequency of the frequency sub-band;

step S16: simulating a frequency domain masking effect by using a frequency spread function in a Terhardt psychoacoustic model; the energy of each sub-band is distributed to the whole Bark domain space through a spreading function, then the energy of the ith sub-band is the weighted sum of the contributions of the energy of each sub-band in the sub-band, and the spreading function is marked as S _dB (i, k, n) representing the contribution of the kth subband energy to the ith subband, as follows:

calculating the energy distribution E of each sub-band after frequency domain expansion _s [k,n]:

In the formula B _s (i) Is a normalization factor;

step S117, performing time domain expansion, and simulating a time domain masking effect by using a first-order low-pass filter:

E _f [k,n]＝a·E _f [k,n-1]+(1-a)·E _s [k,n]；

wherein, the parameter a is determined by a time coefficient and is related to the center frequency of each sub-band; thus, the auditory perception masking spectrum E [ k, n ] of each sub-band after time domain masking can be obtained:

E[k,n]＝max(E _f [k,n],E _s [k,n])。

preferably, the relation between the parameter a and the time coefficient τ is as follows:

wherein: tau is ₁₀₀ 、τ _min Is constant, generally τ ₁₀₀ ＝0.03，τ _min ＝0.008，f _c [k]Is the center frequency of each sub-band.

Preferably, the expression of performing SVD decomposition on the matrix E in step S2 is as follows:

wherein S is a matrix of k, V is a matrix of n, S and V are unitary matrices, and Σ is diag (σ) ₁ ,σ ₂ ,...,σ _n ) (ii) a Singular values of matrix E are σ _i I 1,2, r and σ ₁ ≥σ ₂ ≥...≥σ _r Not less than 0; the loudspeaker response signal feature vector s ═ σ ═ s ₁ ,σ ₂ ,...,σ _r ) (ii) a Forming X by the characteristic vector s obtained by each sample _h×r ＝{X ₁ ,X ₂ ,...X _r Is a set of features of a psychoacoustic energy map of the loudspeaker, where X _i (

i

1, 2.. r.) is the singular value σ of the auditory perceptual masking spectrum E of the response signal from each loudspeaker _i (

i

1, 2.. r.) and then assigning the labels corresponding to the loudspeaker response signals of four different states of normal, circle collision, air leakage and small sound as 1,2, 3 and 4 respectively, assuming that Y is _h×1 ＝{Y ₁ ,Y ₂ ,...Y _h And h is the number of speaker samples, and r is the feature dimension.

Preferably, the step S3 specifically includes the following steps:

step S31, set X of characteristics _h×r Normalization, feature set X is calculated according to the MRMR method _h×r Each feature X in _i R of importance _x The formula is as follows:

wherein L is _x For each feature X _i Correlation with class Y, L _x From each feature X _i And its class Y mutual information decision, i.e. L _x ＝I(X _i ,Y)；

N _x For each feature X _i With other features X in the set Z _l Redundancy between, where i ≠ l, N _x Determined by the mutual information between each feature and others:

step S32: according to the degree of importance R of each feature _x And (3) sorting in a descending order, inputting the features into the classifier one by one for evaluation, selecting the feature subset with the highest classification precision, if the mth feature subset is input so that the precision of the classifier is the highest, indicating that the first m features are selected, and selecting the optimal feature from the features to form the feature subset Z by the MRMR criterion.

The invention has the beneficial effects that: the conventional speaker abnormal sound detection method is completed, and the type of the abnormal sound of the speaker is difficult to judge by a stable and consistent standard. The invention simulates artificial listening to analyze the loudspeaker response signals based on the ITU-RBS.1387-1 psychoacoustic model, and eliminates the subjective feeling influence of listening to the artificial listening. The invention provides a method for performing Singular Value Decomposition (SVD) on a loudspeaker response signal after being processed by a psychoacoustic model to obtain a psychoacoustic energy singular value as a feature set to be selected, so that the feature extraction of fault information is facilitated. The invention selects the optimal characteristic from the characteristic set by utilizing a maximum correlation minimum redundancy (MRMR) algorithm and eliminates redundant information, thereby being more beneficial to unified measurement and judgment.

Drawings

In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.

FIG. 1 is a flow chart of a detection implementation of the present invention;

FIG. 2 is a time domain plot of a response signal for a normal loudspeaker;

FIG. 3 is a time domain diagram of a response signal of a circle-of-impact speaker;

FIG. 4 is a time domain diagram of the response signal of a leaky speaker;

FIG. 5 is a time domain plot of a response signal for a woofer;

FIG. 6 is a psycho-acoustically masked energy spectrum of a response signal for a normal speaker;

FIG. 7 is a psycho-acoustically masked energy spectrum of a response signal of a bump ring speaker;

FIG. 8 is a psycho-acoustically masked energy spectrum of a response signal for a leaky speaker;

FIG. 9 is a psycho-acoustically masked energy spectrum of a response signal of a woofer;

FIG. 10 illustrates the singular psychoacoustic energy values of the loudspeaker signals in four states after SVD;

FIG. 11 test set classification accuracy curves for different feature quantities.

FIG. 12 is a visualization result of the feature extraction of the method of the present invention

FIG. 13 is a visualization result of features extracted by the LMD energy entropy method

FIG. 14 is a visual result of features extracted by the VMD energy entropy method

FIG. 15 is a visualization result of features extracted by the psychoacoustic energy averaging method

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As shown in fig. 1, the embodiment of the invention provides a method for extracting characteristics of speaker abnormal sounds based on auditory masking in combination with SVD-MRMR, comprising the following steps:

step S1: using a microphone to collect the sound signal of a loudspeaker, and setting the sampling frequency of the microphone as f _s In Hz; selecting a logarithmic sweep frequency signal of 20-20000 Hz as an excitation signal, wherein the duration of a sound signal is T, and the unit is s; calculating an auditory perceptual masking spectrum E [ k, n ] of the loudspeaker response signal x (t) according to the ITU-RBS.1387-1 psychoacoustic model]K is the number of sub-bands, n is the number of frames, wherein k is the number of frames, a Bark scale is uniformly divided into 4 frequency bands, the frequency bandwidth is 1/4, a Bark domain is uniformly divided into 109 non-overlapping critical frequency sub-bands in the hearing range of 20Hz to 18kHz, k is 1,2, 109, n is the number of frames for framing the duration T of the sound signal according to the frame length of 2048 and the overlapping rate of 50 percent,

the test object of this experiment was a 4cm diameter loudspeaker unit with a power of 0.5W and an impedance of 8 Ω. In the embodiment of the invention, a logarithmic frequency sweep signal of 20-20000 Hz is selected as an excitation signal, and a microphone is Bruel, Denmark&

The model 4966-H-041 microphone of company Type, the main performance parameters of the microphone of model 4966 are as follows: the dynamic range is 14.6-144 dB, the frequency range is 5 Hz-20000 Hz, the inherent noise is 144.6dB A, and the lower limit frequency is-3 dB: 3Hz, a sensitivity of 50mV/Pa and a sampling rate of 65536 Hz. The experimental data acquisition system adopts Denmark B&And the K acoustic and vibration acquisition and analysis system is used as an acquisition system and stores the acquired loudspeaker response signals.

The loudspeakers under 4 states of normal, small sound, circle collision and air leakage are mainly tested, wherein 50 loudspeakers of the normal, small sound, circle collision and air leakage are respectively tested, 200 loudspeakers to be tested are totally tested, 5 repeated tests are carried out on each loudspeaker, 1000 sample sets are formed in total, time domain graphs of various loudspeaker response signals are acquired and shown in the figures 2-5, wherein the figure 2 is the response signal of the normal loudspeaker, the figure 3 is the response signal of the circle collision loudspeaker, the figure 4 is the response signal of the air leakage loudspeaker, and the figure 5 is the response signal of the small sound loudspeaker.

The auditory perceptual masking spectrum E k, n of the loudspeaker response signal is calculated for the time domain signals in fig. 2-5 according to the ITU-r bs.1387-1 psychoacoustic model, as follows:

the method specifically comprises the following steps:

step S11, performing framing processing on the response signal x (t) of the speaker, so as to obtain each frame signal x _n [t,n]Comprises the following steps:

step S12, responding to the loudspeaker response signal x after the frame division _n [t,n]Using hanning window filtering and Fast Fourier Transform (FFT), the signal is transformed from the time domain to the frequency domain to obtain a frequency spectrum F _f [k _f ,n]：

F _f [k _f ,n]＝fft(h[t]·x _n [t,n])；

Wherein, W [ f ] is the frequency response function of the middle and outer ear;

step S14, Bark scale critical sub-band mapping, scale transformation is nonlinear transformation processing of sound frequency by simulating human body intra-cochlear basement membrane resonance Bark property, linear frequency domain can be mapped to psychoacoustic frequency domain (Bark domain), and after signal transformation to Bark domain, response signal is given high resolution nonlinear stretching, which has obvious amplification effect, thereby more meticulously and intuitively depicting high and low frequency components in signal.

Uniformly dividing a Bark scale into 4 frequency bands, wherein the frequency bandwidth is 1/4 equal to delta z, and uniformly dividing a Bark domain into 109 non-overlapping critical frequency sub-bands in the hearing range of 20Hz to 18 kHz; adding the signal energy values in each sub-band to obtain an energy output spectrum P in the sub-band under one frequency _e [k,n]Wherein k represents the kth group of critical frequency subbands;

wherein the energy distribution of each sub-band is calculated by:

P _e [k,n]＝max(∑U[k,k _f ](F _e [k _f ,n]) ² ,10 ^-12 )；

in step S15, the energy output spectrum P is given because factors such as blood and heartbeat of a person may affect the hearing of the person to some extent _e [k,n]Adding a piece of internal noise P _noise [k]Adding to obtain P _p [k,n]：

P _p [k,n]＝P _e [k,n]+P _noise [k]；

Wherein f is _c The center frequency of a frequency sub-band.

Step S16: simulating a frequency domain masking effect by using a frequency spread function in a Terhardt psychoacoustic model; distributing the energy of each sub-band to the whole Bark domain space through a spreading function, wherein the energy of the ith sub-band is the weighted sum of the contribution of the energy of each sub-band in the sub-band; the spreading function is denoted S _dB (i, k, n) representing the contribution of the kth subband energy to the ith subband, as follows:

In the formula B _s (i) Is a normalization factor;

E _f [k,n]＝a·E _f [k,n-1]+(1-a)·E _s [k,n]；

wherein, the parameter a is determined by a time coefficient and is related to the center frequency of each sub-band; thus, after time-domain masking, the auditory perception masking spectrum E [ k, n ] of each sub-band:

E[k,n]＝max(E _f [k,n],E _s [k,n])。

the relation between the parameter a and the time coefficient tau is as follows:

wherein: tau. ₁₀₀ ，τ _min Is constant, generally τ ₁₀₀ ＝0.03，τ _min ＝0.008，f _c [k]Is the center frequency of each sub-band.

The resulting psychoacoustic masking energy spectrum is shown in fig. 6-9. Fig. 6 is a psycho-acoustically masked energy spectrum of a normal speaker. Fig. 7 is a psychoacoustic masking energy spectrum for a pop-up speaker, fig. 8 is a psychoacoustic masking energy spectrum for a flat speaker, and fig. 9 is a psychoacoustic masking energy spectrum for a small-pitched speaker. 6-9 can analyze that the loudspeaker response signals of different fault types have the largest energy ratio in the interval of 49-64 # critical frequency sub-bands, wherein the corresponding center frequency is 2026.266-3519.344 Hz, which accords with the most sensitive characteristic of human ears to 1000-3000 Hz sound, and the energy amplitudes of the response signals of different fault types in the interval are different, so that a sharp human auditory perception system can be simulated by adopting an auditory perception masking spectrum, the signals of the low-frequency part and the high-frequency part of the collected loudspeaker response signals are inhibited, the signals of the middle-frequency part are enhanced, and the analysis result is more accordant with the auditory perception of human ears.

Step S2, constructing each sub-band auditory perception masking spectrum E [ k, n ] of the loudspeaker response signal into a k multiplied by n order matrix E, wherein the rank is r, (r is min (k, n)), and then carrying out SVD decomposition on the matrix E to obtain a singular value characteristic sequence; the expression for the SVD decomposition of matrix E is as follows:

wherein S is a matrix of k, V is a matrix of n, S and V are unitary matrices, and Σ is diag (σ) ₁ ,σ ₂ ,...,σ _r ) (ii) a Singular values of matrix E are σ _i 1,2, r, and σ ₁ ≥σ ₂ ≥...≥σ _r Not less than 0; . It contains the auditory perception masking spectrum E k, f]So that the information is used to analyze the type of abnormal sound for classification, the characteristic vector s of the loudspeaker response signal is (σ ═ c) ₁ ,σ ₂ ,...,σ _r ). Fig. 10 shows singular values of psychoacoustic energy of speaker signals in different states, and it can be seen from fig. 10 that for speakers in different states, arrangement of the singular values is in descending order, and arrangement of feature numbers is reflected regardless of the states of the speakers, thereby showing the advantage of SVD stability.

Forming X by the characteristic vector s obtained by each sample _h×r ＝{X ₁ ,X ₂ ,...X _r Is a set of features of a psychoacoustic energy map of the loudspeaker, where X _i (i ═ 1, 2.. times, r) are the singular values σ of the auditory perceptual masking spectrum E of the response signal from each loudspeaker _i (

i

Step S3 from data set X by MRMR algorithm _h×r ＝{X ₁ ,X ₂ ,...X _r The optimal feature subset Z is selected, eliminating redundant information. The method specifically comprises the following steps:

wherein L is _x For each feature X _i Correlation with class Y, L _x From each feature X _i And its category Y mutual information decision:

L _x ＝I(X _i ,Y)；

N _x for each feature X _i With other features X in the set Z _l Redundancy therebetween, whichWhere i ≠ l, N _x Determined by the mutual information between each feature and others:

two continuous random variables are assumed to be x and y, the probability density functions of the two continuous random variables are p (x) and p (y), and the joint probability density function is p (x, y); mutual information between x, y:

The method specifically comprises the following steps: normalizing the feature set, and calculating R of each feature according to MRMR method _x Then, the features are sorted in descending order according to their importance, and the features are input into a classifier one by one for classification, and the change of the classification accuracy is shown in fig. 11. As can be seen from fig. 11, the accuracy of feature set classification gradually increases with the number of feature values, but basically remains the same or even decreases when the accuracy reaches the maximum value. This shows that when the accuracy reaches the maximum value, the increase of the number of features has little effect on the improvement of the classification accuracy and also has an influence on the classification effect. The results illustrate the importance and necessity of feature selection in feature identification and classification. As can be seen from fig. 11, when the number of features is 10, the accuracy is highest, and the optimal classification effect is achieved, so the first 10 feature quantities are selected as the optimal feature subset.

To more pictorially illustrate the algorithm, the algorithm is then visually analyzed using a t-distribution domain embedding algorithm. the t-distributed stored probabilistic Neighbor Embedding (t-SNE) is a technology integrating dimensionality reduction and visualization, and converts high-dimensional Euclidean distances between data points into conditional probabilities representing similarity. After the visualization algorithm maps the high-dimensional data to the low-dimensional data, points that are separated from each other in the high-dimensional space remain unchanged in the low-dimensional space. After learning converges, t-SNE can project the dataset into a two-dimensional or three-dimensional space. The feature subset Z selected by the MRMR algorithm is visualized by t-SNE, and the result is shown in fig. 12.

In order to better verify the accuracy of the speaker abnormal sound classification by the feature extraction method, LMD energy entropy (Chenxiunong; study on speaker abnormal sound fault diagnosis based on LMD and LSSVM [ D ]. Guilin electronic technology university, 2020.), VMD energy entropy (Zhongjing Lei, Yangting; speaker abnormal sound classification using variation modal decomposition and energy entropy [ J ]. acoustical declaration, 2021,46(02):263-270.) and psychoacoustic energy mean value (Guo, what Jie happy, Suhaitao, Wang Kwang; speaker abnormal sound detection algorithm based on psychoacoustics and support vector machine [ J ]. Donghua university declaration (natural science edition), 2020,46(02):275-281.), are respectively extracted, then the data set is normalized and visualized by t-SNE, and the results are shown as 13-15, fig. 13 is a visualization result of features extracted by the LMD energy entropy method, fig. 14 is a visualization result of features extracted by the VMD energy entropy method, and fig. 15 is a visualization result of features extracted by the psychoacoustic energy mean value.

As can be seen from the analysis of fig. 12 to fig. 15, the method for extracting the abnormal sound feature of the speaker according to the present invention has a high recognition effect on normal, circle collision, air leakage, and small sounds, and especially, the feature distributions among normal, circle collision, and air leakage are completely separated and are respectively gathered in the corresponding areas. Only the small sound is completely separated from the characteristics of the normal, circle collision and air leakage in the characteristics extracted by the LMD energy entropy method, and the characteristic distribution of the normal, circle collision and air leakage has the tendency of gradual separation but still gathers together; the feature distribution of circle collision and small sound in the features extracted by the VMD energy entropy method has a tendency of gradually separating from other features, but the features are not gathered together, the dispersion degree is larger, the feature extraction of the faults such as air leakage is particularly poor, and the air leakage features are known to be mixed with the feature distribution of normal circle collision and small sound in the graph; only the collision circle is completely separated from the normal, air leakage and small sound feature distribution in the features extracted from the psychoacoustic energy mean value, and the normal, air leakage and small sound feature distribution has a tendency of being gradually separated but still mixed together. Therefore, the characteristic extraction method provided by the invention is superior to the characteristic extraction method of the abnormal sound of the existing loudspeaker, in particular to the classification of the normal loudspeaker.

Firstly, analyzing a loudspeaker response signal by simulating artificial listening to collected sound, and calculating to obtain a hearing perception masking spectrum of the loudspeaker; then SVD decomposition is carried out on the auditory perception masking spectrum to obtain a singular value characteristic sequence; and then, the optimal features in the singular value feature sequence are extracted through an MRMR algorithm, redundant information is eliminated, the features after dimension reduction are obtained, the redundant information is eliminated, the problem that the dimensionality of a classification model is too high due to too many features in the classification process and the efficiency of the model is influenced is avoided, and the method is more favorable for unified measurement and judgment.

Those of ordinary skill in the art will appreciate that the elements of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the components of the examples have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present application, it should be understood that the division of the unit is only one division of logical functions, and other division manners may be used in actual implementation, for example, multiple units may be combined into one unit, one unit may be split into multiple units, or some features may be omitted.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims

1. A method for extracting the characteristics of speaker abnormal sounds based on auditory masking and SVD-MRMR is characterized by comprising the following steps:

step S1: using a microphone to collect a sound signal of a loudspeaker, and setting the sampling frequency of the microphone as f _s In Hz; selecting a logarithmic sweep frequency signal of 20-20000 Hz as an excitation signal, wherein the duration of a sound signal is T, and the unit is s; calculating an auditory perceptual masking spectrum E [ k, n ] of the loudspeaker response signal x (t) according to the ITU-RBS.1387-1 psychoacoustic model]K is the number of subbands, n is the number of frames for framing the duration T of the sound signal according to the frame length of 2048 and the overlapping rate of 50%,

step S2, masking each sub-band auditory perception spectrum E [ k, n ] of the loudspeaker response signal]Constructing a k multiplied by n order matrix E with the rank of r, r being min (k, n), then carrying out SVD on the matrix E to obtain singular value eigenvectors, and enabling the eigenvectors obtained by each sample to form a feature set X of the speaker psychoacoustic energy spectrum _h×r Wherein h is the number of speaker samples and r is the feature dimension;

step S3 from the data set X by the MRMR algorithm _h×n And selecting the optimal characteristic subset Z to eliminate redundant information.

2. The method for extracting the feature of the abnormal sound of the speaker based on the auditory masking in combination with the SVD-MRMR as claimed in claim 1, wherein the step S1 specifically comprises the following steps:

step S11, the response signal x (t) of the speaker is framed, and each frame signal x _n [t,n]Comprises the following steps:

wherein t is the number of time-domain sampling points of each frame; n is a radical of _F The number of samples for each frame;

step S12, responding to the loudspeaker response signal x after the frame division _n [t,n]Using Hanning window filtering and fast Fourier transform to convert signal from time domain to frequency domain to obtain frequency spectrum F _f [k _f ,n]:

F _f [k _f ,n]＝fft(h[t]·x _n [t,n])；

step S13, simulating auditory characteristics of middle and outer ear of human, and obtaining frequency spectrum F _f [k _f ,n]Frequency domain weighting is carried out to obtain an output F after the middle and outer ear frequency weighting _e [k _f ,n]：

Wherein, W [ f ] is the frequency response function of middle and outer ear;

step S14, uniformly dividing a Bark scale into 4 frequency bands, wherein the frequency bandwidth is Δ z 1/4, and uniformly dividing a Bark domain into 109 non-overlapping critical frequency sub-bands in the 20 Hz-18 kHz hearing range, wherein k is 1, 2. Adding the signal energy values in each sub-band to obtain an energy output spectrum P in one frequency sub-band _e [k,n]Wherein k represents the kth group of critical frequency subbands;

wherein the energy distribution of each sub-band is calculated by:

P _e [k,n]＝max(∑U[k,k _f ](F _e [k _f ,n]) ² ,10 ^-12 )；

P _p [k,n]＝P _e [k,n]+P _noise [k]；

Where fc is the center frequency of the frequency sub-band;

step S16: applying a frequency spreading function in a Terhardt psychoacoustic model to simulate a frequency domain masking effect; the energy of each sub-band is distributed to the whole Bark domain space through a spreading function, then the energy of the ith sub-band can be expressed as the weighted sum of the energy of each sub-band and the contribution of the sub-band, and the spreading function is marked as S _dB (i, k, n) representing the component of the contribution of the kth subband energy to the ith subband, as follows:

In the formula B _s (i) Is a normalization factor;

E _f [k,n]＝a·E _f [k,n-1]+(1-a)·E _s [k,n]；

E[k,n]＝max(E _f [k,n],E _s [k,n])。

3. the method for extracting the characteristics of the abnormal sounds of the loudspeaker based on the auditory masking and SVD-MRMR as claimed in claim 2, wherein the relation between the parameter a and the time coefficient τ is as follows:

4. The method for extracting the feature of the abnormal sound of the speaker based on the auditory masking in combination with the SVD-MRMR as claimed in claim 1, wherein the expression of the SVD decomposition of the matrix E in the step S2 is as follows:

wherein S is a matrix of k, V is a matrix of n, S and V are unitary matrices, and Σ is diag (σ) ₁ ,σ ₂ ,...,σ _r ) (ii) a Singular values of matrix E are σ _i 1,2, r, and σ ₁ ≥σ ₂ ≥...≥σ _r Not less than 0; the loudspeaker response signal feature vector s ═ σ ═ s ₁ ,σ ₂ ,...,σ _r ) (ii) a Forming X by the characteristic vector s obtained by each sample _h×r ＝{X ₁ ,X ₂ ,...X _r Is a set of features of a psychoacoustic energy map of the loudspeaker, where X _i Is the singular value σ of the auditory perceptual masking spectrum E of each loudspeaker response signal _i Then labels corresponding to the loudspeaker response signals in four different states of normal, circle collision, air leakage and small sound are respectively assigned as 1,2, 3 and 4, and Y is assumed to be _h×1 ＝{Y ₁ ,Y ₂ ,...Y _h And h is the number of speaker samples, and r is the feature dimension.

5. The method for extracting the feature of the abnormal sound of the speaker based on the auditory masking in combination with the SVD-MRMR as claimed in claim 1, wherein the step S3 specifically comprises the following steps: