CN115002642A - Feature extraction method for abnormal sound of loudspeaker based on combination of auditory masking and SVD-MRMR - Google Patents
Feature extraction method for abnormal sound of loudspeaker based on combination of auditory masking and SVD-MRMR Download PDFInfo
- Publication number
- CN115002642A CN115002642A CN202210561080.2A CN202210561080A CN115002642A CN 115002642 A CN115002642 A CN 115002642A CN 202210561080 A CN202210561080 A CN 202210561080A CN 115002642 A CN115002642 A CN 115002642A
- Authority
- CN
- China
- Prior art keywords
- frequency
- feature
- sub
- band
- loudspeaker
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000000873 masking effect Effects 0.000 title claims abstract description 45
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 31
- 238000000605 extraction Methods 0.000 title abstract description 10
- 230000004044 response Effects 0.000 claims abstract description 45
- 238000001228 spectrum Methods 0.000 claims abstract description 45
- 238000000034 method Methods 0.000 claims abstract description 32
- 230000008447 perception Effects 0.000 claims abstract description 15
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 8
- 239000011159 matrix material Substances 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 15
- 238000009826 distribution Methods 0.000 claims description 13
- 210000000883 ear external Anatomy 0.000 claims description 9
- 210000000959 ear middle Anatomy 0.000 claims description 9
- 230000005236 sound signal Effects 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 7
- 230000007480 spreading Effects 0.000 claims description 7
- 238000003892 spreading Methods 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 6
- 230000005284 excitation Effects 0.000 claims description 4
- 238000009432 framing Methods 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000005316 response function Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 abstract description 8
- 238000005259 measurement Methods 0.000 abstract description 3
- 230000009467 reduction Effects 0.000 abstract description 3
- 238000013145 classification model Methods 0.000 abstract description 2
- 230000002349 favourable effect Effects 0.000 abstract description 2
- 230000008569 process Effects 0.000 abstract description 2
- 238000012800 visualization Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 6
- 210000005069 ears Anatomy 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 229910052739 hydrogen Inorganic materials 0.000 description 2
- 239000001257 hydrogen Substances 0.000 description 2
- 125000004435 hydrogen atom Chemical class [H]* 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 210000002469 basement membrane Anatomy 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000006996 mental state Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R29/00—Monitoring arrangements; Testing arrangements
- H04R29/001—Monitoring arrangements; Testing arrangements for loudspeakers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Otolaryngology (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention relates to the technical field of loudspeaker quality detection, in particular to a loudspeaker abnormal sound feature extraction method based on auditory masking and SVD-MRMR. Firstly, analyzing a loudspeaker response signal by simulating artificial listening to collected sound, and calculating to obtain an auditory perception masking spectrum of the loudspeaker; then SVD decomposition is carried out on the auditory perception masking spectrum to obtain a singular value characteristic sequence; and then, the optimal features in the singular value feature sequence are extracted through an MRMR algorithm, redundant information is eliminated, the features after dimension reduction are obtained, the redundant information is eliminated, the problem that the dimensionality of a classification model is too high due to too many features in the classification process and the efficiency of the model is influenced is avoided, and the method is more favorable for unified measurement and judgment.
Description
Technical Field
The invention relates to the technical field of loudspeaker quality detection, in particular to a loudspeaker abnormal sound feature extraction method based on auditory masking and SVD-MRMR.
Background
The loudspeaker and the system thereof are key elements of the sound equipment, and the quality of the loudspeaker directly influences the tone quality of the whole pronunciation system; however, in the production process, the loudspeaker sounds abnormally due to bottoming of the voice coil, breakage of the diaphragm, excessive impedance and the like. Therefore, it is very practical to detect and classify abnormal sounds of the produced speaker.
The traditional loudspeaker abnormal sound detection method is characterized in that a listener detects abnormal sound through listening to the ears, listening results are influenced by the complexity of technology, the physical and mental states of workers and the working fatigue degree, the subjective difference of the listening results is large, the detection precision and stability of a loudspeaker are difficult to further improve, hearing damage is easy to cause, and the production efficiency and the product quality of enterprises are difficult to further improve.
The speaker abnormal sound can be detected by a signal processing method, for example, by detecting a time-frequency diagram obtained by performing short-time Fourier transform or waveform transform, decomposing a speaker response signal, and then detecting a time-frequency characteristic, for example, by using an LMD energy entropy (Chenxiu-Lou. speaker abnormal sound fault diagnosis research [ D ] Guilin electronic technology university, 2020.) and a VMD energy entropy (Zhou Jing, Yan Ting. speaker abnormal sound classification [ J ] acoustics, 2021,46(02): 263-plus 270.) and the like. Although the method can have good effect on the recognition and classification of the abnormal sounds of the loudspeaker, the auditory perception characteristics of human ears are not concerned, and some methods extract a psychoacoustic energy mean value (Guo Qing, who Jie happy, Su Hao, Wang King building.) and a loudspeaker abnormal sound detection algorithm [ J ] based on psychoacoustic and a support vector machine (Nature science edition), 2020,46(02):275 plus 281.) as a characteristic to classify the abnormal sounds of the loudspeaker, but the classification result is poor due to overhigh dimension.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method for extracting the characteristic of the speaker abnormal sound based on auditory masking in combination with SVD-MRMR, the specific technical solution is as follows:
a method for extracting the characteristics of speaker abnormal sounds based on auditory masking and SVD-MRMR comprises the following steps:
a method for extracting the characteristics of speaker abnormal sounds based on auditory masking and SVD-MRMR comprises the following steps:
step S1: using a microphone to collect the sound signal of a loudspeaker, and setting the sampling frequency of the microphone as f s In Hz; selecting a logarithmic sweep frequency signal of 20-20000 Hz as an excitation signal, wherein the duration of a sound signal is T, and the unit is s; calculating an auditory perceptual masking spectrum E [ k, n ] of the loudspeaker response signal x (t) according to the ITU-RBS.1387-1 psychoacoustic model]K is the number of subbands, n is the number of frames for framing the duration T of the sound signal according to the frame length of 2048 and the overlapping rate of 50%,
step S2, masking each sub-band auditory perception spectrum E [ k, n ] of the loudspeaker response signal]Constructing a k × n order matrix E with the rank of r, (r ═ min (k, n)) and then carrying out SVD on the matrix E to obtain singular value eigenvectors, and forming the eigenvectors obtained from each sample into a feature set X of the psychoacoustic energy spectrum of the loudspeaker h×r Wherein h is the number of speaker samples and r is the feature dimension;
step S3 from data set X by MRMR algorithm h×r And selecting the optimal characteristic subset Z to eliminate redundant information.
Preferably, the step S1 specifically includes the following steps:
step S11, the response signal x (t) of the speaker is framed, and each frame signal is obtainedx n [t,n]Comprises the following steps:
wherein t is the number of time-domain sampling points of each frame; n is a radical of hydrogen F The number of samples for each frame;
step S12, responding the speaker response signal x after frame division n [t,n]Using Hanning window filtering and fast Fourier transform to convert signal from time domain to frequency domain to obtain frequency spectrum F f [k f ,n]:
F f [k f ,n]=fft(h[t]·x n [t,n]);
Wherein: k is a radical of f The number of frequency domain points; h [ t ]]Is a Hanning window function;
step S13, simulating auditory characteristics of middle and outer ear of human, and obtaining frequency spectrum F f [k f ,n]Carrying out frequency domain weighting to obtain an output F after the middle and outer ear frequencies are weighted e [k f ,n]:
Wherein, W [ f ] is the frequency response function of middle and outer ear;
step S14, uniformly dividing a Bark scale into 4 frequency bands, wherein the frequency bandwidth is Δ z 1/4, and the Bark domain is uniformly divided into 109 non-overlapping critical frequency sub-bands in the 20Hz to 18kHz hearing range, k 1, 2. Adding the signal energy values in each sub-band to obtain an energy output spectrum P in one frequency sub-band e [k,n]Wherein k represents the kth group of critical frequency subbands;
wherein the energy distribution of each sub-band is calculated by:
wherein f is u [k]、f l [k]Respectively the low frequency and the high frequency of the kth group of critical frequency sub-bands;
energy output spectrum P within one frequency sub-band e [k,n]Calculated from the following formula:
P e [k,n]=max(∑U[k,k f ](F e [k f ,n]) 2 ,10 -12 );
step S15, outputting spectrum P to energy e [k,n]Adding a piece of internal noise P noise [k]Adding to obtain P p [k,n]:
P p [k,n]=P e [k,n]+P noise [k];
Where fc is the center frequency of the frequency sub-band;
step S16: simulating a frequency domain masking effect by using a frequency spread function in a Terhardt psychoacoustic model; the energy of each sub-band is distributed to the whole Bark domain space through a spreading function, then the energy of the ith sub-band is the weighted sum of the contributions of the energy of each sub-band in the sub-band, and the spreading function is marked as S dB (i, k, n) representing the contribution of the kth subband energy to the ith subband, as follows:
calculating the energy distribution E of each sub-band after frequency domain expansion s [k,n]:
In the formula B s (i) Is a normalization factor;
step S117, performing time domain expansion, and simulating a time domain masking effect by using a first-order low-pass filter:
E f [k,n]=a·E f [k,n-1]+(1-a)·E s [k,n];
wherein, the parameter a is determined by a time coefficient and is related to the center frequency of each sub-band; thus, the auditory perception masking spectrum E [ k, n ] of each sub-band after time domain masking can be obtained:
E[k,n]=max(E f [k,n],E s [k,n])。
preferably, the relation between the parameter a and the time coefficient τ is as follows:
wherein: tau is 100 、τ min Is constant, generally τ 100 =0.03,τ min =0.008,f c [k]Is the center frequency of each sub-band.
Preferably, the expression of performing SVD decomposition on the matrix E in step S2 is as follows:
wherein S is a matrix of k, V is a matrix of n, S and V are unitary matrices, and Σ is diag (σ) 1 ,σ 2 ,...,σ n ) (ii) a Singular values of matrix E are σ i I 1,2, r and σ 1 ≥σ 2 ≥...≥σ r Not less than 0; the loudspeaker response signal feature vector s ═ σ ═ s 1 ,σ 2 ,...,σ r ) (ii) a Forming X by the characteristic vector s obtained by each sample h×r ={X 1 ,X 2 ,...X r Is a set of features of a psychoacoustic energy map of the loudspeaker, where X i ( i 1, 2.. r.) is the singular value σ of the auditory perceptual masking spectrum E of the response signal from each loudspeaker i ( i 1, 2.. r.) and then assigning the labels corresponding to the loudspeaker response signals of four different states of normal, circle collision, air leakage and small sound as 1,2, 3 and 4 respectively, assuming that Y is h×1 ={Y 1 ,Y 2 ,...Y h And h is the number of speaker samples, and r is the feature dimension.
Preferably, the step S3 specifically includes the following steps:
step S31, set X of characteristics h×r Normalization, feature set X is calculated according to the MRMR method h×r Each feature X in i R of importance x The formula is as follows:
wherein L is x For each feature X i Correlation with class Y, L x From each feature X i And its class Y mutual information decision, i.e. L x =I(X i ,Y);
N x For each feature X i With other features X in the set Z l Redundancy between, where i ≠ l, N x Determined by the mutual information between each feature and others:
step S32: according to the degree of importance R of each feature x And (3) sorting in a descending order, inputting the features into the classifier one by one for evaluation, selecting the feature subset with the highest classification precision, if the mth feature subset is input so that the precision of the classifier is the highest, indicating that the first m features are selected, and selecting the optimal feature from the features to form the feature subset Z by the MRMR criterion.
The invention has the beneficial effects that: the conventional speaker abnormal sound detection method is completed, and the type of the abnormal sound of the speaker is difficult to judge by a stable and consistent standard. The invention simulates artificial listening to analyze the loudspeaker response signals based on the ITU-RBS.1387-1 psychoacoustic model, and eliminates the subjective feeling influence of listening to the artificial listening. The invention provides a method for performing Singular Value Decomposition (SVD) on a loudspeaker response signal after being processed by a psychoacoustic model to obtain a psychoacoustic energy singular value as a feature set to be selected, so that the feature extraction of fault information is facilitated. The invention selects the optimal characteristic from the characteristic set by utilizing a maximum correlation minimum redundancy (MRMR) algorithm and eliminates redundant information, thereby being more beneficial to unified measurement and judgment.
Drawings
In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.
FIG. 1 is a flow chart of a detection implementation of the present invention;
FIG. 2 is a time domain plot of a response signal for a normal loudspeaker;
FIG. 3 is a time domain diagram of a response signal of a circle-of-impact speaker;
FIG. 4 is a time domain diagram of the response signal of a leaky speaker;
FIG. 5 is a time domain plot of a response signal for a woofer;
FIG. 6 is a psycho-acoustically masked energy spectrum of a response signal for a normal speaker;
FIG. 7 is a psycho-acoustically masked energy spectrum of a response signal of a bump ring speaker;
FIG. 8 is a psycho-acoustically masked energy spectrum of a response signal for a leaky speaker;
FIG. 9 is a psycho-acoustically masked energy spectrum of a response signal of a woofer;
FIG. 10 illustrates the singular psychoacoustic energy values of the loudspeaker signals in four states after SVD;
FIG. 11 test set classification accuracy curves for different feature quantities.
FIG. 12 is a visualization result of the feature extraction of the method of the present invention
FIG. 13 is a visualization result of features extracted by the LMD energy entropy method
FIG. 14 is a visual result of features extracted by the VMD energy entropy method
FIG. 15 is a visualization result of features extracted by the psychoacoustic energy averaging method
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As shown in fig. 1, the embodiment of the invention provides a method for extracting characteristics of speaker abnormal sounds based on auditory masking in combination with SVD-MRMR, comprising the following steps:
step S1: using a microphone to collect the sound signal of a loudspeaker, and setting the sampling frequency of the microphone as f s In Hz; selecting a logarithmic sweep frequency signal of 20-20000 Hz as an excitation signal, wherein the duration of a sound signal is T, and the unit is s; calculating an auditory perceptual masking spectrum E [ k, n ] of the loudspeaker response signal x (t) according to the ITU-RBS.1387-1 psychoacoustic model]K is the number of sub-bands, n is the number of frames, wherein k is the number of frames, a Bark scale is uniformly divided into 4 frequency bands, the frequency bandwidth is 1/4, a Bark domain is uniformly divided into 109 non-overlapping critical frequency sub-bands in the hearing range of 20Hz to 18kHz, k is 1,2, 109, n is the number of frames for framing the duration T of the sound signal according to the frame length of 2048 and the overlapping rate of 50 percent,
the test object of this experiment was a 4cm diameter loudspeaker unit with a power of 0.5W and an impedance of 8 Ω. In the embodiment of the invention, a logarithmic frequency sweep signal of 20-20000 Hz is selected as an excitation signal, and a microphone is Bruel, Denmark&The model 4966-H-041 microphone of company Type, the main performance parameters of the microphone of model 4966 are as follows: the dynamic range is 14.6-144 dB, the frequency range is 5 Hz-20000 Hz, the inherent noise is 144.6dB A, and the lower limit frequency is-3 dB: 3Hz, a sensitivity of 50mV/Pa and a sampling rate of 65536 Hz. The experimental data acquisition system adopts Denmark B&And the K acoustic and vibration acquisition and analysis system is used as an acquisition system and stores the acquired loudspeaker response signals.
The loudspeakers under 4 states of normal, small sound, circle collision and air leakage are mainly tested, wherein 50 loudspeakers of the normal, small sound, circle collision and air leakage are respectively tested, 200 loudspeakers to be tested are totally tested, 5 repeated tests are carried out on each loudspeaker, 1000 sample sets are formed in total, time domain graphs of various loudspeaker response signals are acquired and shown in the figures 2-5, wherein the figure 2 is the response signal of the normal loudspeaker, the figure 3 is the response signal of the circle collision loudspeaker, the figure 4 is the response signal of the air leakage loudspeaker, and the figure 5 is the response signal of the small sound loudspeaker.
The auditory perceptual masking spectrum E k, n of the loudspeaker response signal is calculated for the time domain signals in fig. 2-5 according to the ITU-r bs.1387-1 psychoacoustic model, as follows:
the method specifically comprises the following steps:
step S11, performing framing processing on the response signal x (t) of the speaker, so as to obtain each frame signal x n [t,n]Comprises the following steps:
wherein t is the number of time-domain sampling points of each frame; n is a radical of hydrogen F The number of samples for each frame;
step S12, responding to the loudspeaker response signal x after the frame division n [t,n]Using hanning window filtering and Fast Fourier Transform (FFT), the signal is transformed from the time domain to the frequency domain to obtain a frequency spectrum F f [k f ,n]:
F f [k f ,n]=fft(h[t]·x n [t,n]);
Wherein: k is a radical of f The number of frequency domain points; h [ t ]]Is a Hanning window function;
step S13, simulating auditory characteristics of middle and outer ear of human, and obtaining frequency spectrum F f [k f ,n]Carrying out frequency domain weighting to obtain an output F after the middle and outer ear frequencies are weighted e [k f ,n]:
Wherein, W [ f ] is the frequency response function of the middle and outer ear;
step S14, Bark scale critical sub-band mapping, scale transformation is nonlinear transformation processing of sound frequency by simulating human body intra-cochlear basement membrane resonance Bark property, linear frequency domain can be mapped to psychoacoustic frequency domain (Bark domain), and after signal transformation to Bark domain, response signal is given high resolution nonlinear stretching, which has obvious amplification effect, thereby more meticulously and intuitively depicting high and low frequency components in signal.
Uniformly dividing a Bark scale into 4 frequency bands, wherein the frequency bandwidth is 1/4 equal to delta z, and uniformly dividing a Bark domain into 109 non-overlapping critical frequency sub-bands in the hearing range of 20Hz to 18 kHz; adding the signal energy values in each sub-band to obtain an energy output spectrum P in the sub-band under one frequency e [k,n]Wherein k represents the kth group of critical frequency subbands;
wherein the energy distribution of each sub-band is calculated by:
wherein f is u [k]、f l [k]Respectively the low frequency and the high frequency of the kth group of critical frequency sub-bands;
energy output spectrum P within one frequency sub-band e [k,n]Calculated from the following formula:
P e [k,n]=max(∑U[k,k f ](F e [k f ,n]) 2 ,10 -12 );
in step S15, the energy output spectrum P is given because factors such as blood and heartbeat of a person may affect the hearing of the person to some extent e [k,n]Adding a piece of internal noise P noise [k]Adding to obtain P p [k,n]:
P p [k,n]=P e [k,n]+P noise [k];
Wherein f is c The center frequency of a frequency sub-band.
Step S16: simulating a frequency domain masking effect by using a frequency spread function in a Terhardt psychoacoustic model; distributing the energy of each sub-band to the whole Bark domain space through a spreading function, wherein the energy of the ith sub-band is the weighted sum of the contribution of the energy of each sub-band in the sub-band; the spreading function is denoted S dB (i, k, n) representing the contribution of the kth subband energy to the ith subband, as follows:
calculating the energy distribution E of each sub-band after frequency domain expansion s [k,n]:
In the formula B s (i) Is a normalization factor;
step S117, performing time domain expansion, and simulating a time domain masking effect by using a first-order low-pass filter:
E f [k,n]=a·E f [k,n-1]+(1-a)·E s [k,n];
wherein, the parameter a is determined by a time coefficient and is related to the center frequency of each sub-band; thus, after time-domain masking, the auditory perception masking spectrum E [ k, n ] of each sub-band:
E[k,n]=max(E f [k,n],E s [k,n])。
the relation between the parameter a and the time coefficient tau is as follows:
wherein: tau. 100 ,τ min Is constant, generally τ 100 =0.03,τ min =0.008,f c [k]Is the center frequency of each sub-band.
The resulting psychoacoustic masking energy spectrum is shown in fig. 6-9. Fig. 6 is a psycho-acoustically masked energy spectrum of a normal speaker. Fig. 7 is a psychoacoustic masking energy spectrum for a pop-up speaker, fig. 8 is a psychoacoustic masking energy spectrum for a flat speaker, and fig. 9 is a psychoacoustic masking energy spectrum for a small-pitched speaker. 6-9 can analyze that the loudspeaker response signals of different fault types have the largest energy ratio in the interval of 49-64 # critical frequency sub-bands, wherein the corresponding center frequency is 2026.266-3519.344 Hz, which accords with the most sensitive characteristic of human ears to 1000-3000 Hz sound, and the energy amplitudes of the response signals of different fault types in the interval are different, so that a sharp human auditory perception system can be simulated by adopting an auditory perception masking spectrum, the signals of the low-frequency part and the high-frequency part of the collected loudspeaker response signals are inhibited, the signals of the middle-frequency part are enhanced, and the analysis result is more accordant with the auditory perception of human ears.
Step S2, constructing each sub-band auditory perception masking spectrum E [ k, n ] of the loudspeaker response signal into a k multiplied by n order matrix E, wherein the rank is r, (r is min (k, n)), and then carrying out SVD decomposition on the matrix E to obtain a singular value characteristic sequence; the expression for the SVD decomposition of matrix E is as follows:
wherein S is a matrix of k, V is a matrix of n, S and V are unitary matrices, and Σ is diag (σ) 1 ,σ 2 ,...,σ r ) (ii) a Singular values of matrix E are σ i 1,2, r, and σ 1 ≥σ 2 ≥...≥σ r Not less than 0; . It contains the auditory perception masking spectrum E k, f]So that the information is used to analyze the type of abnormal sound for classification, the characteristic vector s of the loudspeaker response signal is (σ ═ c) 1 ,σ 2 ,...,σ r ). Fig. 10 shows singular values of psychoacoustic energy of speaker signals in different states, and it can be seen from fig. 10 that for speakers in different states, arrangement of the singular values is in descending order, and arrangement of feature numbers is reflected regardless of the states of the speakers, thereby showing the advantage of SVD stability.
Forming X by the characteristic vector s obtained by each sample h×r ={X 1 ,X 2 ,...X r Is a set of features of a psychoacoustic energy map of the loudspeaker, where X i (i ═ 1, 2.. times, r) are the singular values σ of the auditory perceptual masking spectrum E of the response signal from each loudspeaker i ( i 1, 2.. r.) and then assigning the labels corresponding to the loudspeaker response signals of four different states of normal, circle collision, air leakage and small sound as 1,2, 3 and 4 respectively, assuming that Y is h×1 ={Y 1 ,Y 2 ,...Y h And h is the number of speaker samples, and r is the feature dimension.
Step S3 from data set X by MRMR algorithm h×r ={X 1 ,X 2 ,...X r The optimal feature subset Z is selected, eliminating redundant information. The method specifically comprises the following steps:
step S31, set X of characteristics h×r Normalization, feature set X is calculated according to the MRMR method h×r Each feature X in i R of importance x The formula is as follows:
wherein L is x For each feature X i Correlation with class Y, L x From each feature X i And its category Y mutual information decision:
L x =I(X i ,Y);
N x for each feature X i With other features X in the set Z l Redundancy therebetween, whichWhere i ≠ l, N x Determined by the mutual information between each feature and others:
two continuous random variables are assumed to be x and y, the probability density functions of the two continuous random variables are p (x) and p (y), and the joint probability density function is p (x, y); mutual information between x, y:
step S32: according to the degree of importance R of each feature x And (3) sorting in a descending order, inputting the features into the classifier one by one for evaluation, selecting the feature subset with the highest classification precision, if the mth feature subset is input so that the precision of the classifier is the highest, indicating that the first m features are selected, and selecting the optimal feature from the features to form the feature subset Z by the MRMR criterion.
The method specifically comprises the following steps: normalizing the feature set, and calculating R of each feature according to MRMR method x Then, the features are sorted in descending order according to their importance, and the features are input into a classifier one by one for classification, and the change of the classification accuracy is shown in fig. 11. As can be seen from fig. 11, the accuracy of feature set classification gradually increases with the number of feature values, but basically remains the same or even decreases when the accuracy reaches the maximum value. This shows that when the accuracy reaches the maximum value, the increase of the number of features has little effect on the improvement of the classification accuracy and also has an influence on the classification effect. The results illustrate the importance and necessity of feature selection in feature identification and classification. As can be seen from fig. 11, when the number of features is 10, the accuracy is highest, and the optimal classification effect is achieved, so the first 10 feature quantities are selected as the optimal feature subset.
To more pictorially illustrate the algorithm, the algorithm is then visually analyzed using a t-distribution domain embedding algorithm. the t-distributed stored probabilistic Neighbor Embedding (t-SNE) is a technology integrating dimensionality reduction and visualization, and converts high-dimensional Euclidean distances between data points into conditional probabilities representing similarity. After the visualization algorithm maps the high-dimensional data to the low-dimensional data, points that are separated from each other in the high-dimensional space remain unchanged in the low-dimensional space. After learning converges, t-SNE can project the dataset into a two-dimensional or three-dimensional space. The feature subset Z selected by the MRMR algorithm is visualized by t-SNE, and the result is shown in fig. 12.
In order to better verify the accuracy of the speaker abnormal sound classification by the feature extraction method, LMD energy entropy (Chenxiunong; study on speaker abnormal sound fault diagnosis based on LMD and LSSVM [ D ]. Guilin electronic technology university, 2020.), VMD energy entropy (Zhongjing Lei, Yangting; speaker abnormal sound classification using variation modal decomposition and energy entropy [ J ]. acoustical declaration, 2021,46(02):263-270.) and psychoacoustic energy mean value (Guo, what Jie happy, Suhaitao, Wang Kwang; speaker abnormal sound detection algorithm based on psychoacoustics and support vector machine [ J ]. Donghua university declaration (natural science edition), 2020,46(02):275-281.), are respectively extracted, then the data set is normalized and visualized by t-SNE, and the results are shown as 13-15, fig. 13 is a visualization result of features extracted by the LMD energy entropy method, fig. 14 is a visualization result of features extracted by the VMD energy entropy method, and fig. 15 is a visualization result of features extracted by the psychoacoustic energy mean value.
As can be seen from the analysis of fig. 12 to fig. 15, the method for extracting the abnormal sound feature of the speaker according to the present invention has a high recognition effect on normal, circle collision, air leakage, and small sounds, and especially, the feature distributions among normal, circle collision, and air leakage are completely separated and are respectively gathered in the corresponding areas. Only the small sound is completely separated from the characteristics of the normal, circle collision and air leakage in the characteristics extracted by the LMD energy entropy method, and the characteristic distribution of the normal, circle collision and air leakage has the tendency of gradual separation but still gathers together; the feature distribution of circle collision and small sound in the features extracted by the VMD energy entropy method has a tendency of gradually separating from other features, but the features are not gathered together, the dispersion degree is larger, the feature extraction of the faults such as air leakage is particularly poor, and the air leakage features are known to be mixed with the feature distribution of normal circle collision and small sound in the graph; only the collision circle is completely separated from the normal, air leakage and small sound feature distribution in the features extracted from the psychoacoustic energy mean value, and the normal, air leakage and small sound feature distribution has a tendency of being gradually separated but still mixed together. Therefore, the characteristic extraction method provided by the invention is superior to the characteristic extraction method of the abnormal sound of the existing loudspeaker, in particular to the classification of the normal loudspeaker.
Firstly, analyzing a loudspeaker response signal by simulating artificial listening to collected sound, and calculating to obtain a hearing perception masking spectrum of the loudspeaker; then SVD decomposition is carried out on the auditory perception masking spectrum to obtain a singular value characteristic sequence; and then, the optimal features in the singular value feature sequence are extracted through an MRMR algorithm, redundant information is eliminated, the features after dimension reduction are obtained, the redundant information is eliminated, the problem that the dimensionality of a classification model is too high due to too many features in the classification process and the efficiency of the model is influenced is avoided, and the method is more favorable for unified measurement and judgment.
Those of ordinary skill in the art will appreciate that the elements of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the components of the examples have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present application, it should be understood that the division of the unit is only one division of logical functions, and other division manners may be used in actual implementation, for example, multiple units may be combined into one unit, one unit may be split into multiple units, or some features may be omitted.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.
Claims (5)
1. A method for extracting the characteristics of speaker abnormal sounds based on auditory masking and SVD-MRMR is characterized by comprising the following steps:
step S1: using a microphone to collect a sound signal of a loudspeaker, and setting the sampling frequency of the microphone as f s In Hz; selecting a logarithmic sweep frequency signal of 20-20000 Hz as an excitation signal, wherein the duration of a sound signal is T, and the unit is s; calculating an auditory perceptual masking spectrum E [ k, n ] of the loudspeaker response signal x (t) according to the ITU-RBS.1387-1 psychoacoustic model]K is the number of subbands, n is the number of frames for framing the duration T of the sound signal according to the frame length of 2048 and the overlapping rate of 50%,
step S2, masking each sub-band auditory perception spectrum E [ k, n ] of the loudspeaker response signal]Constructing a k multiplied by n order matrix E with the rank of r, r being min (k, n), then carrying out SVD on the matrix E to obtain singular value eigenvectors, and enabling the eigenvectors obtained by each sample to form a feature set X of the speaker psychoacoustic energy spectrum h×r Wherein h is the number of speaker samples and r is the feature dimension;
step S3 from the data set X by the MRMR algorithm h×n And selecting the optimal characteristic subset Z to eliminate redundant information.
2. The method for extracting the feature of the abnormal sound of the speaker based on the auditory masking in combination with the SVD-MRMR as claimed in claim 1, wherein the step S1 specifically comprises the following steps:
step S11, the response signal x (t) of the speaker is framed, and each frame signal x n [t,n]Comprises the following steps:
wherein t is the number of time-domain sampling points of each frame; n is a radical of F The number of samples for each frame;
step S12, responding to the loudspeaker response signal x after the frame division n [t,n]Using Hanning window filtering and fast Fourier transform to convert signal from time domain to frequency domain to obtain frequency spectrum F f [k f ,n]:
F f [k f ,n]=fft(h[t]·x n [t,n]);
Wherein: k is a radical of f The number of frequency domain points; h [ t ]]Is a Hanning window function;
step S13, simulating auditory characteristics of middle and outer ear of human, and obtaining frequency spectrum F f [k f ,n]Frequency domain weighting is carried out to obtain an output F after the middle and outer ear frequency weighting e [k f ,n]:
Wherein, W [ f ] is the frequency response function of middle and outer ear;
step S14, uniformly dividing a Bark scale into 4 frequency bands, wherein the frequency bandwidth is Δ z 1/4, and uniformly dividing a Bark domain into 109 non-overlapping critical frequency sub-bands in the 20 Hz-18 kHz hearing range, wherein k is 1, 2. Adding the signal energy values in each sub-band to obtain an energy output spectrum P in one frequency sub-band e [k,n]Wherein k represents the kth group of critical frequency subbands;
wherein the energy distribution of each sub-band is calculated by:
wherein f is u [k]、f l [k]Respectively the low frequency and the high frequency of the kth group of critical frequency sub-bands;
energy output spectrum P within one frequency sub-band e [k,n]Calculated from the following formula:
P e [k,n]=max(∑U[k,k f ](F e [k f ,n]) 2 ,10 -12 );
step S15, outputting spectrum P to energy e [k,n]Adding a piece of internal noise P noise [k]Adding to obtain P p [k,n]:
P p [k,n]=P e [k,n]+P noise [k];
Where fc is the center frequency of the frequency sub-band;
step S16: applying a frequency spreading function in a Terhardt psychoacoustic model to simulate a frequency domain masking effect; the energy of each sub-band is distributed to the whole Bark domain space through a spreading function, then the energy of the ith sub-band can be expressed as the weighted sum of the energy of each sub-band and the contribution of the sub-band, and the spreading function is marked as S dB (i, k, n) representing the component of the contribution of the kth subband energy to the ith subband, as follows:
calculating the energy distribution E of each sub-band after frequency domain expansion s [k,n]:
In the formula B s (i) Is a normalization factor;
step S117, performing time domain expansion, and simulating a time domain masking effect by using a first-order low-pass filter:
E f [k,n]=a·E f [k,n-1]+(1-a)·E s [k,n];
wherein, the parameter a is determined by a time coefficient and is related to the center frequency of each sub-band; thus, the auditory perception masking spectrum E [ k, n ] of each sub-band after time domain masking can be obtained:
E[k,n]=max(E f [k,n],E s [k,n])。
3. the method for extracting the characteristics of the abnormal sounds of the loudspeaker based on the auditory masking and SVD-MRMR as claimed in claim 2, wherein the relation between the parameter a and the time coefficient τ is as follows:
wherein: tau is 100 、τ min Is constant, generally τ 100 =0.03,τ min =0.008,f c [k]Is the center frequency of each sub-band.
4. The method for extracting the feature of the abnormal sound of the speaker based on the auditory masking in combination with the SVD-MRMR as claimed in claim 1, wherein the expression of the SVD decomposition of the matrix E in the step S2 is as follows:
wherein S is a matrix of k, V is a matrix of n, S and V are unitary matrices, and Σ is diag (σ) 1 ,σ 2 ,...,σ r ) (ii) a Singular values of matrix E are σ i 1,2, r, and σ 1 ≥σ 2 ≥...≥σ r Not less than 0; the loudspeaker response signal feature vector s ═ σ ═ s 1 ,σ 2 ,...,σ r ) (ii) a Forming X by the characteristic vector s obtained by each sample h×r ={X 1 ,X 2 ,...X r Is a set of features of a psychoacoustic energy map of the loudspeaker, where X i Is the singular value σ of the auditory perceptual masking spectrum E of each loudspeaker response signal i Then labels corresponding to the loudspeaker response signals in four different states of normal, circle collision, air leakage and small sound are respectively assigned as 1,2, 3 and 4, and Y is assumed to be h×1 ={Y 1 ,Y 2 ,...Y h And h is the number of speaker samples, and r is the feature dimension.
5. The method for extracting the feature of the abnormal sound of the speaker based on the auditory masking in combination with the SVD-MRMR as claimed in claim 1, wherein the step S3 specifically comprises the following steps:
step S31, set X of characteristics h×r Normalization, feature set X is calculated according to the MRMR method h×r Each feature X in i R of importance x The formula is as follows:
wherein L is x For each feature X i Correlation with class Y, L x From each feature X i And its class Y mutual information decision, i.e. L x =I(X i ,Y);
N x For each feature X i With other features X in the set Z l Redundancy between, where i ≠ l, N x Determined by the mutual information between each feature and others:
step S32: according to the degree of importance R of each feature x And (3) sorting in a descending order, inputting the features into the classifier one by one for evaluation, selecting the feature subset with the highest classification precision, if the mth feature subset is input so that the precision of the classifier is the highest, indicating that the first m features are selected, and selecting the optimal feature from the features to form the feature subset Z by the MRMR criterion.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210561080.2A CN115002642A (en) | 2022-05-23 | 2022-05-23 | Feature extraction method for abnormal sound of loudspeaker based on combination of auditory masking and SVD-MRMR |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210561080.2A CN115002642A (en) | 2022-05-23 | 2022-05-23 | Feature extraction method for abnormal sound of loudspeaker based on combination of auditory masking and SVD-MRMR |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115002642A true CN115002642A (en) | 2022-09-02 |
Family
ID=83028181
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210561080.2A Pending CN115002642A (en) | 2022-05-23 | 2022-05-23 | Feature extraction method for abnormal sound of loudspeaker based on combination of auditory masking and SVD-MRMR |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115002642A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117351988A (en) * | 2023-12-06 | 2024-01-05 | 方图智能(深圳)科技集团股份有限公司 | Remote audio information processing method and system based on data analysis |
-
2022
- 2022-05-23 CN CN202210561080.2A patent/CN115002642A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117351988A (en) * | 2023-12-06 | 2024-01-05 | 方图智能(深圳)科技集团股份有限公司 | Remote audio information processing method and system based on data analysis |
CN117351988B (en) * | 2023-12-06 | 2024-02-13 | 方图智能(深圳)科技集团股份有限公司 | Remote audio information processing method and system based on data analysis |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Eaton et al. | Estimation of room acoustic parameters: The ACE challenge | |
Falk et al. | Modulation spectral features for robust far-field speaker identification | |
CN101426168B (en) | Sounding body abnormal sound detection method and system | |
CN108417228A (en) | Voice tone color method for measuring similarity under instrument tamber migration | |
CN103546853A (en) | Speaker abnormal sound detecting method based on short-time Fourier transformation | |
CN113763986B (en) | Abnormal sound detection method for air conditioner indoor unit based on sound classification model | |
CN104900238A (en) | Audio real-time comparison method based on sensing filtering | |
Nossier et al. | Mapping and masking targets comparison using different deep learning based speech enhancement architectures | |
CN115002642A (en) | Feature extraction method for abnormal sound of loudspeaker based on combination of auditory masking and SVD-MRMR | |
Roßbach et al. | A model of speech recognition for hearing-impaired listeners based on deep learning | |
Eklund | Data augmentation techniques for robust audio analysis | |
Chiang et al. | Hasa-net: A non-intrusive hearing-aid speech assessment network | |
Shifas et al. | A non-causal FFTNet architecture for speech enhancement | |
Záviška et al. | Psychoacoustically motivated audio declipping based on weighted l 1 minimization | |
Valero et al. | Narrow-band autocorrelation function features for the automatic recognition of acoustic environments | |
CN112735468A (en) | MFCC-based automobile seat motor abnormal noise detection method | |
CN117676446A (en) | Method for measuring perceived audio quality | |
CN117690452A (en) | Motor signal processing method, device, equipment and medium | |
Gandhiraj et al. | Auditory-based wavelet packet filterbank for speech recognition using neural network | |
Strauss et al. | Improved normalizing flow-based speech enhancement using an all-pole gammatone filterbank for conditional input representation | |
Jassim et al. | NSQM: A non-intrusive assessment of speech quality using normalized energies of the neurogram | |
CN111755025B (en) | State detection method, device and equipment based on audio features | |
Donai et al. | Identification of high-pass filtered male, female, and child vowels: The use of high-frequency cues | |
CN109788410A (en) | A kind of method and apparatus inhibiting loudspeaker noise | |
Jassim et al. | Speech quality assessment using 2D neurogram orthogonal moments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |