CN115002642A - Feature extraction method for abnormal sound of loudspeaker based on combination of auditory masking and SVD-MRMR - Google Patents

Feature extraction method for abnormal sound of loudspeaker based on combination of auditory masking and SVD-MRMR Download PDF

Info

Publication number
CN115002642A
CN115002642A CN202210561080.2A CN202210561080A CN115002642A CN 115002642 A CN115002642 A CN 115002642A CN 202210561080 A CN202210561080 A CN 202210561080A CN 115002642 A CN115002642 A CN 115002642A
Authority
CN
China
Prior art keywords
frequency
feature
sub
band
loudspeaker
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210561080.2A
Other languages
Chinese (zh)
Inventor
徐翠锋
詹锦成
苏海涛
吴景
陈家钦
胡鸿志
许金
郭庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202210561080.2A priority Critical patent/CN115002642A/en
Publication of CN115002642A publication Critical patent/CN115002642A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R29/00Monitoring arrangements; Testing arrangements
    • H04R29/001Monitoring arrangements; Testing arrangements for loudspeakers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention relates to the technical field of loudspeaker quality detection, in particular to a loudspeaker abnormal sound feature extraction method based on auditory masking and SVD-MRMR. Firstly, analyzing a loudspeaker response signal by simulating artificial listening to collected sound, and calculating to obtain an auditory perception masking spectrum of the loudspeaker; then SVD decomposition is carried out on the auditory perception masking spectrum to obtain a singular value characteristic sequence; and then, the optimal features in the singular value feature sequence are extracted through an MRMR algorithm, redundant information is eliminated, the features after dimension reduction are obtained, the redundant information is eliminated, the problem that the dimensionality of a classification model is too high due to too many features in the classification process and the efficiency of the model is influenced is avoided, and the method is more favorable for unified measurement and judgment.

Description

Feature extraction method for abnormal sound of loudspeaker based on combination of auditory masking and SVD-MRMR
Technical Field
The invention relates to the technical field of loudspeaker quality detection, in particular to a loudspeaker abnormal sound feature extraction method based on auditory masking and SVD-MRMR.
Background
The loudspeaker and the system thereof are key elements of the sound equipment, and the quality of the loudspeaker directly influences the tone quality of the whole pronunciation system; however, in the production process, the loudspeaker sounds abnormally due to bottoming of the voice coil, breakage of the diaphragm, excessive impedance and the like. Therefore, it is very practical to detect and classify abnormal sounds of the produced speaker.
The traditional loudspeaker abnormal sound detection method is characterized in that a listener detects abnormal sound through listening to the ears, listening results are influenced by the complexity of technology, the physical and mental states of workers and the working fatigue degree, the subjective difference of the listening results is large, the detection precision and stability of a loudspeaker are difficult to further improve, hearing damage is easy to cause, and the production efficiency and the product quality of enterprises are difficult to further improve.
The speaker abnormal sound can be detected by a signal processing method, for example, by detecting a time-frequency diagram obtained by performing short-time Fourier transform or waveform transform, decomposing a speaker response signal, and then detecting a time-frequency characteristic, for example, by using an LMD energy entropy (Chenxiu-Lou. speaker abnormal sound fault diagnosis research [ D ] Guilin electronic technology university, 2020.) and a VMD energy entropy (Zhou Jing, Yan Ting. speaker abnormal sound classification [ J ] acoustics, 2021,46(02): 263-plus 270.) and the like. Although the method can have good effect on the recognition and classification of the abnormal sounds of the loudspeaker, the auditory perception characteristics of human ears are not concerned, and some methods extract a psychoacoustic energy mean value (Guo Qing, who Jie happy, Su Hao, Wang King building.) and a loudspeaker abnormal sound detection algorithm [ J ] based on psychoacoustic and a support vector machine (Nature science edition), 2020,46(02):275 plus 281.) as a characteristic to classify the abnormal sounds of the loudspeaker, but the classification result is poor due to overhigh dimension.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method for extracting the characteristic of the speaker abnormal sound based on auditory masking in combination with SVD-MRMR, the specific technical solution is as follows:
a method for extracting the characteristics of speaker abnormal sounds based on auditory masking and SVD-MRMR comprises the following steps:
a method for extracting the characteristics of speaker abnormal sounds based on auditory masking and SVD-MRMR comprises the following steps:
step S1: using a microphone to collect the sound signal of a loudspeaker, and setting the sampling frequency of the microphone as f s In Hz; selecting a logarithmic sweep frequency signal of 20-20000 Hz as an excitation signal, wherein the duration of a sound signal is T, and the unit is s; calculating an auditory perceptual masking spectrum E [ k, n ] of the loudspeaker response signal x (t) according to the ITU-RBS.1387-1 psychoacoustic model]K is the number of subbands, n is the number of frames for framing the duration T of the sound signal according to the frame length of 2048 and the overlapping rate of 50%,
Figure BDA0003656603050000021
step S2, masking each sub-band auditory perception spectrum E [ k, n ] of the loudspeaker response signal]Constructing a k × n order matrix E with the rank of r, (r ═ min (k, n)) and then carrying out SVD on the matrix E to obtain singular value eigenvectors, and forming the eigenvectors obtained from each sample into a feature set X of the psychoacoustic energy spectrum of the loudspeaker h×r Wherein h is the number of speaker samples and r is the feature dimension;
step S3 from data set X by MRMR algorithm h×r And selecting the optimal characteristic subset Z to eliminate redundant information.
Preferably, the step S1 specifically includes the following steps:
step S11, the response signal x (t) of the speaker is framed, and each frame signal is obtainedx n [t,n]Comprises the following steps:
Figure BDA0003656603050000022
wherein t is the number of time-domain sampling points of each frame; n is a radical of hydrogen F The number of samples for each frame;
step S12, responding the speaker response signal x after frame division n [t,n]Using Hanning window filtering and fast Fourier transform to convert signal from time domain to frequency domain to obtain frequency spectrum F f [k f ,n]:
F f [k f ,n]=fft(h[t]·x n [t,n]);
Wherein: k is a radical of f The number of frequency domain points; h [ t ]]Is a Hanning window function;
step S13, simulating auditory characteristics of middle and outer ear of human, and obtaining frequency spectrum F f [k f ,n]Carrying out frequency domain weighting to obtain an output F after the middle and outer ear frequencies are weighted e [k f ,n]:
Figure BDA0003656603050000031
Wherein, W [ f ] is the frequency response function of middle and outer ear;
Figure BDA0003656603050000032
step S14, uniformly dividing a Bark scale into 4 frequency bands, wherein the frequency bandwidth is Δ z 1/4, and the Bark domain is uniformly divided into 109 non-overlapping critical frequency sub-bands in the 20Hz to 18kHz hearing range, k 1, 2. Adding the signal energy values in each sub-band to obtain an energy output spectrum P in one frequency sub-band e [k,n]Wherein k represents the kth group of critical frequency subbands;
wherein the energy distribution of each sub-band is calculated by:
Figure BDA0003656603050000033
wherein f is u [k]、f l [k]Respectively the low frequency and the high frequency of the kth group of critical frequency sub-bands;
energy output spectrum P within one frequency sub-band e [k,n]Calculated from the following formula:
P e [k,n]=max(∑U[k,k f ](F e [k f ,n]) 2 ,10 -12 );
step S15, outputting spectrum P to energy e [k,n]Adding a piece of internal noise P noise [k]Adding to obtain P p [k,n]:
Figure BDA0003656603050000034
P p [k,n]=P e [k,n]+P noise [k];
Where fc is the center frequency of the frequency sub-band;
step S16: simulating a frequency domain masking effect by using a frequency spread function in a Terhardt psychoacoustic model; the energy of each sub-band is distributed to the whole Bark domain space through a spreading function, then the energy of the ith sub-band is the weighted sum of the contributions of the energy of each sub-band in the sub-band, and the spreading function is marked as S dB (i, k, n) representing the contribution of the kth subband energy to the ith subband, as follows:
Figure BDA0003656603050000041
calculating the energy distribution E of each sub-band after frequency domain expansion s [k,n]:
Figure BDA0003656603050000042
In the formula B s (i) Is a normalization factor;
Figure BDA0003656603050000043
step S117, performing time domain expansion, and simulating a time domain masking effect by using a first-order low-pass filter:
E f [k,n]=a·E f [k,n-1]+(1-a)·E s [k,n];
wherein, the parameter a is determined by a time coefficient and is related to the center frequency of each sub-band; thus, the auditory perception masking spectrum E [ k, n ] of each sub-band after time domain masking can be obtained:
E[k,n]=max(E f [k,n],E s [k,n])。
preferably, the relation between the parameter a and the time coefficient τ is as follows:
Figure BDA0003656603050000044
Figure BDA0003656603050000045
wherein: tau is 100 、τ min Is constant, generally τ 100 =0.03,τ min =0.008,f c [k]Is the center frequency of each sub-band.
Preferably, the expression of performing SVD decomposition on the matrix E in step S2 is as follows:
Figure BDA0003656603050000051
wherein S is a matrix of k, V is a matrix of n, S and V are unitary matrices, and Σ is diag (σ) 12 ,...,σ n ) (ii) a Singular values of matrix E are σ i I 1,2, r and σ 1 ≥σ 2 ≥...≥σ r Not less than 0; the loudspeaker response signal feature vector s ═ σ ═ s 12 ,...,σ r ) (ii) a Forming X by the characteristic vector s obtained by each sample h×r ={X 1 ,X 2 ,...X r Is a set of features of a psychoacoustic energy map of the loudspeaker, where X i ( i 1, 2.. r.) is the singular value σ of the auditory perceptual masking spectrum E of the response signal from each loudspeaker i ( i 1, 2.. r.) and then assigning the labels corresponding to the loudspeaker response signals of four different states of normal, circle collision, air leakage and small sound as 1,2, 3 and 4 respectively, assuming that Y is h×1 ={Y 1 ,Y 2 ,...Y h And h is the number of speaker samples, and r is the feature dimension.
Preferably, the step S3 specifically includes the following steps:
step S31, set X of characteristics h×r Normalization, feature set X is calculated according to the MRMR method h×r Each feature X in i R of importance x The formula is as follows:
Figure BDA0003656603050000052
wherein L is x For each feature X i Correlation with class Y, L x From each feature X i And its class Y mutual information decision, i.e. L x =I(X i ,Y);
N x For each feature X i With other features X in the set Z l Redundancy between, where i ≠ l, N x Determined by the mutual information between each feature and others:
Figure BDA0003656603050000053
step S32: according to the degree of importance R of each feature x And (3) sorting in a descending order, inputting the features into the classifier one by one for evaluation, selecting the feature subset with the highest classification precision, if the mth feature subset is input so that the precision of the classifier is the highest, indicating that the first m features are selected, and selecting the optimal feature from the features to form the feature subset Z by the MRMR criterion.
The invention has the beneficial effects that: the conventional speaker abnormal sound detection method is completed, and the type of the abnormal sound of the speaker is difficult to judge by a stable and consistent standard. The invention simulates artificial listening to analyze the loudspeaker response signals based on the ITU-RBS.1387-1 psychoacoustic model, and eliminates the subjective feeling influence of listening to the artificial listening. The invention provides a method for performing Singular Value Decomposition (SVD) on a loudspeaker response signal after being processed by a psychoacoustic model to obtain a psychoacoustic energy singular value as a feature set to be selected, so that the feature extraction of fault information is facilitated. The invention selects the optimal characteristic from the characteristic set by utilizing a maximum correlation minimum redundancy (MRMR) algorithm and eliminates redundant information, thereby being more beneficial to unified measurement and judgment.
Drawings
In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.
FIG. 1 is a flow chart of a detection implementation of the present invention;
FIG. 2 is a time domain plot of a response signal for a normal loudspeaker;
FIG. 3 is a time domain diagram of a response signal of a circle-of-impact speaker;
FIG. 4 is a time domain diagram of the response signal of a leaky speaker;
FIG. 5 is a time domain plot of a response signal for a woofer;
FIG. 6 is a psycho-acoustically masked energy spectrum of a response signal for a normal speaker;
FIG. 7 is a psycho-acoustically masked energy spectrum of a response signal of a bump ring speaker;
FIG. 8 is a psycho-acoustically masked energy spectrum of a response signal for a leaky speaker;
FIG. 9 is a psycho-acoustically masked energy spectrum of a response signal of a woofer;
FIG. 10 illustrates the singular psychoacoustic energy values of the loudspeaker signals in four states after SVD;
FIG. 11 test set classification accuracy curves for different feature quantities.
FIG. 12 is a visualization result of the feature extraction of the method of the present invention
FIG. 13 is a visualization result of features extracted by the LMD energy entropy method
FIG. 14 is a visual result of features extracted by the VMD energy entropy method
FIG. 15 is a visualization result of features extracted by the psychoacoustic energy averaging method
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As shown in fig. 1, the embodiment of the invention provides a method for extracting characteristics of speaker abnormal sounds based on auditory masking in combination with SVD-MRMR, comprising the following steps:
step S1: using a microphone to collect the sound signal of a loudspeaker, and setting the sampling frequency of the microphone as f s In Hz; selecting a logarithmic sweep frequency signal of 20-20000 Hz as an excitation signal, wherein the duration of a sound signal is T, and the unit is s; calculating an auditory perceptual masking spectrum E [ k, n ] of the loudspeaker response signal x (t) according to the ITU-RBS.1387-1 psychoacoustic model]K is the number of sub-bands, n is the number of frames, wherein k is the number of frames, a Bark scale is uniformly divided into 4 frequency bands, the frequency bandwidth is 1/4, a Bark domain is uniformly divided into 109 non-overlapping critical frequency sub-bands in the hearing range of 20Hz to 18kHz, k is 1,2, 109, n is the number of frames for framing the duration T of the sound signal according to the frame length of 2048 and the overlapping rate of 50 percent,
Figure BDA0003656603050000081
the test object of this experiment was a 4cm diameter loudspeaker unit with a power of 0.5W and an impedance of 8 Ω. In the embodiment of the invention, a logarithmic frequency sweep signal of 20-20000 Hz is selected as an excitation signal, and a microphone is Bruel, Denmark&
Figure BDA0003656603050000083
The model 4966-H-041 microphone of company Type, the main performance parameters of the microphone of model 4966 are as follows: the dynamic range is 14.6-144 dB, the frequency range is 5 Hz-20000 Hz, the inherent noise is 144.6dB A, and the lower limit frequency is-3 dB: 3Hz, a sensitivity of 50mV/Pa and a sampling rate of 65536 Hz. The experimental data acquisition system adopts Denmark B&And the K acoustic and vibration acquisition and analysis system is used as an acquisition system and stores the acquired loudspeaker response signals.
The loudspeakers under 4 states of normal, small sound, circle collision and air leakage are mainly tested, wherein 50 loudspeakers of the normal, small sound, circle collision and air leakage are respectively tested, 200 loudspeakers to be tested are totally tested, 5 repeated tests are carried out on each loudspeaker, 1000 sample sets are formed in total, time domain graphs of various loudspeaker response signals are acquired and shown in the figures 2-5, wherein the figure 2 is the response signal of the normal loudspeaker, the figure 3 is the response signal of the circle collision loudspeaker, the figure 4 is the response signal of the air leakage loudspeaker, and the figure 5 is the response signal of the small sound loudspeaker.
The auditory perceptual masking spectrum E k, n of the loudspeaker response signal is calculated for the time domain signals in fig. 2-5 according to the ITU-r bs.1387-1 psychoacoustic model, as follows:
the method specifically comprises the following steps:
step S11, performing framing processing on the response signal x (t) of the speaker, so as to obtain each frame signal x n [t,n]Comprises the following steps:
Figure BDA0003656603050000082
wherein t is the number of time-domain sampling points of each frame; n is a radical of hydrogen F The number of samples for each frame;
step S12, responding to the loudspeaker response signal x after the frame division n [t,n]Using hanning window filtering and Fast Fourier Transform (FFT), the signal is transformed from the time domain to the frequency domain to obtain a frequency spectrum F f [k f ,n]:
F f [k f ,n]=fft(h[t]·x n [t,n]);
Wherein: k is a radical of f The number of frequency domain points; h [ t ]]Is a Hanning window function;
step S13, simulating auditory characteristics of middle and outer ear of human, and obtaining frequency spectrum F f [k f ,n]Carrying out frequency domain weighting to obtain an output F after the middle and outer ear frequencies are weighted e [k f ,n]:
Figure BDA0003656603050000091
Wherein, W [ f ] is the frequency response function of the middle and outer ear;
Figure BDA0003656603050000092
step S14, Bark scale critical sub-band mapping, scale transformation is nonlinear transformation processing of sound frequency by simulating human body intra-cochlear basement membrane resonance Bark property, linear frequency domain can be mapped to psychoacoustic frequency domain (Bark domain), and after signal transformation to Bark domain, response signal is given high resolution nonlinear stretching, which has obvious amplification effect, thereby more meticulously and intuitively depicting high and low frequency components in signal.
Uniformly dividing a Bark scale into 4 frequency bands, wherein the frequency bandwidth is 1/4 equal to delta z, and uniformly dividing a Bark domain into 109 non-overlapping critical frequency sub-bands in the hearing range of 20Hz to 18 kHz; adding the signal energy values in each sub-band to obtain an energy output spectrum P in the sub-band under one frequency e [k,n]Wherein k represents the kth group of critical frequency subbands;
wherein the energy distribution of each sub-band is calculated by:
Figure BDA0003656603050000093
wherein f is u [k]、f l [k]Respectively the low frequency and the high frequency of the kth group of critical frequency sub-bands;
energy output spectrum P within one frequency sub-band e [k,n]Calculated from the following formula:
P e [k,n]=max(∑U[k,k f ](F e [k f ,n]) 2 ,10 -12 );
in step S15, the energy output spectrum P is given because factors such as blood and heartbeat of a person may affect the hearing of the person to some extent e [k,n]Adding a piece of internal noise P noise [k]Adding to obtain P p [k,n]:
Figure BDA0003656603050000101
P p [k,n]=P e [k,n]+P noise [k];
Wherein f is c The center frequency of a frequency sub-band.
Step S16: simulating a frequency domain masking effect by using a frequency spread function in a Terhardt psychoacoustic model; distributing the energy of each sub-band to the whole Bark domain space through a spreading function, wherein the energy of the ith sub-band is the weighted sum of the contribution of the energy of each sub-band in the sub-band; the spreading function is denoted S dB (i, k, n) representing the contribution of the kth subband energy to the ith subband, as follows:
Figure BDA0003656603050000102
calculating the energy distribution E of each sub-band after frequency domain expansion s [k,n]:
Figure BDA0003656603050000103
In the formula B s (i) Is a normalization factor;
Figure BDA0003656603050000104
step S117, performing time domain expansion, and simulating a time domain masking effect by using a first-order low-pass filter:
E f [k,n]=a·E f [k,n-1]+(1-a)·E s [k,n];
wherein, the parameter a is determined by a time coefficient and is related to the center frequency of each sub-band; thus, after time-domain masking, the auditory perception masking spectrum E [ k, n ] of each sub-band:
E[k,n]=max(E f [k,n],E s [k,n])。
the relation between the parameter a and the time coefficient tau is as follows:
Figure BDA0003656603050000111
Figure BDA0003656603050000112
wherein: tau. 100 ,τ min Is constant, generally τ 100 =0.03,τ min =0.008,f c [k]Is the center frequency of each sub-band.
The resulting psychoacoustic masking energy spectrum is shown in fig. 6-9. Fig. 6 is a psycho-acoustically masked energy spectrum of a normal speaker. Fig. 7 is a psychoacoustic masking energy spectrum for a pop-up speaker, fig. 8 is a psychoacoustic masking energy spectrum for a flat speaker, and fig. 9 is a psychoacoustic masking energy spectrum for a small-pitched speaker. 6-9 can analyze that the loudspeaker response signals of different fault types have the largest energy ratio in the interval of 49-64 # critical frequency sub-bands, wherein the corresponding center frequency is 2026.266-3519.344 Hz, which accords with the most sensitive characteristic of human ears to 1000-3000 Hz sound, and the energy amplitudes of the response signals of different fault types in the interval are different, so that a sharp human auditory perception system can be simulated by adopting an auditory perception masking spectrum, the signals of the low-frequency part and the high-frequency part of the collected loudspeaker response signals are inhibited, the signals of the middle-frequency part are enhanced, and the analysis result is more accordant with the auditory perception of human ears.
Step S2, constructing each sub-band auditory perception masking spectrum E [ k, n ] of the loudspeaker response signal into a k multiplied by n order matrix E, wherein the rank is r, (r is min (k, n)), and then carrying out SVD decomposition on the matrix E to obtain a singular value characteristic sequence; the expression for the SVD decomposition of matrix E is as follows:
Figure BDA0003656603050000113
wherein S is a matrix of k, V is a matrix of n, S and V are unitary matrices, and Σ is diag (σ) 12 ,...,σ r ) (ii) a Singular values of matrix E are σ i 1,2, r, and σ 1 ≥σ 2 ≥...≥σ r Not less than 0; . It contains the auditory perception masking spectrum E k, f]So that the information is used to analyze the type of abnormal sound for classification, the characteristic vector s of the loudspeaker response signal is (σ ═ c) 12 ,...,σ r ). Fig. 10 shows singular values of psychoacoustic energy of speaker signals in different states, and it can be seen from fig. 10 that for speakers in different states, arrangement of the singular values is in descending order, and arrangement of feature numbers is reflected regardless of the states of the speakers, thereby showing the advantage of SVD stability.
Forming X by the characteristic vector s obtained by each sample h×r ={X 1 ,X 2 ,...X r Is a set of features of a psychoacoustic energy map of the loudspeaker, where X i (i ═ 1, 2.. times, r) are the singular values σ of the auditory perceptual masking spectrum E of the response signal from each loudspeaker i ( i 1, 2.. r.) and then assigning the labels corresponding to the loudspeaker response signals of four different states of normal, circle collision, air leakage and small sound as 1,2, 3 and 4 respectively, assuming that Y is h×1 ={Y 1 ,Y 2 ,...Y h And h is the number of speaker samples, and r is the feature dimension.
Step S3 from data set X by MRMR algorithm h×r ={X 1 ,X 2 ,...X r The optimal feature subset Z is selected, eliminating redundant information. The method specifically comprises the following steps:
step S31, set X of characteristics h×r Normalization, feature set X is calculated according to the MRMR method h×r Each feature X in i R of importance x The formula is as follows:
Figure BDA0003656603050000121
wherein L is x For each feature X i Correlation with class Y, L x From each feature X i And its category Y mutual information decision:
L x =I(X i ,Y);
N x for each feature X i With other features X in the set Z l Redundancy therebetween, whichWhere i ≠ l, N x Determined by the mutual information between each feature and others:
Figure BDA0003656603050000122
two continuous random variables are assumed to be x and y, the probability density functions of the two continuous random variables are p (x) and p (y), and the joint probability density function is p (x, y); mutual information between x, y:
Figure BDA0003656603050000123
step S32: according to the degree of importance R of each feature x And (3) sorting in a descending order, inputting the features into the classifier one by one for evaluation, selecting the feature subset with the highest classification precision, if the mth feature subset is input so that the precision of the classifier is the highest, indicating that the first m features are selected, and selecting the optimal feature from the features to form the feature subset Z by the MRMR criterion.
The method specifically comprises the following steps: normalizing the feature set, and calculating R of each feature according to MRMR method x Then, the features are sorted in descending order according to their importance, and the features are input into a classifier one by one for classification, and the change of the classification accuracy is shown in fig. 11. As can be seen from fig. 11, the accuracy of feature set classification gradually increases with the number of feature values, but basically remains the same or even decreases when the accuracy reaches the maximum value. This shows that when the accuracy reaches the maximum value, the increase of the number of features has little effect on the improvement of the classification accuracy and also has an influence on the classification effect. The results illustrate the importance and necessity of feature selection in feature identification and classification. As can be seen from fig. 11, when the number of features is 10, the accuracy is highest, and the optimal classification effect is achieved, so the first 10 feature quantities are selected as the optimal feature subset.
To more pictorially illustrate the algorithm, the algorithm is then visually analyzed using a t-distribution domain embedding algorithm. the t-distributed stored probabilistic Neighbor Embedding (t-SNE) is a technology integrating dimensionality reduction and visualization, and converts high-dimensional Euclidean distances between data points into conditional probabilities representing similarity. After the visualization algorithm maps the high-dimensional data to the low-dimensional data, points that are separated from each other in the high-dimensional space remain unchanged in the low-dimensional space. After learning converges, t-SNE can project the dataset into a two-dimensional or three-dimensional space. The feature subset Z selected by the MRMR algorithm is visualized by t-SNE, and the result is shown in fig. 12.
In order to better verify the accuracy of the speaker abnormal sound classification by the feature extraction method, LMD energy entropy (Chenxiunong; study on speaker abnormal sound fault diagnosis based on LMD and LSSVM [ D ]. Guilin electronic technology university, 2020.), VMD energy entropy (Zhongjing Lei, Yangting; speaker abnormal sound classification using variation modal decomposition and energy entropy [ J ]. acoustical declaration, 2021,46(02):263-270.) and psychoacoustic energy mean value (Guo, what Jie happy, Suhaitao, Wang Kwang; speaker abnormal sound detection algorithm based on psychoacoustics and support vector machine [ J ]. Donghua university declaration (natural science edition), 2020,46(02):275-281.), are respectively extracted, then the data set is normalized and visualized by t-SNE, and the results are shown as 13-15, fig. 13 is a visualization result of features extracted by the LMD energy entropy method, fig. 14 is a visualization result of features extracted by the VMD energy entropy method, and fig. 15 is a visualization result of features extracted by the psychoacoustic energy mean value.
As can be seen from the analysis of fig. 12 to fig. 15, the method for extracting the abnormal sound feature of the speaker according to the present invention has a high recognition effect on normal, circle collision, air leakage, and small sounds, and especially, the feature distributions among normal, circle collision, and air leakage are completely separated and are respectively gathered in the corresponding areas. Only the small sound is completely separated from the characteristics of the normal, circle collision and air leakage in the characteristics extracted by the LMD energy entropy method, and the characteristic distribution of the normal, circle collision and air leakage has the tendency of gradual separation but still gathers together; the feature distribution of circle collision and small sound in the features extracted by the VMD energy entropy method has a tendency of gradually separating from other features, but the features are not gathered together, the dispersion degree is larger, the feature extraction of the faults such as air leakage is particularly poor, and the air leakage features are known to be mixed with the feature distribution of normal circle collision and small sound in the graph; only the collision circle is completely separated from the normal, air leakage and small sound feature distribution in the features extracted from the psychoacoustic energy mean value, and the normal, air leakage and small sound feature distribution has a tendency of being gradually separated but still mixed together. Therefore, the characteristic extraction method provided by the invention is superior to the characteristic extraction method of the abnormal sound of the existing loudspeaker, in particular to the classification of the normal loudspeaker.
Firstly, analyzing a loudspeaker response signal by simulating artificial listening to collected sound, and calculating to obtain a hearing perception masking spectrum of the loudspeaker; then SVD decomposition is carried out on the auditory perception masking spectrum to obtain a singular value characteristic sequence; and then, the optimal features in the singular value feature sequence are extracted through an MRMR algorithm, redundant information is eliminated, the features after dimension reduction are obtained, the redundant information is eliminated, the problem that the dimensionality of a classification model is too high due to too many features in the classification process and the efficiency of the model is influenced is avoided, and the method is more favorable for unified measurement and judgment.
Those of ordinary skill in the art will appreciate that the elements of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the components of the examples have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present application, it should be understood that the division of the unit is only one division of logical functions, and other division manners may be used in actual implementation, for example, multiple units may be combined into one unit, one unit may be split into multiple units, or some features may be omitted.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims (5)

1. A method for extracting the characteristics of speaker abnormal sounds based on auditory masking and SVD-MRMR is characterized by comprising the following steps:
step S1: using a microphone to collect a sound signal of a loudspeaker, and setting the sampling frequency of the microphone as f s In Hz; selecting a logarithmic sweep frequency signal of 20-20000 Hz as an excitation signal, wherein the duration of a sound signal is T, and the unit is s; calculating an auditory perceptual masking spectrum E [ k, n ] of the loudspeaker response signal x (t) according to the ITU-RBS.1387-1 psychoacoustic model]K is the number of subbands, n is the number of frames for framing the duration T of the sound signal according to the frame length of 2048 and the overlapping rate of 50%,
Figure FDA0003656603040000011
step S2, masking each sub-band auditory perception spectrum E [ k, n ] of the loudspeaker response signal]Constructing a k multiplied by n order matrix E with the rank of r, r being min (k, n), then carrying out SVD on the matrix E to obtain singular value eigenvectors, and enabling the eigenvectors obtained by each sample to form a feature set X of the speaker psychoacoustic energy spectrum h×r Wherein h is the number of speaker samples and r is the feature dimension;
step S3 from the data set X by the MRMR algorithm h×n And selecting the optimal characteristic subset Z to eliminate redundant information.
2. The method for extracting the feature of the abnormal sound of the speaker based on the auditory masking in combination with the SVD-MRMR as claimed in claim 1, wherein the step S1 specifically comprises the following steps:
step S11, the response signal x (t) of the speaker is framed, and each frame signal x n [t,n]Comprises the following steps:
Figure FDA0003656603040000012
wherein t is the number of time-domain sampling points of each frame; n is a radical of F The number of samples for each frame;
step S12, responding to the loudspeaker response signal x after the frame division n [t,n]Using Hanning window filtering and fast Fourier transform to convert signal from time domain to frequency domain to obtain frequency spectrum F f [k f ,n]:
F f [k f ,n]=fft(h[t]·x n [t,n]);
Wherein: k is a radical of f The number of frequency domain points; h [ t ]]Is a Hanning window function;
step S13, simulating auditory characteristics of middle and outer ear of human, and obtaining frequency spectrum F f [k f ,n]Frequency domain weighting is carried out to obtain an output F after the middle and outer ear frequency weighting e [k f ,n]:
Figure FDA0003656603040000021
Wherein, W [ f ] is the frequency response function of middle and outer ear;
Figure FDA0003656603040000022
step S14, uniformly dividing a Bark scale into 4 frequency bands, wherein the frequency bandwidth is Δ z 1/4, and uniformly dividing a Bark domain into 109 non-overlapping critical frequency sub-bands in the 20 Hz-18 kHz hearing range, wherein k is 1, 2. Adding the signal energy values in each sub-band to obtain an energy output spectrum P in one frequency sub-band e [k,n]Wherein k represents the kth group of critical frequency subbands;
wherein the energy distribution of each sub-band is calculated by:
Figure FDA0003656603040000023
wherein f is u [k]、f l [k]Respectively the low frequency and the high frequency of the kth group of critical frequency sub-bands;
energy output spectrum P within one frequency sub-band e [k,n]Calculated from the following formula:
P e [k,n]=max(∑U[k,k f ](F e [k f ,n]) 2 ,10 -12 );
step S15, outputting spectrum P to energy e [k,n]Adding a piece of internal noise P noise [k]Adding to obtain P p [k,n]:
Figure FDA0003656603040000024
P p [k,n]=P e [k,n]+P noise [k];
Where fc is the center frequency of the frequency sub-band;
step S16: applying a frequency spreading function in a Terhardt psychoacoustic model to simulate a frequency domain masking effect; the energy of each sub-band is distributed to the whole Bark domain space through a spreading function, then the energy of the ith sub-band can be expressed as the weighted sum of the energy of each sub-band and the contribution of the sub-band, and the spreading function is marked as S dB (i, k, n) representing the component of the contribution of the kth subband energy to the ith subband, as follows:
Figure FDA0003656603040000031
calculating the energy distribution E of each sub-band after frequency domain expansion s [k,n]:
Figure FDA0003656603040000032
In the formula B s (i) Is a normalization factor;
Figure FDA0003656603040000033
step S117, performing time domain expansion, and simulating a time domain masking effect by using a first-order low-pass filter:
E f [k,n]=a·E f [k,n-1]+(1-a)·E s [k,n];
wherein, the parameter a is determined by a time coefficient and is related to the center frequency of each sub-band; thus, the auditory perception masking spectrum E [ k, n ] of each sub-band after time domain masking can be obtained:
E[k,n]=max(E f [k,n],E s [k,n])。
3. the method for extracting the characteristics of the abnormal sounds of the loudspeaker based on the auditory masking and SVD-MRMR as claimed in claim 2, wherein the relation between the parameter a and the time coefficient τ is as follows:
Figure FDA0003656603040000034
Figure FDA0003656603040000035
wherein: tau is 100 、τ min Is constant, generally τ 100 =0.03,τ min =0.008,f c [k]Is the center frequency of each sub-band.
4. The method for extracting the feature of the abnormal sound of the speaker based on the auditory masking in combination with the SVD-MRMR as claimed in claim 1, wherein the expression of the SVD decomposition of the matrix E in the step S2 is as follows:
Figure FDA0003656603040000036
wherein S is a matrix of k, V is a matrix of n, S and V are unitary matrices, and Σ is diag (σ) 12 ,...,σ r ) (ii) a Singular values of matrix E are σ i 1,2, r, and σ 1 ≥σ 2 ≥...≥σ r Not less than 0; the loudspeaker response signal feature vector s ═ σ ═ s 12 ,...,σ r ) (ii) a Forming X by the characteristic vector s obtained by each sample h×r ={X 1 ,X 2 ,...X r Is a set of features of a psychoacoustic energy map of the loudspeaker, where X i Is the singular value σ of the auditory perceptual masking spectrum E of each loudspeaker response signal i Then labels corresponding to the loudspeaker response signals in four different states of normal, circle collision, air leakage and small sound are respectively assigned as 1,2, 3 and 4, and Y is assumed to be h×1 ={Y 1 ,Y 2 ,...Y h And h is the number of speaker samples, and r is the feature dimension.
5. The method for extracting the feature of the abnormal sound of the speaker based on the auditory masking in combination with the SVD-MRMR as claimed in claim 1, wherein the step S3 specifically comprises the following steps:
step S31, set X of characteristics h×r Normalization, feature set X is calculated according to the MRMR method h×r Each feature X in i R of importance x The formula is as follows:
Figure FDA0003656603040000041
wherein L is x For each feature X i Correlation with class Y, L x From each feature X i And its class Y mutual information decision, i.e. L x =I(X i ,Y);
N x For each feature X i With other features X in the set Z l Redundancy between, where i ≠ l, N x Determined by the mutual information between each feature and others:
Figure FDA0003656603040000042
step S32: according to the degree of importance R of each feature x And (3) sorting in a descending order, inputting the features into the classifier one by one for evaluation, selecting the feature subset with the highest classification precision, if the mth feature subset is input so that the precision of the classifier is the highest, indicating that the first m features are selected, and selecting the optimal feature from the features to form the feature subset Z by the MRMR criterion.
CN202210561080.2A 2022-05-23 2022-05-23 Feature extraction method for abnormal sound of loudspeaker based on combination of auditory masking and SVD-MRMR Pending CN115002642A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210561080.2A CN115002642A (en) 2022-05-23 2022-05-23 Feature extraction method for abnormal sound of loudspeaker based on combination of auditory masking and SVD-MRMR

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210561080.2A CN115002642A (en) 2022-05-23 2022-05-23 Feature extraction method for abnormal sound of loudspeaker based on combination of auditory masking and SVD-MRMR

Publications (1)

Publication Number Publication Date
CN115002642A true CN115002642A (en) 2022-09-02

Family

ID=83028181

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210561080.2A Pending CN115002642A (en) 2022-05-23 2022-05-23 Feature extraction method for abnormal sound of loudspeaker based on combination of auditory masking and SVD-MRMR

Country Status (1)

Country Link
CN (1) CN115002642A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117351988A (en) * 2023-12-06 2024-01-05 方图智能(深圳)科技集团股份有限公司 Remote audio information processing method and system based on data analysis

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117351988A (en) * 2023-12-06 2024-01-05 方图智能(深圳)科技集团股份有限公司 Remote audio information processing method and system based on data analysis
CN117351988B (en) * 2023-12-06 2024-02-13 方图智能(深圳)科技集团股份有限公司 Remote audio information processing method and system based on data analysis

Similar Documents

Publication Publication Date Title
Eaton et al. Estimation of room acoustic parameters: The ACE challenge
Falk et al. Modulation spectral features for robust far-field speaker identification
CN101426168B (en) Sounding body abnormal sound detection method and system
CN108417228A (en) Voice tone color method for measuring similarity under instrument tamber migration
CN103546853A (en) Speaker abnormal sound detecting method based on short-time Fourier transformation
CN113763986B (en) Abnormal sound detection method for air conditioner indoor unit based on sound classification model
CN104900238A (en) Audio real-time comparison method based on sensing filtering
Nossier et al. Mapping and masking targets comparison using different deep learning based speech enhancement architectures
CN115002642A (en) Feature extraction method for abnormal sound of loudspeaker based on combination of auditory masking and SVD-MRMR
Roßbach et al. A model of speech recognition for hearing-impaired listeners based on deep learning
Eklund Data augmentation techniques for robust audio analysis
Chiang et al. Hasa-net: A non-intrusive hearing-aid speech assessment network
Shifas et al. A non-causal FFTNet architecture for speech enhancement
Záviška et al. Psychoacoustically motivated audio declipping based on weighted l 1 minimization
Valero et al. Narrow-band autocorrelation function features for the automatic recognition of acoustic environments
CN112735468A (en) MFCC-based automobile seat motor abnormal noise detection method
CN117676446A (en) Method for measuring perceived audio quality
CN117690452A (en) Motor signal processing method, device, equipment and medium
Gandhiraj et al. Auditory-based wavelet packet filterbank for speech recognition using neural network
Strauss et al. Improved normalizing flow-based speech enhancement using an all-pole gammatone filterbank for conditional input representation
Jassim et al. NSQM: A non-intrusive assessment of speech quality using normalized energies of the neurogram
CN111755025B (en) State detection method, device and equipment based on audio features
Donai et al. Identification of high-pass filtered male, female, and child vowels: The use of high-frequency cues
CN109788410A (en) A kind of method and apparatus inhibiting loudspeaker noise
Jassim et al. Speech quality assessment using 2D neurogram orthogonal moments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination