CN109215633A - The recognition methods of cleft palate speech rhinorrhea gas based on recurrence map analysis - Google Patents

The recognition methods of cleft palate speech rhinorrhea gas based on recurrence map analysis Download PDF

Info

Publication number
CN109215633A
CN109215633A CN201811176054.8A CN201811176054A CN109215633A CN 109215633 A CN109215633 A CN 109215633A CN 201811176054 A CN201811176054 A CN 201811176054A CN 109215633 A CN109215633 A CN 109215633A
Authority
CN
China
Prior art keywords
recurrence
recurrence plot
matrix
signal
rhinorrhea
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811176054.8A
Other languages
Chinese (zh)
Inventor
何凌
刘新怡
尹恒
何飞
付佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN201811176054.8A priority Critical patent/CN109215633A/en
Publication of CN109215633A publication Critical patent/CN109215633A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • G06T5/30Erosion or dilatation, e.g. thinning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • G06T2207/20032Median filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20056Discrete and fast Fourier transform, [DFT, FFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The recognition methods of the invention discloses a kind of cleft palate speech rhinorrhea gas based on recurrence map analysis, is related to field of voice signal.The detection method includes (1) speech signal pre-processing, carries out down-sampled, normalization framing, preemphasis and amplitude to the voice signal of input and normalizes;(2) recurrence plot and matrix recurrence plot of voice signal are sought to pretreated voice signal;(3) trend analysis is done for recurrence plot;(4) area dividing processing is directly done for recurrence plot, matrix disposal is done to signal recurrence plot;(5) image procossing is carried out to recurrence plot, the image array after format is converted successively carries out binary conversion treatment and the filtering processing of specific structure template twice;(6) Classification and Identification is carried out to signal using classifier, obtains automatic identification result.Compared with the prior art, testing result is objective and accurate, realizes the automatic measurement of higher degree, clinically provides reliable reference data to the digitlization assessment of cleft palate speech rhinorrhea gas, the growth requirement for meeting accurate medical treatment carries out more accurately and effectively Modulation recognition and identifies.

Description

The recognition methods of cleft palate speech rhinorrhea gas based on recurrence map analysis
Technical field
The present invention relates to field of voice signal, especially a kind of cleft palate speech rhinorrhea gas based on recurrence map analysis Recognition methods.
Background technique
The most apparent dysfunction of congenital cleft palate deformity first is that voice disorder, since cleft palate loses between mouth and nose chamber Normal bone and flesh separation, cause patient to be unable to control the size and Orientation of air-flow during the pronunciation process.Patients with Cleft Palate by Palate pharynx valve is unable to completely close when velopharyngeal incompetence makes pronunciation, and air-flow have passed through nasal cavity and oral cavity simultaneously, is generated abnormal total The typical cleft palate speech such as vibration, rhinorrhea gas, high nasal sound.Enunciator has air-flow to leak out from nasal cavity when sending out consonant, and this air-flow is logical The sound for crossing nasal cavity can be audible during the pronunciation process sometimes is known as rhinorrhea gas.Rhinorrhea gas exists mainly in plosive, plug is wiped The pressures consonant such as sound and fricative, sound quality change.
In recent years, domestic and foreign scholars are dedicated to the more objective digitlization speech processes of research and recognition methods, but existing The less research for having diagnosis and assessment for rhinorrhea gas voice in technology, and lack effective automatic knowledge to rhinorrhea gas voice It does not detect.
Summary of the invention
The present invention is directed to above-mentioned technical problem of the existing technology, provides a kind of cleft palate speech based on recurrence map analysis The recognition methods of rhinorrhea gas, the modified parameters obtained using recurrence plot trend analysis, then area dividing processing is done for recurrence plot It obtains Minimum Area matrix and calculates relevant parameter, and do image analysis processing for recurrence plot and obtain image array, by these Characteristic parameter of the parameter as voice signal realizes the automatic identification to cleft palate speech rhinorrhea gas using classifier.
The invention adopts the following technical scheme:
A kind of recognition methods of the cleft palate speech rhinorrhea gas based on recurrence map analysis, comprising the following steps:
(1) speech signal pre-processing carries out down-sampled, normalization framing, preemphasis and amplitude to the voice signal of input Normalization;
(2) the recurrence plot matrix that voice signal is sought to pretreated voice signal, the time series in system is mentioned It takes out, reappears recursive signal, select suitable Embedded dimensions m and delay time T by one-dimensional Nonlinear Time Series:
S (i), i=1,2 ... } (2)
Reconstruct outgoing vector:
Si=[s (i), s (i+ τ) ..., s (i+ (m-1) τ)] (3),
The m dimension phase space orbit of system is made of the sequence vector that having time marks:
{Si, i=1,2 ..., N } and (4)
Then the point in these phase spaces is used to constitute the matrix recurrence plot of N × N as row and column;
Each node R in figureI, jIt is indicated by the distance between corresponding row, column vector point:
RI, j=θ (ε-| | Si-Sj| |), i, j=1,2 ..., N (5),
Wherein ε is the threshold constant set according to previous occasion, for indicating critical distance;Symbol | | * | | indicate orientation Euclidean (Euclidean) norm of amount;θ (x) is kernel function;If RI, jValue be 1, then correspond to location point in recurrence plot (i, J) it is indicated as a stain, if RI, jValue be 0, then correspond to location point (i, j) in recurrence plot and be indicated as a white point, obtain To two-dimentional recursion matrix RP;
(3) trend analysis is done for recurrence plot, is joined using 5 kinds of quantizations for recurrence map analysis of recurrence quantification analysis Number, including certainty DET, recurrence degree RRL, longest diagonal L L, entropy ENTR and trend analysis RT, by diagonal line in recurrence plot The probability density of distribution, it is corresponding to be converted into diagonal line feature LLF function on frequency domain, and by area under each diagonal line on frequency domain The corresponding entropy in domain is converted into frequency domain entropy PENTR;
(4) area dividing processing is directly done for recurrence plot, matrix disposal is done to signal recurrence plot, seeks ranks value, selected Suitable blocking factor is taken, blocking factor is determined according to statistical property, after recurrence plot is divided into corresponding region, extracts recurrence The Minimum Area part matrix at four angles of figure is as follows:
Wherein RP is recursion matrix, and m is required blocking factor, and the calculating of blocking factor m will entirely recurrence plot be divided into M*m part is divided, calculation formula according to all points summations of matrix by 1000, each region point are as follows:
Extract the recursion matrix RP at four angles of recurrence plot (upper left, lower-left, upper right, bottom right)a, minor matrix after each piecemeal Label indicated with a;Four angles are the matrix of Minimum Area part as characteristic parameter at this time, then are directed to each Minimum Area Part processes, zoning mean-square value, and simulation is transformed into the weight in densely distributed region on two dimensional image, meter when one-dimensional signal It is as follows to calculate formula:
Matrix dot R hereinI, j∈RPa, NaIndicate the points summation of the Minimum Area matrix;
(5) image procossing is carried out to recurrence plot, the image array after format is converted successively carries out binary conversion treatment and two The matrix signal feature that obtains that treated after the filtering processing of secondary specific structure template is for Classification and Identification;
(6) Classification and Identification is carried out to signal using classifier, obtains automatic identification result.
The step 3 specifically includes the following steps:
(1.1) voice signal is down-sampled, and 15000 or 8000 proportional logarithmic evidence is downsampled to according to sample frequency 44100 It is handled;
(1.2) it normalizes: seeking the amplitude max of maximum absolute value to each voice data, it then will be in voice signal Each value respectively divided by the maximum value, be normalized, corresponding calculation formula are as follows:
S=xx/max | xx | (1)
Wherein xx is the voice signal of input, and s is treated sequence.
Quantization parameter in the step 3 specifically:
Since recurrence plot is symmetrical about leading diagonal, then lower right-most portion is equidistantly parallel to leading diagonal, uses RRLExpression is passed Return recurrence degree in the region figure L, calculation formula is as follows:
And trend analysis RT is characterized with recurrence degree rate of change, calculation formula is as follows:
Wherein,For { RRL, L=1,2 ..., K } average value, therefore RT is also { RRL, L=1,2 ..., K } and sequence About the linear regression slope of { L=1,2 ..., K }, slope is smaller to mean that the system is more stable, and the slope the big also represents The mutation of the instability or kinetic characteristics of system;
The calculation formula of certainty DET is as follows:
Wherein p (l) indicates that the probability density that diagonal line is distributed in signal recurrence plot, p (l) calculation formula are as follows:
lminIt is the statistics initial value of catercorner length, meets 2≤lmin≤NL- 1, P indicate probability calculation, due to signal length Difference is as follows to longest diagonal L L calculation formula after normalization:
Wherein lmaxIt is corresponding lminValue, due to lminIt is the statistics initial value of catercorner length, therefore lmaxIt is statistical length In cornerwise maximum value, and LL be exactly diagonal line maximum value is normalized;
Entropy ENTR is the complexity for describing signal recurrence plot, and the more high then entropy of complexity is bigger, it is assumed that passs Returning all wire lengths in figure is all the same value, then entropy is 0, according to voice entropy calculation formula:
The probability density for being distributed diagonal line in recurrence plot in the step 3, it is corresponding to be converted on frequency domain diagonally Line feature LLF function, and it is specific by the corresponding entropy in region under each diagonal line on frequency domain to convert frequency domain entropy PENTR Are as follows:
The probability density that diagonal line is distributed in recurrence plot, correspondence are converted into diagonal line feature LLF function on frequency domain, Calculation formula is as follows:
Wherein, β is modifying factor, prevents from selecting according to the actual situation beyond frequency-domain calculations range,
Frequency domain entropy PENTR is converted by the corresponding entropy in region under each diagonal line on frequency domain, calculation formula is as follows:
The step 5 specifically:
(5.1) rp (x, y) is enabled to illustrate that size is the RP recurrence plot matrix of N × N pixel, wherein x=0,1,2 ..., N- 1, y=0,1,2 ..., N-1;
(5.2) Fast Fourier Transform (FFT):
The two dimensional discrete Fourier transform for the rp (x, y) for enabling RP (u, v) to indicate, calculation formula are as follows:
In formula, u=0,1,2 ..., NL- 1, v=0,1,2 ..., NL-1;
(5.3) specific template filter process:
For the recurrence plot RP (u, v) after Fast Fourier Transform (FFT), the threshold value of binaryzation is determined using adaptive method, to figure As binaryzation;
To first time filtering is carried out after image binaryzation, chooses the first specific structure member and carry out out operation;
Then carry out second filtering and carry out blurred picture, remove some details, finally choose the second specific structure member into Row closed operation removes unconcerned non-dense set ingredient, and the image array after obtaining final process is as a result, and be put into classifier progress Feature differentiation.
The step 6 specifically:
(6.1) distance is calculated
According to given test object, it is calculated at a distance from each object in training set, usually calculating Euclidean distance d1(x, y) and manhatton distance d2(x, y),
(6.2) neighbouring object is looked for, is ranked up according to the incremental relationship of distance.It draws a circle to approve apart from nearest k trained object, Neighbour as test object;
(6.3) frequency of occurrences of classification where K point before determining;
(6.4) the highest classification of the frequency of occurrences is classified as the prediction of test data in K point before returning.
In conclusion by adopting the above-described technical solution, the beneficial effects of the present invention are:
1, the recognition methods of the cleft palate speech rhinorrhea gas provided by the invention based on recurrence map analysis, can be based on acquisition Cleft palate speech rhinorrhea gas voice signal is automatically detected, and is extracted signal characteristic using the method for Speech processing, is compared In in the prior art, detection method testing result of the invention is objective and accurate, realizes the automatic measurement of higher degree;
2, the recognition methods of the cleft palate speech rhinorrhea gas provided by the invention based on recurrence map analysis proposes and is based on passing through Recurrence plot extracts corresponding recurrence trend analysis feature, which can more preferably reflect voice signal changing due to sound generating mechanism Subtleer data difference caused by change, and the difference is transformed into frequency domain and is further processed, utilize specific template Filtering obtains corresponding image array feature.The extraction of this feature is based on human auditory system, in prominent rhinorrhea gas voice and just Arithmetic speed is improved while Chang Yuyin nuance.
3, the recognition methods of the cleft palate speech rhinorrhea gas provided by the invention based on recurrence map analysis, based in recurrence plot While extracting recursion matrix feature in frequency domain, choose that suitable specific template is further to be located also according to the characteristics of signal Reason, the feature that obtains that treated can preferably embody the nuance between voice.
4, the recognition methods of the cleft palate speech rhinorrhea gas provided by the invention based on recurrence map analysis, can more preferably meet language Sound signal characteristic, because recurrence map analysis is the analysis method based on nonlinear kinetics, and voice signal is nonlinear properties, Recurrence plot can have good characterization to it.And in the performance of nonlinear kinetics, voice and air-borne sound have notable difference, so It can the feature difference of Efficient Characterization between the two.
Detailed description of the invention
Fig. 1 is cleft palate rhinorrhea gas automatic testing method block diagram provided in an embodiment of the present invention.
Fig. 2 is normal voice and the signal recurrence plot for having rhinorrhea gas voice.
Fig. 3 is the frequency domain entropy PENTR schematic diagram after conversion.
Fig. 4 is the interception figure to the frequency domain entropy PENTR schematic diagram various pieces after conversion.
Fig. 5 is to normal voice and the observation schematic for having rhinorrhea gas voice signal recurrence plot.
Fig. 6 is the matrix in block form schematic diagram of recurrence plot.
Fig. 7 is the process flow diagram that image analysis processing is carried out to recurrence plot.
Fig. 8 is normal voice and the transformed schematic diagram of recurrence plot for having rhinorrhea gas voice.
Fig. 9 is to do the flow chart being filtered to transformed image.
Figure 10 is normal voice after binaryzation and has rhinorrhea gas voice schematic diagram.
Figure 11 is first time filtered normal voice and has rhinorrhea gas voice schematic diagram.
Figure 12 is second of filtered normal voice and has rhinorrhea gas voice schematic diagram.
Specific embodiment
It is right below with reference to attached drawing of the invention in order to make those skilled in the art more fully understand technical solution of the present invention Technical solution of the present invention carries out clear, complete description, and based on the embodiment in the application, those of ordinary skill in the art exist Other similar embodiments obtained under the premise of creative work are not made, shall fall within the protection scope of the present application.
It should be noted that the voice data used in the present embodiment is recorded according to mandarin structure sound measurement table It obtaining, voice data early period is sentenced and is listened by the voice Shi Jinhang of profession, by manually to rhinorrhea destiny according to being marked in detail, according to Rhinorrhea gas pronunciation characteristic is chosen containing consonant/c//ch//d//f//k//g//j//s//sh//t//x//p//q//z//zh/'s etc. Printed words sheet.
Explanation is needed further exist for, signal characteristic processing method, the feature that the present invention and corresponding embodiment propose mention Take method, signal identification and classification method that all only the processing and recognition methods itself of signal are studied and improved, though So being directed to is rhinorrhea gas signal, and the automatic classification recognition result of realization can be used as assessment reference, but in clinical or medical treatment neck Its assessment result of domain also be only a complementary assessment, for specific treatment method there is still a need for and depend on doctor Clinical experience and doctor provide treatment method.
As shown in Figure 1, the recognition methods of the cleft palate speech rhinorrhea gas provided in this embodiment based on recurrence map analysis is as follows:
(1) speech signal pre-processing carries out down-sampled and amplitude to the voice signal of input and normalizes:
(1.1) voice signal is down-sampled, and 15000 or 8000 proportional logarithmic evidence is downsampled to according to sample frequency 44100 It is handled;
(1.2) it normalizes: when speech signal collection, since equipment difference can have signal acquisition condition disunity, It is not of uniform size also for wave volume is avoided, the problems such as noise, the amplitude of maximum absolute value is sought to each voice data Then each of voice signal is worth respectively divided by the maximum value, is normalized, corresponding calculation formula by max are as follows:
S=xx/max | xx | (1)
Wherein xx is the voice signal of input, and s is the sequence obtained after handling.
(2) recurrence that a kind of unstable state track is formed is showed chaos system by the recurrence plot matrix for seeking voice signal As its essential characteristic.Since volume is limited chaos attractor in its phase space, the non-of attractor is constituted Bifurcated is separate again for continuous close approximation in limited attraction subspace for stable state track, and recurrence plots are exactly to use system dynamics Principle, the time series in system is extracted, reappear recursive signal, select suitable Embedded dimensions m and delay time T It can be by one-dimensional Nonlinear Time Series:
S (i), i=1,2 ... } (2)
Reconstruct outgoing vector:
Si=[s (i), s (i+ τ) ..., s (i+ (m-1) τ)] (3),
The m dimension phase space orbit of system is made of the sequence vector that having time marks:
{Si, i=1,2 ..., N } and (4)
Then the point in these phase spaces is used to constitute the matrix recurrence plot of N × N as row and column.Each node in figure RI, jIt is indicated by the distance between corresponding row, column vector point:
RI, j=θ (ε-| | Si-Sj| |), i, j=1,2 ..., N (5)
ε is the threshold constant set according to previous occasion in above formula, for indicating critical distance.Symbol | | * | | expression takes Euclidean (Euclidean) norm of vector.θ (x) is kernel function.If RI, jValue be 1, then correspond to location point in recurrence plot (i, j) is indicated as a stain, similarly, if RI, jValue be 0, then correspond to location point (i, j) in recurrence plot and be indicated as one White point.So recurrence plot indicates the track relation by a m dimension phase space and is mapped on an X-Y scheme.Indicate last with RP Obtained two-dimensional matrix, that is, recursion matrix.Because voice signal is one-dimensional signal, then m=1.In figure as shown in Figure 2 respectively For normal voice/sha/ and there is rhinorrhea gas voice/sha/ recurrence plot.
(3) trend analysis is done for recurrence plot, since recurrence plot reflects dynamic system trajectory of phase space operation relationship Two-dimensional map figure, in order to quantify the system recurses phenomenon showed in recurrence plot, using recurrence quantification analysis 5 kinds of (recurrence quantification analysis, ROA) are directed to the quantization parameter of recurrence map analysis, including true Qualitative DET, recurrence degree RRL, diagonal line feature LLF, frequency domain entropy ENTR and trend analysis RT.Different ROA parameters, which describes, is It unites different dynamic behaviors.For example, because of definite description kinetic locus period recursive degree, then it is determined that property is just Opposite with randomness, value indicates that more greatly certainty is strong, then indicates that randomness is strong on the contrary.
Because recurrence plot is symmetrical about leading diagonal, then lower right-most portion is equidistantly parallel to leading diagonal, uses RRLExpression is passed Return recurrence degree in the region figure L, calculation formula is as follows:
Wherein, NL=N × N;
Since the immanent structure of voice segments and non-speech segment in voice signal has been reacted in the point distribution in recurrence plot.Because of language The dynamics of sound signal has differences, and point of the voice segments in recurrence plot is intensively smaller than non-speech segment than regular meeting.Therefore by recurrence Spend RRLAs main differentiation parameter.Rhinorrhea gas is shown as on subjective Auditory Perception, air-flow sound instead of normal sounding, Therefore rhinorrhea gas voice can there are one section of airflow noises in the consonant stage, in this, as the differentiation with voice segments.
And trend analysis RT is characterized with recurrence degree rate of change, calculation formula is as follows:
Wherein,For { RRL, L=1,2 ..., K } average value, therefore RT also illustrates that { RRL, L=1,2 ..., K } and sequence The linear regression slope about { L=1,2 ..., K } is arranged, slope is smaller to mean that the system is more stable, and slope more big also generation The table mutation of the instability or kinetic characteristics of system.
The calculation formula of certainty DET is as follows:
Wherein p (l) indicates that the probability density that diagonal line is distributed in signal recurrence plot, p (l) calculation formula are as follows:
Wherein lminIt is the statistics initial value of catercorner length, meets 2≤lmin≤NL-1.P indicates probability calculation.Due to this reality It is different to apply the analyzed signal length of example, as follows to longest diagonal L L calculation formula after normalization:
Wherein lmaxIt is corresponding lminValue, due to lminIt is the statistics initial value of catercorner length, therefore lmaxIt is statistical length In cornerwise maximum value, and LL be exactly diagonal line maximum value is normalized.
The reason of the present embodiment improvement p (l) calculation method, is that the algorithm generally used is set according to experimental signal characteristic Meter, thus here will according to the characteristics of voice signal come to its algorithm improvement, and voice signal recurrence plot present it is certain symmetrical Property, so being the calculating that main segmentation carries out probability density with diagonal line.
On the other hand, due to being directed to the trend analysis of recurrence plot at present only in the time domain, the present embodiment passes through research hair Recursive analysis on present frequency domain also has certain effect, and it is as follows to do further concrete analysis.
The probability density that diagonal line is distributed in recurrence plot, correspondence can be converted into diagonal line feature LLF on frequency domain Function, calculation formula are as follows:
Wherein, β is modifying factor, prevents from selecting according to the actual situation beyond frequency-domain calculations range,
Entropy ENTR is the complexity for describing signal recurrence plot, and the more high then entropy of complexity is bigger, it is assumed that passs Returning all wire lengths in figure is all the same value, then entropy is 0, according to voice entropy calculation formula:
It maps that on frequency domain again, converts frequency domain entropy for the corresponding entropy in region under each diagonal line on frequency domain PENTR, calculation formula are as follows:
It is illustrated in figure 3 the overall diagram of frequency domain entropy PENTR, subtleer data point is characterized as in Fig. 3, further The various pieces amplification screenshot of Fig. 3 is obtained as shown in figure 4, the point that therefore frequency domain entropy PENTR is characterized under frequency domain is intensive Degree.Finally by improved certainty DET, recurrence degree RRL, diagonal line feature LLF, frequency domain entropy ENTR and trend analysis RT five Signal characteristic of a parameter as rhinorrhea gas voice.
(4) area dividing processing is directly done for recurrence plot, by whether there is or not the observation of rhinorrhea gas voice recurrence plot, discoveries Normal voice recurrence plot is broadly divided into four regions, and has good stability and entropy is smaller, as shown in Figure 5 in the left.There is rhinorrhea Gas voice recurrence plot is broadly divided into eight regions, and entropy is big compared with normal voice, and in four angular zones of recurrence plot, there are some intensive Point distribution, as shown in the right figure of Fig. 5.
Matrix disposal is done to signal recurrence plot, seeks ranks value, chooses suitable blocking factor.According to statistical property come really Determine blocking factor, after recurrence plot is divided into corresponding region, extracts the Minimum Area part matrix at four angles of recurrence plot.Piecemeal system The calculating of number m is by by m*m part of entire recurrence plot, according to all points summations of matrix, it is assumed that it is uniformly distributed, by often 1000, a region point divides, calculation formula are as follows:
Therefore indicate that the relationship of the matrix of areas after piecemeal and former recursion matrix is as follows with formula:
Wherein RP is recursion matrix.
Extract the recursion matrix RP at four angles of recurrence plot (upper left, lower-left, upper right, bottom right)a, minor matrix after each piecemeal Label indicated with a, as shown in Figure 6.At this point, four angles are the matrix of Minimum Area part as characteristic parameter.There is rhinorrhea gas The Minimum Area part matrix of voice is more densely distributed than normal voice point, can be used as effective signal characteristic to judge in voice Rhinorrhea gas whether there is.
It is processed again for each Minimum Area part, zoning mean-square value is simulated when one-dimensional signal is transformed into The weight in densely distributed region, calculation formula are as follows on two dimensional image:
Matrix dot R hereinI, j∈RPa, NaIndicate the points summation of the Minimum Area matrix.
By the calculating of above formula, the corresponding weight E of our available four Minimum Area part matrixsaValue, is made For the characteristic parameter of rhinorrhea gas voice.
(5) to recurrence plot carry out image analysis processing, process flow as shown in fig. 7,
(5.1) RP recurrence plot matrix is defaulted into jpg format when reading in matlab, then this season rp with the storage of unit8 format (x, y) illustrates that a secondary size is the digital picture of N × N pixel, wherein x=0,1,2 ..., N-1, y=0,1,2 ..., N- 1。
(5.2) Fast Fourier Transform (FFT):
The two dimensional discrete Fourier transform for the rp (x, y) for enabling RP (u, v) to indicate, calculation formula are as follows:
In formula, u=0,1,2 ..., NL- 1, v=0,1,2 ..., NL-1.Use the determining available x of frequency variable u and v And y, exponential term can also be expanded into SIN function and cosine function.
Transformed image has rhinorrhea gas voice middle section point more dense as shown in figure 8, making discovery from observation, and deposits In cross blank area, rhinorrhea gas voice is demonstrated by aerodynamics there are the difference of voice signal and non-speech audio, Namely nasal cavity air flows sound instead of pronunciation.Therefore it is demonstrated from frequency domain that Temporal Recursive is analyzed as a result, and frequency It more can significant reaction difference place on domain.
(5.3) specific template filter process:
As shown in figure 9, the recurrence plot RP (u, v) for finishing Fast Fourier Transform (FFT) is formatted, picture is saved as Format is read in matlab, and since recurrence plot point is distributed some local comparatively denses, some places are sparse, therefore use adaptive method The threshold value for determining binaryzation, to image binaryzation.The results are shown in Figure 10.
To first time filtering is carried out after image binaryzation, this is filtered into median filtering for the first time, choose specific structure member into Row opens operation,
The dot characteristics of comprehensive recurrence plot, choosing B is that structural elements can synthesize one piece of area effectively by point off density distribution or accumulation Domain, it is easier to distinguishing characteristic, as a result as shown in figure 11.
Then it carries out filtering for second, this is filtered into gaussian filtering for the second time, and blurred picture removes some details, finally It chooses specific structure first (disc structure of radius 15) and carries out closed operation, remove unconcerned non-dense set ingredient, most terminated Fruit is as shown in figure 12.
It obviously finds there is rhinorrhea gas voice to entreat part in the picture by processing result, there is the point different from normal voice Dense distribution region more protrudes obvious after specific filtering transformation.
(6) the characteristic parameter matrix of rhinorrhea gas voice signal is constructed:
A line parameter of step (3) (4) is spliced, i.e., five recurrence trend analyses after calculating step (3) are joined The E of four Minimum Area matrixes in several and step (4)aValue is spliced into the parameter matrix of uniline.For the place in step (5) Reason, converts thereof into matrix format at jpg format-pattern for the matrix conversion of unit format before treatment again after treatment, selects Take unified interception standard, i.e., after analyzing all voice data, the minimum value wherein counted is taken just to be used as interception standard. It is assumed to be M, then the matrix of our an available M*M, for the matrix the first row data M, first 9 are joined for trend analysis Several and Ea, 0 polishing of remaining M-9 value finally obtains the eigenmatrix of M+1 row × M column, puts it into classifier and identified.
(7) Classification and Identification is carried out to signal using classifier
KNN classifier is to be also known as k nearest neighbor classification (k-nearest neighbor according to KNN algorithm Classification) algorithm is realized.KNN is classified by the distance between measurement different characteristic value.If one A sample most of in k most like samples (i.e. closest in feature space) in feature space belong to some class Not, then the sample also belongs to this classification.K is usually the integer for being not more than 20.In KNN algorithm, it is selected it is neighbouring be all The object correctly classified.This method on determining class decision only according to the classification of one or several closest samples come determine to Divide classification belonging to sample.The specific implementation steps are as follows:
(7.1) distance is calculated
According to given test object, it is calculated at a distance from each object in training set, usually calculating Euclidean distance d1(x, y) and manhatton distance d2(x, y).
(7.2) it looks for neighbouring object: being ranked up according to the incremental relationship of distance.It draws a circle to approve apart from nearest k trained object, Neighbour as test object.
(7.3) frequency of occurrences of classification where K point before determining;
(7.4) the highest classification of the frequency of occurrences is classified as the prediction of test data in K point before returning.
(8) by the automatic classification and identification of the rhinorrhea gas voice of the achievable the present embodiment of the above processing step, further It is as follows that verifying this method obtains corresponding experimental result:
Corresponding according to variable uniform principles, and above, sampling number N is embedded in dimension m, delay time T, critical distance ε。
Wherein 1 class is rhinorrhea gas voice, and 2 classes are no rhinorrhea gas voice.
Cleft palate speech rhinorrhea gas automatic identification accuracy reaches as high as 84.63%.Respectively from down-sampled point, delay time, Influence of three factors of critical distance to recognition correct rate is analyzed, since three above factor is to cleft palate speech signal recurrence plot There is decisive role, down-sampled point reflects the resolution ratio of recurrence plot, and delay time and critical distance reflect inside recurrence plot Rule, and the cleft palate speech signal recurrence plot for meeting certain special exercise rule is generated according to the two parameters.Recurrence plot is this Literary entire algorithm analysis it is basic, by the discussion to three principal elements, gradually determine best value apply it is next because In element.
(1) influence of down-sampled point
If down-sampled point N (a), delay time T (ms), critical distance ε (unit).Be arranged delay time T=3, it is critical away from From ε=5.
The down-sampled influence that cleft palate speech rhinorrhea gas is identified of counting of table 1
(2) influence of delay time
Be set respectively in N=8000 and two kinds of N=15000, critical distance ε=5 delay time.
The influence that 2 delay time of table identifies cleft palate speech rhinorrhea gas
(3) influence of critical distance
It is respectively two kinds of situations of N=8000 and N=15000, delay time T=3 that delay time, which is arranged,.
The influence that 3 critical distance of table identifies cleft palate speech rhinorrhea gas
(4) influence of voice unit
By the discussion to above three factor, the corresponding recurrence plot matrix of a speech samples is only done, has generated five The characteristic parameter of a Recursive parameter, four Minimum Area matrixes.Since a recurrence plot matrix characterization information is limited, it assumes that language Sound unit can also have an impact experimental result.Framing is carried out to voice signal, uses Hamming window herein, frame length is 200ms, it is 40ms that frame, which moves,.Choosing as the frame number of characteristic parameter is fn.Down-sampled points N=30000 are set, because of framing meeting Data calculation amount is reduced, therefore as shown in Table 1, data points are more, and rhinorrhea gas recognition correct rate is higher.According to table 2, table 3, choose Delay time T=1,3, critical distance ε=5,7, it is tested respectively, obtains following result:
The influence that 4 voice unit frame number of table identifies cleft palate speech rhinorrhea gas

Claims (7)

1. a kind of recognition methods of the cleft palate speech rhinorrhea gas based on recurrence map analysis, which comprises the following steps:
(1) speech signal pre-processing carries out down-sampled, normalization framing, preemphasis and amplitude normalizing to the voice signal of input Change;
(2) the recurrence plot matrix that voice signal is sought to pretreated voice signal, the time series in system is extracted Come, reappear recursive signal, select suitable Embedded dimensions m and delay time T by one-dimensional Nonlinear Time Series:
S (i), i=1,2 ... } (2)
Reconstruct outgoing vector:
Si=[s (i), s (i+ τ) ..., s (i+ (m-1) τ)] (3),
The m dimension phase space orbit of system is made of the sequence vector that having time marks:
{Si, i=1,2 ..., N } and (4)
Then the point in these phase spaces is used to constitute the matrix recurrence plot of N × N as row and column;
Each node R in figureI, jIt is indicated by the distance between corresponding row, column vector point:
RI, j=θ (ε-| | Si-Sj| |), i, j=1,2 ..., N (5),
Wherein ε is the threshold constant set according to previous occasion, for indicating critical distance;Symbol | | * | | indicate amount of orientation Euclidean (Euclidean) norm;θ (x) is kernel function;If RI, jValue be 1, then correspond in recurrence plot location point (i, j) just It is expressed as a stain, if RI, jValue be 0, then correspond to location point (i, j) in recurrence plot and be indicated as a white point, obtain two Tie up recursion matrix RP;
(3) trend analysis is done for recurrence plot, is directed to the quantization parameter of recurrence map analysis, packet using 5 kinds of recurrence quantification analysis Include certainty DET, recurrence degree RRL, longest diagonal L L, entropy ENTR and trend analysis RT, diagonal line is distributed in recurrence plot Probability density, it is corresponding to be converted into diagonal line feature LLF function on frequency domain, and by region pair under each diagonal line on frequency domain The entropy answered is converted into frequency domain entropy PENTR;
(4) area dividing processing is directly done for recurrence plot, matrix disposal is done to signal recurrence plot, seek ranks value, chosen and close Suitable blocking factor determines blocking factor according to statistical property, after recurrence plot is divided into corresponding region, extracts recurrence plot four The Minimum Area part matrix at a angle is as follows:
Wherein RP is recursion matrix, and m is required blocking factor, and the calculating of blocking factor m entirely will be divided into m*m by recurrence plot Part is divided, calculation formula according to all points summations of matrix by 1000, each region point are as follows:
Extract the recursion matrix RP at four angles of recurrence plot (upper left, lower-left, upper right, bottom right)a, the mark of minor matrix after each piecemeal It number is indicated with a;Four angles are the matrix of Minimum Area part as characteristic parameter at this time, then are directed to each Minimum Area part It processes, zoning mean-square value, simulation is transformed into the weight in densely distributed region on two dimensional image when one-dimensional signal, calculates public Formula is as follows:
Matrix dot R hereinI, j∈RPa, NaIndicate the points summation of the Minimum Area matrix;
(5) image procossing is carried out to recurrence plot, the image array after format is converted successively carries out binary conversion treatment and twice spy Determine to obtain after the filtering processing of stay in place form treated matrix signal feature for Classification and Identification;
(6) Classification and Identification is carried out to signal using classifier, obtains automatic identification result.
2. the recognition methods of the cleft palate speech rhinorrhea gas based on recurrence map analysis as described in claim 1, which is characterized in that institute State step 3 specifically includes the following steps:
(1.1) voice signal is down-sampled, is downsampled to 15000 or 8000 proportional logarithmic according to progress according to sample frequency 44100 Processing;
(1.2) it normalizes: seeking the amplitude max of maximum absolute value to each voice data, it then will be every in voice signal One value is normalized, corresponding calculation formula respectively divided by the maximum value are as follows:
S=xx/max | xx | (1)
Wherein xx is the voice signal of input, and s is treated sequence.
3. the recognition methods of the cleft palate speech rhinorrhea gas based on recurrence map analysis as described in claim 1, which is characterized in that institute State the quantization parameter in step 3 specifically:
Since recurrence plot is symmetrical about leading diagonal, then lower right-most portion is equidistantly parallel to leading diagonal, uses RRLIndicate recurrence plot Recurrence degree in the region L, calculation formula are as follows:
And trend analysis RT is characterized with recurrence degree rate of change, calculation formula is as follows:
Wherein,For { RRL, L=1,2 ..., K } average value, therefore RT is also { RRL, L=1,2 ..., K } sequence about The linear regression slope of { L=1,2 ..., K }, slope is smaller to mean that the system is more stable, and slope also represents more greatly and is The mutation of the instability or kinetic characteristics of system;
The calculation formula of certainty DET is as follows:
Wherein p (l) indicates that the probability density that diagonal line is distributed in signal recurrence plot, p (l) calculation formula are as follows:
lminIt is the statistics initial value of catercorner length, meets 2≤lmin≤NL- 1, P indicate probability calculation, since signal length is different, It is as follows to longest diagonal L L calculation formula after normalization:
Wherein lmaxIt is corresponding lminValue, due to lminIt is the statistics initial value of catercorner length, therefore lmaxIt is right in statistical length The maximum value of linea angulata, and LL is exactly that diagonal line maximum value is normalized;
Entropy ENTR is the complexity for describing signal recurrence plot, and the more high then entropy of complexity is bigger, it is assumed that recurrence plot In all wire lengths be all the same value, then entropy be 0, according to voice entropy calculation formula:
4. the recognition methods of the cleft palate speech rhinorrhea gas based on recurrence map analysis as described in claim 1, which is characterized in that institute The probability density for being distributed diagonal line in recurrence plot in step 3 is stated, correspondence is converted into diagonal line feature LLF letter on frequency domain Number, and frequency domain entropy PENTR is converted by the corresponding entropy in region under each diagonal line on frequency domain specifically:
The probability density that diagonal line is distributed in recurrence plot, correspondence are converted into diagonal line feature LLF function on frequency domain, calculate Formula is as follows:
Wherein, β is modifying factor, prevents from selecting according to the actual situation beyond frequency-domain calculations range,
Frequency domain entropy PENTR is converted by the corresponding entropy in region under each diagonal line on frequency domain, calculation formula is as follows:
5. the recognition methods of the cleft palate speech rhinorrhea gas based on recurrence plot trend analysis, feature exist as described in claim 1 In the step 5 specifically:
(5.1) rp (x, y) is enabled to illustrate that size is the RP recurrence plot matrix of N × N pixel, wherein x=0,1,2 ..., N-1, y= 0,1,2 ..., N-1;
(5.2) Fast Fourier Transform (FFT):
The two dimensional discrete Fourier transform for the rp (x, y) for enabling RP (u, v) to indicate, calculation formula are as follows:
In formula, u=0,1,2 ..., NL- 1, v=0,1,2 ..., NL-1;
(5.3) specific template filter process:
For the recurrence plot RP (u, v) after Fast Fourier Transform (FFT), the threshold value of binaryzation is determined using adaptive method, to image two Value;
To first time filtering is carried out after image binaryzation, chooses the first specific structure member and carry out out operation;
Then it carries out second of filtering and carries out blurred picture, remove some details, finally choose the second specific structure member and closed Operation, removes unconcerned non-dense set ingredient, and the image array after obtaining final process is as a result, and be put into classifier progress feature It distinguishes.
6. the recognition methods of the cleft palate speech rhinorrhea gas based on recurrence map analysis as claimed in claim 5, which is characterized in that institute State the first specific structure member
7. the recognition methods of the cleft palate speech rhinorrhea gas based on recurrence map analysis as described in claim 1, which is characterized in that institute State step 6 specifically:
(6.1) distance is calculated
According to given test object, it is calculated at a distance from each object in training set, usually calculates Euclidean distance d1(x, And manhatton distance d y)2(x, y),
(6.2) neighbouring object is looked for, is ranked up according to the incremental relationship of distance.Delineation apart from nearest k trained object, as The neighbour of test object;
(6.3) frequency of occurrences of classification where K point before determining;
(6.4) the highest classification of the frequency of occurrences is classified as the prediction of test data in K point before returning.
CN201811176054.8A 2018-10-10 2018-10-10 The recognition methods of cleft palate speech rhinorrhea gas based on recurrence map analysis Pending CN109215633A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811176054.8A CN109215633A (en) 2018-10-10 2018-10-10 The recognition methods of cleft palate speech rhinorrhea gas based on recurrence map analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811176054.8A CN109215633A (en) 2018-10-10 2018-10-10 The recognition methods of cleft palate speech rhinorrhea gas based on recurrence map analysis

Publications (1)

Publication Number Publication Date
CN109215633A true CN109215633A (en) 2019-01-15

Family

ID=64983054

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811176054.8A Pending CN109215633A (en) 2018-10-10 2018-10-10 The recognition methods of cleft palate speech rhinorrhea gas based on recurrence map analysis

Country Status (1)

Country Link
CN (1) CN109215633A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114358091A (en) * 2022-03-03 2022-04-15 中山大学 Pile damage identification method, equipment and medium based on convolutional neural network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114358091A (en) * 2022-03-03 2022-04-15 中山大学 Pile damage identification method, equipment and medium based on convolutional neural network
CN114358091B (en) * 2022-03-03 2022-06-10 中山大学 Pile damage identification method, equipment and medium based on convolutional neural network

Similar Documents

Publication Publication Date Title
CN105976809B (en) Identification method and system based on speech and facial expression bimodal emotion fusion
CN109856517B (en) Method for distinguishing partial discharge detection data of extra-high voltage equipment
CN110633725B (en) Method and device for training classification model and classification method and device
JP5897107B2 (en) Detection of speech syllable / vowel / phoneme boundaries using auditory attention cues
CN109599120B (en) Abnormal mammal sound monitoring method based on large-scale farm plant
CN110033756B (en) Language identification method and device, electronic equipment and storage medium
WO2016155047A1 (en) Method of recognizing sound event in auditory scene having low signal-to-noise ratio
CN111724770B (en) Audio keyword identification method for generating confrontation network based on deep convolution
US6205422B1 (en) Morphological pure speech detection using valley percentage
CN117095694B (en) Bird song recognition method based on tag hierarchical structure attribute relationship
CN110647656A (en) Audio retrieval method utilizing transform domain sparsification and compression dimension reduction
Fagerlund et al. New parametric representations of bird sounds for automatic classification
CN116861303A (en) Digital twin multisource information fusion diagnosis method for transformer substation
CN115510909A (en) Unsupervised algorithm for DBSCAN to perform abnormal sound features
CN116842460A (en) Cough-related disease identification method and system based on attention mechanism and residual neural network
CN114694640A (en) Abnormal sound extraction and identification method and device based on audio frequency spectrogram
Murugaiya et al. Probability enhanced entropy (PEE) novel feature for improved bird sound classification
Xiao et al. AMResNet: An automatic recognition model of bird sounds in real environment
CN109215633A (en) The recognition methods of cleft palate speech rhinorrhea gas based on recurrence map analysis
Rahman et al. Dynamic thresholding on speech segmentation
CN110675858A (en) Terminal control method and device based on emotion recognition
CN110443276A (en) Time series classification method based on depth convolutional network Yu the map analysis of gray scale recurrence
JP4219539B2 (en) Acoustic classification device
CN115238738A (en) Method and device for constructing underwater acoustic target recognition model
CN113571050A (en) Voice depression state identification method based on Attention and Bi-LSTM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190115