CN117198340A - Dysarthria correction effect analysis method based on optimized acoustic parameters - Google Patents

Dysarthria correction effect analysis method based on optimized acoustic parameters Download PDF

Info

Publication number
CN117198340A
CN117198340A CN202311219168.7A CN202311219168A CN117198340A CN 117198340 A CN117198340 A CN 117198340A CN 202311219168 A CN202311219168 A CN 202311219168A CN 117198340 A CN117198340 A CN 117198340A
Authority
CN
China
Prior art keywords
dysarthria
acoustic
parameter
real
correction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311219168.7A
Other languages
Chinese (zh)
Other versions
CN117198340B (en
Inventor
何燕姬
陈国新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Youdao Speech Rehabilitation Research Institute
Original Assignee
Nanjing Youdao Speech Rehabilitation Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Youdao Speech Rehabilitation Research Institute filed Critical Nanjing Youdao Speech Rehabilitation Research Institute
Priority to CN202311219168.7A priority Critical patent/CN117198340B/en
Publication of CN117198340A publication Critical patent/CN117198340A/en
Application granted granted Critical
Publication of CN117198340B publication Critical patent/CN117198340B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

The invention belongs to the technical field of voice recognition, and discloses a dysarthria correction effect analysis method based on optimized acoustic parameters, which comprises the following steps: establishing correct pronunciation for the dysarthria patient, and marking the established correct pronunciation as voice fingerprint data; performing recognition analysis on voice data and voice fingerprint data of the daily correction training of the dysarthria patient to generate a voice mark, wherein the voice mark comprises a standard pronunciation mark and an abnormal pronunciation mark; extracting acoustic characteristic parameters, emotion influence parameters and lip image characteristics of a patient in the time corresponding to the abnormal pronunciation marks; combining acoustic characteristic parameters, emotion influence parameters and lip image characteristics to form a dysarthria correction real-time characteristic vector, introducing a convolutional neural network model, and integrating through a maximized normalization formula to generate an abnormal pronunciation evaluation result of a dysarthria patient; and gradually recovering the standard pronunciation according to the abnormal pronunciation evaluation result.

Description

Dysarthria correction effect analysis method based on optimized acoustic parameters
Technical Field
The invention belongs to the technical field of voice recognition, and particularly relates to a dysarthria correction effect analysis method based on optimized acoustic parameters.
Background
Dysarthria refers to a speech disorder caused by disharmony of the teeth, and inaccurate pronunciation due to physiological, neural or psychological causes. People who generally need to correct dysarthria at present include children, teenagers, adults, patients with dysplasia and artistic practitioners, and the like, and the people have certain language capability, and can assist the people to correct pronunciation and improve social expression capability through professional guiding personnel or instruments; common acoustic characterization parameters include fundamental frequency, formant frequency, and formant bandwidth. These parameters may reflect pitch, harmonic structure, formant information in the speech signal. In correcting dysarthria, the linguistic ability of the patient can be improved by adjusting these acoustic parameters.
In the prior art, the Chinese patent with the application publication number of CN114093206A discloses an intelligent children language disorder correction and treatment robot, which relates to a correction and treatment scheme for language disorder people, but still has the following problems: the acquisition of the training content displayed by the device is difficult to realize, the tongue bitmap is particularly acquired, and the 3D display tongue position is realized, so that the acquisition process of the method has a certain difficulty, because if the detection equipment is placed in the oral cavity, the training effect of a patient can be influenced, if the lip can shield the tongue position through the camera equipment, and a clear image can not be acquired; if the tongue position is indirectly set up through other equipment, a positioning algorithm is additionally required to be provided, so that the difficulty is increased for the whole training equipment;
In addition, the existing dysarthria correction equipment is used for mechanically evaluating and comparing acoustic characteristic parameters in terms of language, and does not consider the context factors and psychological factors of patients, so that language disorder occurs under the condition of tension or anxiety, or the patients have conflict emotion in the treatment process due to psychological problems such as spelt, shame and the like, so that the treatment is difficult to adhere to, and the correction effect cannot be achieved, therefore, the correction effect needs to be analyzed, and the factors affecting the correction effect are fundamentally solved.
In view of this, the present inventors have discovered a method of analyzing dysarthria correction effects based on preferred acoustic parameters.
Disclosure of Invention
The application aims to solve the technical problems and provides a dysarthria correcting effect analysis method based on optimized acoustic parameters.
According to one aspect of the present application, there is provided a method for analyzing dysarthria correction effects based on preferred acoustic parameters, comprising the steps of:
establishing correct pronunciation for the dysarthria patient, marking the established correct pronunciation as voice fingerprint data, and importing the voice fingerprint data into a comparison database;
performing recognition analysis on voice data and voice fingerprint data of the daily correction training of the dysarthria patient to generate a voice mark, wherein the voice mark comprises a standard pronunciation mark and an abnormal pronunciation mark;
Extracting acoustic characteristic parameters sxtz, emotion influence parameters qxyx and lip image characteristics Hcxt of a patient in time t corresponding to the abnormal pronunciation marks;
the acoustic characteristic parameters sxtz, the emotion influence parameters qxyx and the lip image characteristics Hcxt are combined to form dysarthria correction real-time characteristic vectors, the dysdysarthria correction real-time characteristic vectors are introduced into a convolutional neural network model and are integrated through a maximized normalization formula, and abnormal pronunciation evaluation results of dysarthria patients are generated;
and gradually leading out standard pronunciation for dysarthria patients according to the abnormal pronunciation evaluation result.
Further, the voice data includes words, words and sentences, and tones and syllables corresponding to the words, words and sentences.
Further, the logic for generating the standard pronunciation tag and the abnormal pronunciation tag is as follows:
the voice data is divided into N content areas according to characters, n= {1,2, 3..the term "N., }; n is a positive integer, N being an nth content area of the N content areas;
and comparing and analyzing the voice data corresponding to the nth content area with the voice fingerprint data, marking the corresponding content area as a standard pronunciation mark if the corresponding voice data is matched with the voice fingerprint data respectively, marking the corresponding content area as an abnormal pronunciation mark if the corresponding voice data is not matched with the voice fingerprint data respectively, and recording the time t corresponding to the abnormal pronunciation mark.
Further, the comparison of the voice data and the voice fingerprint data comprises one or more modes of characters, words and sentences, and tones and syllables corresponding to the characters, words and sentences.
Further, the logic for generating the acoustic feature parameter sxtz is:
the voice data is imported into a voice analyzer, and a spectrogram, a waveform chart and a frequency response chart are generated through the voice analyzer;
obtaining a pitch acoustic parameter sg and a formant acoustic parameter sz of the voice digital signal according to the spectrogram analysis;
analyzing according to the waveform diagram to obtain the acoustic parameter sq and the acoustic parameter sc of the duration of the voice digital signal;
obtaining a phoneme acoustic parameter sw of the voice digital signal according to the frequency response curve graph;
generating the acoustic characteristic parameter sxtz according to the pitch acoustic parameter sg, the formant acoustic parameter sz, the intensity acoustic parameter sq, the duration acoustic parameter sc and the phoneme acoustic parameter sw, and calculating the acoustic characteristic parameter sxtz through a formula, wherein the specific formula is as follows:
wherein a is 1 、a 2 、a 3 、a 4 、a 5 Is weight, and 0 is less than or equal to a 1 ≤1,0≤a 2 ≤1,0≤a 3 ≤1,0≤a 4 ≤1,0≤a 5 Not more than 1, and a 1 +a 2 +a 3 +a 4 +a 5 =1。
Further, the logic for generating the mood influencing parameter qxyx is:
logic for generating the mood influencing parameter qxyx is:
Extracting the maximum amplitude Rmax of the heart beat, the minimum amplitude Rmin of the heart beat and the average amplitude Pmax of the heart beat signal, which are influenced by respiration, of the patient suffering from dysarthria within the time t corresponding to the abnormal pronunciation mark by a respiratory heart beat measuring instrument, and calculating the amplitude to generate the emotion influence parameter qxyx by a formula, wherein the calculation formula is as follows:
further, the analysis logic of the lip image feature Hcxt is:
extracting lip-shaped images within a time t corresponding to the abnormal pronunciation marks by using image acquisition equipment, wherein the lip-shaped images are equal in image size, and the image size is 100 multiplied by 100 pixels in an intermediate area taking the lips of a dysarthria patient as the center; and carrying out convolution processing on the lip image to determine a first feature matrix of the lip image, wherein the first feature matrix characterizes the lip image feature Hcxt.
Further, the analysis logic of the dysarthria correction real-time feature vector based on the convolutional neural network model is as follows:
the convolutional neural network model comprises a convolutional layer, a pooling layer, a full-connection layer and a classification layer, wherein the convolutional layer comprises a first convolutional layer, a second convolutional layer and a third convolutional layer; the pooling layer comprises a first pooling layer, a second pooling layer and a third pooling layer; the full connection layer comprises a first full connection layer and a second full connection layer;
The classification layer comprises the dysarthria correction real-time feature vector, and the dysdysdysarthria correction real-time feature vector comprises the acoustic feature parameter sxtz, the emotion influence parameter qxyx and the lip-shaped image feature Hcxt;
the convolution layer in the first convolution layer comprises 16 feature mappings, the convolution kernel size is 3 multiplied by 3, convolution processing is carried out on the dysarthria correction real-time feature vector, and first feature mapping information of the dysdysarthria correction real-time feature vector is obtained;
the size of the extraction area of the first pooling layer is 2 multiplied by 2, and the first characteristic mapping information is integrated through the first pooling layer by using a maximized normalization formula to generate a once-processed real-time characteristic vector;
the real-time feature vector processed at one time is processed through a first full-connection layer to obtain a first real-time key feature vector;
the second convolution layer contains 32 feature maps, the size of the convolution kernel is 3×3; convolving the first real-time key feature vector to obtain second feature mapping information of the first real-time key feature vector;
the size of the extraction area of the second pooling layer is 2 multiplied by 2, and the second characteristic mapping information is integrated by the second pooling layer through a maximized normalization formula to generate a real-time characteristic vector of secondary treatment;
The real-time feature vector subjected to secondary processing is processed through a second full-connection layer to obtain a second real-time key feature vector;
the third convolution layer contains 64 feature maps, the size of the convolution kernel is 3×3; convolving the second real-time key feature vector to obtain third feature mapping information of the second real-time key feature vector;
the size of the extraction area of the third pooling layer is 2 multiplied by 2, and the third characteristic mapping information is integrated by the third pooling layer through a maximized normalization formula to generate a real-time characteristic vector processed for three times;
the real-time feature vector processed for three times is dysarthria correction feature vector, and abnormal pronunciation evaluation results of dysarthria patients are generated according to the dysdysdysdysdysarthria correction feature vector.
Further, the application logic for generating the once processed real-time feature vector is:
marking the dysarthria correction real-time feature vector as GYXL, then GYXL= [ sxtz, qxyx, hcxt ]]The method comprises the steps of carrying out a first treatment on the surface of the Obtaining a once processed real-time feature vector GYXL after maximum normalization G The method comprises the steps of carrying out a first treatment on the surface of the The specific formula is as follows:
wherein C is 1 For correction of coefficients, the real-time feature vector GYXL is processed once G Correcting, wherein alpha and beta are weight coefficients, alpha is more than or equal to 0 and less than or equal to 1, beta is more than or equal to 0 and less than or equal to 1, and beta is more than or equal to 0 and less than or equal to 1 22 =1;GYXL max And GYXL min Maximum and minimum values of the real-time eigenvectors are corrected for the current convolution layer dysarthria.
Further, the analysis logic of the abnormal pronunciation evaluation result of the dysarthria patient is as follows:
the abnormal pronunciation assessment result includes: acoustic feature parameter disqualification, and in case of acoustic feature parameter disqualification, one or more modes of emotion influencing parameter disqualification and lip image feature disqualification are combined;
the specific abnormal pronunciation assessment results are shown as follows:
a. if the dysarthria correction feature vector is [1, 0], the acoustic feature parameter is unqualified;
b. if the dysarthria correction feature vector is [1,0,1], the acoustic feature parameter is unqualified and the lip-shaped image feature is unqualified;
c. if the dysarthria correction feature vector is [1, 0], the acoustic feature parameter is unqualified and the emotion influence parameter is unqualified;
d. and if the dysarthria correction feature vector is [1, 1], the acoustic feature parameter is unqualified, the lip image feature is unqualified and the emotion influence parameter is unqualified.
By adopting the technical scheme, the invention has the beneficial effects that:
according to the invention, voice fingerprint data are established for the dysarthria patient by professional guiding personnel, and the dysdysarthria problem of the dysarthria patient is output and acquired more accurately by comparing the voice data in the daily correction training of the dysarthria patient with the voice fingerprint data of the dysarthria patient; the current most objective evaluation standard of dysarthria is acoustic characteristic parameters, but the influence of the acoustic characteristic parameters is also a context factor and a psychological factor, the dysarthria of a patient is considered from multiple dimensions, the analysis result is more accurate, and meanwhile, the training effect can be more specifically trained and enhanced.
On the other hand, the invention can be directly used on the existing dysarthria correction equipment, improves the mode that the existing equipment only carries out mechanical evaluation and training aiming at acoustic characteristic parameters, comprehensively considers the context factors and psychological factors of patients, can provide an analysis report with reference value for professional guiding staff, and corrects dysarthria patients more scientifically and effectively.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following description will briefly explain the drawings needed in the embodiments, it being understood that the following drawings illustrate only some examples of the invention and are therefore not to be considered limiting of its scope, since it is possible for a person skilled in the art to obtain other related drawings from these drawings without inventive effort.
FIG. 1 is a flow chart of a method for analyzing dysarthria correction effect in an embodiment of the present invention;
fig. 2 is a schematic diagram of a system for analyzing dysarthria correction effect according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the technical solutions in the embodiments of the present invention will be clear and complete, and it is obvious that the described embodiments are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in fig. 1, the present embodiment provides a method for analyzing dysarthria correction effect based on preferred acoustic parameters, including the following steps:
establishing correct pronunciation for the dysarthria patient, marking the established correct pronunciation as voice fingerprint data, and importing the voice fingerprint data into a comparison database;
performing recognition analysis on voice data and voice fingerprint data of the daily correction training of the dysarthria patient to generate a voice mark, wherein the voice mark comprises a standard pronunciation mark and an abnormal pronunciation mark;
the voice data includes words, words and sentences, and tones and syllables corresponding to the words, words and sentences.
What needs to be explained here is: the current voice data is to set characters, words and sentences according to the dysarthria degree of dysarthria patients in a targeted manner, and set corresponding tones and syllables for the set characters, words and sentences in combination with different contexts.
The logic for generating the standard pronunciation marks and the abnormal pronunciation marks is as follows:
the voice data is divided into N content areas according to characters, n= {1,2, 3..the term "N., }; n is a positive integer, N being an nth content area of the N content areas;
And comparing and analyzing the voice data corresponding to the nth content area with the voice fingerprint data, marking the corresponding content area as a standard pronunciation mark if the corresponding voice data is matched with the voice fingerprint data respectively, marking the corresponding content area as an abnormal pronunciation mark if the corresponding voice data is not matched with the voice fingerprint data respectively, and recording the time t corresponding to the abnormal pronunciation mark.
The content of the comparison of the voice data and the voice fingerprint data comprises characters, words and sentences, and one or more modes of tone and syllable corresponding to the characters, words and sentences are combined for comparison.
What needs to be explained here is: the voice data is voice data of the dysarthria patient himself in training, the voice fingerprint data is voice fingerprint data of correct pronunciation established by the dysarthria patient himself under the guidance of professional guiding personnel, therefore, the voice data has one-to-one comparison function to a certain extent, and the voice data and the voice fingerprint data can be used for judging nonstandard places in characters, words and sentences, and tones and syllables corresponding to the characters, words and sentences of the dysdysarthria patient in the comparison process.
Extracting acoustic characteristic parameters sxtz, emotion influence parameters qxyx and lip image characteristics Hcxt of a patient in time t corresponding to the abnormal pronunciation marks;
the logic for generating the acoustic feature parameter sxtz is:
the voice data is imported into a voice analyzer, and a spectrogram, a waveform chart and a frequency response chart are generated through the voice analyzer;
obtaining a pitch acoustic parameter sg and a formant acoustic parameter sz of the voice digital signal according to the spectrogram analysis;
analyzing according to the waveform diagram to obtain the acoustic parameter sq and the acoustic parameter sc of the duration of the voice digital signal;
obtaining a phoneme acoustic parameter sw of the voice digital signal according to the frequency response curve graph;
generating the acoustic characteristic parameter sxtz according to the pitch acoustic parameter sg, the formant acoustic parameter sz, the intensity acoustic parameter sq, the duration acoustic parameter sc and the phoneme acoustic parameter sw, and calculating the acoustic characteristic parameter sxtz through a formula, wherein the specific formula is as follows:
wherein a is 1 、a 2 、a 3 、a 4 、a 5 Is weight, and 0 is less than or equal to a 1 ≤1,0≤a 2 ≤1,0≤a 3 ≤1,0≤a 4 ≤1,0≤a 5 Not more than 1, and a 1 +a 2 +a 3 +a 4 +a 5 =1。
What needs to be explained here is: the pitch acoustic parameter sg refers to a fundamental frequency component in a voice signal, and in a dysarthria patient, the fundamental frequency may be affected due to inaccurate pronunciation or unclear teeth, and the fundamental frequency may be represented as a condition of too high or too low pitch of a sound.
The formant acoustic parameter sz refers to the formant frequencies generated by the sound organs such as lips, tongue, and throat during the sonification process. In dysarthric patients, the formant acoustic parameter sz may be abnormal, resulting in a mispronounced or ill-defined mouth teeth.
The intensity acoustic parameter sq refers to the intensity of sound or the volume level in the speech signal. In dysarthria patients, the sound intensity may be affected due to inaccurate pronunciation or unclear teeth, and the sound intensity may be too light or too heavy.
The duration acoustic parameter sc refers to the sound production rate in the speech signal. In dysarthria patients, the sounding speed may be affected due to inaccurate sounding or unclear mouth and teeth, and the sounding speed may be too slow or too fast.
The phoneme acoustic parameter sw refers to the degree of discrimination of different phonemes in the speech signal. In dysarthria patients, poor pronunciation or poor mouth and teeth may lead to reduced phoneme discrimination, affecting communication and understanding.
These acoustic characteristic parameters may be measured and analyzed by means of acoustic analysis software or a voice analyzer. By analyzing the acoustic parameters, the pronunciation situation of the dysarthria patient can be known, personalized treatment schemes can be formulated, the treatment effect can be estimated, and the method is applied to the fields of voice recognition, voice synthesis and the like in scientific research; the higher the acoustic feature parameter sxtz, the greater the influence of the pitch acoustic parameter sg, the formant acoustic parameter sz, the intensity acoustic parameter sq, the duration acoustic parameter sc, and the phoneme acoustic parameter sw on the generation of the acoustic feature parameter sxtz, and vice versa.
Logic for generating the mood influencing parameter qxyx is:
extracting the maximum amplitude Rmax of the heart beat, the minimum amplitude Rmin of the heart beat and the average amplitude Pmax of the heart beat signal, which are influenced by respiration, of the patient suffering from dysarthria within the time t corresponding to the abnormal pronunciation mark by a respiratory heart beat measuring instrument, and calculating the amplitude to generate the emotion influence parameter qxyx by a formula, wherein the calculation formula is as follows:
what needs to be explained here is: in daily correction training, detecting the respiratory rate, heart rate, respiration and heartbeat of a dysarthria patient through a respiratory heartbeat measuring instrument; and separating the respiratory rate, the heart rate, the respiration and the heartbeat of the dysarthria patient from a variation part caused by the respiration in a heartbeat signal through signal processing, wherein Rmax is the maximum amplitude of the heartbeat influenced by the respiration, rmin is the minimum amplitude of the heartbeat influenced by the respiration, pmax is the average amplitude of the heartbeat signal, and the average amplitude of the heartbeat signal is the average value of all the amplitudes in the heartbeat signal, so that the emotion influence parameter qxyx is calculated and generated.
The mood influencing parameter qxyx refers to the relative amplitude of the respiration-induced heart beat variation, which can be used to analyze the interaction between respiration and heart rate; the higher the mood influencing parameter qxyx, the greater the influence of respiration on heart rate and vice versa.
The analysis logic of the lip image feature Hcxt is:
extracting lip-shaped images within a time t corresponding to the abnormal pronunciation marks by using image acquisition equipment, wherein the lip-shaped images are equal in image size, and the image size is 100 multiplied by 100 pixels in an intermediate area taking the lips of a dysarthria patient as the center; and carrying out convolution processing on the lip image to determine a first feature matrix of the lip image, wherein the first feature matrix characterizes the lip image feature Hcxt.
What needs to be explained here is: the lip-shaped image is preprocessed, the image areas with the same pixel number are extracted, the lip-shaped image is ensured to contain the lip areas, and the extracted lip-shaped image can be ensured to have the same area size; and then, feature extraction is carried out by utilizing a convolutional neural network, and features related to the lips, such as lip recognition, emotion analysis and the like, are carried out. In addition, because the image area is limited on the lip, the operation time of the convolutional neural network model can be reduced, and the calculation efficiency is improved.
The acoustic characteristic parameters sxtz, the emotion influence parameters qxyx and the lip image characteristics Hcxt are combined to form dysarthria correction real-time characteristic vectors, the dysdysarthria correction real-time characteristic vectors are introduced into a convolutional neural network model and are integrated through a maximized normalization formula, and abnormal pronunciation evaluation results of dysarthria patients are generated;
The analysis logic of the dysarthria correction real-time feature vector based on the convolutional neural network model is as follows:
the convolutional neural network model comprises a convolutional layer, a pooling layer, a full-connection layer and a classification layer, wherein the convolutional layer comprises a first convolutional layer, a second convolutional layer and a third convolutional layer; the pooling layer comprises a first pooling layer, a second pooling layer and a third pooling layer; the full connection layer comprises a first full connection layer and a second full connection layer;
the classification layer comprises the dysarthria correction real-time feature vector, and the dysdysdysarthria correction real-time feature vector comprises the acoustic feature parameter sxtz, the emotion influence parameter qxyx and the lip-shaped image feature Hcxt;
the convolution layer in the first convolution layer comprises 16 feature mappings, the convolution kernel size is 3 multiplied by 3, convolution processing is carried out on the dysarthria correction real-time feature vector, and first feature mapping information of the dysdysarthria correction real-time feature vector is obtained;
the size of the extraction area of the first pooling layer is 2 multiplied by 2, and the first characteristic mapping information is integrated through the first pooling layer by using a maximized normalization formula to generate a once-processed real-time characteristic vector;
the real-time feature vector processed at one time is processed through a first full-connection layer to obtain a first real-time key feature vector;
The second convolution layer contains 32 feature maps, the size of the convolution kernel is 3×3; convolving the first real-time key feature vector to obtain second feature mapping information of the first real-time key feature vector;
the size of the extraction area of the second pooling layer is 2 multiplied by 2, and the second characteristic mapping information is integrated by the second pooling layer through a maximized normalization formula to generate a real-time characteristic vector of secondary treatment;
the real-time feature vector subjected to secondary processing is processed through a second full-connection layer to obtain a second real-time key feature vector;
the third convolution layer contains 64 feature maps, the size of the convolution kernel is 3×3; convolving the second real-time key feature vector to obtain third feature mapping information of the second real-time key feature vector;
the size of the extraction area of the third pooling layer is 2 multiplied by 2, and the third characteristic mapping information is integrated by the third pooling layer through a maximized normalization formula to generate a real-time characteristic vector processed for three times;
the real-time feature vector processed for three times is dysarthria correction feature vector, and abnormal pronunciation evaluation results of dysarthria patients are generated according to the dysdysdysdysdysarthria correction feature vector.
The application logic for generating the once processed real-time feature vector is as follows:
Marking the dysarthria correction real-time feature vector as GYXL, then GYXL= [ sxtz, qxyx, hcxt ]]The method comprises the steps of carrying out a first treatment on the surface of the Obtaining a once processed real-time feature vector GYXL after maximum normalization G The method comprises the steps of carrying out a first treatment on the surface of the The specific formula is as follows:
wherein C is 1 For correction of coefficients, the real-time feature vector GYXL is processed once G Correcting, wherein alpha and beta are weight coefficients, alpha is more than or equal to 0 and less than or equal to 1, beta is more than or equal to 0 and less than or equal to 1, and beta is more than or equal to 0 and less than or equal to 1 22 =1;GYXL max And GYXL min Maximum and minimum values of the real-time eigenvectors are corrected for the current convolution layer dysarthria.
What needs to be explained here is: logic for integrating and generating a once processed real-time feature vector through a maximized normalization formula, and similarly generating a twice processed real-time feature vector and a three-time processed real-time feature vector; in this way, the features can be normalized, thereby improving the performance and convergence speed of the algorithm. Maximum normalization is one of the common feature normalization methods that can scale feature values into the range of [0,1 ].
And analyzing the correction effect of the dysarthria patient according to the abnormal pronunciation evaluation result, and gradually leading out standard pronunciation.
The analysis logic of the abnormal pronunciation evaluation result of the dysarthria patient is as follows:
the abnormal pronunciation assessment result includes: acoustic feature parameter disqualification, and in case of acoustic feature parameter disqualification, one or more modes of emotion influencing parameter disqualification and lip image feature disqualification are combined;
The specific abnormal pronunciation assessment results are shown as follows:
a. if the dysarthria correction feature vector is [1, 0], the acoustic feature parameter is unqualified;
b. if the dysarthria correction feature vector is [1,0,1], the acoustic feature parameter is unqualified and the lip-shaped image feature is unqualified;
c. if the dysarthria correction feature vector is [1, 0], the acoustic feature parameter is unqualified and the emotion influence parameter is unqualified;
d. and if the dysarthria correction feature vector is [1, 1], the acoustic feature parameter is unqualified, the lip image feature is unqualified and the emotion influence parameter is unqualified.
If the prediction result is: a. the acoustic characteristic parameters are unqualified, and the fact that the pitch acoustic parameters sg, formant acoustic parameters sz, tone intensity acoustic parameters sq, duration acoustic parameters sc and phoneme acoustic parameters sw of the dysarthria patient are unqualified is indicated, namely, the sound intensity, frequency, tone, sound duration, tone, harmonic wave, volume and tone color need to be trained and adjusted;
if the prediction result is: b. the acoustic characteristic parameters are unqualified and the lip image characteristics are unqualified, wherein the unqualified lip image characteristics indicate that the lip shape of a patient with dysarthria is different from a preset lip shape in the sounding process, and the sounding mode is required to be trained and adjusted; namely, under the condition that the pronunciation mode is problematic, the pitch acoustic parameter sg, the formant acoustic parameter sz, the intensity acoustic parameter sq, the duration acoustic parameter sc and the phoneme acoustic parameter sw of the dysarthria patient are unqualified;
If the prediction result is: c. the acoustic characteristic parameters are unqualified and the emotion influence parameters are unqualified, wherein the unqualified emotion influence parameters indicate that the emotion fluctuation of a dysarthria patient is large in the sounding process, influence is caused on a training result, and the emotion needs to be trained and adjusted during sounding; namely, under the condition of large variation of emotion fluctuation, the pitch acoustic parameter sg, the formant acoustic parameter sz, the intensity acoustic parameter sq, the duration acoustic parameter sc and the phoneme acoustic parameter sw of the dysarthria patient are unqualified;
if the prediction result is: d. the acoustic characteristic parameters are unqualified, the lip image characteristic is unqualified and the emotion influence parameter is unqualified, and the fact that the pitch acoustic parameter sg, the formant acoustic parameter sz, the sound intensity acoustic parameter sq, the duration acoustic parameter sc and the phoneme acoustic parameter sw of a dysarthria patient are unqualified under the conditions that the emotion fluctuation is large and the pronunciation mode is problematic is indicated.
What needs to be specifically stated here is: firstly, after voice data of dysarthria patients are compared, the unmatched content with voice fingerprint data is marked as an abnormal pronunciation mark, which means that the condition that acoustic characteristic parameters are unqualified exists in the content of the abnormal pronunciation mark, so that the aim of calculating the acoustic characteristic parameters is to calculate and verify specific places unqualified in the acoustic characteristic parameters by referring to the prior art; and analyzing and predicting the result through a convolutional neural network model, so as to generate an abnormal pronunciation evaluation result of the dysarthria patient.
And analyzing the correction effect of the dysarthria patient according to the abnormal pronunciation evaluation result, and gradually leading out standard pronunciation.
What needs to be explained here is: the database can distinguish various syllable data according to the similarity of the spoken syllables, and pronunciation correction can be carried out according to the data stored in the database, so that the aim of pronunciation correction training is achieved, and thus, a dysarthria patient carries out pronunciation correction training according to training teaching materials selected by professional instructors. Analyzing the voice data sent by each training, and mainly analyzing an abnormal pronunciation mark part in the voice data to determine whether the disqualification of the acoustic characteristic parameters is influenced by a pronunciation mode and emotion; if yes, the part of the follow-up processing is emphasized aiming at the pronunciation mode and emotion management, and if not, the part of the follow-up processing is unqualified for the simple acoustic characteristic parameters;
of course, no matter what kind of reasons causes the unqualified acoustic characteristic parameters, comprehensive analysis and judgment are needed by combining clinical practice and phonetic theory knowledge. Therefore, in the diagnosis and treatment of the dysarthria problem, a professional medical institution and a professional person need to be searched for guiding correction training according to the abnormal pronunciation evaluation result of the dysarthria patient, so as to achieve the correction effect.
Example two
Referring to fig. 2, the embodiment is not described in detail in the description of the first embodiment, and provides a dysarthria correction effect analysis system based on preferred acoustic parameters, which is implemented based on the above-mentioned dysdysarthria correction effect analysis method based on preferred acoustic parameters, and includes: the system comprises a construction module, a first data analysis module, a second data analysis module, a third data analysis module and a guide module, wherein the modules are connected in a wired and/or wireless connection mode to realize data transmission among the modules;
the construction module is used for guiding correction of the dysarthria patient to establish correct pronunciation, marking the established correct pronunciation as voice fingerprint data and importing the voice fingerprint data into the comparison database;
the first data analysis module is used for carrying out recognition analysis on voice data and voice fingerprint data of the daily correction training of the dysarthria patient to generate a voice mark, wherein the voice mark comprises a standard pronunciation mark and an abnormal pronunciation mark;
the second data analysis module is used for extracting acoustic characteristic parameters sxtz, emotion influence parameters qxyx and lip-shaped image characteristics Hcxt of the patient in the time t corresponding to the abnormal pronunciation marks;
The third data analysis module combines the acoustic characteristic parameters sxtz, the emotion influence parameters qxyx and the lip image characteristics Hcxt to form dysarthria correction real-time characteristic vectors, introduces the dysdysarthria correction real-time characteristic vectors into a convolutional neural network model, and integrates the dysarthria correction real-time characteristic vectors through a maximized normalization formula to generate abnormal pronunciation evaluation results of dysarthria patients;
and the guiding module gradually guides the standard pronunciation to the dysarthria patient according to the abnormal pronunciation evaluation result.
The voice data includes words, words and sentences, and tones and syllables corresponding to the words, words and sentences.
The logic for generating the standard pronunciation marks and the abnormal pronunciation marks is as follows:
the voice data is divided into N content areas according to characters, n= {1,2, 3..the term "N., }; n is a positive integer, N being an nth content area of the N content areas;
and comparing and analyzing the voice data corresponding to the nth content area with the voice fingerprint data, marking the corresponding content area as a standard pronunciation mark if the corresponding voice data is matched with the voice fingerprint data respectively, marking the corresponding content area as an abnormal pronunciation mark if the corresponding voice data is not matched with the voice fingerprint data respectively, and recording the time t corresponding to the abnormal pronunciation mark.
The content of the comparison of the voice data and the voice fingerprint data comprises one or more modes of combination and comparison of characters, words and sentences and tones and syllables corresponding to the characters, words and sentences.
The logic for generating the acoustic feature parameter sxtz is:
the voice data is imported into a voice analyzer, and a spectrogram, a waveform chart and a frequency response chart are generated through the voice analyzer;
obtaining a pitch acoustic parameter sg and a formant acoustic parameter sz of the voice digital signal according to the spectrogram analysis;
analyzing according to the waveform diagram to obtain the acoustic parameter sq and the acoustic parameter sc of the duration of the voice digital signal;
obtaining a phoneme acoustic parameter sw of the voice digital signal according to the frequency response curve graph;
generating the acoustic characteristic parameter sxtz according to the pitch acoustic parameter sg, the formant acoustic parameter sz, the intensity acoustic parameter sq, the duration acoustic parameter sc and the phoneme acoustic parameter sw, and calculating the acoustic characteristic parameter sxtz through a formula, wherein the specific formula is as follows:
wherein a is 1 、a 2 、a 3 、a 4 、a 5 Is weight, and 0 is less than or equal to a 1 ≤1,0≤a 2 ≤1,0≤a 3 ≤1,0≤a 4 ≤1,0≤a 5 Not more than 1, and a 1 +a 2 +a 3 +a 4 +a 5 =1。
Logic for generating the mood influencing parameter qxyx is:
extracting the maximum amplitude Rmax of the heart beat, the minimum amplitude Rmin of the heart beat and the average amplitude Pmax of the heart beat signal, which are influenced by respiration, of the patient suffering from dysarthria within the time t corresponding to the abnormal pronunciation mark by a respiratory heart beat measuring instrument, and calculating the amplitude to generate the emotion influence parameter qxyx by a formula, wherein the calculation formula is as follows:
The analysis logic of the lip image feature Hcxt is:
extracting lip-shaped images within a time t corresponding to the abnormal pronunciation marks by using image acquisition equipment, wherein the lip-shaped images are equal in image size, and the image size is 100 multiplied by 100 pixels in an intermediate area taking the lips of a dysarthria patient as the center; and carrying out convolution processing on the lip image to determine a first feature matrix of the lip image, wherein the first feature matrix characterizes the lip image feature Hcxt.
The analysis logic of the dysarthria correction real-time feature vector based on the convolutional neural network model is as follows:
the convolutional neural network model comprises a convolutional layer, a pooling layer, a full-connection layer and a classification layer, wherein the convolutional layer comprises a first convolutional layer, a second convolutional layer and a third convolutional layer; the pooling layer comprises a first pooling layer, a second pooling layer and a third pooling layer; the full connection layer comprises a first full connection layer and a second full connection layer;
the classification layer comprises the dysarthria correction real-time feature vector, and the dysdysdysarthria correction real-time feature vector comprises the acoustic feature parameter sxtz, the emotion influence parameter qxyx and the lip-shaped image feature Hcxt;
the convolution layer in the first convolution layer comprises 16 feature mappings, the convolution kernel size is 3 multiplied by 3, convolution processing is carried out on the dysarthria correction real-time feature vector, and first feature mapping information of the dysdysarthria correction real-time feature vector is obtained;
The size of the extraction area of the first pooling layer is 2 multiplied by 2, and the first characteristic mapping information is integrated through the first pooling layer by using a maximized normalization formula to generate a once-processed real-time characteristic vector;
the real-time feature vector processed at one time is processed through a first full-connection layer to obtain a first real-time key feature vector;
the second convolution layer contains 32 feature maps, the size of the convolution kernel is 3×3; convolving the first real-time key feature vector to obtain second feature mapping information of the first real-time key feature vector;
the size of the extraction area of the second pooling layer is 2 multiplied by 2, and the second characteristic mapping information is integrated by the second pooling layer through a maximized normalization formula to generate a real-time characteristic vector of secondary treatment;
the real-time feature vector subjected to secondary processing is processed through a second full-connection layer to obtain a second real-time key feature vector;
the third convolution layer contains 64 feature maps, the size of the convolution kernel is 3×3; convolving the second real-time key feature vector to obtain third feature mapping information of the second real-time key feature vector;
the size of the extraction area of the third pooling layer is 2 multiplied by 2, and the third characteristic mapping information is integrated by the third pooling layer through a maximized normalization formula to generate a real-time characteristic vector processed for three times;
The real-time feature vector processed for three times is dysarthria correction feature vector, and abnormal pronunciation evaluation results of dysarthria patients are generated according to the dysdysdysdysdysarthria correction feature vector.
The application logic for generating the once processed real-time feature vector is as follows:
marking the dysarthria correction real-time feature vector as GYXL, then GYXL= [ sxtz, qxyx, hcxt ]]The method comprises the steps of carrying out a first treatment on the surface of the Obtaining a once processed real-time feature vector GYXL after maximum normalization G The method comprises the steps of carrying out a first treatment on the surface of the The specific formula is as follows:
wherein C is 1 For correction of coefficients, the real-time feature vector GYXL is processed once G Correcting, wherein alpha and beta are weight coefficients, alpha is more than or equal to 0 and less than or equal to 1, beta is more than or equal to 0 and less than or equal to 1, and beta is more than or equal to 0 and less than or equal to 1 22 =1; and GYXL min Maximum and minimum values of the real-time eigenvectors are corrected for the current convolution layer dysarthria.
The analysis logic of the abnormal pronunciation evaluation result of the dysarthria patient is as follows:
the abnormal pronunciation assessment result includes: acoustic feature parameter disqualification, and in case of acoustic feature parameter disqualification, one or more modes of emotion influencing parameter disqualification and lip image feature disqualification are combined;
the specific abnormal pronunciation assessment results are shown as follows:
a. if the dysarthria correction feature vector is [1, 0], the acoustic feature parameter is unqualified;
b. If the dysarthria correction feature vector is [1,0,1], the acoustic feature parameter is unqualified and the lip-shaped image feature is unqualified;
c. if the dysarthria correction feature vector is [1, 0], the acoustic feature parameter is unqualified and the emotion influence parameter is unqualified;
d. and if the dysarthria correction feature vector is [1, 1], the acoustic feature parameter is unqualified, the lip image feature is unqualified and the emotion influence parameter is unqualified.
Example III
The present embodiment provides an electronic device including: a processor and a memory, wherein the memory stores a computer program for the processor to call;
the processor executes the above-described dysarthria correction effect analysis method based on the preferred acoustic parameters by calling the computer program stored in the memory.
Fig. 3 is a schematic structural diagram of an electronic device provided in this embodiment, where the electronic device may have a relatively large difference due to different configurations or performances, and may include one or more processors (Central Processing Units, CPU) and one or more memories, where at least one computer program is stored in the memories, and the computer program is loaded and executed by the processors to implement a method for analyzing dysarthria correction effects based on preferred acoustic parameters provided in the foregoing method embodiments. The electronic device can also include other components for implementing the functions of the device, for example, the electronic device can also have wired or wireless network interfaces, input-output interfaces, and the like, for inputting and outputting data. The present embodiment is not described herein.
Example IV
The present embodiment proposes a computer-readable storage medium having stored thereon an erasable computer program;
the computer program, when run on a computer device, causes the computer device to perform a dysarthria correction effect analysis method as described above, based on preferred acoustic parameters.
For example, the computer readable storage medium can be Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), compact disk Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM), magnetic tape, floppy disk, optical data storage device, etc.
It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.
It should be understood that determining B from a does not mean determining B from a alone, but can also determine B from a and/or other information.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center over a wired network or/and a wireless network. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided by the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely one, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.

Claims (10)

1. The dysarthria correcting effect analysis method based on the optimized acoustic parameters is characterized by comprising the following steps of:
establishing correct pronunciation for the dysarthria patient, marking the established correct pronunciation as voice fingerprint data, and importing the voice fingerprint data into a comparison database;
performing recognition analysis on voice data and voice fingerprint data of the daily correction training of the dysarthria patient to generate a voice mark, wherein the voice mark comprises a standard pronunciation mark and an abnormal pronunciation mark;
extracting acoustic characteristic parameters sxtz, emotion influence parameters qxyx and lip image characteristics Hcxt of a patient in time t corresponding to the abnormal pronunciation marks;
The acoustic characteristic parameters sxtz, the emotion influence parameters qxyx and the lip image characteristics Hcxt are combined to form dysarthria correction real-time characteristic vectors, the dysdysarthria correction real-time characteristic vectors are introduced into a convolutional neural network model and are integrated through a maximized normalization formula, and abnormal pronunciation evaluation results of dysarthria patients are generated;
and gradually leading out standard pronunciation for dysarthria patients according to the abnormal pronunciation evaluation result.
2. A dysarthria correction effect analysis method based on preferred acoustic parameters according to claim 1, characterized in that the speech data comprises words, words and sentences, and the tones and syllables to which the words and sentences correspond.
3. A method of analyzing dysarthria correction effects based on preferred acoustic parameters according to claim 2, wherein the logic for generating the standard pronunciation signature and the abnormal pronunciation signature is as follows:
the voice data is divided into N content areas according to characters, n= {1,2, 3..the term "N., }; n is a positive integer, N being an nth content area of the N content areas;
and comparing and analyzing the voice data corresponding to the nth content area with the voice fingerprint data, marking the corresponding content area as a standard pronunciation mark if the corresponding voice data is matched with the voice fingerprint data respectively, marking the corresponding content area as an abnormal pronunciation mark if the corresponding voice data is not matched with the voice fingerprint data respectively, and recording the time t corresponding to the abnormal pronunciation mark.
4. A dysarthria correction effect analysis method based on preferred acoustic parameters according to claim 3, wherein the content of the comparison of the voice data with the voice fingerprint data comprises one or more of a word, a word and a sentence, and a tone and syllable corresponding to the word, word and sentence.
5. The method of claim 4, wherein the logic for generating the acoustic characteristic parameter sxtz is:
the voice data is imported into a voice analyzer, and a spectrogram, a waveform chart and a frequency response chart are generated through the voice analyzer;
obtaining a pitch acoustic parameter sg and a formant acoustic parameter sz of the voice digital signal according to the spectrogram analysis;
analyzing according to the waveform diagram to obtain the acoustic parameter sq and the acoustic parameter sc of the duration of the voice digital signal;
obtaining a phoneme acoustic parameter sw of the voice digital signal according to the frequency response curve graph;
generating the acoustic characteristic parameter sxtz according to the pitch acoustic parameter sg, the formant acoustic parameter sz, the intensity acoustic parameter sq, the duration acoustic parameter sc and the phoneme acoustic parameter sw, and calculating the acoustic characteristic parameter sxtz through a formula, wherein the specific formula is as follows:
Wherein a is 1 、a 2 、a 3 、a 4 、a 5 Is weight, and 0 is less than or equal to a 1 ≤1,0≤a 2 ≤1,0≤a 3 ≤1,0≤a 4 ≤1,0≤a 5 Not more than 1, and a 1 +a 2 +a 3 +a 4 +a 5 =1。
6. A method of analyzing dysarthria correction effects based on preferred acoustic parameters according to claim 5, wherein the logic to generate the mood influencing parameter qxyx is:
extracting the maximum amplitude Rmax of the heart beat, the minimum amplitude Rmin of the heart beat and the average amplitude Pmax of the heart beat signal, which are influenced by respiration, of the patient suffering from dysarthria within the time t corresponding to the abnormal pronunciation mark by a respiratory heart beat measuring instrument, and calculating the amplitude to generate the emotion influence parameter qxyx by a formula, wherein the calculation formula is as follows:
7. the method for analyzing dysarthria correction effects based on preferred acoustic parameters according to claim 6, wherein the analysis logic of the lip image feature Hcxt is:
extracting lip-shaped images within a time t corresponding to the abnormal pronunciation marks by using image acquisition equipment, wherein the lip-shaped images are equal in image size, and the image size is 100 multiplied by 100 pixels in an intermediate area taking the lips of a dysarthria patient as the center; and carrying out convolution processing on the lip image to determine a first feature matrix of the lip image, wherein the first feature matrix characterizes the lip image feature Hcxt.
8. The method for analyzing dysarthria correction effects based on preferred acoustic parameters according to claim 7, wherein the analysis logic of the dysarthria correction real-time feature vector based on the convolutional neural network model is as follows:
the convolutional neural network model comprises a convolutional layer, a pooling layer, a full-connection layer and a classification layer, wherein the convolutional layer comprises a first convolutional layer, a second convolutional layer and a third convolutional layer; the pooling layer comprises a first pooling layer, a second pooling layer and a third pooling layer; the full connection layer comprises a first full connection layer and a second full connection layer;
the classification layer comprises the dysarthria correction real-time feature vector, and the dysdysdysarthria correction real-time feature vector comprises the acoustic feature parameter sxtz, the emotion influence parameter qxyx and the lip-shaped image feature Hcxt;
the convolution layer in the first convolution layer comprises 16 feature mappings, the convolution kernel size is 3 multiplied by 3, convolution processing is carried out on the dysarthria correction real-time feature vector, and first feature mapping information of the dysdysarthria correction real-time feature vector is obtained;
the size of the extraction area of the first pooling layer is 2 multiplied by 2, and the first characteristic mapping information is integrated through the first pooling layer by using a maximized normalization formula to generate a once-processed real-time characteristic vector;
The real-time feature vector processed at one time is processed through a first full-connection layer to obtain a first real-time key feature vector;
the second convolution layer contains 32 feature maps, the size of the convolution kernel is 3×3; convolving the first real-time key feature vector to obtain second feature mapping information of the first real-time key feature vector;
the size of the extraction area of the second pooling layer is 2 multiplied by 2, and the second characteristic mapping information is integrated by the second pooling layer through a maximized normalization formula to generate a real-time characteristic vector of secondary treatment;
the real-time feature vector subjected to secondary processing is processed through a second full-connection layer to obtain a second real-time key feature vector;
the third convolution layer contains 64 feature maps, the size of the convolution kernel is 3×3; convolving the second real-time key feature vector to obtain third feature mapping information of the second real-time key feature vector;
the size of the extraction area of the third pooling layer is 2 multiplied by 2, and the third characteristic mapping information is integrated by the third pooling layer through a maximized normalization formula to generate a real-time characteristic vector processed for three times;
the real-time feature vector processed for three times is dysarthria correction feature vector, and abnormal pronunciation evaluation results of dysarthria patients are generated according to the dysdysdysdysdysarthria correction feature vector.
9. The method for analyzing dysarthria correcting effects based on preferred acoustic parameters according to claim 8, wherein the application logic for generating the once processed real-time feature vector is:
marking the dysarthria correction real-time feature vector as GYXL, then GYXL= [ sxtz, qxyx, hcxt ]]The method comprises the steps of carrying out a first treatment on the surface of the Obtaining a once processed real-time feature vector GYXL after maximum normalization G The method comprises the steps of carrying out a first treatment on the surface of the The specific formula is as follows:
wherein C is 1 For correction of coefficients, the real-time feature vector GYXL is processed once G Correcting, wherein alpha and beta are weight coefficients, alpha is more than or equal to 0 and less than or equal to 1, beta is more than or equal to 0 and less than or equal to 1, and beta is more than or equal to 0 and less than or equal to 1 22 =1;GYXL max And GYXL min Maximum and minimum values of the real-time eigenvectors are corrected for the current convolution layer dysarthria.
10. The method for analyzing dysarthria correcting effects based on preferred acoustic parameters according to claim 9, wherein the analysis logic of dysarthria patient abnormal pronunciation evaluation result is:
the abnormal pronunciation assessment result includes: acoustic feature parameter disqualification, and in case of acoustic feature parameter disqualification, one or more modes of emotion influencing parameter disqualification and lip image feature disqualification are combined;
the specific abnormal pronunciation assessment results are shown as follows:
a. If the dysarthria correction feature vector is [1, 0], the acoustic feature parameter is unqualified;
b. if the dysarthria correction feature vector is [1,0,1], the acoustic feature parameter is unqualified and the lip-shaped image feature is unqualified;
c. if the dysarthria correction feature vector is [1, 0], the acoustic feature parameter is unqualified and the emotion influence parameter is unqualified;
d. and if the dysarthria correction feature vector is [1, 1], the acoustic feature parameter is unqualified, the lip image feature is unqualified and the emotion influence parameter is unqualified.
CN202311219168.7A 2023-09-20 2023-09-20 Dysarthria correction effect analysis method based on optimized acoustic parameters Active CN117198340B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311219168.7A CN117198340B (en) 2023-09-20 2023-09-20 Dysarthria correction effect analysis method based on optimized acoustic parameters

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311219168.7A CN117198340B (en) 2023-09-20 2023-09-20 Dysarthria correction effect analysis method based on optimized acoustic parameters

Publications (2)

Publication Number Publication Date
CN117198340A true CN117198340A (en) 2023-12-08
CN117198340B CN117198340B (en) 2024-04-30

Family

ID=88984851

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311219168.7A Active CN117198340B (en) 2023-09-20 2023-09-20 Dysarthria correction effect analysis method based on optimized acoustic parameters

Country Status (1)

Country Link
CN (1) CN117198340B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117976141A (en) * 2024-04-01 2024-05-03 四川大学华西医院 Voice rehabilitation analysis method and system based on acoustic analysis algorithm

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108630225A (en) * 2018-03-29 2018-10-09 太原理工大学 Barrier children's vowel appraisal procedure is listened based on fuzzy overall evaluation
WO2019194843A1 (en) * 2018-04-05 2019-10-10 Google Llc System and method for generating diagnostic health information using deep learning and sound understanding
CN113241065A (en) * 2021-05-11 2021-08-10 北京工商大学 Dysarthria voice recognition method and system based on visual facial contour motion
CN113658584A (en) * 2021-08-19 2021-11-16 北京智精灵科技有限公司 Intelligent pronunciation correction method and system
KR20230119609A (en) * 2022-02-07 2023-08-16 가톨릭대학교 산학협력단 Apparatus and method for examining articulatory phonological disorders using artificial intelligence

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108630225A (en) * 2018-03-29 2018-10-09 太原理工大学 Barrier children's vowel appraisal procedure is listened based on fuzzy overall evaluation
WO2019194843A1 (en) * 2018-04-05 2019-10-10 Google Llc System and method for generating diagnostic health information using deep learning and sound understanding
CN113241065A (en) * 2021-05-11 2021-08-10 北京工商大学 Dysarthria voice recognition method and system based on visual facial contour motion
CN113658584A (en) * 2021-08-19 2021-11-16 北京智精灵科技有限公司 Intelligent pronunciation correction method and system
KR20230119609A (en) * 2022-02-07 2023-08-16 가톨릭대학교 산학협력단 Apparatus and method for examining articulatory phonological disorders using artificial intelligence

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117976141A (en) * 2024-04-01 2024-05-03 四川大学华西医院 Voice rehabilitation analysis method and system based on acoustic analysis algorithm

Also Published As

Publication number Publication date
CN117198340B (en) 2024-04-30

Similar Documents

Publication Publication Date Title
Kent et al. Static measurements of vowel formant frequencies and bandwidths: A review
US20200294509A1 (en) Method and apparatus for establishing voiceprint model, computer device, and storage medium
Khan et al. Classification of speech intelligibility in Parkinson's disease
Rudzicz et al. The TORGO database of acoustic and articulatory speech from speakers with dysarthria
CN107622797B (en) Body condition determining system and method based on sound
Panek et al. Acoustic analysis assessment in speech pathology detection
Dejonckere Perceptual and laboratory assessment of dysphonia
CN106073706B (en) A kind of customized information and audio data analysis method and system towards Mini-mental Status Examination
CN108922563B (en) Based on the visual verbal learning antidote of deviation organ morphology behavior
JP2017532082A (en) A system for speech-based assessment of patient mental status
CN117198340B (en) Dysarthria correction effect analysis method based on optimized acoustic parameters
EP2862169A2 (en) Cepstral separation difference
Freitas et al. An introduction to silent speech interfaces
Almaghrabi et al. Bio-acoustic features of depression: A review
Usman et al. Heart rate detection and classification from speech spectral features using machine learning
Wand Advancing electromyographic continuous speech recognition: Signal preprocessing and modeling
Dutta et al. A Fine-Tuned CatBoost-Based Speech Disorder Detection Model
Sahoo et al. Analyzing the vocal tract characteristics for out-of-breath speech
McGlashan Evaluation of the Voice
Castellanos et al. Acoustic speech analysis for hypernasality detection in children
Loakes et al. From IPA to Praat and beyond
Jeyalakshmi et al. Deaf speech assessment using digital processing techniques
Karakoc et al. Visual and auditory analysis methods for speaker recognition in digital forensic
Koniaris et al. On mispronunciation analysis of individual foreign speakers using auditory periphery models
Tavakoli et al. Statistics in Phonetics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant