CN117976141A - Voice rehabilitation analysis method and system based on acoustic analysis algorithm - Google Patents
Voice rehabilitation analysis method and system based on acoustic analysis algorithm Download PDFInfo
- Publication number
- CN117976141A CN117976141A CN202410381430.6A CN202410381430A CN117976141A CN 117976141 A CN117976141 A CN 117976141A CN 202410381430 A CN202410381430 A CN 202410381430A CN 117976141 A CN117976141 A CN 117976141A
- Authority
- CN
- China
- Prior art keywords
- real
- voice rehabilitation
- time
- model
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 110
- 238000012549 training Methods 0.000 claims abstract description 134
- 238000001228 spectrum Methods 0.000 claims abstract description 65
- 238000000605 extraction Methods 0.000 claims abstract description 48
- 238000000034 method Methods 0.000 claims abstract description 35
- 238000004364 calculation method Methods 0.000 claims description 29
- 230000004927 fusion Effects 0.000 claims description 25
- 238000007781 pre-processing Methods 0.000 claims description 22
- 238000012545 processing Methods 0.000 claims description 12
- 230000009466 transformation Effects 0.000 claims description 11
- 238000003058 natural language processing Methods 0.000 claims description 8
- 230000003595 spectral effect Effects 0.000 claims description 8
- 238000012937 correction Methods 0.000 claims description 7
- 238000005457 optimization Methods 0.000 claims description 6
- 238000004891 communication Methods 0.000 claims description 5
- 238000013135 deep learning Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000009432 framing Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 description 15
- 238000005070 sampling Methods 0.000 description 11
- 230000008569 process Effects 0.000 description 10
- 238000013527 convolutional neural network Methods 0.000 description 6
- 230000002457 bidirectional effect Effects 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 210000001260 vocal cord Anatomy 0.000 description 3
- 125000004122 cyclic group Chemical group 0.000 description 2
- 210000000214 mouth Anatomy 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
Landscapes
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention belongs to the technical field of audio data identification, and discloses a voice rehabilitation analysis method and system based on an acoustic analysis algorithm. The method comprises the following steps: constructing a voice rehabilitation training knowledge graph, an acoustic analysis model, a frequency spectrum feature extraction model and a voice rehabilitation scoring model; generating a voice rehabilitation training scheme, and acquiring a corresponding basic voice rehabilitation training scheme; according to the real-time voice rehabilitation training audio data of the user, using an acoustic analysis model to carry out acoustic analysis; according to the real-time voice rehabilitation training audio data of the user, using a frequency spectrum feature extraction model to extract frequency spectrum features; scoring the voice rehabilitation by using a voice rehabilitation scoring model; and correcting the voice rehabilitation training scheme according to the real-time voice rehabilitation scoring result. The invention solves the problems of poor adaptability of the training scheme, lack of consistency and objectivity of rehabilitation analysis, high time cost and poor accuracy in the prior art.
Description
Technical Field
The invention belongs to the technical field of audio data identification, and particularly relates to a voice rehabilitation analysis method and system based on an acoustic analysis algorithm.
Background
Sound is a carrier of human language communication. The sound producing system produces sound under the drive of vocal cord vibration and is transmitted out through a channel formed by the throat and the oral cavity. Sound carries different information, which is an indispensable means for communication from person to person. The sound emitted by the sound emitting system can be described by a waveform signal, called voice. When the sounding organ is in a normal state, the vibration of the vocal cords has an obvious periodic rule, and the propagation channel formed by the throat and the mouth also changes regularly, so that the generated voice can also circulate regularly. Patients with damaged vocal cords need to perform voice rehabilitation to help them recover or improve voice function.
In the prior art, voice rehabilitation training schemes are usually standardized, and can not completely adapt to unique voice characteristics and rehabilitation requirements of each patient; voice rehabilitation analysis often depends on manual evaluation and guidance of professional therapists, and analysis results may vary from person to person, and lack consistency and objectivity; the process of evaluating and formulating a training regimen by a professional therapist can be very time consuming and difficult to popularize on a large scale; the accuracy of manual voice rehabilitation analysis is poor.
Disclosure of Invention
The invention aims to solve the problems of poor adaptability, lack of consistency and objectivity of rehabilitation analysis, high time cost and poor accuracy of a training scheme in the prior art, and provides a voice rehabilitation analysis method and system based on an acoustic analysis algorithm.
The technical scheme adopted by the invention is as follows:
a voice rehabilitation analysis method based on an acoustic analysis algorithm comprises the following steps:
constructing a voice rehabilitation training knowledge graph, an acoustic analysis model, a frequency spectrum feature extraction model and a voice rehabilitation scoring model;
According to basic inquiry information of a user, using a voice rehabilitation training knowledge graph to generate a voice rehabilitation training scheme, and acquiring a corresponding basic voice rehabilitation training scheme;
according to the real-time voice rehabilitation training audio data of the user, performing acoustic analysis by using an acoustic analysis model to obtain corresponding real-time acoustic parameters;
according to the real-time voice rehabilitation training audio data of the user, using a frequency spectrum feature extraction model to extract frequency spectrum features and obtaining corresponding real-time frequency spectrum features;
According to the real-time acoustic parameter characteristics of the real-time acoustic parameters and the corresponding real-time frequency spectrum characteristics, carrying out voice rehabilitation scoring by using a voice rehabilitation scoring model, and obtaining a corresponding real-time voice rehabilitation scoring result;
And according to the real-time voice rehabilitation scoring result, using a voice rehabilitation training knowledge graph to correct the voice rehabilitation training scheme, and obtaining a corresponding correction voice rehabilitation training scheme.
Further, constructing a voice rehabilitation training knowledge graph, an acoustic analysis model, a frequency spectrum feature extraction model and a voice rehabilitation scoring model, wherein the method comprises the following steps of:
Acquiring professional knowledge big data in the voice rehabilitation field, and constructing a corresponding voice rehabilitation training knowledge graph by using a natural language processing algorithm according to the professional knowledge big data;
collecting historical voice rehabilitation training audio data of a plurality of users, and preprocessing the historical voice rehabilitation training audio data to obtain a plurality of historical model sample data provided with real scoring labels;
According to a plurality of preset acoustic parameter indexes, constructing a corresponding acoustic analysis model by using an acoustic analysis algorithm, and constructing a corresponding spectral feature extraction model by using a spectral feature extraction algorithm;
according to the historical model sample data, performing acoustic analysis by using an acoustic analysis model to obtain a plurality of corresponding historical acoustic parameters, and extracting historical acoustic parameter characteristics corresponding to the historical acoustic parameters;
According to the historical model sample data, using a spectrum feature extraction model to extract spectrum features to obtain a plurality of corresponding historical spectrum features;
Carrying out feature fusion on the historical acoustic parameter features and the corresponding historical spectrum features of the same historical model sample data to obtain a plurality of corresponding historical fusion features;
According to a plurality of historical fusion characteristics, performing optimization training by using a deep learning algorithm, constructing an initial voice rehabilitation scoring model, and generating a plurality of corresponding prediction scoring labels;
Obtaining a corresponding model prediction accuracy according to the plurality of prediction scoring labels and the corresponding plurality of real scoring labels;
if the model prediction accuracy is greater than the preset model prediction accuracy threshold, outputting an optimal voice rehabilitation scoring model, otherwise, continuing to perform optimization training.
Further, acquiring professional knowledge big data in the voice rehabilitation field, and constructing a corresponding voice rehabilitation training knowledge graph by using a natural language processing algorithm according to the professional knowledge big data, wherein the method comprises the following steps of:
Acquiring professional knowledge big data in the voice rehabilitation field, and carrying out knowledge preprocessing on the professional knowledge big data to obtain professional knowledge after intervention processing;
Constructing a corresponding named entity extraction model and an entity relation extraction model by using a natural language processing algorithm;
Extracting a plurality of knowledge named entities in the preprocessed professional knowledge by using a named entity extraction model, and extracting a plurality of knowledge entity relations among the plurality of knowledge named entities by using an entity relation extraction model;
And constructing a corresponding voice rehabilitation training knowledge graph according to the relationships between the knowledge naming entities and the knowledge entities.
Further, the acoustic parameter indexes comprise volume indexes, tone indexes, formant indexes and time length indexes;
The acoustic analysis model consists of a volume calculation sub-model, a tone calculation sub-model, a formant calculation sub-model and a time length calculation sub-model;
The spectral feature extraction model is constructed based on logfBank algorithm.
Further, the voice rehabilitation scoring model is constructed based on a CNN algorithm and comprises an input layer, a feature fusion module, a feature transformation module, a scoring calculation module and an output layer which are sequentially connected.
Further, according to the basic inquiry information, using a voice rehabilitation training knowledge graph to generate a voice rehabilitation training scheme, and obtaining a corresponding basic voice rehabilitation training scheme, wherein the method comprises the following steps:
Collecting basic inquiry information of a user, and extracting a plurality of inquiry named entities of the basic inquiry information by using a named entity extraction model;
Inputting a plurality of inquiry named entities into a voice rehabilitation training knowledge graph, and searching the named entities to obtain a plurality of corresponding matching knowledge named entities;
And generating a voice rehabilitation training scheme according to the plurality of knowledge entity relations among the plurality of matched knowledge named entities, and acquiring a corresponding basic voice rehabilitation training scheme.
Further, according to the real-time voice rehabilitation training audio data of the user, using an acoustic analysis model to perform acoustic analysis, and obtaining corresponding real-time acoustic parameters, the method comprises the following steps:
Performing first preprocessing on the real-time voice rehabilitation training audio data of the user to obtain real-time voice rehabilitation training audio data after the first preprocessing; the first preprocessing comprises denoising, downsampling and filtering which are sequentially carried out;
Acquiring corresponding real-time volume parameters, real-time tone parameters, real-time formant parameters and real-time duration parameters by using a sound volume operator model, a tone calculation sub-model, a formant calculation sub-model and a time duration calculation sub-model according to the first preprocessed real-time voice rehabilitation training audio data;
and integrating the real-time volume parameter, the real-time tone parameter, the real-time formant parameter and the real-time duration parameter to obtain the corresponding real-time acoustic parameter.
Further, according to the real-time voice rehabilitation training audio data of the user, using a spectrum feature extraction model to extract spectrum features and obtain corresponding real-time spectrum features, the method comprises the following steps:
Performing second preprocessing on the real-time voice rehabilitation training audio data of the user to obtain a plurality of second preprocessed real-time voice rehabilitation training audio data frames; the second preprocessing comprises denoising, pre-emphasis and framing which are sequentially carried out;
Performing STFT processing on the second preprocessed real-time voice rehabilitation training audio data frames to obtain a corresponding spectrogram;
Acquiring a Mel frequency cepstrum coefficient corresponding to the spectrogram, and acquiring a first-order difference and a second-order difference corresponding to the Mel frequency cepstrum coefficient;
And integrating the Mel frequency cepstrum coefficient, the first-order difference and the second-order difference to obtain corresponding real-time frequency spectrum characteristics.
Further, according to the real-time acoustic parameter characteristics of the real-time acoustic parameters and the corresponding real-time spectrum characteristics, using a voice rehabilitation scoring model to score voice rehabilitation, and obtaining a corresponding real-time voice rehabilitation scoring result, comprising the following steps:
extracting real-time acoustic parameter characteristics of the real-time acoustic parameters;
inputting the real-time acoustic parameter characteristics and the corresponding real-time frequency spectrum characteristics into an input layer of a voice rehabilitation scoring model;
a feature fusion module is used for carrying out feature fusion on the real-time acoustic parameter features and the real-time spectrum features to obtain fusion features;
Performing feature transformation on the fusion features by using a feature transformation module to obtain corresponding output features;
according to the output characteristics, a score calculating module is used for calculating a definition score and an accuracy score to obtain a corresponding definition score and accuracy score;
and using an output layer to take the definition score and the accuracy score as corresponding real-time voice rehabilitation score results.
The system comprises a cloud computing center and a plurality of user terminals, wherein the cloud computing center is respectively in communication connection with the plurality of user terminals, and comprises a model building unit, a training scheme generating unit, an acoustic analysis unit, a frequency spectrum feature extracting unit, a voice rehabilitation scoring unit and a training scheme correcting unit.
The beneficial effects of the invention are as follows:
The invention discloses a voice rehabilitation analysis method and a voice rehabilitation analysis system based on an acoustic analysis algorithm, which are used for carrying out voice rehabilitation analysis according to real-time audio of voice rehabilitation training by constructing an acoustic analysis model, a frequency spectrum feature extraction model and a voice rehabilitation scoring model, so that the consistency and objectivity of voice rehabilitation analysis are improved, and a standardized system of voice rehabilitation analysis is realized; the manual evaluation and the guidance of a professional therapist are avoided, and the labor cost and the time cost of voice rehabilitation analysis are reduced; the voice rehabilitation analysis method has the advantages that automatic and intelligent analysis is carried out based on the audio data characteristics, accuracy and practicality of voice rehabilitation analysis are improved, and large-scale popularization can be carried out; by analyzing the voice rehabilitation scoring result of the voice rehabilitation training, the voice rehabilitation training scheme is subjected to real-time and personalized self-adaptive correction, so that the voice rehabilitation training scheme can be more in line with the actual situation of a user, and the effect of the voice rehabilitation training and the experience of the user are improved.
Other advantageous effects of the present invention will be further described in the detailed description.
Drawings
Fig. 1 is a flow chart of a voice rehabilitation analysis method based on an acoustic analysis algorithm in the invention.
Fig. 2 is a block diagram of a voice rehabilitation analysis system based on an acoustic analysis algorithm in the present invention.
Detailed Description
The invention is further illustrated by the following description of specific embodiments in conjunction with the accompanying drawings.
Example 1:
As shown in fig. 1, the embodiment provides a voice rehabilitation analysis method based on an acoustic analysis algorithm, which includes the following steps:
S1: the method comprises the following steps of constructing a voice rehabilitation training knowledge graph, an acoustic analysis model, a frequency spectrum feature extraction model and a voice rehabilitation scoring model:
S1-1: the method comprises the steps of collecting professional knowledge big data in the voice rehabilitation field, and constructing a corresponding voice rehabilitation training knowledge graph by using a natural language processing algorithm according to the professional knowledge big data, wherein the method comprises the following steps of:
S1-1-1: acquiring professional knowledge big data in the voice rehabilitation field, and carrying out knowledge preprocessing on the professional knowledge big data to obtain professional knowledge after intervention processing;
The knowledge preprocessing comprises data deduplication processing and data screening processing which are sequentially carried out on the large professional knowledge data, wherein the data deduplication processing deletes repeated data in the large professional knowledge data, so that the volume of data is reduced, the data screening processing eliminates the knowledge data which does not accord with the knowledge graph construction, such as error data, missing data and the like, and the standardization degree of the data is improved;
S1-1-2: constructing a corresponding named entity extraction model and an entity relation extraction model by using a natural language processing algorithm;
The named entity extraction model is constructed based on a BERT-BiLSTM-CRF algorithm, a pre-training language sub-model (BERT, bidirectional Encoder Representation from Transformers) is used for carrying out vector characterization on the pre-processed expertise, a two-way long-short-Term Memory network (BiLSTM, bi-directional Long Short-Term Memory) is used for extracting semantic features of the pre-processed expertise vector, and a linear chain conditional random field module (CRF, conditional Random Field) is used for marking a plurality of knowledge named entities in the pre-processed expertise according to the semantic features;
The entity relation extraction model is constructed based on BiGRU-Attention algorithm, the bidirectional cyclic neural network (BiGRU, bidirectional Recurrent Neural Network) extracts the vector of the professional knowledge after pretreatment, the Attention mechanism distributes Attention weight for each channel of the bidirectional cyclic neural network, the influence caused by the knowledge of the relation labeling error is reduced, and a plurality of corresponding entity relations are output based on BiGRU network according to the Attention weight, the professional knowledge vector after pretreatment and a plurality of named entities output by the named entity extraction model;
s1-1-3: extracting a plurality of knowledge named entities in the preprocessed professional knowledge by using a named entity extraction model, and extracting a plurality of knowledge entity relations among the plurality of knowledge named entities by using an entity relation extraction model;
S1-1-4: constructing a corresponding voice rehabilitation training knowledge graph according to the relationships between the knowledge naming entities and the knowledge entities;
s1-2: collecting historical voice rehabilitation training audio data of a plurality of users, and preprocessing the historical voice rehabilitation training audio data to obtain a plurality of historical model sample data provided with real scoring labels;
S1-3: according to a plurality of preset acoustic parameter indexes, constructing a corresponding acoustic analysis model by using an acoustic analysis algorithm, and constructing a corresponding spectral feature extraction model by using a spectral feature extraction algorithm;
The acoustic parameter indexes comprise volume indexes, tone indexes, formant indexes and time length indexes;
The acoustic analysis model consists of a volume calculation sub-model, a tone calculation sub-model, a formant calculation sub-model and a time length calculation sub-model;
The formula of the volume calculation sub-model is:
in the method, in the process of the invention, Calculating a function for the volume; /(I)Indicating the quantity for the sampling point; /(I)The total number of sampling points; /(I)For audio data/>At the sampling point/>Is a magnitude of (a);
the formula of the pitch calculation sub-model is:
in the method, in the process of the invention, For the tone calculation function, an automatic tone detection algorithm is generally adopted; /(I)For audio dataAt the sampling point/>Is provided; /(I)Is a pitch compensation value;
The formant calculation sub-model has the formula:
in the method, in the process of the invention, Is predictive data; /(I)Is a prediction coefficient; /(I)Is audio data; /(I)Indicating the quantity for the sampling point; /(I)For the indication quantity of sampling points, the current sampling point/>, is representedPrevious sample points/>;/>Is the prediction order; /(I)Predicted data after fast fourier transform; /(I)Is a fast fourier transform function; /(I)Is the formant frequency; /(I)Reserving a function for the maximum value; /(I)Is the bandwidth of the formants; /(I)Calculating a function for the bandwidth;
The formula of the duration calculation sub-model is:
in the method, in the process of the invention, Calculating a function for the duration; /(I)For sampling points/>Audio measurement duration of (a); /(I)For sampling points/>Is a silence interval duration of (2); /(I)The time length compensation value is obtained; /(I)Indicating the quantity for the sampling point; /(I)The total number of sampling points;
The spectral feature extraction model is constructed based on logfBank algorithm; the logfBank algorithm is similar to a Mel frequency cepstrum (Mel-Frequency Cepstrum, MFCC) algorithm, is based on fBank feature extraction results, and is subjected to subsequent processing, the calculated amount of the logfBank algorithm is smaller than that of the MFCC algorithm, the correlation of features is higher, and the correlation between features can be better utilized by a deep learning model, so that the recognition accuracy is improved, and the calculated amount is reduced;
s1-4: according to the historical model sample data, performing acoustic analysis by using an acoustic analysis model to obtain a plurality of corresponding historical acoustic parameters, and extracting historical acoustic parameter characteristics corresponding to the historical acoustic parameters;
S1-5: according to the historical model sample data, using a spectrum feature extraction model to extract spectrum features to obtain a plurality of corresponding historical spectrum features;
s1-6: carrying out feature fusion on the historical acoustic parameter features and the corresponding historical spectrum features of the same historical model sample data to obtain a plurality of corresponding historical fusion features;
S1-7: according to a plurality of historical fusion characteristics, performing optimization training by using a deep learning algorithm, constructing an initial voice rehabilitation scoring model, and generating a plurality of corresponding prediction scoring labels;
The voice rehabilitation scoring model is constructed based on a convolutional neural network (Convolutional Neural Networks, CNN) algorithm and comprises an input layer, a feature fusion module, a feature transformation module, a scoring calculation module and an output layer which are sequentially connected; the feature transformation module performs a series of pooling, convolution and feature processing on the input features;
S1-8: obtaining a corresponding model prediction accuracy according to the plurality of prediction scoring labels and the corresponding plurality of real scoring labels;
s1-9: if the model prediction accuracy is greater than a preset model prediction accuracy threshold, outputting an optimal voice rehabilitation scoring model, otherwise, continuing to perform optimization training;
S2: according to basic inquiry information of a user, using a voice rehabilitation training knowledge graph to generate a voice rehabilitation training scheme, and acquiring a corresponding basic voice rehabilitation training scheme, wherein the voice rehabilitation training scheme comprises the following steps of:
S2-1: collecting basic inquiry information of a user, and extracting a plurality of inquiry named entities of the basic inquiry information by using a named entity extraction model;
S2-2: inputting a plurality of inquiry named entities into a voice rehabilitation training knowledge graph, and searching the named entities to obtain a plurality of corresponding matching knowledge named entities;
s2-3: according to a plurality of knowledge entity relations among a plurality of matched knowledge named entities, voice rehabilitation training scheme generation is carried out, and a corresponding basic voice rehabilitation training scheme is obtained;
S3: according to the real-time voice rehabilitation training audio data of the user, using an acoustic analysis model to carry out acoustic analysis and obtaining corresponding real-time acoustic parameters, comprising the following steps:
s3-1: performing first preprocessing on the real-time voice rehabilitation training audio data of the user to obtain real-time voice rehabilitation training audio data after the first preprocessing; the first preprocessing comprises denoising, downsampling and filtering which are sequentially carried out;
S3-2: acquiring corresponding real-time volume parameters, real-time tone parameters, real-time formant parameters and real-time duration parameters by using a sound volume operator model, a tone calculation sub-model, a formant calculation sub-model and a time duration calculation sub-model according to the first preprocessed real-time voice rehabilitation training audio data;
s3-3: integrating the real-time volume parameter, the real-time tone parameter, the real-time formant parameter and the real-time duration parameter to obtain a corresponding real-time acoustic parameter;
S4: according to the real-time voice rehabilitation training audio data of the user, a spectrum feature extraction model is used for extracting spectrum features, and corresponding real-time spectrum features are obtained, and the method comprises the following steps:
s4-1: performing second preprocessing on the real-time voice rehabilitation training audio data of the user to obtain a plurality of second preprocessed real-time voice rehabilitation training audio data frames; the second preprocessing comprises denoising, pre-emphasis and framing which are sequentially carried out;
the pre-emphasis formula is:
in the method, in the process of the invention, Real-time voice rehabilitation training audio data after pre-emphasis; /(I)Audio data for real-time voice rehabilitation training; /(I)Is a time indication; /(I)Is a pre-emphasis frequency, typically set at 80Hz;
S4-2: performing Short-time Fourier transform (STFT) processing on the plurality of second preprocessed real-time voice rehabilitation training audio data frames to obtain a corresponding spectrogram;
the formulation of the spectrogram is:
in the method, in the process of the invention, Is a spectrogram; /(I)For time/>After pre-emphasis, real-time voice rehabilitation training audio data; /(I)Is a window function; /(I)Are time indication quantities; /(I)Is a frequency indication;
s4-3: acquiring a Mel frequency cepstrum coefficient corresponding to the spectrogram, and acquiring a first-order difference and a second-order difference corresponding to the Mel frequency cepstrum coefficient;
The obtaining formula of the mel frequency cepstrum coefficient is as follows:
in the method, in the process of the invention, Is sound spectrum/>Corresponding mel frequency cepstrum coefficients; /(I)Is a discrete cosine transform function; /(I)As a logarithmic function; /(I)Is sound spectrum/>A corresponding mold; Is sound spectrum/> A corresponding phase; /(I)A plurality of;
The formula of the first-order and second-order difference is:
in the method, in the process of the invention, Is sound spectrum/>Corresponding mel frequency cepstrum coefficients; /(I)Is a first order difference; is a second order difference; /(I) Is a differential indication quantity;
S4-4: integrating the Mel frequency cepstrum coefficient, the first-order difference and the second-order difference to obtain corresponding real-time frequency spectrum characteristics ;
S5: according to the real-time acoustic parameter characteristics of the real-time acoustic parameters and the corresponding real-time frequency spectrum characteristics, using a voice rehabilitation scoring model to score voice rehabilitation and obtain a corresponding real-time voice rehabilitation scoring result, comprising the following steps:
s5-1: extracting real-time acoustic parameter characteristics of real-time acoustic parameters Wherein/>Extracting a function for the characteristics;
S5-2: inputting the real-time acoustic parameter characteristics and the corresponding real-time frequency spectrum characteristics into an input layer of a voice rehabilitation scoring model;
S5-3: using a feature fusion module to perform feature fusion on the real-time acoustic parameter features and the real-time spectrum features to obtain fusion features Wherein/>Is a feature fusion function;
s5-4: feature transformation is carried out on the fusion features by using a feature transformation module to obtain corresponding output features Wherein/>Is a characteristic transformation function of the CNN network;
s5-5: according to the output characteristics, a score calculating module is used for calculating a definition score and an accuracy score to obtain a corresponding definition score and accuracy score;
in the method, in the process of the invention, Is an output feature; /(I)Predictive values of sharpness and accuracy, respectively, corresponding to predictive scoring tags; /(I)Scoring for clarity; /(I)Scoring the accuracy; /(I)As a function of the classification of the object,The function converts the activation of CNN network output into probability distribution;
S5-6: using an output layer, and taking the definition score and the accuracy score as corresponding real-time voice rehabilitation score results;
s6: and according to the real-time voice rehabilitation scoring result, using a voice rehabilitation training knowledge graph to correct the voice rehabilitation training scheme, and obtaining a corresponding correction voice rehabilitation training scheme.
Example 2:
As shown in fig. 2, the present embodiment provides a voice rehabilitation analysis system based on an acoustic analysis algorithm, for implementing a voice rehabilitation analysis method, where the system includes a cloud computing center and a plurality of user terminals, the cloud computing center is respectively connected with the plurality of user terminals in a communication manner, and the cloud computing center includes a model building unit, a training scheme generating unit, an acoustic analysis unit, a frequency spectrum feature extracting unit, a voice rehabilitation scoring unit and a training scheme correcting unit;
The user terminal is used for collecting basic inquiry information and real-time voice rehabilitation training audio data of a user and sending the basic inquiry information and the real-time voice rehabilitation training audio data to the cloud computing center; receiving a basic voice rehabilitation training scheme and a correction voice rehabilitation training scheme which are sent by a cloud computing center;
the model building unit is used for building a voice rehabilitation training knowledge graph, an acoustic analysis model, a frequency spectrum feature extraction model and a voice rehabilitation scoring model;
the training scheme generating unit is used for generating a voice rehabilitation training scheme by using a voice rehabilitation training knowledge graph according to basic inquiry information of a user, acquiring a corresponding basic voice rehabilitation training scheme and transmitting the basic voice rehabilitation training scheme to a corresponding user terminal;
the acoustic analysis unit is used for carrying out acoustic analysis by using an acoustic analysis model according to the real-time voice rehabilitation training audio data of the user to obtain corresponding real-time acoustic parameters;
The spectrum feature extraction unit is used for extracting spectrum features by using a spectrum feature extraction model according to the real-time voice rehabilitation training audio data of the user to obtain corresponding real-time spectrum features;
The voice rehabilitation scoring unit is used for scoring voice rehabilitation according to the real-time acoustic parameter characteristics of the real-time acoustic parameters and the corresponding real-time frequency spectrum characteristics by using a voice rehabilitation scoring model, and obtaining a corresponding real-time voice rehabilitation scoring result;
and the training scheme correction unit is used for correcting the voice rehabilitation training scheme by using the voice rehabilitation training knowledge graph according to the real-time voice rehabilitation scoring result to obtain a corresponding corrected voice rehabilitation training scheme.
The invention discloses a voice rehabilitation analysis method and a voice rehabilitation analysis system based on an acoustic analysis algorithm, which are used for carrying out voice rehabilitation analysis according to real-time audio of voice rehabilitation training by constructing an acoustic analysis model, a frequency spectrum feature extraction model and a voice rehabilitation scoring model, so that the consistency and objectivity of voice rehabilitation analysis are improved, and a standardized system of voice rehabilitation analysis is realized; the manual evaluation and the guidance of a professional therapist are avoided, and the labor cost and the time cost of voice rehabilitation analysis are reduced; the voice rehabilitation analysis method has the advantages that automatic and intelligent analysis is carried out based on the audio data characteristics, accuracy and practicality of voice rehabilitation analysis are improved, and large-scale popularization can be carried out; by analyzing the voice rehabilitation scoring result of the voice rehabilitation training, the voice rehabilitation training scheme is subjected to real-time and personalized self-adaptive correction, so that the voice rehabilitation training scheme can be more in line with the actual situation of a user, and the effect of the voice rehabilitation training and the experience of the user are improved.
The invention is not limited to the alternative embodiments described above, but any person may derive other various forms of products in the light of the present invention. The above detailed description should not be construed as limiting the scope of the invention, which is defined in the claims and the description may be used to interpret the claims.
Claims (10)
1. A voice rehabilitation analysis method based on an acoustic analysis algorithm is characterized in that: the method comprises the following steps:
constructing a voice rehabilitation training knowledge graph, an acoustic analysis model, a frequency spectrum feature extraction model and a voice rehabilitation scoring model;
According to basic inquiry information of a user, using a voice rehabilitation training knowledge graph to generate a voice rehabilitation training scheme, and acquiring a corresponding basic voice rehabilitation training scheme;
according to the real-time voice rehabilitation training audio data of the user, performing acoustic analysis by using an acoustic analysis model to obtain corresponding real-time acoustic parameters;
according to the real-time voice rehabilitation training audio data of the user, using a frequency spectrum feature extraction model to extract frequency spectrum features and obtaining corresponding real-time frequency spectrum features;
According to the real-time acoustic parameter characteristics of the real-time acoustic parameters and the corresponding real-time frequency spectrum characteristics, carrying out voice rehabilitation scoring by using a voice rehabilitation scoring model, and obtaining a corresponding real-time voice rehabilitation scoring result;
And according to the real-time voice rehabilitation scoring result, using a voice rehabilitation training knowledge graph to correct the voice rehabilitation training scheme, and obtaining a corresponding correction voice rehabilitation training scheme.
2. The voice rehabilitation analysis method based on the acoustic analysis algorithm according to claim 1, wherein: the method comprises the following steps of constructing a voice rehabilitation training knowledge graph, an acoustic analysis model, a frequency spectrum feature extraction model and a voice rehabilitation scoring model:
Acquiring professional knowledge big data in the voice rehabilitation field, and constructing a corresponding voice rehabilitation training knowledge graph by using a natural language processing algorithm according to the professional knowledge big data;
collecting historical voice rehabilitation training audio data of a plurality of users, and preprocessing the historical voice rehabilitation training audio data to obtain a plurality of historical model sample data provided with real scoring labels;
According to a plurality of preset acoustic parameter indexes, constructing a corresponding acoustic analysis model by using an acoustic analysis algorithm, and constructing a corresponding spectral feature extraction model by using a spectral feature extraction algorithm;
according to the historical model sample data, performing acoustic analysis by using an acoustic analysis model to obtain a plurality of corresponding historical acoustic parameters, and extracting historical acoustic parameter characteristics corresponding to the historical acoustic parameters;
According to the historical model sample data, using a spectrum feature extraction model to extract spectrum features to obtain a plurality of corresponding historical spectrum features;
Carrying out feature fusion on the historical acoustic parameter features and the corresponding historical spectrum features of the same historical model sample data to obtain a plurality of corresponding historical fusion features;
According to a plurality of historical fusion characteristics, performing optimization training by using a deep learning algorithm, constructing an initial voice rehabilitation scoring model, and generating a plurality of corresponding prediction scoring labels;
Obtaining a corresponding model prediction accuracy according to the plurality of prediction scoring labels and the corresponding plurality of real scoring labels;
if the model prediction accuracy is greater than the preset model prediction accuracy threshold, outputting an optimal voice rehabilitation scoring model, otherwise, continuing to perform optimization training.
3. The voice rehabilitation analysis method based on the acoustic analysis algorithm according to claim 2, wherein: the method comprises the steps of collecting professional knowledge big data in the voice rehabilitation field, and constructing a corresponding voice rehabilitation training knowledge graph by using a natural language processing algorithm according to the professional knowledge big data, wherein the method comprises the following steps of:
Acquiring professional knowledge big data in the voice rehabilitation field, and carrying out knowledge preprocessing on the professional knowledge big data to obtain professional knowledge after intervention processing;
Constructing a corresponding named entity extraction model and an entity relation extraction model by using a natural language processing algorithm;
Extracting a plurality of knowledge named entities in the preprocessed professional knowledge by using a named entity extraction model, and extracting a plurality of knowledge entity relations among the plurality of knowledge named entities by using an entity relation extraction model;
And constructing a corresponding voice rehabilitation training knowledge graph according to the relationships between the knowledge naming entities and the knowledge entities.
4. The voice rehabilitation analysis method based on the acoustic analysis algorithm according to claim 2, wherein: the acoustic parameter indexes comprise volume indexes, tone indexes, formant indexes and time length indexes;
The acoustic analysis model consists of a volume calculation sub-model, a tone calculation sub-model, a formant calculation sub-model and a time length calculation sub-model;
The spectrum characteristic extraction model is constructed based on logfBank algorithm.
5. The voice rehabilitation analysis method based on the acoustic analysis algorithm according to claim 4, wherein: the voice rehabilitation scoring model is constructed based on a CNN algorithm and comprises an input layer, a feature fusion module, a feature transformation module, a scoring calculation module and an output layer which are sequentially connected.
6. A method for voice rehabilitation analysis based on an acoustic analysis algorithm according to claim 3, wherein: according to basic inquiry information, using a voice rehabilitation training knowledge graph to generate a voice rehabilitation training scheme, and acquiring a corresponding basic voice rehabilitation training scheme, wherein the voice rehabilitation training scheme comprises the following steps of:
Collecting basic inquiry information of a user, and extracting a plurality of inquiry named entities of the basic inquiry information by using a named entity extraction model;
Inputting a plurality of inquiry named entities into a voice rehabilitation training knowledge graph, and searching the named entities to obtain a plurality of corresponding matching knowledge named entities;
And generating a voice rehabilitation training scheme according to the plurality of knowledge entity relations among the plurality of matched knowledge named entities, and acquiring a corresponding basic voice rehabilitation training scheme.
7. The voice rehabilitation analysis method based on the acoustic analysis algorithm according to claim 4, wherein: according to the real-time voice rehabilitation training audio data of the user, using an acoustic analysis model to carry out acoustic analysis and obtaining corresponding real-time acoustic parameters, comprising the following steps:
Performing first preprocessing on the real-time voice rehabilitation training audio data of the user to obtain real-time voice rehabilitation training audio data after the first preprocessing; the first preprocessing comprises denoising, downsampling and filtering which are sequentially carried out;
Acquiring corresponding real-time volume parameters, real-time tone parameters, real-time formant parameters and real-time duration parameters by using a sound volume operator model, a tone calculation sub-model, a formant calculation sub-model and a time duration calculation sub-model according to the first preprocessed real-time voice rehabilitation training audio data;
and integrating the real-time volume parameter, the real-time tone parameter, the real-time formant parameter and the real-time duration parameter to obtain the corresponding real-time acoustic parameter.
8. The voice rehabilitation analysis method based on the acoustic analysis algorithm according to claim 4, wherein: according to the real-time voice rehabilitation training audio data of the user, a spectrum feature extraction model is used for extracting spectrum features, and corresponding real-time spectrum features are obtained, and the method comprises the following steps:
Performing second preprocessing on the real-time voice rehabilitation training audio data of the user to obtain a plurality of second preprocessed real-time voice rehabilitation training audio data frames; the second preprocessing comprises denoising, pre-emphasis and framing which are sequentially carried out;
Performing STFT processing on the second preprocessed real-time voice rehabilitation training audio data frames to obtain a corresponding spectrogram;
Acquiring a Mel frequency cepstrum coefficient corresponding to the spectrogram, and acquiring a first-order difference and a second-order difference corresponding to the Mel frequency cepstrum coefficient;
And integrating the Mel frequency cepstrum coefficient, the first-order difference and the second-order difference to obtain corresponding real-time frequency spectrum characteristics.
9. The voice rehabilitation analysis method based on the acoustic analysis algorithm according to claim 5, wherein the voice rehabilitation analysis method comprises the following steps: according to the real-time acoustic parameter characteristics of the real-time acoustic parameters and the corresponding real-time frequency spectrum characteristics, using a voice rehabilitation scoring model to score voice rehabilitation and obtain a corresponding real-time voice rehabilitation scoring result, comprising the following steps:
extracting real-time acoustic parameter characteristics of the real-time acoustic parameters;
inputting the real-time acoustic parameter characteristics and the corresponding real-time frequency spectrum characteristics into an input layer of a voice rehabilitation scoring model;
a feature fusion module is used for carrying out feature fusion on the real-time acoustic parameter features and the real-time spectrum features to obtain fusion features;
Performing feature transformation on the fusion features by using a feature transformation module to obtain corresponding output features;
according to the output characteristics, a score calculating module is used for calculating a definition score and an accuracy score to obtain a corresponding definition score and accuracy score;
and using an output layer to take the definition score and the accuracy score as corresponding real-time voice rehabilitation score results.
10. A voice rehabilitation analysis system based on an acoustic analysis algorithm, for implementing the voice rehabilitation analysis method according to any one of claims 1 to 9, characterized in that: the system comprises a cloud computing center and a plurality of user terminals, wherein the cloud computing center is respectively in communication connection with the plurality of user terminals, and comprises a model building unit, a training scheme generating unit, an acoustic analysis unit, a frequency spectrum characteristic extracting unit, a voice rehabilitation scoring unit and a training scheme correcting unit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410381430.6A CN117976141A (en) | 2024-04-01 | 2024-04-01 | Voice rehabilitation analysis method and system based on acoustic analysis algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410381430.6A CN117976141A (en) | 2024-04-01 | 2024-04-01 | Voice rehabilitation analysis method and system based on acoustic analysis algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117976141A true CN117976141A (en) | 2024-05-03 |
Family
ID=90859869
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410381430.6A Pending CN117976141A (en) | 2024-04-01 | 2024-04-01 | Voice rehabilitation analysis method and system based on acoustic analysis algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117976141A (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103405217A (en) * | 2013-07-08 | 2013-11-27 | 上海昭鸣投资管理有限责任公司 | System and method for multi-dimensional measurement of dysarthria based on real-time articulation modeling technology |
TWI622980B (en) * | 2017-09-05 | 2018-05-01 | 醫療財團法人徐元智先生醫藥基金會亞東紀念醫院 | Disease detecting and classifying system of voice |
CN109727608A (en) * | 2017-10-25 | 2019-05-07 | 香港中文大学深圳研究院 | A kind of ill voice appraisal procedure based on Chinese speech |
CN114373452A (en) * | 2020-10-15 | 2022-04-19 | 华东师范大学 | Voice abnormity identification and evaluation method and system based on deep learning |
CN116312469A (en) * | 2023-05-17 | 2023-06-23 | 天津大学 | Pathological voice restoration method based on voice conversion |
CN116831533A (en) * | 2023-08-03 | 2023-10-03 | 上海慧敏医疗器械有限公司 | Intelligent voice and sound quality disorder rehabilitation system based on ICF-RFT framework |
CN117198340A (en) * | 2023-09-20 | 2023-12-08 | 南京优道言语康复研究院 | Dysarthria correction effect analysis method based on optimized acoustic parameters |
CN117409819A (en) * | 2023-12-15 | 2024-01-16 | 北京大学第三医院(北京大学第三临床医学院) | Human voice detection and analysis method based on artificial intelligence |
-
2024
- 2024-04-01 CN CN202410381430.6A patent/CN117976141A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103405217A (en) * | 2013-07-08 | 2013-11-27 | 上海昭鸣投资管理有限责任公司 | System and method for multi-dimensional measurement of dysarthria based on real-time articulation modeling technology |
TWI622980B (en) * | 2017-09-05 | 2018-05-01 | 醫療財團法人徐元智先生醫藥基金會亞東紀念醫院 | Disease detecting and classifying system of voice |
CN109727608A (en) * | 2017-10-25 | 2019-05-07 | 香港中文大学深圳研究院 | A kind of ill voice appraisal procedure based on Chinese speech |
CN114373452A (en) * | 2020-10-15 | 2022-04-19 | 华东师范大学 | Voice abnormity identification and evaluation method and system based on deep learning |
CN116312469A (en) * | 2023-05-17 | 2023-06-23 | 天津大学 | Pathological voice restoration method based on voice conversion |
CN116831533A (en) * | 2023-08-03 | 2023-10-03 | 上海慧敏医疗器械有限公司 | Intelligent voice and sound quality disorder rehabilitation system based on ICF-RFT framework |
CN117198340A (en) * | 2023-09-20 | 2023-12-08 | 南京优道言语康复研究院 | Dysarthria correction effect analysis method based on optimized acoustic parameters |
CN117409819A (en) * | 2023-12-15 | 2024-01-16 | 北京大学第三医院(北京大学第三临床医学院) | Human voice detection and analysis method based on artificial intelligence |
Non-Patent Citations (2)
Title |
---|
刘智勇主编: "《卫生信息学教程》", vol. 01, 31 December 2021, 华中科技大学出版, pages: 39 * |
王昌辉, 谢湘, 赵胜辉: "基于语音识别的汉语发音教学系统", 计算机应用研究, no. 11, 28 November 2005 (2005-11-28), pages 11 - 13 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112489635B (en) | Multi-mode emotion recognition method based on attention enhancement mechanism | |
CN110992987B (en) | Parallel feature extraction system and method for general specific voice in voice signal | |
CN103531205B (en) | The asymmetrical voice conversion method mapped based on deep neural network feature | |
CN101599271B (en) | Recognition method of digital music emotion | |
CN102496363B (en) | Correction method for Chinese speech synthesis tone | |
CN110767210A (en) | Method and device for generating personalized voice | |
CN113297383B (en) | Speech emotion classification method based on knowledge distillation | |
CN113450761B (en) | Parallel voice synthesis method and device based on variation self-encoder | |
CN113436612B (en) | Intention recognition method, device, equipment and storage medium based on voice data | |
CN109065073A (en) | Speech-emotion recognition method based on depth S VM network model | |
CN114783418B (en) | End-to-end voice recognition method and system based on sparse self-attention mechanism | |
CN112002348A (en) | Method and system for recognizing speech anger emotion of patient | |
CN112735404A (en) | Ironic detection method, system, terminal device and storage medium | |
CN113571095B (en) | Speech emotion recognition method and system based on nested deep neural network | |
CN114842878A (en) | Speech emotion recognition method based on neural network | |
CN114495969A (en) | Voice recognition method integrating voice enhancement | |
Liu et al. | AI recognition method of pronunciation errors in oral English speech with the help of big data for personalized learning | |
CN113012717A (en) | Emotional feedback information recommendation system and method based on voice recognition | |
CN114626424B (en) | Data enhancement-based silent speech recognition method and device | |
CN117976141A (en) | Voice rehabilitation analysis method and system based on acoustic analysis algorithm | |
CN117909486B (en) | Multi-mode question-answering method and system based on emotion recognition and large language model | |
CN117041430B (en) | Method and device for improving outbound quality and robustness of intelligent coordinated outbound system | |
CN117235435B (en) | Method and device for determining audio signal loss function | |
CN115312029B (en) | Voice translation method and system based on voice depth characterization mapping | |
Li et al. | Research on isolated word recognition algorithm based on machine learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination |